Office
Supported formats
Microsoft Word and Excel (all popular versions)
Description
The backend performs a throughout analysis of Office files, extracts text, VBA code, embedded objects and a plethora or properties.
Available in Contextal Platform 1.0 and later.
Features
Encryption
Supported encryption formats:
- XOR Obfuscation
- Office Binary Document RC4 Encryption
- Office Binary Document RC4 CryptoAPI Encryption
- ECMA-376 Standard Encryption
- ECMA-376 Agile Encryption
Supported agile algorithms:
- AES-128
- AES-192
- AES-256
Supported agile counters:
- CBC
- CFB-8
Supported agile hashes:
- SHA1
- SHA256
- SHA384
- SHA512
VBA
Visual Basic for Applications (VBA) structures are stored in two flavors: in the version-independent format (which is publicly documented) and in the version-dependent format (the infamous, undocumented, PerformanceCache
).
Since the version-independent data is regularly stomped by malware authors and almost never used by MS Office, this backend also extracts the cached data and decompiles the P-Code.
Although Office forms are rare these days, they are often abused by malware to store fragments of code; this backend extracts all forms as metadata.
Symbols
Object
ENCRYPTED
→ the document is encryptedDECRYPTED
→ the document has been successfully decryptedLIMITS_REACHED
→ limits triggered while processing the documentVBA
→ the document contains VBA (Visual Basic for Applications)CORRUPTED_VBA
→ the document contains VBA, but it cannot be parsedHAS_FORMS
→ VBA contains formsHAS_MACRO_SHEET
→ the document contains Excel 4.0 macrosheetsOLE
→ the document is contained within OLEDOC
→ the document is a legacy Word documentXLS
→ the document is a legacy Excel documentDOCX
→ the document is an ooxml Word documentXLSX
→ the document is an ooxml Excel document
Children
ENCRYPTED
→ the parent (the document) is encryptedDECRYPTED
→ this child object has been successfully decryptedVBA
→ this child object contains result from processing VBA projectDECOMPILED
→ the VBA project was decompiled from P-CodeCORRUPTED
→ the VBA contains corrupted contentTOOBIG
→ this child object was truncated or was not stored as it exceeds the limitsNOT_FOUND
→ this child object was referenced in the document but not found in the document container
Example Metadata
{
"org": "ctx",
"object_id": "587bfa9fe6162e0c74dfaa1e48b2ff1b596f803b95648d362daa412cf9dcbb3a",
"object_type": "Office",
"object_subtype": "DOCX",
"recursion_level": 5,
"size": 250807,
"hashes": {
"sha256": "587bfa9fe6162e0c74dfaa1e48b2ff1b596f803b95648d362daa412cf9dcbb3a",
"sha1": "ee867ac81fcb3e51995dcf90aaad659cd340a750",
"sha512": "a9a4715ec8d374cfce9137bd5fa91e662f90bade289d07c7f9f134b5bef53338502642c2f14b745946a4dba8e3afb2d914e2aa32dad410c9a27e33837beedbe1",
"md5": "065fee6d19cb04e56ab15b1682c463b6"
},
"ctime": 1725869059.435991,
"relation_metadata": {
"decoded_size": 250807,
"encoded_size": 334443,
"mime_type": "application/msword"
},
"ok": {
"symbols": [
"DOCX",
"VBA"
],
"object_metadata": {
"_backend_version": "1.0.0",
"properties": {
"app_version": "14.0000",
"application": "Microsoft Office Word",
"characters": 565152,
"characters_with_spaces": 662976,
"created": "2019-06-27 10:51:00.0 +00:00:00",
"creator": "",
"doc_security": {
"locked": true,
"password_protected": false,
"read_only_enforced": false,
"read_only_recommended": false
},
"hyperlinks_changed": false,
"last_modified_by": "",
"lines": 4709,
"links_up_to_date": false,
"modified": "2019-06-27 11:57:00.0 +00:00:00",
"pages": 120,
"paragraphs": 1325,
"revision": "1",
"scale_crop": false,
"shared_doc": false,
"template": "Normal.dotm",
"total_time": "0s",
"words": 99149
},
"user_properties": {},
"vba": {
"rsvd2": 0,
"rsvd3": 1,
"version": 151
}
}
}
}
Example Queries
object_type == "Office"
&& @match_object_meta($properties.doc_security.password_protected == true)
- This query matches password protected documents.
object_type == "Office"
&& @match_object_meta($properties.app_version == "14.0000")
&& @has_symbol("VBA")
- This matches documents created by a specific
Office
version and containingVBA
macros.
Configuration Options
max_processed_size
→ maximum size of the input object that will be processed (default: 262144000)max_children
→ maximum number of children objects to create (default: 100)max_child_output_size
→ maximum size of a single children object (default: 41943040)sheet_size_limit
→ size limit of Excel's sheet (default: 5242880)shared_strings_cache_limit
→ size limit of shared strings cache (default: 10000000)create_domain_children
→ whether to createDomain
children out of collected domain names for further processing (default: true)