Block Malformed Office Documents Used in Phishing Campaigns

December 19, 2024 · 3 min read

Contextal Platform Creators

By the end of 2024, threat actors began employing a new technique to deliver phishing attacks using handcrafted Office files. The legitimate document content is preceded by specifically crafted data, which disrupts format detection mechanisms. Surprisingly, Microsoft Office, when opening such a file based on its extension, offers to recover the data. It scans for a valid header and opens the Office content embedded within the manipulated file.

According to our research, existing protections offered by major vendors are ineffective, and it remains relatively easy to create files that evade detection. Here, we demonstrate how to create a scenario in Contextal Platform to block all attacks of this type!

We start by verifying if the file extension is either docx, xlsx, or pptx, which are popular formats supported by Microsoft Office:

@has_name(iregex(".*(docx|xlsx|pptx)$"))

Refer to the ContexQL documentation for details on has_name and iregex functions.

Next, we check if the object type was not detected as Office:

object_type != "Office"

Technically, Office files are zip archives and must start with a specific byte sequence. To minimize false positives (e.g. triggering on empty files with Office extensions), we check if the object contains characteristic Office data, using the powerful match_pattern function:

/* The function below looks for "PK\x03\x04" and "[Content_Types].xml",
 * which should be located close to each other.
 */
@match_pattern(504b0304{16-128}5b436f6e74656e745f54797065735d2e786d6c)

With these checks in place, we can confidently flag and block suspicious files. Here's the complete ContexQL query:

@has_name(iregex(".*(docx|xlsx|pptx)$"))
&& object_type != "Office"
/* The function below looks for "PK\x03\x04" and "[Content_Types].xml,
 * which should be located close to each other.
 */
&& @match_pattern(504b0304{16-128}5b436f6e74656e745f54797065735d2e786d6c)

tip

You can further customize this query to your specific needs. For instance, you can require the malformed file is part of an Email:

@has_ancestor(object_type == "Email")

As demonstrated, Contextal Platform offers highly adaptable and efficient tools to combat a wide variety of threats. Be sure to explore our other scenarios for more inspiring use cases!

info

Click on the download button below to get the scenario and then upload it using Contextal Console or the ctx command line tool (when using the latter, don't forget to reload remote scenarios after adding a new one!)

Malformed-Office.json
{
  "name": "Malformed Office",
  "creator": "Contextal",
  "description": "Detect and block Office objects, which may have some data prepended to evade scanners - the Office software is still capable of recovering and opening such files.",
  "local_query": "@has_name(iregex(\".*(docx|xlsx|pptx)$\"))\n&& object_type != \"Office\"\n/* the function below looks for \"PK\\x03\\x04\" and \"[Content_Types].xml\",\n * which should be located close to each other.\n */\n&& @match_pattern(504b0304{16-128}5b436f6e74656e745f54797065735d2e786d6c)",
  "action": "BLOCK"
}