Skip to content

magicbytes-chacho

Magic Bytes & File Forensics

Here's a walkthrough that includes basic checks for file integrity, metadata analysis, content inspection, and security intelligence gathering. Each step is fleshed out with context to guide the triage process:


1. Basic String Extraction

Use strings to extract readable text from the DOCX file. This helps identify hidden or encoded text, suspicious URLs, or any embedded messages without unzipping the file.

strings suspiciousfile.docx

What to Look For

Scan the output for suspicious phrases, URLs, encoded strings (e.g., base64), or unusual content that doesn't fit the context of the document.


2. File Identification and Format Check

Use file to confirm that the file type is consistent with its extension and to get a basic idea of its format. If the DOCX file claims to be something else, this is a red flag.

file suspiciousfile.docx

What to Look For

Verify that the output indicates the file is a valid ZIP archive, which is the expected format for DOCX files.

Note

When it is some other format type, (e.g. jpg, etc.), it may end up being helpful to do a google search on "jpg file structure." This will help find documentation as to what the specific magic bytes and other pertinent file structure components are specific to that file type.


3. Unzipping the DOCX File

Since DOCX is a ZIP archive, unzip the file to examine its contents and the internal structure (XML files, metadata, embedded media, etc.).

unzip suspiciousfile.docx -d suspiciousfile_unzipped

What to Look For

After unzipping, inspect the directories (/word, /media, /_rels, etc.) and individual files (document.xml, .rels files) for embedded content, links, or other suspicious artifacts.


4. Metadata and EXIF Analysis

Extract metadata with exiftool to check for creation details, authorship, software used, or other relevant information that might reveal manipulation or give context about the file's origin.

exiftool suspiciousfile.docx

What to Look For

Pay attention to unusual authorship, software version discrepancies, or suspicious timestamps. Compare this metadata to known legitimate documents for context.


5. File Configuration and Magic Byte Verification

Research the expected configuration and magic bytes for the file type. For a DOCX file, the magic bytes should align with the ZIP file signature (50 4B 03 04).

When the bytes don't match:

If the magic bytes don’t match, you can use a hex editor like HexEd.it to inspect the file manually and verify the first few bytes.

What to Look For

Check for ZIP signatures or other standard file format indicators. Inconsistent or missing signatures could indicate tampering.


6. Manual Inspection of Extracted XML Files

After unzipping, examine the internal XML files (such as document.xml, settings.xml, and .rels files) for clues:

  • Look for:
    • Suspicious URLs in .rels files (relationships).
    • Unusual formatting or encoding in the document content (e.g., hidden payloads).
    • Metadata fields like the author, editing history, and other identifying data.

Command

You can manually open and inspect the files using any text editor or use:

cat suspiciousfile_unzipped/word/document.xml

7. Check File Integrity Against Known Malware

Calculate the hash of the file and submit it to services like VirusTotal to check if the file has been flagged as malicious.

md5sum suspiciousfile.docx

What to Look For

Submit the hash to VirusTotal, Hybrid Analysis, or similar services and check for any existing reports or detections.


8. Look for Encoded Messages

During XML or string inspection, search for base64 or hex-encoded strings, which may be used to conceal malicious payloads or URLs.

Suspicious Code

If you find suspicious encoded content, decode it using tools like base64 or online decoders to reveal potential hidden content.

echo 'c3VzcGljaW91c3VybA==' | base64 --decode


9. Examine Embedded Media or Objects

DOCX files may contain embedded media, macros, or OLE objects that could hide malicious content. Review the word/media/ or word/embeddings/ directories for any unusual files. - What to Look For: Analyze embedded images, macros, or objects to ensure they are benign and not being used to execute hidden payloads.


Pro-Tip: CyberChef

CyberChef will give you a GUI version of hexdump, strings, and other pertinent commands if you prefer a visual drag/drop experience rather than CLI. It is also helpful for just dumping your suspcious file into and then dragging "Magic" to the Input to find the file type.

File Signature List

Also helpful: Wikipedia List of File Signatures