Magic Bytes & File Forensics¶
Here's a walkthrough that includes basic checks for file integrity, metadata analysis, content inspection, and security intelligence gathering. Each step is fleshed out with context to guide the triage process:
1. Basic String Extraction¶
Use strings
to extract readable text from the DOCX file. This helps identify hidden or encoded text, suspicious URLs, or any embedded messages without unzipping the file.
strings suspiciousfile.docx
What to Look For
Scan the output for suspicious phrases, URLs, encoded strings (e.g., base64), or unusual content that doesn't fit the context of the document.
2. File Identification and Format Check¶
Use file
to confirm that the file type is consistent with its extension and to get a basic idea of its format. If the DOCX file claims to be something else, this is a red flag.
file suspiciousfile.docx
What to Look For
Verify that the output indicates the file is a valid ZIP archive, which is the expected format for DOCX files.
Note
When it is some other format type, (e.g. jpg, etc.), it may end up being helpful to do a google search on "jpg file structure." This will help find documentation as to what the specific magic bytes and other pertinent file structure components are specific to that file type.
3. Unzipping the DOCX File¶
Since DOCX is a ZIP archive, unzip the file to examine its contents and the internal structure (XML files, metadata, embedded media, etc.).
unzip suspiciousfile.docx -d suspiciousfile_unzipped
What to Look For
After unzipping, inspect the directories (/word
, /media
, /_rels
, etc.) and individual files (document.xml
, .rels
files) for embedded content, links, or other suspicious artifacts.
4. Metadata and EXIF Analysis¶
Extract metadata with exiftool
to check for creation details, authorship, software used, or other relevant information that might reveal manipulation or give context about the file's origin.
exiftool suspiciousfile.docx
What to Look For
Pay attention to unusual authorship, software version discrepancies, or suspicious timestamps. Compare this metadata to known legitimate documents for context.
5. File Configuration and Magic Byte Verification¶
Research the expected configuration and magic bytes for the file type. For a DOCX file, the magic bytes should align with the ZIP file signature (50 4B 03 04
).
When the bytes don't match:
If the magic bytes don’t match, you can use a hex editor like HexEd.it to inspect the file manually and verify the first few bytes.
What to Look For
Check for ZIP signatures or other standard file format indicators. Inconsistent or missing signatures could indicate tampering.
6. Manual Inspection of Extracted XML Files¶
After unzipping, examine the internal XML files (such as document.xml
, settings.xml
, and .rels
files) for clues:
- Look for:
- Suspicious URLs in
.rels
files (relationships). - Unusual formatting or encoding in the document content (e.g., hidden payloads).
- Metadata fields like the author, editing history, and other identifying data.
- Suspicious URLs in
Command
You can manually open and inspect the files using any text editor or use:
cat suspiciousfile_unzipped/word/document.xml
7. Check File Integrity Against Known Malware¶
Calculate the hash of the file and submit it to services like VirusTotal to check if the file has been flagged as malicious.
md5sum suspiciousfile.docx
What to Look For
Submit the hash to VirusTotal, Hybrid Analysis, or similar services and check for any existing reports or detections.
8. Look for Encoded Messages¶
During XML or string inspection, search for base64 or hex-encoded strings, which may be used to conceal malicious payloads or URLs.
Suspicious Code
If you find suspicious encoded content, decode it using tools like base64
or online decoders to reveal potential hidden content.
echo 'c3VzcGljaW91c3VybA==' | base64 --decode
9. Examine Embedded Media or Objects¶
DOCX files may contain embedded media, macros, or OLE objects that could hide malicious content. Review the word/media/
or word/embeddings/
directories for any unusual files.
- What to Look For: Analyze embedded images, macros, or objects to ensure they are benign and not being used to execute hidden payloads.
Pro-Tip: CyberChef
CyberChef will give you a GUI version of hexdump, strings, and other pertinent commands if you prefer a visual drag/drop experience rather than CLI. It is also helpful for just dumping your suspcious file into and then dragging "Magic" to the Input to find the file type.
File Signature List
Also helpful: Wikipedia List of File Signatures