Security researchers at Proofpoint have launched an innovative open-source tool called PDF Object Hashing that helps security teams detect and track dangerous, malicious files disguised as PDF documents.
The tool is now available on GitHub and marks a breakthrough in identifying suspicious documents that threat actors use in phishing campaigns, malware distribution, and business email compromise attacks.
PDFs have become the weapon of choice for cybercriminals because they look legitimate to everyday users.
Threat actors frequently send PDFs packed with malicious URLs, QR codes, or fake banking details to trick people into clicking.
The problem is that traditional security tools often fail to catch these threats because PDFs can be modified in countless ways while still appearing identical to users.
The PDF format itself is complex and flexible, which actually works against security teams trying to defend against attacks.
The specification allows multiple ways to represent the same document, giving attackers many creative options to hide their tracks.

Some PDFs are encrypted, which makes analysis even harder since security tools cannot read the text, URLs, or images inside them.
Different parts of a PDF can be stored as plain text or compressed, and important details like domain names might be buried in these compressed sections.
These variations make it nearly impossible to create simple detection rules that catch all malicious PDFs.
When attackers change a URL or swap out a fake invoice image, the entire security signature breaks, and the threat slips right through traditional defenses.
This cat-and-mouse game has frustrated security teams for years because PDF variations are endless.
Proofpoint’s solution takes a completely different approach to the problem.
Instead of focusing on what is inside the PDF, the tool examines the document’s underlying structure.
It analyzes the types of objects in the PDF and the order they appear, ignoring the specific details within those objects.
This creates a “skeleton” or template of the document that remains constant even when attackers modify images, URLs, or text.
The tool then hashes these object types into a unique fingerprint that stays the same even if the threat actor changes the lure image or updates the malicious URL.
The technique even works on encrypted PDFs because the document structure remains visible even when the contents are hidden.
Proofpoint has already used this tool to track dangerous threat actors. The UAC-0050 group, which targets Ukraine, distributes encrypted PDFs containing malware.
Because the files are encrypted, traditional tools cannot extract the malicious URLs inside them.
However, PDF Object Hashing allowed Proofpoint to identify these threats by analyzing their structure alone, regardless of encryption.
By using PDF Object Hashing alongside traditional detection methods, security teams can catch variations of malicious documents that might otherwise be missed, significantly improving threat detection and helping analysts understand when multiple PDF attacks are connected to the same threat actor.
Cyber Awareness Month Offer: Upskill With 100+ Premium Cybersecurity Courses From EHA's Diamond Membership: Join Today