Traditional malware analysis struggles with the ever-growing volume and complexity of threats, while static and dynamic analysis techniques are used to dissect malware, and AI and machine learning are used for classification based on patterns but these methods struggle with entirely new threats.
Gemini 1.5 Pro, a new AI model capable of processing massive amounts of code, is a significant breakthrough that can automate complex code analysis and empower AI to be a powerful assistant in the fight against malware, increased automation allows analysts to handle the overwhelming volume of threats more effectively.
Generative AI (gen AI) is emerging as an assistant to malware analysts, while Code Insight, a feature on Google’s VirusTotal platform, analyzes code snippets and generates reports describing potential malicious functionalities.
While effective for smaller files, current-gen AI models struggle with large binaries due to limited token input capacity. Conversely, reverse engineering, the traditional method for analyzing binaries, requires significant expertise and time, making it difficult to scale.
A large language model revolutionizes malware analysis with its ability to process disassembled executables, eliminating the need for context-losing fragmentation and allowing for holistic understanding.
Gemini 1.5 Pro reads a huge amount of code and uses its operating systems and security knowledge to figure out what the code is trying to say. It then predicts what malware will do and makes detailed reports that people can read that include indicators of compromise (IOCs) to help quickly find threats.
Traditional AI tools for malware analysis struggle with large, decompiled code bases, requiring fragmentation, which hinders analysis. Gemini 1.5 Pro, a large language model, overcomes this limitation by processing the entire code (over 280,000 tokens) in a single pass.
It successfully analyzes a WannaCry sample without prior knowledge, identifying malicious behavior, Indicators of Compromise (IOCs), network scanning techniques, and kill switch mechanisms all within 34 seconds, which demonstrates its ability to perform comprehensive analysis of complex malware in a single run.
It discusses code analysis using Large Language Models (LLMs) and the distinction between disassembly and decompilation, as disassembly translates binary code to assembly language, which is human-readable but complex.
Decompilation tries to reconstruct the original source code, offering a more concise and higher-level view.
Decompiled code is better suited for LLM analysis due to its shorter length and structure. However, disassembly remains valuable for detailed, low-level analysis.
The authors’ LLM, Gemini 1.5 Pro, can handle both disassembly and high-level languages, allowing a flexible approach depending on the scenario, which showcased this with an unknown binary, where disassembly analysis revealed it to be a likely game cheat for Grand Theft Auto.
The table by Google Cloud lists the details of four malware samples, where the first two, lhdfrgui.exe and tasksche.exe, are from the WannaCry ransomware attack in 2017, which are executable files (Win32 EXE) with unique SHA-256 hashes and file sizes, while the third sample, EXEC.exe, is a more recent Win32 EXE from April 2022 and the last sample, medui.exe, is the newest, a Win32 EXE first seen in March 2024.
Stay updated on Cybersecurity news, whitepapers, and Infographics. Follow us on LinkedIn & Twitter.