Researchers have introduced ARACNE, a novel autonomous pentesting agent powered by Large Language Models (LLMs) that can execute commands on real Linux shell systems.
This agent is designed to interact with SSH services, leveraging a modular architecture that supports multiple LLMs for enhanced flexibility and effectiveness.
ARACNE’s architecture includes a planner, interpreter, summarizer, and core agent modules, each utilizing different LLMs to optimize performance.
Key Features and Evaluations
The planner module, powered by OpenAI’s GPT-O3-mini, generates attack plans based on the provided goal and updates them after each action.
The interpreter module uses LLaMA 3.1 to translate these plans into executable Linux commands.
The summarizer module, which is optional, uses GPT-4o to reduce the context size, allowing for longer attack durations but potentially at the cost of accuracy.
ARACNE connects to target systems via SSH using the Paramiko library, enabling interactive shell interactions.
ARACNE was evaluated against two platforms: ShelLM, an LLM-based shell honeypot, and the Over the Wire Bandit capture-the-flag challenges.
Against ShelLM, ARACNE achieved a 60% success rate in both scenarios with and without the summarizer module.
In the Bandit challenges, ARACNE achieved a 57.58% success rate, slightly improving over the state-of-the-art results.
According to the study, the agent typically required fewer than five actions to succeed when winning, indicating its efficiency.
Future Developments and Ethical Considerations
Future plans include expanding ARACNE’s capabilities by integrating it with established security tools and evaluating its performance against defensive mechanisms like Mantis.
The modular design allows for easy integration of new LLM models as they become available, potentially enhancing ARACNE’s performance further.
Ethical considerations highlight the dual nature of such technology: while it poses security risks, it also aids in identifying vulnerabilities and improving defenses at a lower cost.
As LLMs continue to evolve, ARACNE’s potential for both offensive and defensive cybersecurity applications is significant.
Find this Story Interesting! Follow us on LinkedIn, and X to Get More Instant Updates