2026.03.30 • Patrick Binder • Cybersecurity Research
How Do Attackers Get Started with LLMs?
The real question is not whether large language models make attacks fully autonomous. The better question is: How fast is the adversarial scene industrializing the early and middle phases of the attack chain?
While many IT security experts have effectively stuck their heads in the sand—refusing to use these tools due to privacy concerns—attackers are operating without such limitations. We are seeing an adoption rate of 90-95% in the adversarial scene, compared to a mere 20-40% among security professionals. This delta is where the new risk of initial access is born.
This blog post explores the transition from assistive LLMs to partially autonomous cyber operators. Between 2024 and 2026, the discussion shifted from speculation to empirical evidence: Codex CLI, Gemini CLI, and Claude CLI are already succeeding where traditional methods struggle.
The practical near-term impact is a massive acceleration of target understanding, hypothesis generation, and breadth-first vulnerability coverage. While the "Blue Team" is often restricted by data governance, attackers are using frontier models like Opus 4.6 to find high-severity findings in mature open-source projects at industrial tempo.
We must move beyond the "toy" mindset. Every IT-Security Expert in the Blue Team should at least know what an attacker can do with LLMs!
0x01
The Delta: Attackers are Winning the Adoption Race
LLMs are transforming cyber operations, but the most significant shift isn't technical—it's organizational. While many IT security experts have effectively stuck their heads in the sand, refusing to use these models due to fears of leaking customer data to OpenAI or Anthropic, attackers are operating without any such constraints.
In Germany and beyond, we see a stark "Adoption Delta." The adversarial scene is utilizing LLMs at a 90-95% efficiency rate, while legitimate security teams lag behind at 20-40%. This gap allows attackers to industrialize the early phases of the attack chain: target research, pretexting, and rapid variation of tactics. The risk starts here, with the cost of broad vulnerability search falling sharply.
Every IT-Security Expert in the Blue Team should at least know what an attacker can do with LLMs!
0x02
From Chatbots to Agentic Operators (CLI Power)
The era of "static prompting" is over. The new frontier is agentic. Attackers are no longer just asking a chatbot for a payload; they are using Codex CLI, Gemini CLI, and Claude CLI to maintain goals across long horizons.
These agentic operators can read code, inspect commit histories, and iterate based on environmental feedback. Research on frameworks like PentestGPT shows a 228.6% increase in task completion over naive prompting. By offloading the cognitive load of understanding complex routing and authentication states to the model, attackers can navigate multi-step preconditions with industrial tempo.
# Initializing agentic attack sequence gemini-cli --task "Analyze routes and identify potential logic flaws in /api/v2" claude-cli --task "Reason over recent commit history for incomplete security patches" codex-cli --task "Generate multi-step exploit reproduction for identified CVE" # Iterating on feedback loops...
0x03
Industrialized 0-Day Discovery
2026 marked a critical inflection point in vulnerability research. Frontier models like Opus 4.6 and Google’s Big Sleep (Naptime) have proven they can find high-severity vulnerabilities in mature, heavily tested open-source codebases. Google Project Zero reported a real SQLite vulnerability discovered by an LLM before its official release.
For attackers, this means the highest near-term value isn't just autonomous "black-box" attacks, but code-informed research. LLMs are now used to reason over source diffs, identify framework misuse, and search for incomplete fix propagation. The cost of breadth-oriented vulnerability research is falling, allowing attackers to find those crucial entry points that humans often overlook.
SigninLogs | where TimeGenerated > ago(7d) | where ResultType != 0 | summarize FailedAttempts = count(), Users = dcount(UserPrincipalName) by IPAddress | order by FailedAttempts desc
0x04
Chaining Initial Access in the Agentic Era
The benchmark for initial access is no longer just "finding a bug." It's about the "operational glue"—managing sessions, tokens, and multi-step authentication. While benchmarks like CVE-Bench show a 13% success rate for autonomous agents, this number ignores the speed of adaptation.
Attackers are now using LLMs to solve logical hurdles like business-logic abuse and cross-tenant authorization flaws. Furthermore, a new attack surface has emerged: Prompt Injection against browser agents. If a target's internal workflow uses an AI agent for analysis or browser automation, the attacker can redirect that agent's behavior to exfiltrate data or bypass security controls entirely.
az login --use-device-code az account show python3 tools/export_timeline.py --source logs --output ./dist/timeline.md python3 tools/render_report.py --input ./dist/timeline.md --format pdf
0x05
Future: Inference Budget as an Attack Multiplier
The future of cyber operations is tied to inference-time compute. Research shows that as you spend more tokens, agent performance scales log-linearly. Average steps completed on corporate attack chains have increased from 1.7 to nearly 10 in just 18 months.
A determined operator can now "buy" additional capability simply by increasing their token budget. The era of dismissing LLM-enabled attacks as "hype" is over. The web red teaming problem is expanding to include AI-mediated risks, retrieval pipelines, and agent toolchains. The competitive edge belongs to those who adapt fastest.
Each IT-Security Expert in Blue Team should at least know what an attacker can do with LLMs!