Research Blog / Modern Monochrome Format

Agent Security from an Identity Attacker’s Perspective

Most discussions about agent security still center on prompts, jailbreaks, and model behavior. That is only part of the picture. From an identity attacker’s perspective, the real question is different: what can this agent access, what can it invoke, and what trust has been delegated to it by design? If the model is the brain, identity is the blast radius.

TopicAgent Security

Core QuestionWhat can it do, as whom?

Primary RiskDelegated authority abuse

Reader ValueThreat model beyond jailbreak discourse

0. The Problem

Modern agents are not just chat surfaces anymore. They are increasingly execution surfaces attached to cloud APIs, messaging systems, internal documents, ticketing systems, CI/CD pipelines, remediation workflows, and privileged automation paths. That changes the security question. The issue is no longer only whether the model can be manipulated into saying something unsafe. The issue is whether the surrounding system allows a manipulator to inherit trust, trigger tools, traverse data boundaries, or transform soft steering into real actions.

Many organizations still frame agent risk as a variant of prompt security. That misses the more durable problem. A weak prompt boundary can matter, but an over-entitled service principal, poorly designed approval chain, or over-trusted tool connector often matters more. An attacker does not need philosophical control over the model. They need practical control over authority.

1. The Identity Attacker Lens

An identity-focused attacker evaluates an agent exactly the way they would evaluate any other privileged operator. Which identities back it? Which tokens exist? Which scopes, roles, or delegated permissions are attached? What actions are possible through tools? What approvals are real and what approvals are ceremonial? What context can the system retrieve and under what boundary conditions?

This framing matters because it cuts through hype immediately. A highly aligned model connected to over-broad credentials is still dangerous. A carefully filtered prompt layer wrapped around sloppy delegated execution is still dangerous. From the attacker’s point of view, the model is simply one interface into a larger identity system. If that identity system is weak, the whole agent is weak.

2. High-Risk Abuse Themes

Theme

Token Concentration

Agents often sit near highly valuable credentials, API tokens, or delegated sessions. That makes them attractive trust amplifiers.

Theme

Delegated Execution

The dangerous question is not whether the system can answer, but whether it can act on behalf of a more trusted principal.

Theme

Tool Trust Abuse

If a tool is trusted and loosely bounded, steering the tool can be operationally equivalent to compromising the identity directly.

Other high-risk themes follow naturally. Over-entitled service principals give agents authority they do not operationally need. Retrieval systems become reconnaissance layers when context scoping is weak. Cross-project or cross-tenant operation creates confidentiality spillover. Approval models often fail not because they are absent, but because they are attached too late in the workflow or can be socially bypassed by how requests are framed.

There is also a subtle but important risk in context aggregation. An agent that appears to merely summarize can still accumulate enough operational context to expose decisions, secrets, relationships, or next steps across boundaries the user never intended to merge. That is not a classic jailbreak problem. It is a trust-composition problem.

3. Detection and Defense

Detection priorities for agent systems should include unusual tool invocation patterns, deviations in scope or action frequency, anomalous identity use tied to the agent’s backing principals, approval flow irregularities, and cross-context data access sequences that exceed the expected task boundary. If the system can call tools, defenders need telemetry that connects the request, the retrieved context, the identity used, and the downstream action attempted or completed.

Defensive controls should be identity-first. Least privilege for agent identities is non-negotiable. Tool connectors should be sharply scoped. Retrieval should be domain-bounded. High-risk actions should sit behind explicit approval checkpoints. Credentials should be short-lived where possible. Analysis and change authority should be separated. Session isolation should reflect actual organizational trust boundaries, not just engineering convenience.

4. Research Direction

Agent security deserves to be treated as an authorization and identity discipline at least as much as a model-safety discipline. That means researchers should be modeling delegated execution paths, privilege inheritance, context mixing, approval workflows, and tool-chain abuse with the same seriousness currently reserved for prompt injection. In practice, many of the highest-impact weaknesses will emerge from the connective tissue between model, tools, identity, and workflow design.

The next generation of meaningful security work in this space will come from teams that can bridge offensive reasoning, identity architecture, and real operational control design. That is where the strongest unanswered questions still sit.

5. Takeaways

An attacker does not care whether compromise begins with a prompt flaw, a workflow weakness, or an over-entitled service principal. They care about access, authority, and reachable impact. Agent security becomes much easier to reason about once it is viewed through that lens. The right question is not only whether the model is safe. The right question is whether the surrounding identity and execution design deserves trust at all.