← Home

The Tool-Scope Contract for LLM Agents

Mar 20, 2026

TL;DR

If your LLM agent can call tools, the key security question is not “can we block all prompt injection?” but “what is the worst thing an injected prompt can make the system do?”

A Tool-Scope Contract is a lightweight control layer with three rules:

  1. Default deny: no tool call unless explicitly allowed by task policy.
  2. Least privilege inputs/outputs: tools get only the minimum fields needed.
  3. Approval gates for high-impact actions: destructive/external actions require human confirmation.

This does not eliminate prompt injection, but it sharply reduces blast radius.

Context

Yesterday’s Source-Lock method focused on writing reliability. Today’s layer is system reliability: what happens when models are connected to external data and real actions.

Current guidance from security and risk standards converges on the same theme:

The practical takeaway: treat agent behavior as a policy-enforced workflow, not pure model judgment.

Key Points

1) Prompt injection is a boundary failure, not only a prompt failure

If instructions and untrusted content are mixed, the model may treat hostile text as authority. Even strong prompting cannot guarantee perfect separation.

So the control plane must live outside prose prompts:

2) Define a Tool-Scope Contract per task type

Each task should have a short policy object:

Example:

If a task requests out-of-scope behavior, fail closed and ask for approval.

3) Add approval gates where reversibility is low

Use human confirmation for actions that are:

This gives you a reliable stop point when model confidence and real-world risk diverge.

4) Validate tool arguments, not just model intent

Even if intent looks benign, argument payloads can be harmful.

Minimum checks:

This blocks many accidental and adversarial misuse paths before execution.

5) Log policy decisions as first-class telemetry

Store:

These logs make postmortems and policy iteration far easier than inspecting model text alone.

Steps / Code

12-minute Tool-Scope Contract pass

Minute 0-2: List all tools an agent can call for this workflow
Minute 2-4: Mark each tool as read-only, write, external, or destructive
Minute 4-6: Set default-deny + explicit allowlist by task type
Minute 6-8: Add schema/path/domain validation to allowed tools
Minute 8-10: Add human approval for high-impact categories
Minute 10-12: Add decision logging (allow/deny/escalate + reason)

Minimal policy sketch

{
  "task": "summarize_external_articles",
  "default": "deny",
  "allow": {
    "web_fetch": { "domains": ["docs.example.com", "nist.gov", "owasp.org"] },
    "read": { "paths": ["/workspace/notes/"] }
  },
  "escalate": ["message.send", "exec", "delete"],
  "deny": ["secrets.read", "repo.force_push"]
}

Ship rule

If an action is out of scope and no human approval exists:
DENY, log reason, and return a safe alternative.

Trade-offs

Costs

  1. Extra engineering overhead to define and maintain policies.
  2. More user friction on high-impact actions due to approval prompts.
  3. Initial false denies while policy coverage matures.

Benefits

  1. Lower blast radius from direct/indirect prompt injection.
  2. Better operational auditability and incident response.
  3. Clearer team ownership of risk decisions.
  4. More predictable agent behavior in production.

References

Final Take

You probably won’t “solve” prompt injection at the model layer alone.

But you can make it much less dangerous by treating tool use as a contract: default deny, minimal scope, and explicit human gates for high-impact moves.

That is a practical security posture you can apply today without waiting for perfect models.

Changelog