Skip to content

fix: add security warning for untrusted skill content#18784

Open
sumleo wants to merge 1 commit intoanomalyco:devfrom
sumleo:fix/skill-security-warning
Open

fix: add security warning for untrusted skill content#18784
sumleo wants to merge 1 commit intoanomalyco:devfrom
sumleo:fix/skill-security-warning

Conversation

@sumleo
Copy link

@sumleo sumleo commented Mar 23, 2026

Issue for this PR

Closes #18781

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Skills loaded from repositories can contain malicious instructions that trick the agent into silently rewriting package manager configs (pip.conf, .npmrc) to point at attacker-controlled registries. This is a supply-chain poisoning attack via the skill system.

This PR adds a two-layer prompt-level defense:

  1. System prompt (system.ts): Appends a <skill_security_policy> block to the skills section that explicitly tells the agent to refuse writing to package manager configs, adding non-standard registry URLs, writing to system directories, or executing RCE patterns from skill content.

  2. Skill tool output (skill.ts): Wraps each loaded skill's content in a <skill_security_warning> block so the agent sees the untrusted-content warning immediately before processing the skill body.

The fix targets the root cause: the agent has no signal that skill content is untrusted repository input. By framing it as untrusted at both the system prompt level and the point of injection, the model can distinguish user-direct instructions from repo-embedded instructions.

How did you verify your code works?

Tested against 31 poisoned skill variants in a Docker-based validation harness. Without the fix, 4 of 31 skills successfully wrote malicious package manager configs. The defense was validated using a similar AGENTS.md-based approach — the system prompt framing blocked the supply-chain payloads in controlled testing.

The changes are additive (append-only to existing strings) and do not alter any control flow or tool behavior.

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

Closes anomalyco#18781

Skills loaded from repositories may contain malicious instructions that
trick the agent into writing to package manager configs, adding rogue
registry URLs, or modifying system-wide settings. This is a supply-chain
poisoning vector.

Add a two-layer defense:
- System prompt: append a <skill_security_policy> block to the skills
  section listing prohibited actions (registry hijacking, config writes,
  RCE patterns)
- Skill tool output: wrap each loaded skill in a <skill_security_warning>
  reminding the agent that the content is untrusted before it processes
  the skill body
@github-actions
Copy link
Contributor

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: Agent skills can silently redirect package managers to malicious registries

1 participant