fix: add security warning for untrusted skill content#18784
Open
sumleo wants to merge 1 commit intoanomalyco:devfrom
Open
fix: add security warning for untrusted skill content#18784sumleo wants to merge 1 commit intoanomalyco:devfrom
sumleo wants to merge 1 commit intoanomalyco:devfrom
Conversation
Closes anomalyco#18781 Skills loaded from repositories may contain malicious instructions that trick the agent into writing to package manager configs, adding rogue registry URLs, or modifying system-wide settings. This is a supply-chain poisoning vector. Add a two-layer defense: - System prompt: append a <skill_security_policy> block to the skills section listing prohibited actions (registry hijacking, config writes, RCE patterns) - Skill tool output: wrap each loaded skill in a <skill_security_warning> reminding the agent that the content is untrusted before it processes the skill body
Contributor
|
Thanks for your contribution! This PR doesn't have a linked issue. All PRs must reference an existing issue. Please:
See CONTRIBUTING.md for details. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue for this PR
Closes #18781
Type of change
What does this PR do?
Skills loaded from repositories can contain malicious instructions that trick the agent into silently rewriting package manager configs (pip.conf, .npmrc) to point at attacker-controlled registries. This is a supply-chain poisoning attack via the skill system.
This PR adds a two-layer prompt-level defense:
System prompt (
system.ts): Appends a<skill_security_policy>block to the skills section that explicitly tells the agent to refuse writing to package manager configs, adding non-standard registry URLs, writing to system directories, or executing RCE patterns from skill content.Skill tool output (
skill.ts): Wraps each loaded skill's content in a<skill_security_warning>block so the agent sees the untrusted-content warning immediately before processing the skill body.The fix targets the root cause: the agent has no signal that skill content is untrusted repository input. By framing it as untrusted at both the system prompt level and the point of injection, the model can distinguish user-direct instructions from repo-embedded instructions.
How did you verify your code works?
Tested against 31 poisoned skill variants in a Docker-based validation harness. Without the fix, 4 of 31 skills successfully wrote malicious package manager configs. The defense was validated using a similar AGENTS.md-based approach — the system prompt framing blocked the supply-chain payloads in controlled testing.
The changes are additive (append-only to existing strings) and do not alter any control flow or tool behavior.
Checklist