This hook runs automatically on an agent event, with no chance for you to review it first. The spotted command the flagged value pipes whatever the remote server returns at that moment straight into a shell — if the URL is ever compromised, attacker code runs on your machine.
Detection logic
regex `(?i)\b(?:curl|wget|fetch|invoke-webrequest|iwr)\b[^|;\n]*\|\s*(?:bash|sh|zsh|fish|powershell|pwsh|cmd|python|node|perl|ruby)\b` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/*.hook.ps1, **/*.hook.bat, **/SessionStart*, **/SessionEnd*, **/PreToolUse*, **/PostToolUse*
Limitations (3)
- Cannot detect the pattern when curl output is written to a file first then executed in a separate command. The two-step variant requires a sequence-aware analyzer (deferred to v2).
- Cannot detect dynamic-URL construction (e.g. curl "$BASE/install.sh" | bash where $BASE is variable).
- PowerShell IWR pipelines have many equivalent phrasings; the regex covers the canonical ones.
This hook runs automatically on an agent event, with no chance for you to stop it. The spotted command the flagged value recursively force-deletes a root, home, or variable-expanded path — if the path resolves wrong at runtime, it irreversibly destroys data.
Detection logic
regex `(?i)\brm\s+(?:-[rRfF]+\s+|--recursive\s+|--force\s+)+(?:/(?:\s|$|\*|[a-zA-Z][a-zA-Z0-9/_-]*)|\$\w+|~/?\s*$|"\$\{?\w+\}?")` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/*.hook.ps1, **/SessionStart*, **/SessionEnd*, **/PreToolUse*, **/PostToolUse*
Limitations (3)
- Cannot detect `rm -rf` with paths constructed via complex variable expansion (the rule catches `$VAR` directly but not `"${BASE}/${SUBDIR}"` chains).
- Cannot detect PowerShell `Remove-Item -Recurse -Force` equivalents — that pattern is a v2 extension.
- Cannot distinguish a `rm -rf` that legitimately cleans a known temp directory from one that destroys user data; the rule treats hook-scope `rm -rf` as universally suspicious.
An MCP tool's description is read by the agent as trusted instructions. This one embeds bidirectional-override characters (the flagged value) so the manifest renders one thing to a human reviewer but tokenizes a different, reordered instruction to the agent.
Detection logic
regex `[\u{202A}-\u{202E}\u{2066}-\u{2069}]` against tools/manifest.json, **/tools/manifest.json, manifest.json, **/server.py, **/server.ts, **/server.js, **/*.toolmanifest.json
Limitations (2)
- Cannot detect BiDi attacks via inherent-RTL scripts (Arabic / Hebrew text used in MCP tools for non-English users).
- Cannot distinguish legitimate i18n use of BiDi overrides from injection.
An MCP tool's description is injected into the agent's context as trusted instructions. This one runs past 2000 characters — far beyond the 100–500 a real tool needs — room to bury reasoning chains or system-prompt overrides a human reviewer skims past.
Detection logic
regex `(?is)"description"\s*:\s*"[^"]{2000,}"` against tools/manifest.json, **/tools/manifest.json, manifest.json, **/*.toolmanifest.json
Limitations (3)
- Length-based heuristic — does not analyze the description content. A legitimate complex tool may have a long description; vendor-appeal data will tune the threshold.
- Cannot detect description-creep injected via dynamic construction (server-side string concatenation).
- Cannot detect multi-line JSON-formatted descriptions where the length is hidden by line wrapping in the source view.
This server publishes a tool whose name (the flagged value) uses an "internal/private" naming convention. The agent still sees and may invoke it, but a human reviewing the manifest skips over names that visually signal "not for me" — letting a hidden tool act unreviewed.
Detection logic
regex `(?is)"name"\s*:\s*"[^"]*(?:_internal|__|\.hidden|_meta|_sys)[^"]*"` against tools/manifest.json, **/tools/manifest.json, manifest.json, **/*.toolmanifest.json
Limitations (2)
- Heuristic-based: detects tool naming conventions suggestive of hidden / shadow tools. Cannot detect a shadow tool with a normal-looking name that becomes hidden via runtime registration order.
- Cannot detect shadow tools registered exclusively at runtime (no entry in the static manifest).
An MCP tool's description is fed into the agent's context as trusted instructions. This one hides invisible plane-14 tag characters (the flagged value) the agent tokenizes as commands, while a human reading the manifest sees the description unchanged.
Detection logic
regex `[\u{E0000}-\u{E007F}]` against tools/manifest.json, **/tools/manifest.json, manifest.json, **/server.py, **/server.ts, **/server.js, **/*.toolmanifest.json
Limitations (3)
- Cannot detect tag-channel characters re-encoded as numeric entities; rule operates on decoded text only.
- Cannot detect tag-channel chars embedded only in runtime-constructed tool descriptions (e.g. a tool whose description is f-stringed from user input).
- Server-source detection is regex-based; obfuscation via string concatenation evades the rule.
This plugin references the AWS credentials file or the access-key fields stored inside it (the flagged value). Those are long-lived keys with broad cloud access, so any code that reads them can hand your whole AWS account to whatever it contacts next.
Detection logic
regex `(?i)(~/.aws/credentials|~/.aws/config|/\.aws/credentials|aws_access_key_id|aws_secret_access_key|aws_session_token)` against **/*.py, **/*.ts, **/*.js, **/*.mjs, **/*.cjs, **/*.go, **/*.rb, **/*.java
Limitations (3)
- Legitimate plugins that interact with AWS (deployment tools, S3 clients, etc.) need to reference the credentials file path. The rule's coarse detection means FP risk on every AWS-using plugin.
- Cannot distinguish a plugin that reads .aws/credentials in a documented, user-consented flow from one that does so for exfiltration.
- v2 will refine via composite (AWS-file read PLUS unexpected-endpoint HTTP call).
This plugin reads from the environment (where API keys, tokens, and AWS credentials live) and also makes outbound network calls. When both sit in the same code, a secret read from the env can be packed into a request and sent off the machine the moment the plugin runs.
Detection logic
AND of 2 sub-triggers
Limitations (3)
- Composite trigger detects only the *coexistence* of env-read and HTTP-call primitives in the same plugin. Cannot prove that env values flow into HTTP bodies (taint analysis is deferred to v2).
- FP risk on legitimate plugins that legitimately read env (e.g. an API key) AND call out (e.g. to the API endpoint that requires the key). The pattern is normal for any service-integration plugin; treat the rule as a signal-to-review, not a verdict.
- Cannot detect dynamic env access via Reflect / getattr / runtime-string.
A GitHub token (the flagged value) is committed directly into this plugin's source. Anyone who reads the repo — including everyone who installs the plugin — gets the token, so it must be treated as already compromised.
Detection logic
regex `\b(?:ghp_[A-Za-z0-9]{36}|github_pat_[A-Za-z0-9_]{82}|gho_[A-Za-z0-9]{36}|ghu_[A-Za-z0-9]{36}|ghs_[A-Za-z0-9]{36}|ghr_[A-Za-z0-9]{36})\b` against **/*.py, **/*.ts, **/*.js, **/*.mjs, **/*.cjs, **/*.go, **/*.rb, **/*.java, **/*.md, **/*.json, **/*.yaml, **/*.yml
Limitations (3)
- Detects only the canonical GitHub token prefixes (ghp_ / github_pat_ / gho_ / ghu_ / ghs_ / ghr_). Cannot detect legacy 40-char hex tokens without high FP risk.
- Detects committed tokens; cannot detect tokens read at runtime from env or from external secrets store.
- Cannot distinguish revoked / expired tokens from active ones.
A rules file is injected verbatim into every agent prompt. Unicode tag-channel codepoints like the flagged value render as nothing to a human reviewer but are tokenized as text by the model — so hidden instructions steer every session while staying invisible in any editor or diff.
Detection logic
regex `[\u{E0000}-\u{E007F}]` against .cursorrules, .cursor/rules/**, .windsurfrules, .windsurf/rules/**, .continuerules, CONTINUE.md, **/rules.md, **/RULES.md
Limitations (2)
- Cannot detect tag-channel characters re-encoded as numeric entities.
- Cannot detect tag-channel chars in dynamically-loaded rules (e.g. rules constructed at IDE-runtime).
A Right-to-Left Override or isolate character (U+202A–U+202E, U+2066–U+2069) reorders how the line displays without changing what the tokenizer reads — the Trojan Source trick. The file can look harmless to you while injecting hostile instructions at the token level, and the deception survives copy-paste into a sandbox.
Detection logic
regex `[\u{202A}-\u{202E}\u{2066}-\u{2069}]` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md, **/*.py, **/*.ts, **/*.js, **/*.sh
Limitations (2)
- Cannot detect BiDi attacks that use RTL-marker-free scripts (Arabic / Hebrew script blocks have inherent RTL); rule scopes to explicit BiDi-override codepoints only.
- Cannot distinguish legitimate i18n use (mixed-script documentation) from injection — fires on any explicit override character. Reviewers should examine context.
Plane-14 tag characters (U+E0000–U+E007F) render as absolutely nothing to a person reading the file in any editor, yet every LLM tokenizer turns them into real tokens. An attacker can hide a whole second instruction in them — telling the agent to drop its safety rules or exfiltrate data — that no reviewer ever sees in the source.
Detection logic
regex `[\u{E0000}-\u{E007F}]` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md
Limitations (3)
- Cannot detect tag-channel characters re-encoded as numeric entities (`󠀁`); rule operates on decoded text only.
- Cannot detect tag-channel chars hidden inside compressed payloads (gzip / brotli); only post-decompression bytes are scanned.
- Does not distinguish documented use of plane-14 characters (rare) from injection — fires on any presence.
This hook runs automatically on an agent event. The spotted command the flagged value Base64-decodes a blob and pipes it straight into a shell — encoding that hides the real commands from review, with no legitimate reason for a hook to obscure its own plain text.
Detection logic
regex `(?i)\b(?:echo|printf)\s+["']?[A-Za-z0-9+/=]{32,}["']?\s*\|\s*(?:base64\s+(?:-d|--decode)|openssl\s+base64\s+-d)\s*\|\s*(?:bash|sh|zsh)\b|\bbase64\s+(?:-d|--decode)\s+<<<\s*["'][A-Za-z0-9+/=]{32,}["']\s*\|\s*(?:bash|sh)\b` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/SessionStart*, **/PreToolUse*, **/PostToolUse*
Limitations (3)
- Cannot detect base64-shell pipelines that split the encoded payload across variables before piping.
- Cannot detect hex-encoded or otherwise-encoded shell payloads (those would be a separate rule).
- Cannot detect base64-decode chains that route through a temp file before execution.
This hook runs automatically on an agent event. The spotted command the flagged value uses eval on command-substituted or variable content — the actual code is assembled at runtime from values not visible in the source, defeating any static review.
Detection logic
regex `(?i)\beval\s+["']?\$\(.*\)|\beval\s+["']?[^"'\n]*\$\{?\w+\}?[^"'\n]*["']?|\bsource\s+<\(.+\)` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/SessionStart*, **/PreToolUse*, **/PostToolUse*
Limitations (3)
- Legitimate uses of eval exist (string-substitution before execution); the FP risk is real but the hook-scope filter bounds it.
- Cannot detect non-shell eval equivalents (Python exec(), JavaScript eval(), etc.) in hook scripts written in those languages — those would be separate rules.
- Cannot detect indirect-eval (functions whose body executes a variable).
This hook runs automatically on an agent event. The spotted command the flagged value wires a shell to an outbound TCP socket (netcat or /dev/tcp) — a reverse shell that hands an attacker persistent, interactive access to your machine the moment it runs.
Detection logic
regex `(?i)\b(?:nc|netcat|ncat|socat)\s+(?:-[a-z]+\s+)*\S+\s+\d+|/dev/tcp/\S+/\d+|bash\s+-i\s+>&\s*/dev/tcp/` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/SessionStart*, **/PreToolUse*, **/PostToolUse*
Limitations (3)
- Cannot detect reverse-shell patterns implemented via Python sockets, Node net.connect, or other native runtime APIs (the regex covers shell-level patterns only).
- Cannot distinguish a reverse-shell setup from a legitimate use of nc for protocol testing (rare in hook scope, but possible).
- Operates on static text only — runtime-constructed connection strings are not detected.
This hook runs automatically on an agent event. The spotted command the flagged value invokes sudo in a way that skips the password prompt (a piped password, -n/-S, or NOPASSWD) — so it can run as root without you ever confirming.
Detection logic
regex `(?i)\bsudo\s+(?:-[ASEnk]+\s+)?\b|\becho\s+["']?\$[A-Z_]+["']?\s*\|\s*sudo\s+-S\b|\bNOPASSWD\b` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/SessionStart*, **/PreToolUse*
Limitations (3)
- Legitimate use case: installation hooks for tools that genuinely require root (rare in agent-tool ecosystems). Shadow window measures real-world FP.
- Cannot distinguish a sudo invocation that would prompt the user (acceptable) from a sudo invocation backed by NOPASSWD or a piped-password (unacceptable).
- Does not cover Windows runas / elevation equivalents — v2 extension.
The server's source spawns processes (e.g. subprocess, child_process, exec()) but its manifest never declares that capability, so the consuming client can't warn the user that this server may run shell commands on their machine.
Detection logic
AND of 2 sub-triggers
Limitations (3)
- Heuristic detection of process-spawning primitives — cannot detect dynamically-loaded process libraries (e.g. importlib-resolved subprocess wrappers).
- Manifest capability declarations are advisory; the MCP protocol does not enforce them. The rule flags discrepancy as a transparency signal, not a hard guarantee.
- FP risk on MCP servers that subprocess only for legitimate internal lifecycle (worker management, not user-facing tool execution). Shadow window measures.
An MCP tool's description is read by the agent as trusted instructions. This manifest packs three or more invisible zero-width characters (the flagged value) that the agent tokenizes but a human reviewer cannot see — a way to smuggle hidden instructions past review.
Detection logic
regex `[\u{200B}-\u{200D}\u{2060}\u{FEFF}]` against tools/manifest.json, **/tools/manifest.json, manifest.json, **/*.toolmanifest.json
Limitations (3)
- Requires ≥3 zero-width characters to fire (FP reduction).
- Cannot detect zero-width characters re-encoded as numeric entities.
- Cannot detect zero-width characters in dynamically-constructed tool descriptions.
This plugin references an SSH private-key path or a private-key file header (the flagged value). An SSH private key authenticates you to servers and Git remotes, so code that reads it can impersonate you wherever that key is trusted.
Detection logic
regex `(?i)(~/.ssh/id_rsa|~/.ssh/id_ed25519|~/.ssh/id_ecdsa|/\.ssh/id_[a-z]+|BEGIN\s+(?:RSA|OPENSSH|DSA|EC)\s+PRIVATE\s+KEY)` against **/*.py, **/*.ts, **/*.js, **/*.mjs, **/*.cjs, **/*.go, **/*.rb, **/*.java
Limitations (3)
- Legitimate SSH-using plugins (git wrappers, SSH-key-rotation tools) reference the canonical paths.
- Cannot detect runtime-resolved key paths (e.g. paths read from config).
- Cannot detect direct memory read of the SSH agent.
This plugin embeds a chat-platform or request-capture webhook URL (the flagged value). Webhooks are the classic exfiltration drop: a plugin collects env, files, or system info and posts it to a hardcoded endpoint the attacker watches.
Detection logic
regex `(?i)https://(?:hooks\.slack\.com/services/|discord(?:app)?\.com/api/webhooks/|outlook\.office\.com/webhook/|api\.telegram\.org/bot|webhook\.site/|requestcatcher\.com/|pipedream\.com/|n8n\.cloud/|zapier\.com/hooks/)\S+` against **/*.py, **/*.ts, **/*.js, **/*.mjs, **/*.cjs, **/*.go, **/*.rb, **/*.java, **/*.md, **/*.json, **/*.yaml, **/*.yml
Limitations (3)
- Many legitimate plugins use webhook URLs (Slack notifiers, Discord bots, monitoring integrations). Shadow window measures real-world FP cost.
- Cannot detect webhook URLs constructed at runtime from variables.
- Cannot distinguish a user-configured legitimate webhook from a hardcoded exfiltration endpoint without consulting the rest of the plugin source for the configuration mechanism.
A base64 string of 128+ characters appears in a documentation file. Encoded prompt injection hides the hostile instruction in base64 — invisible to keyword filters — and relies on the agent's ability to decode it at runtime. There is no normal authoring reason to embed a multi-hundred-byte base64 blob in skill docs.
Detection logic
regex `(?<![A-Za-z0-9+/=])(?:[A-Za-z0-9+/]{4}){32,}(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?(?![A-Za-z0-9+/=])` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md, **/CLAUDE.md
Limitations (3)
- Cannot distinguish injection payloads from legitimate base64 (e.g. embedded test fixtures, SHA hashes presented in base64, signed manifests). Operates on the *presence* of a sufficiently long base64-shaped string in a documentation file as a signal.
- Does not decode the payload at scan time — the engine flags the pattern; review by maintainer or vendor surfaces the decoded content.
- Requires ≥128 base64 chars (32 quartets) to fire. Shorter strings (typical signatures / hashes) do not trigger.
A fenced bash/python block in SKILL.md carries a natural-language imperative — "now run this", "execute the following command" — directing the agent to execute the fenced content. What looks like documentation becomes an executable payload the agent may run without ever asking you.
Detection logic
regex `(?ms)^```(?:bash|sh|zsh|powershell|pwsh|cmd|python|node|js|ts)\s*$\s+(?:.*?\b(?:please|now|then|next|first)\s+(?:run|execute|invoke|call|trigger)\b.*?|.*?(?:run|execute|invoke|call|trigger)\s+(?:this|the\s+(?:above|below|following))\s+(?:command|code|script)\b.*?)```` against **/*.md, **/SKILL.md
Limitations (3)
- Cannot detect imperative instructions that don't use the curated verb set (run/execute/invoke/call/trigger).
- Cannot detect fenced-imperative when the natural-language wrapper is outside the fence (most skills place instructions BEFORE a fence, not inside it).
- May over-fire on tutorials demonstrating the safe execution of example code; shadow window measures real-world impact.
A hex string of 256+ characters appears in a documentation file — well above a single SHA-256 hash (64 chars). Like its base64 sibling, hex-encoding hides a hostile instruction from keyword filters while staying trivially decodable by the agent at runtime.
Detection logic
regex `(?i)(?<![0-9a-f])([0-9a-f]{256,})(?![0-9a-f])` against **/*.md, **/*.yaml, **/*.yml, **/SKILL.md, **/CLAUDE.md
Limitations (3)
- Cannot distinguish hex-encoded injection payloads from legitimate hex constants (SHA-256 of a known artifact, signed manifest entries). Operates on the *presence* of a sufficiently long hex-shaped string in documentation as a signal.
- Requires ≥256 hex chars (128 bytes decoded) to fire — well above typical SHA-256 (64 hex chars) so single-hash inclusions do not trigger.
- Does not decode the payload at scan time.
The text the flagged value is the classic direct prompt-injection phrasing. Placed in a skill body that the agent reads as trusted instructions, it tries to make the agent abandon its prior rules and follow whatever comes next — a full system-prompt override.
Detection logic
regex `(?i)\b(ignore|disregard|forget|override)\s+(all\s+)?(previous|prior|above|earlier)\s+(instructions?|prompts?|rules?|system\s+prompts?)\b` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md
Limitations (3)
- False-positives on roleplay skills, jailbreak research documentation, prompt-engineering tutorials, and any meta-discussion of LLM safety. Shadow window measures real-world FP rate.
- Cannot detect non-English equivalents of the imperative pattern. Coverage at W2 is English-only; multilingual extension is a Phase 2 RFC.
- Cannot detect the pattern when split across formatting or hidden in comments.
The phrase the flagged value matches a known role-override jailbreak — DAN, "developer mode", "jailbroken", "unrestricted", or an evil/malicious persona. Embedded in a skill, it tries to push the agent into an adversarial role that drops its safety constraints, without the person running the skill realizing it.
Detection logic
regex `(?i)\b(you\s+are|act\s+as|pretend\s+(to\s+be|you\s+are)|roleplay\s+as)\s+(DAN|developer\s+mode|jailbroken|unrestricted|uncensored|an?\s+(evil|malicious|harmful|amoral)\s+\w+)\b` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md
Limitations (3)
- FP rate on jailbreak-research and red-team skills is expected to be non-trivial — shadow window measures real-world impact.
- Cannot detect novel role-override personas coined after the rule lands. The DAN/developer-mode/jailbroken seed list is curated; new personas require a v2 rule revision.
- Cannot detect role-override embedded in tool-call arguments dynamically constructed at runtime.
Zero-width spaces and joiners (U+200B–U+200D, U+2060, U+FEFF) are invisible to a human reading the file but are emitted as distinct tokens by the agent's tokenizer. Several of them clustered together — this rule fires on three or more — can carry a hidden instruction past a reviewer who sees only ordinary text.
Detection logic
regex `[\u{200B}-\u{200D}\u{2060}\u{FEFF}]` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md
Limitations (3)
- Requires ≥3 zero-width characters to fire (FP reduction). A single zero-width space — e.g. a copy-paste artifact from a Slack message — does not trigger.
- Cannot detect zero-width characters re-encoded as numeric entities (`​`); rule operates on decoded text only.
- Does not currently distinguish between zero-width characters used legitimately in CJK script and the same characters used adversarially.
This hook runs automatically on an agent event. The spotted command the flagged value grants world-writable permissions (777 / a+rwx) — breaking least-privilege and leaving the files open for any other process on the system to tamper with.
Detection logic
regex `(?i)\bchmod\s+(?:-R\s+)?(?:777|a\+rwx|o\+rwx|0?777)\b` against .claude/hooks/**, hooks/**, **/*.hook.sh, **/SessionStart*, **/PreToolUse*, **/PostToolUse*
Limitations (3)
- Legitimate use: hooks that prepare a shared-temp-directory or socket for cross-user access. Shadow window measures FP rate.
- Cannot detect chmod 666 / 644 / other wide-but-not-fully-permissive patterns that still violate least-privilege.
- Cannot detect equivalent setfacl / takeown / icacls patterns on other platforms.
A rules file is loaded verbatim into the agent as standing instructions for every session. A non-disclosure imperative like the flagged value steers the agent to conceal behaviour, hide tool calls, or withhold information on every prompt — never appearing in any single conversation you'd review.
Detection logic
regex `(?im)^\s*(?:•|-|\*|\d+\.)\s*(?:always|never|must|do\s+not|don[’']t|under\s+no\s+circumstances)\s+(?:reveal|disclose|share|leak|tell\s+(?:the\s+)?(?:user|anyone)|mention)\b` against .cursorrules, .cursor/rules/**, .windsurfrules, .windsurf/rules/**, .continuerules, CONTINUE.md, **/rules.md, **/RULES.md
Limitations (2)
- Cursor / Windsurf rule files frequently contain legitimate confidentiality instructions ('always keep API keys out of logs'). Shadow window measures FP rate.
- Cannot distinguish user-facing imperatives (legitimate) from model-facing imperatives (suspicious).
A rules file is loaded verbatim into the agent. A word like the flagged value mixes Cyrillic, Greek, or mathematical look-alike letters with Latin ones — it reads normally to you but is distinct text to the model, letting an instruction evade keyword review while still steering the session.
Detection logic
regex `[\u{0400}-\u{04FF}\u{1D400}-\u{1D7FF}\u{0370}-\u{03FF}](?:[a-zA-Z]|[\u{0400}-\u{04FF}\u{1D400}-\u{1D7FF}\u{0370}-\u{03FF}]){4,}` against .cursorrules, .cursor/rules/**, .windsurfrules, .windsurf/rules/**, .continuerules, CONTINUE.md, **/rules.md, **/RULES.md
Limitations (3)
- FP risk on multilingual rules content (legitimate use of Greek / Cyrillic identifiers in technical documentation).
- Cannot detect single-character homoglyph swaps; only mixed-script words ≥5 chars long.
- v2 will integrate UTS #39 confusables data.
The text the flagged value tells the agent to skip the normal "ask the user first" gate. Used adversarially it removes the human-in-the-loop check before destructive or sensitive actions, turning a normally-gated agent into a fire-and-forget executor.
Detection logic
regex `(?i)\b(do\s+not|don[’']t|never|no\s+need\s+to)\s+(ask|confirm|prompt|request\s+permission|wait\s+for\s+approval|seek\s+consent|verify\s+with\s+the\s+user)\b` against **/*.md, **/SKILL.md, **/CLAUDE.md
Limitations (3)
- Legitimate use case: autonomous-mode skills (background tasks, scheduled jobs) explicitly opt out of approval prompts. Shadow window will quantify the FP rate on this legitimate cohort.
- Cannot distinguish skill-level autonomy declarations from injection-style mid-skill instructions; both fire on the same regex.
- Cannot detect equivalents in non-English text.
Long runs of emoji — especially with variation selectors and zero-width joiners — tokenize in model-specific ways that can encode an instruction while reading as harmless decoration to a person. This rule flags ten or more consecutive emoji, a length that is rare in genuine documentation but common in this obfuscation technique.
Detection logic
regex `(?u)(?:[\U0001F300-\U0001F9FF☀-➿]\u{FE0F}?\u{200D}?){10,}` against **/*.md, **/*.yaml, **/*.yml, **/SKILL.md, **/CLAUDE.md
Limitations (3)
- Cannot distinguish emoji-smuggled payloads from heavy-emoji documentation (welcome banners, decorative section dividers). Shadow window measures real-world impact.
- Emoji-variation-selector (FE0F) and zero-width-joiner (200D) sequences are tokenized differently across models; rule fires on the literal codepoint sequence, not on per-model token expansion.
- Cannot detect non-emoji obfuscation via mathematical alphanumeric symbols (U+1D400+) — that pattern is covered by SS-SKILL-INJECT-HOMOGLYPH-01.
A word can look like plain English while secretly containing Cyrillic, Greek, or mathematical letters that are visually identical to ASCII — Cyrillic 'а' for Latin 'a', Greek 'ο' for 'o'. This rule flags mixed-script words of five or more characters, a technique used to slip past keyword filters while reading normally to a person.
Detection logic
regex `[\u{0400}-\u{04FF}\u{1D400}-\u{1D7FF}\u{0370}-\u{03FF}](?:[a-zA-Z]|[\u{0400}-\u{04FF}\u{1D400}-\u{1D7FF}\u{0370}-\u{03FF}]){4,}` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md, **/CLAUDE.md
Limitations (3)
- FP risk on multilingual documentation (Russian / Greek / mathematical) that legitimately uses non-Latin scripts. Shadow window measures the cost.
- Cannot detect single-character homoglyph swaps in identifiers — only mixed-script words ≥5 chars long.
- Does not currently consult a homoglyph-confusable table (UTS #39); a v2 revision will integrate the Unicode confusables data.
A bulleted imperative like the flagged value tells the agent to never reveal, disclose, or mention something to the user. Used adversarially it can instruct the agent to hide its tool calls or lie about what it did — stripping the transparency a user relies on to trust the agent.
Detection logic
regex `(?im)^\s*(?:•|-|\*|\d+\.)\s*(?:always|never|must|do\s+not|don[’']t|under\s+no\s+circumstances)\s+(?:reveal|disclose|share|leak|tell\s+(?:the\s+)?(?:user|anyone)|mention)\b` against **/*.md, **/SKILL.md, **/CLAUDE.md
Limitations (3)
- FP risk is substantial: legitimate skills often contain instructions about confidentiality or scope boundaries. Shadow window measures real-world impact.
- Cannot detect imperative patterns expressed indirectly ('please keep X private', 'X should remain hidden') — only the strict imperative form.
- Does not distinguish between user-facing imperatives (legitimate) and model-facing imperatives (suspicious).
The text the flagged value asks the agent to disclose its hidden system prompt or initial instructions. That is often the first step of a larger attack: knowing the system prompt lets an attacker craft inputs that defeat its constraints by mimicking its own voice.
Detection logic
regex `(?i)\b(repeat|reveal|output|print|show|tell\s+me)\s+(your|the)\s+(system\s+prompt|initial\s+instructions?|original\s+prompts?|hidden\s+instructions?|prompt\s+template)\b` against **/*.md, **/*.yaml, **/*.yml, **/*.json, **/SKILL.md
Limitations (3)
- FP risk on skills designed to introspect or debug model behaviour (prompt-engineering tutorials, evaluation toolkits).
- Cannot detect indirect leak attempts that don't use the canonical 'system prompt' vocabulary.
- Does not currently chain with model-level mitigations (modern models partially refuse this class of request); SaferSkills flags the intent regardless of model response.