Skip to content

How Scoring Works (Overview)

A SaferSkills score is a single 0–100 number computed deterministically from five weighted sub-scores, then bucketed into one of four color bands. It is methodology, not opinion: every point of penalty traces to a documented rule and a quotable line of evidence, with no LLM in the verdict path. This page is the overview; the full methodology carries the exact math.

Every scan produces five sub-scores, each scored 0–100, then combined by a fixed weighted sum:

Sub-scoreWeightWhat it catches
Security35%Prompt injection, obfuscation, dangerous shell, credential exfiltration
Supply Chain20%Typosquat, owner-transfer, hash drift, unsigned releases
Maintenance15%Commit recency, commit frequency, issue-response time, CI health
Transparency15%LICENSE / README / CHANGELOG / SECURITY.md / manifest presence
Community15%Stars, contributors, cross-registry presence, fork health

Security carries the most weight because it catches the threats that actually compromise a machine — but the weight alone does not let security dominate, which the severity ceiling (below) corrects.

Each finding carries a severity tier with a fixed penalty: info 0 · low 5 · medium 12 · high 25 · critical 40. A sub-score is max(0, 100 − Σ penalties). The aggregate score is the rounded weighted sum:

aggregate = round(0.35·security + 0.20·supply + 0.15·maintenance
+ 0.15·transparency + 0.15·community)

A pure weighted sum would let a critical security flaw hide behind good docs and a healthy star count, so a severity ceiling caps the whole aggregate by the worst active finding: one active critical caps it at ≤15, one active high caps it at ≤45. This is why a serious flaw can never be diluted by the 65% of weight that sits outside Security. The exact step-by-step math — per-finding penalty, running sub-score, weighted aggregate, ceiling application — is in the detailed methodology, and every public report renders it inline.

The aggregate buckets into four color bands:

BandRangeLabel
Green80–100Approved
Yellow60–79Watch
Orange40–59Caution
Red0–39Block

A band is a reading aid, not a verdict on you. SaferSkills publishes methodology, not endorsements: a Red score means review the findings before use, not “never use this.” You decide; the score tells you where to look first.

Same input, same score — byte-for-byte. There is no model, no random seed, and no temperature anywhere in the verdict path. Every scan stamps three identifiers: rubric_version (the git SHA of the rule set), engine_version (the scan engine), and the scan run’s ref_sha (the scanned commit, or content_hash_sha256 for an upload). A vendor can check out those exact versions and re-derive any historical verdict offline, without SaferSkills’ participation. That reproducibility is what makes the vendor right-of-reply meaningful.