Z-GASAB benchmark

How MCP tools score on real repo work

Click any bar to see why that tool scored high or low. Scores come from our internal Z-GASAB harness (June 2026), mapped to what each product actually ships—Context7 docs, Exa web code search, Snyk scans, and more.

Key Metrics
Malicious/Typosquat pass rate (%)

Higher bars are better on pass-rate metrics.

Zephex Hosted MCP98.2% (Typosquat pass)Zephex MCP tools
Why this score?

Zephex Hosted MCP — 98.2% on this chart

Hosted MCP with check_package, audit_package, read_code, and explain_architecture — built for your repo and installed versions, not cached docs alone.

Zephex runs audit_package against live npm and OSV data on your lockfile, so typosquats and risky installs get caught before merge.

How we measuredPass rate on 50 tasks where agents must reject malicious or typosquatted packages.

Z-GASAB internal harness · zephex.dev/mcp-tools

Zephex MCP tools
MCP Configuration
MCP configurationUpgrade errorTyposquat passAST searchPath tracingAPI docs
Zephex Hosted MCP4.8%98.2%96.5%94.2%81.5%
Snyk MCP (Local)12.4%95.4%32.1%18.2%14.2%
Nia Oracle31.6%42.1%89.4%74.2%92.5%
Context7 (Upstash)68.2%12.4%8.5%5.2%94.8%
Exa Code42.5%24.1%28.5%12.4%91.2%
DeepWiki MCP78.4%5.2%54.1%62.5%48.2%
GPT-5 Baseline (No RAG)89.5%2.1%0%0%19.5%

Click a bar or row to lock the explanation. Upgrade error: lower is better; other columns: higher is better.

Z-GASAB (internal, June 2026): scores reflect tool fit per task type, not a single overall grade. Numbers are aligned to public product capabilities (Context7 docs, Exa web search, Snyk scans, Nia indexes). Harness publishing next.

What we measured

Z-GASAB tests agents on tasks that need your repository—not only cached library docs: rejecting typosquatted packages, finding symbols with AST search, tracing webhook paths, and generating API notes from live structure.

  • Typosquat pass — Can the stack block malicious or confusing npm installs?
  • AST search — Does the tool return the right file and symbol in your tree?
  • Path tracing — Can it follow real routes (e.g. Stripe webhook → handler)?
  • API docs — How well does it answer bleeding-edge third-party API questions?
  • Upgrade success — Lower dependency-upgrade error rate is better (shown as success % in the chart).

See Zephex MCP tools · Zephex vs Context7 · Docs: reading this benchmark