PAI Lab public benchmark

Public agent repositories, measured against visible PSF evidence.

PAI scans public GitHub metadata and file paths for signs of production AI discipline: evals, output schemas, observability, deployment gates, human oversight, security policy, and provider resilience. This is evidence coverage, not certification.

Repositories20
Eval evidence0
Human oversight5
Observability13
Evidence coverage table

Recently active public AI agent repositories

Projects are discovered through GitHub repository search, then scanned for visible PSF-aligned evidence in their public file tree. Higher coverage means more evidence was visible to the scanner, not that PAI has certified or endorsed the project.

GitHub public repository search
Repository
Coverage
Grade
Visible evidence
Oolab-labs/patchwork-os

personal AI runtime, local-first. MCP bridge giving Claude Code 170+ tools (LSP, debugger, terminal, git) inside VS Code, Cursor, Windsurf, or JetBrains. Optional Patchwork layer adds YAML recipes, an approval queue, and an oversight dashboard. Your models, your machine, your policy.

16 starsTypeScriptUpdated May 13, 2026ai-agentanthropicapproval-queue
79%
A
D12/2
D21/2
D32/2
D42/2
D52/2
D61/2
D71/2
D81/2
AI observability instrumentationdashboard/src/app/traces | dashboard/src/app/traces/error.tsx | dashboard/src/app/traces/layout.tsx
Human approval gatesdocs/adr/0006-approval-gate-design.md | src/__tests__/approvalGate.e2e.test.ts | src/__tests__/approvalQueue.test.ts
DevSwat-ResonantGenesis/RG_IDE

Resonant IDE — AI-native code editor built on VS Code OSS. Agentic loop, 59 local tools, 11 AI providers + BYOK + Ollama, AST Code Visualizer, Hash Sphere memory, DSID blockchain identity.

10 starsTypeScriptUpdated May 13, 2026aiai-agentcode-editor
78%
A
D12/2
D22/2
D32/2
D41/2
D52/2
D60/2
D71/2
D82/2
AI observability instrumentation.github/instructions/telemetry.instructions.md | .vscode/extensions/vscode-selfhost-test-provider/src/stackTraceParser.ts | build/azure-pipelines/common/extract-telemetry.ts
Security policy and secret hygiene.github/dependabot.yml | SECURITY.md | extensions/microsoft-authentication/src/betterSecretStorage.ts
vm0-ai/vm0

Zero, your trustworthy AI teammate for real work.

1,109 starsTypeScriptUpdated May 13, 2026agentic-workflowai-agentai-runtime
77%
A
D12/2
D21/2
D32/2
D42/2
D52/2
D60/2
D71/2
D82/2
AI observability instrumentationansible/playbooks/provision-monitoring.yml | crates/guest-agent/src/telemetry.rs | crates/guest-common/src/telemetry.rs
Security policy and secret hygiene.github/dependabot.yml | SECURITY.md | e2e/tests/03-runner/t29-zero-secret.bats
matevip/mateclaw

🤖 MateClaw — Your second brain with Multi-Agent Orchestration, MCP Protocol, Skills & Memory, Dream, and Multi-Channel Support. Built on Spring AI Alibaba.

454 starsJavaUpdated May 13, 2026agentai-agentdingtalk-robot
65%
A
D11/2
D22/2
D31/2
D41/2
D52/2
D61/2
D71/2
D81/2
AI observability instrumentationmateclaw-server/src/main/resources/skills/popular-web-designs/templates/sentry.md
Human approval gatesmateclaw-server/src/main/java/vip/mate/approval/ApprovalWorkflowService.java | mateclaw-server/src/main/java/vip/mate/approval/event/WorkflowApprovalResolvedEvent.java | mateclaw-server/src/main/java/vip/mate/workflow/runtime/ApprovalResumeBridge.java
HankHuang0516/EClaw

E-Claw - OpenClaw Channel for agent-to-agent communication

5 starsJavaScriptUpdated May 13, 2026ai-agentandroidelectronic-pet
63%
A
D12/2
D20/2
D31/2
D41/2
D52/2
D61/2
D71/2
D81/2
AI observability instrumentationapp/src/main/java/com/hank/clawlive/data/remote/TelemetryHelper.kt | app/src/main/java/com/hank/clawlive/data/remote/TelemetryInterceptor.kt | backend/device-telemetry.js
Human approval gates.github/workflows/railway-preview-cleanup.yml
bug-ops/zeph

Memory-first Rust AI agent for long-running work. Temporal graph memory, self-learning skills, multi-model cascade routing. Hybrid inference: Ollama · Claude · Gemini · OpenAI · GGUF · TEE. MCP + ACP + A2A. Sub-agents. One binary.

32 starsRustUpdated May 13, 2026a2aai-agentcandle
58%
A
D12/2
D21/2
D31/2
D41/2
D51/2
D60/2
D72/2
D81/2
AI observability instrumentationbook/src/advanced/observability.md | crates/zeph-config/src/telemetry.rs | crates/zeph-core/src/debug_dump/trace.rs
Security policy and secret hygiene.zeph/skills/rust-agent-handoff/references/security.md | SECURITY.md | book/src/reference/security.md
hrygo/hotplex

HotPlex — Unified access layer for AI Coding Agent.

8 starsGoUpdated May 13, 2026aepai-agentclaude-code
53%
A
D12/2
D21/2
D30/2
D41/2
D52/2
D61/2
D71/2
D80/2
AI observability instrumentationconfigs/monitoring | configs/monitoring/alerts.yml | configs/monitoring/grafana
Human approval gatesdocs/archive/specs/Review-Gateway-Async-Init.md
Gitlawb/openclaude

runs anywhere. uses anything

26,480 starsTypeScriptUpdated May 13, 2026aiai-agentai-tools
52%
A
D11/2
D22/2
D31/2
D41/2
D51/2
D60/2
D71/2
D81/2
AI observability instrumentationscripts/no-telemetry-growthbook-stub.test.ts | scripts/no-telemetry-plugin.ts | src/commands/ant-trace
Security policy and secret hygieneSECURITY.md | src/bridge/bridgePermissionCallbacks.ts | src/bridge/workSecret.test.ts
thenamespace/namera

Namera is a programmable wallet layer that enables agents to securely interact with smart wallets using scoped access and defined execution rules.

5 starsTypeScriptUpdated May 13, 2026aiai-agentai-agent-tools
46%
B
D11/2
D21/2
D32/2
D40/2
D52/2
D60/2
D71/2
D80/2
Security policy and secret hygieneapps/docs/content/blog/agent-need-permissions-not-private-keys.mdx
Schema or contract validationapps/cli/src/commands/schema/index.ts | apps/cli/src/schema/chain.ts | apps/cli/src/schema/common.ts
esengine/DeepSeek-Reasonix

DeepSeek-native AI coding agent for your terminal. Engineered around prefix-cache stability — leave it running.

1,607 starsTypeScriptUpdated May 13, 2026agentagent-frameworkai-agent
32%
B
D11/2
D20/2
D30/2
D42/2
D51/2
D60/2
D71/2
D80/2
AI observability instrumentationsrc/cli/ui/slash/handlers/observability.ts | src/telemetry | src/telemetry/stats.ts
Security policy and secret hygieneSECURITY.md | dashboard/src/panels/permissions.ts | src/cli/ui/slash/handlers/permissions.ts
alibaba/obz-cli

Multi-backend observability CLI for metrics, logs, and traces — unified interface, AI-Agent friendly

15 starsRustUpdated May 13, 2026ai-agentclilogs
29%
B
D11/2
D20/2
D30/2
D41/2
D51/2
D60/2
D71/2
D80/2
AI observability instrumentationcrates/core/src/model/trace.rs | crates/providers/src/victoriatraces | crates/providers/src/victoriatraces/convert.rs
Security policy and secret hygiene.github/dependabot.yml | SECURITY.md
Lifecycle-Innovations-Limited/claude-ops

Business operating system for Claude Code — 25 skills, 13 agents, smart daemon. Unified inbox (WhatsApp/Email/Slack/Telegram), autonomous PR merge, full-AWS monitoring, revenue (Stripe+RevenueCat), e-commerce (Shopify), marketing (Klaviyo/Meta/GA4), voice (Bland/ElevenLabs), APM (Datadog/NewRelic/OTEL), YOLO mode.

8 starsShellUpdated May 13, 2026agent-teamsai-agentai-automation
27%
C
D11/2
D20/2
D30/2
D40/2
D52/2
D60/2
D71/2
D80/2
Security policy and secret hygiene.github/dependabot.yml | SECURITY.md | claude-ops/bin/ops-prevent-secret-commit
Release and deployment gates.github/workflows/ci.yml | .github/workflows/release.yml | claude-ops/.github/workflows/cross-os.yml
visual-req/visual-spec

Give your short business requirement, follow instructions, answer a few questions, get your specification, codes, tests.

28 starsHTMLUpdated May 13, 2026aiai-agentrequirements-engineering
23%
B
D11/2
D20/2
D31/2
D40/2
D51/2
D60/2
D71/2
D80/2
Security policy and secret hygiene.trae/skills/visual-spec/prompts/vspec_detail/data_permission.md | .trae/skills/visual-spec/prompts/vspec_detail/rbac.md | docs/en-US/tools/access-control-rbac.md
Prompt, policy, or model versioning.trae/skills/visual-spec/prompts/harness | .trae/skills/visual-spec/prompts/harness/post_append_test_coverage_check.md | .trae/skills/visual-spec/prompts/harness/post_impl_verify.md
tony1223/better-agent-terminal

Multi-workspace terminal aggregator with Claude Code AI integration

387 starsTypeScriptUpdated May 13, 2026ai-agentanthropicclaude
22%
C
D11/2
D20/2
D30/2
D40/2
D51/2
D60/2
D71/2
D80/2
Security policy and secret hygienenode-sidecar/src/handlers/claude-permission.mjs | node-sidecar/src/lib/remote-secrets.mjs
Release and deployment gates.github/workflows/release-tag-dispatch.yml | .github/workflows/release.yml
ozgurcd/gograph

A fast, local-only CLI tool to generate repository structures and improve IDE context awareness for Go codebases.

64 starsGoUpdated May 13, 2026agentic-codingai-agentai-coding-assistant
22%
C
D10/2
D20/2
D30/2
D41/2
D51/2
D60/2
D71/2
D80/2
AI observability instrumentationinternal/search/trace.go
Security policy and secret hygieneSECURITY.md
SweetSophia/noosphere

A universal memory and wiki knowledge layer for AI agents — structured enough for automation, readable enough for humans.

27 starsTypeScriptUpdated May 13, 2026agentic-memoryagentic-ragagentic-workflow
22%
C
D11/2
D20/2
D30/2
D40/2
D51/2
D61/2
D70/2
D80/2
Human approval gates.github/workflows/autoreview.yml
Release and deployment gates.github/workflows/autoreview.yml | .github/workflows/docker-publish.yml | .github/workflows/npm-publish.yml
voidly-ai/voidly-pay

Off-chain credit ledger + hire marketplace for AI agents. Ed25519-signed envelopes, atomic settlement, hire-and-release escrow. https://voidly.ai/pay

9 starsJavaScriptUpdated May 13, 2026a2aagent-paymentsagent-to-agent
22%
C
D10/2
D20/2
D30/2
D40/2
D51/2
D60/2
D71/2
D81/2
Security policy and secret hygieneSECURITY.md
Provider fallback or degraded modeadapters/openai-compat | adapters/openai-compat/README.md | adapters/openai-compat/package.json
Xquik-dev/tweetclaw

Post tweets, reply, like, retweet, follow, DM and more from OpenClaw through structured Xquik endpoints. 99 agent-callable endpoints via Xquik.

37 starsTypeScriptUpdated May 13, 2026ai-agentautomationclawhub
20%
C
D10/2
D20/2
D30/2
D40/2
D52/2
D60/2
D71/2
D80/2
Security policy and secret hygiene.github/SECURITY.md
Release and deployment gates.github/workflows/context7-refresh.yml | .github/workflows/publish.yml
skalesapp/skales

Local-first AI desktop agent for Windows, macOS, Linux & Android. Codework, multi-agent teams, desktop automation, 15+ AI providers. No Docker. No terminal. AI Companion. Agent Skills (SKILL.md). Migration-Importer, BYOK, from 6 to 60+. Recurring Autonomous AI Agent Tasks.

929 starsTypeScriptUpdated May 13, 2026agentic-aiai-agentai-assistant
15%
C
D10/2
D20/2
D30/2
D41/2
D50/2
D60/2
D71/2
D80/2
AI observability instrumentationapps/web/src/app/api/telemetry | apps/web/src/app/api/telemetry/ping | apps/web/src/app/api/telemetry/ping/route.ts
Security policy and secret hygieneapps/web/src/actions/secrets.ts
JokerJohn/openclaw-autotrader

A 30-day public U.S. stock challenge: follow a 5000 HKD 🦞 claw through live market days.

36 starsJavaScriptUpdated May 13, 2026ai-agentalgorithmic-tradingautotrader
12%
C
D10/2
D21/2
D30/2
D41/2
D50/2
D60/2
D70/2
D80/2
Schema or contract validationxhs-agent/schemas/public-snapshot.schema.json | xhs-agent/schemas/xhs-post-package.schema.json
Incident or drift evidencedocs/incidents | docs/incidents/.gitkeep
Unauthenticated GitHub runs are capped at 20 repositories. Set GITHUB_TOKEN to benchmark up to 100.

Where the repository list comes from

The benchmark uses GitHub's public repository search endpoint and rotates focused queries for AI agent, agentic AI, LLM agent, and MCP server repositories. The run de-duplicates repositories, excludes archived projects and forks when GitHub returns those flags, and sorts the published table by visible PSF evidence coverage.

  • topic:ai-agent archived:false fork:false stars:>=5
  • topic:agentic-ai archived:false fork:false stars:>=5
  • topic:llm-agent archived:false fork:false stars:>=5
  • topic:mcp-server archived:false fork:false stars:>=5
  • "ai agent" in:name,description,readme archived:false fork:false stars:>=5

How to run 50 or 100

Add a server-side GITHUB_TOKEN or GH_TOKEN, then request /api/agent-readiness/benchmark?limit=100. Without a token the public route is capped lower to respect GitHub rate limits. The scanner still reads only public repository metadata and file paths.

  • Use the benchmark page for a live public sample.
  • Use the API route for scheduled monthly reports.
  • Use opt-in reports and badges for maintainers who want a public profile.