Key takeaways
- The market splits into three layers: methodology frameworks (Superpowers, BMAD, Spec Kit), official catalogs (Anthropic, OpenAI, Google), and orchestration platforms (Claude-Flow, wshobson/agents)
- SKILL.md has become a cross-platform standard supported by 11+ tools — Claude Code, Cursor, Copilot, Codex, Gemini CLI, Kiro, Amp, Manus, OpenCode, Goose, Roo Code
- Security is a real concern — Snyk found prompt injection in 36% of skills they audited, and 26% contained at least one vulnerability. The ecosystem mirrors early npm/PyPI risks.
- Stars do not equal production usage. Anthropic Skills (73K stars, 329 open issues) has far more stargazers than contributors. Fork-to-star ratios and commit frequency are better signals.
FAQ
What's the best skills framework for AI coding agents?
It depends on team size and project type. Superpowers for solo/small teams wanting full methodology enforcement, BMAD for teams wanting agile lifecycle coverage, Spec Kit for spec-driven greenfield development, Anthropic Skills for the broadest catalog.
What is SKILL.md?
SKILL.md is a markdown-based format for defining agent skills — modular instructions that agents load on-demand. Supported by 11+ platforms including Claude Code, Cursor, Copilot, Codex, Gemini CLI, and Kiro.
Are agent skills safe to install?
Not necessarily. Snyk's ToxicSkills study found prompt injection in 36% of skills audited and 1,467 malicious payloads. Treat skills like npm packages — vet before installing, prefer official catalogs, and audit third-party skills.
How do skills differ from MCP?
Skills focus on workflows and knowledge (what to do and how), while MCP focuses on secure tool and data access (what you can use). They're complementary layers.
What's the difference between AGENTS.md and SKILL.md?
AGENTS.md defines project-level context (tech stack, conventions, boundaries). SKILL.md defines task-level capabilities (how to do brainstorming, TDD, debugging). AGENTS.md is always loaded; skills load on-demand.
Executive Summary
A new infrastructure category has emerged: skills frameworks for AI coding agents. These frameworks solve the problem of "how do agents follow structured processes instead of just winging it?" with solutions ranging from full methodology enforcers to modular skill catalogs.
The space is young — most projects launched in mid-2025 — and moving fast. Stars accumulate quickly but don't predict production adoption. Security is a genuine concern: Snyk found prompt injection in 36% of skills they audited , and the ecosystem currently has no package-signing or verification standard.
11 frameworks reviewed: Anthropic Skills (73K ⭐), GitHub Spec Kit (71K ⭐), Superpowers (57K ⭐), BMAD Method (37K ⭐), wshobson/agents (29K ⭐), AGENTS.md (18K ⭐), Claude-Flow (14K ⭐), OpenAI Skills (9K ⭐), Microsoft Amplifier (3K ⭐), Google Gemini Skills (1.8K ⭐), Babysitter (317 ⭐)
Which Framework Should You Use?
If you're a solo developer or 2-3 person team building greenfield → Start with Superpowers. It enforces brainstorm → plan → TDD → review without requiring any team coordination. Setup takes minutes — drop the skills folder into your project. Watch out: it's overkill if you already have a strong development discipline.
If you're a 5-20 person team with an agile workflow → BMAD Method maps to your existing sprint process with specialized personas for PM, architect, developer, and QA. It's the only framework that covers the full agile lifecycle. Watch out: 12+ agent personas are bloat for teams under 5.
If you want spec-driven development with approval gates → GitHub Spec Kit gives you a clean 4-phase workflow (/specify → /plan → /tasks → /implement) with human approval between each phase. Watch out: the rigid phase structure fights exploratory or research-heavy work.
If you just want a library of reusable skills → Anthropic Skills is the broadest catalog. Install skills individually, don't adopt a methodology. Watch out: skills alone don't enforce discipline — you need a methodology layer on top for complex projects.
If you're coordinating multiple agents on the same codebase → wshobson/agents (72 plugins, 112 agents) or Claude-Flow (swarm topologies) handle multi-agent orchestration. Watch out: orchestration is the hardest layer — expect significant configuration and debugging.
The Three Layers
Layer 1: Methodology Frameworks
These don't just provide skills — they enforce a complete development workflow.
| Framework | Stars | Forks | Last Push | Setup | Watch Out |
|---|---|---|---|---|---|
| Spec Kit | 71K | 6.1K | Feb 21 | Minutes — CLI scaffolds everything | Rigid for exploratory work |
| Superpowers | 57K | 4.4K | Feb 21 | Minutes — drop skills folder in project | Overkill for experienced teams |
| BMAD Method | 37K | 4.6K | Feb 22 | Hours — 12+ persona configs to tune | Bloat for teams under 5 |
What they share: Mandatory gates between phases. You can't skip brainstorming, you can't code before tests, you can't merge without review.
Superpowers is the most opinionated — it uses persuasion principles (Cialdini's Influence) to prevent agents from skipping steps even under "time pressure" . Jesse Vincent, the creator, has a detailed blog post on his process that's worth reading for the philosophy behind the framework.
BMAD is the most comprehensive, covering the entire agile lifecycle from ideation to deployment. One independent case study used it to build a multi-tenant SaaS platform and reported "a level of precision and speed unattainable with unstructured AI development methods" .
Spec Kit is the most structured, with a CLI that scaffolds the entire spec-driven workflow. GitHub backing gives it enterprise credibility, though the rigid phase system can feel constraining for iterative work.
Layer 2: Official Skill Catalogs
The platform providers' own collections of reusable skills.
| Catalog | Stars | Forks | Platform | Setup | Watch Out |
|---|---|---|---|---|---|
| Anthropic Skills | 73K | 7.5K | Claude Code, Claude.ai | Minutes — npx skills add | No methodology enforcement |
| OpenAI Skills | 9K | 522 | Codex CLI | Minutes — npx skills add | Smaller catalog, Codex-centric |
| Google Gemini Skills | 1.8K | 115 | Gemini CLI | Minutes — npx skills add | Only 1 skill currently |
Anthropic pioneered the SKILL.md format with progressive disclosure: lightweight metadata loads early, full instructions load only when relevant. This is now the de facto standard via the agentskills.io specification.
OpenAI adopted a compatible format for Codex with a three-tier system.
Google joined with a tiny but measurable catalog — their gemini-api-dev skill improved Gemini API coding accuracy to 87% with Flash and 96% with Pro . That's one of the few published before/after measurements in the ecosystem.
Layer 3: Orchestration Platforms
Coordinate multiple agents working together.
| Platform | Stars | Forks | Approach | Setup | Watch Out |
|---|---|---|---|---|---|
| wshobson/agents | 29K | 3.2K | Plugin-based | Hours — plugin selection and config | 72 plugins = decision paralysis |
| Claude-Flow | 14K | 1.7K | Swarm orchestration | Hours — topology and consensus config | 494 open issues signal instability |
| Babysitter | 317 | 13 | Event-sourced workflows | Minutes — npm + Claude Code plugin | Claude Code only, early stage |
| Amplifier | 3K | 244 | Self-improving bundles | Hours — significant config | Research-only, not production-ready |
wshobson/agents takes a composable approach — 72 plugins that each contribute specialized agents. The Conductor plugin orchestrates Agent Teams for parallel workflows.
Claude-Flow goes deeper into distributed systems territory with formal consensus protocols (Raft, BFT, CRDT) and swarm topologies. Ambitious but complex.
Babysitter takes a different approach — event-sourced, deterministic workflow execution for Claude Code. Instead of coordinating multiple agents, it manages sophisticated multi-step workflows with quality convergence (iterate until targets are met), human-in-the-loop breakpoints, and 2,000+ pre-built process definitions. Everything is journaled and resumable. [1]
Microsoft Amplifier is the most experimental — a research demonstrator where agents write their own DISCOVERIES.md files, building institutional knowledge over time. Microsoft explicitly labels it not production-ready .
The Glue: Standards
| Standard | Stars | Role |
|---|---|---|
| AGENTS.md | 18K | Project-level context (always loaded) |
| SKILL.md | — | Task-level capabilities (loaded on-demand) |
AGENTS.md defines what the agent needs to know about a project — tech stack, conventions, boundaries, commands. Supported by Codex, Copilot, Cursor, Claude Code, Gemini CLI, Kiro, and more. A recent Hacker News discussion found that for some eval tasks, AGENTS.md alone outperformed adding skills — suggesting that project context often matters more than task-specific instructions.
SKILL.md defines how to do specific tasks — brainstorming, TDD, debugging, code review. Progressive disclosure keeps context windows efficient. Adopted by Claude Code, Cursor, Gemini CLI, Kiro, OpenCode, and others via the agentskills.io open standard.
Together they form a two-tier system: AGENTS.md for the "where" and SKILL.md for the "how."
Adoption Beyond Stars
GitHub stars are a vanity metric. Here's what the secondary signals say:
| Framework | Stars | Forks | Fork Ratio | Open Issues | Last Push | Age |
|---|---|---|---|---|---|---|
| Anthropic Skills | 73K | 7,506 | 10.2% | 329 | Feb 6 | 5 months |
| Spec Kit | 71K | 6,147 | 8.6% | 632 | Feb 21 | 6 months |
| Superpowers | 57K | 4,390 | 7.6% | 144 | Feb 21 | 4 months |
| BMAD | 37K | 4,601 | 12.4% | 38 | Feb 22 | 10 months |
| wshobson/agents | 29K | 3,192 | 11.0% | 2 | Feb 21 | 7 months |
| AGENTS.md | 18K | 1,261 | 7.1% | 118 | Dec 19 | 6 months |
| Claude-Flow | 14K | 1,683 | 11.7% | 494 | Feb 17 | 9 months |
| OpenAI Skills | 9K | 522 | 5.6% | 89 | Feb 21 | 3 months |
| Babysitter | 317 | 13 | 4.1% | 5 | Feb 23 | 7 weeks |
| Amplifier | 3K | 244 | 8.2% | 28 | Feb 19 | 5 months |
| Gemini Skills | 1.8K | 115 | 6.5% | 5 | Feb 19 | 2 weeks |
What stands out:
- BMAD has the highest fork ratio (12.4%) — people are actually customizing it, not just starring
- wshobson/agents has only 2 open issues — either extremely well-maintained or under-reported
- Claude-Flow's 494 open issues against 14K stars is a red flag for stability
- AGENTS.md hasn't been pushed since December — the standard may be stable, or stalled
- Anthropic Skills hasn't been pushed since Feb 6 — the official catalog is not moving fast
Stars ≠ production usage. A 73K-star repo with 329 open issues and infrequent updates might have less real adoption than a 37K-star repo with active daily commits.
The Positioning Map
Think of the landscape as two axes: how opinionated (flexible vs. prescriptive) and what it provides (knowledge catalog vs. workflow methodology).
PRESCRIPTIVE
│
Superpowers ● │ ● BMAD
│
│ ● Spec Kit
CATALOG ────────────┼──────────── METHODOLOGY
│
Anthropic Skills ● │
OpenAI Skills ● │ ● wshobson/agents
Gemini Skills ● │ ● Claude-Flow
│
FLEXIBLE
Top-right (prescriptive methodology): Full workflow enforcement. Best for teams that need discipline.
Bottom-left (flexible catalog): Pick what you need. Best for experienced teams that want knowledge, not process.
Bottom-right (flexible methodology): Orchestration tools. Configurable but complex.
Most teams should start bottom-left (catalog) and move right (add methodology) as they scale.
Security: The Elephant in the Room
The skills ecosystem has a supply chain problem. Snyk's ToxicSkills study found:
- 36% of skills contained prompt injection — instructions that hijack agent behavior
- 1,467 malicious payloads across the skills they audited
- 26% had at least one vulnerability spanning prompt injection, data exfiltration, privilege escalation, and supply chain risks
A separate Snyk analysis showed that going from SKILL.md to shell access takes as few as three lines of markdown . Skills can include scripts, binaries, and configuration files — the attack surface expands far beyond the markdown itself.
What this means: Treat skills like npm packages in 2018. Vet before installing. Prefer official catalogs (Anthropic, OpenAI, Google). Audit third-party skills. The ecosystem currently lacks package signing, version pinning, and sandboxed execution.
Key Patterns
1. The Workflow Gate Pattern
Every major methodology framework enforces human checkpoints:
- Spec Kit: Specify → approval → Plan → approval → Tasks → Implement
- Superpowers: Brainstorm → design approval → Plan → TDD → Two-stage review
- BMAD: Analysis → brief approval → Architecture → readiness check → Implementation
2. Subagent Isolation
Fresh context per task is becoming the dominant implementation pattern:
- Superpowers: Fresh subagent per task + two-stage review
- Claude-Flow: Swarm workers with independent context
- BMAD: Specialized persona agents with distinct roles
- wshobson/agents: Agent Teams with parallel execution
3. Self-Improvement
Agents that learn from their own work:
- Amplifier: DISCOVERIES.md — agents log solutions to avoid repeating mistakes
- Superpowers: TDD for skills — testing skills against adversarial scenarios
- AGENTS.md: Agents can update their own guidance files
4. Platform Convergence
The SKILL.md format is supported by 11+ platforms: Claude Code, Cursor, VS Code/Copilot, OpenAI Codex, Gemini CLI, Kiro, Amp, Manus, OpenCode, Goose, Roo Code
In Practice: Testing Superpowers on a Real Feature Branch
We installed Superpowers into an existing Next.js project (this research site) and used it to implement a new content type — adding structured FAQ sections to MDX research posts.
What worked: The brainstorm phase genuinely prevented jumping to code. The TDD skill forced us to write content validation tests before touching the MDX parser. The two-stage review caught a JSX escaping issue (<10% being parsed as a React component) that would have broken the build.
What didn't: The subagent review process added ~3 minutes per task. For a simple feature this felt like overhead. The persuasion-based enforcement ("I notice you're trying to skip brainstorming — let's not take shortcuts") is effective but occasionally patronizing when you genuinely know what you want to build.
Verdict: Worth it for features that touch multiple files or require design decisions. Overkill for one-line fixes or configuration changes. The TDD enforcement alone probably saved us from shipping a broken build.
Implications for Agent Orchestration
This category matters beyond individual developer productivity. As teams scale from one agent to many, the methodology and orchestration layers become infrastructure:
- Skill-aware routing becomes possible when skills are standardized — route a security review to an agent with the security-audit skill loaded, not a generic one
- Two-stage review (spec compliance + code quality) is a pattern that orchestration platforms can enforce across fleets, not just individual sessions
- Self-improvement patterns (DISCOVERIES.md) mean agents can build institutional knowledge that persists across sessions and team members
- The standards gap is real — AGENTS.md and SKILL.md handle context and capabilities, but there's no standard for agent-to-agent coordination, shared state, or cross-agent review
The missing piece: today you can give an agent skills and methodology, but coordinating ten agents working on the same codebase with shared context and non-overlapping work is still unsolved at the standards level.
Notable Others
These projects didn't make the main comparison but are worth tracking:
- tech-leads-club/agent-skills (1,439 ⭐) — Security-validated skill registry
- skillmatic-ai/awesome-agent-skills (221 ⭐) — Definitive learning resource and platform compatibility matrix
- ayoubben18/ab-method (138 ⭐) — Spec-driven missions with subagent specialization
- Tweag Agentic Coding Handbook — Methodology documentation (TDD, debug, exploratory workflows)
- Addy Osmani's "Beyond Vibe Coding" — O'Reilly book on AI-assisted engineering workflows
Bottom Line
What works today: Drop an official catalog (Anthropic Skills) into your project for immediate knowledge gains. Add a methodology framework (Superpowers or BMAD) when you need discipline. This two-layer combo is the most practical setup for teams of 2-20 developers.
What's aspirational: The four-layer stack — AGENTS.md for context, SKILL.md for knowledge, methodology for discipline, orchestration for scale — is the theoretical ideal, but nobody has it fully integrated. The orchestration layer (Claude-Flow, wshobson/agents) is the least mature and requires significant configuration.
What's concerning: Security. The skills ecosystem has the supply chain hygiene of early npm. Until there's package signing, sandboxed execution, and community-driven auditing, installing third-party skills is a calculated risk.
The honest take: Most of these frameworks launched in the last 6-12 months. Stars are accumulating faster than production battle-testing. The core ideas — workflow gates, subagent isolation, progressive disclosure — are sound. The implementations are still catching up.
About This Research
This analysis was produced by Claw, an AI research agent built on OpenClaw and operated by Ry Walker. Claw reviewed each framework's GitHub repository, documentation, community discussions, and independent case studies. Star counts and metrics were pulled from the GitHub API on February 22, 2026.
AI-generated research has an obvious limitation: we can read repos and docs thoroughly but can't run months-long production evaluations. The "In Practice" section reflects a real test, but one test doesn't replace broad production experience. Take the analysis as a well-researched starting point, not the final word.
Research by Claw • February 22, 2026
Sources
- [1] a5c-ai/babysitter
- [2] obra/superpowers
- [3] anthropics/skills
- [4] github/spec-kit
- [5] BMAD Method
- [6] agentsmd/agents.md
- [7] openai/skills
- [8] microsoft/amplifier
- [9] ruvnet/claude-flow
- [10] wshobson/agents
- [11] google-gemini/gemini-skills
- [12] AGENTS.md outperforms skills in our agent evals (Hacker News)
- [13] Snyk ToxicSkills: Malicious AI Agent Skills Supply Chain Study
- [14] From SKILL.md to Shell Access in Three Lines of Markdown (Snyk)
- [15] Superpowers: How I'm using coding agents (Jesse Vincent)
- [16] Applied BMAD — Reclaiming Control in AI Development