Tools
Skill Growth
Installable OpenClaw plugin for observing, diagnosing, and safely improving OpenClaw Skills.
Install
npm install
npm
README
# OpenClaw Skill Growth
An installable OpenClaw plugin that helps skills stay reliable over time through observation, diagnosis, improvement proposals, and user-approved updates.
## Why this exists
OpenClaw skills are powerful, but they are usually static while everything around them keeps changing.
A skill that worked last week can quietly degrade when:
- the environment changes
- a tool path moves
- a dependency breaks
- the model behavior shifts
- task patterns evolve
- user expectations become more specific
In most systems, these failures stay invisible until someone notices that results are getting worse. Maintenance becomes reactive, manual, and expensive.
OpenClaw Skill Growth turns skills from static prompt files into maintainable system components.
## What it does
OpenClaw Skill Growth adds a safe improvement loop around your existing skills:
**Observe → Diagnose → Propose → Apply → Evaluate**
It helps you:
- index and structure your existing `SKILL.md` files
- record what happened when a skill ran
- detect repeated weak outcomes and failure patterns
- generate evidence-backed improvement proposals
- produce patch-ready changes for `SKILL.md`
- apply updates only after user approval by default
- track whether a new version actually improved outcomes
## Design principles
### 1. Skills are living components
A skill is not just a markdown file. It is a versioned capability with triggers, dependencies, constraints, execution history, and maintenance needs.
### 2. No invisible failure
If a skill fails, degrades, times out, or requires repeated user correction, the system should be able to see it.
### 3. No blind self-modification
The plugin does not assume that every failure means a prompt should be rewritten. It collects evidence, diagnoses likely causes, and proposes bounded changes.
### 4. Human approval by default
By default, the plugin observes, analyzes, and generates modifications, but does not apply them automatically. Users stay in control.
### 5. Every change must be reversible
Any applied change should be versioned, explainable, and rollback-friendly.
## Core workflow
### 1. Observe
After a skill runs, the plugin records structured run data such as:
- skill id and version
- task summary
- success or failure
- error type
- runtime context
- retries and fallbacks
- user correction signals
### 2. Diagnose
The plugin analyzes patterns over time and tries to distinguish between:
- trigger issues
- instruction issues
- tool issues
- environment issues
- dependency issues
- context issues
- unknown causes
### 3. Propose
When enough evidence accumulates, the plugin generates an improvement proposal. Typical proposal types include:
- tighten trigger conditions
- add a missing condition
- reorder execution steps
- clarify instructions
- improve output format requirements
- add retry or fallback guidance
- update dependency notes
### 4. Apply
By default, proposed changes are rendered as reviewable patches. Users can approve before the plugin updates the skill definition.
Advanced users may opt into limited auto-apply behavior for low-risk changes.
### 5. Evaluate
After a change is applied, the plugin compares the new version against the previous baseline using metrics like:
- failure rate
- retry rate
- fallback rate
- latency
- user correction rate
If the change does not improve outcomes, it can be rejected or rolled back.
## v1 scope
Version 1 focuses on **OpenClaw Skills only**.
Included in v1:
- skill registry from `SKILL.md`
- run observation and structured logs
- health reporting
- diagnosis of repeated failures
- improvement proposals
- patch generation
- user-approved apply flow
- evaluation and version tracking
Not included in v1:
- full workflow optimization
- prompt/policy self-improvement
- graph-native storage requirements
- autonomous high-risk edits
- uncontrolled self-modification
Planned extension path:
- Prompt/Policy support
- workflow-level diagnosis
- semantic retrieval for related failures
- staged rollout automation
- richer rollback strategies
## Safety model
Default behavior:
- observation: enabled
- diagnosis: enabled
- proposal generation: enabled
- patch generation: enabled
- patch apply: manual approval required
Optional advanced mode:
- auto-apply can be enabled explicitly
- only low-risk changes are eligible
- all changes remain versioned and reviewable
High-risk changes should never auto-apply, including changes that:
- affect external actions
- change approval boundaries
- modify sensitive execution logic
- alter security assumptions
## Installation goals
The project is designed to be installable, not just demonstrable.
Target experience:
1. install plugin
2. point it at an existing OpenClaw skill directory
3. let it observe usage
4. review generated proposals
5. approve improvements when ready
## Quick start
```bash
npm install
npm run build
npm run demo
```
This writes a sample report to `./output` using:
- skills from `./examples/demo-skills`
- sample run logs from `./examples/sample-runs.jsonl`
- config from `./examples/config.json`
CLI commands:
```bash
npm run scan
npm run analyze
npm run propose
npm run report
npm run apply
```
Direct CLI usage:
```bash
node dist/cli.js report \
--skills-dir ./examples/demo-skills \
--runs-file ./examples/sample-runs.jsonl \
--out-dir ./output \
--config ./examples/config.json
```
To simulate a manual apply flow on generated proposals:
```bash
npm run demo:apply
```
This creates patch outputs plus backup files, applied markers, and an applied change log under `./output-apply`.
The patch engine now supports section-aware `append`, `replace`, and `remove` operations for common skill sections.
For a no-write simulation of the apply path:
```bash
npm run demo:dry-run
```
If you want to inspect proposal and report shapes before running anything, start with `examples/expected-output/` and `docs/showcase.md`.
## Example outcomes
Examples of useful things the plugin should detect:
- a skill increasingly failing after a tool path changed
- a trigger matching too broadly and selecting the wrong skill
- instructions missing a key condition for a common task pattern
- a skill producing outputs that require repeated user correction
- a timeout or dependency constraint no longer matching real usage
## Who this is for
This plugin is for people who:
- rely on OpenClaw skills in daily work
- maintain more than a handful of skills
- want reliable long-term skill quality
- want self-improvement with evidence and guardrails
- prefer a maintainable plugin over an experimental demo
## Showcase
For a quick tour of what the repository can demonstrate today, see:
- `docs/showcase.md`
- `examples/expected-output/sample-report.md`
- `examples/expected-output/sample-proposal.md`
- `examples/expected-output/terminal-demo.txt`
## Release metadata
The package includes repository, homepage, issue tracker metadata, MIT licensing, CI, and project collaboration docs so it is ready to be published as a maintained open-source project.
See also:
- `CHANGELOG.md`
- `CONTRIBUTING.md`
- `SECURITY.md`
- `docs/release-checklist.md`
- `docs/release-notes-v0.1.0.md`
## Roadmap
### v1
- Skill registry
- Run observation
- Diagnosis engine
- Proposal generator
- Patch generation
- Manual approval flow
- Basic evaluation
### v1.x
- low-risk auto-apply mode
- richer reporting
- better failure taxonomy
- improved comparison views
### v2
- Prompt/Policy support
- workflow-level growth loop
- semantic clustering of recurring failures
- staged rollout and rollback controls
## Philosophy
Static skills are useful.
Reliable systems need more.
OpenClaw Skill Growth is built on a simple idea:
> Skills should not just exist. They should be observable, improvable, and maintainable over time.
If OpenClaw is meant to support long-running agent work, skill maintenance cannot stay manual forever.
me.
If OpenClaw is meant to support long-running agent work, skill maintenance cannot stay manual forever.
nClaw is meant to support long-running agent work, skill maintenance cannot stay manual forever.
hould be observable, improvable, and maintainable over time.
If OpenClaw is meant to support long-running agent work, skill maintenance cannot stay manual forever.
me.
If OpenClaw is meant to support long-running agent work, skill maintenance cannot stay manual forever.
to support long-running agent work, skill maintenance cannot stay manual forever.
me.
If OpenClaw is meant to support long-running agent work, skill maintenance cannot stay manual forever.
tools
Comments
Sign in to leave a comment