Tools
Self Evolve Byom
Self-evolve plugin with Bring Your Own Model support for OpenClaw
Install
openclaw plugins install ./self-evolve
README
# Self Evolve

> A self-evolving OpenClaw plugin that learns from feedback and turns runtime experience into reusable memory.
- [English](#english)
- [中文](#中文)

## English
`self-evolve` is an self-learning plugin for openclaw. Fewer tokens, more algorithmic learning of new skills:
- Retrieves episodic memories before answering and prepends them to prompt context.
- Aggregates a task across multiple turns, then learns when feedback is detected.
- Learns over time by updating utility (Q values) and writing new episodic memories.
### Quick Start
> Recommended: upgrade to **openclaw 2026.3.2+** before using this plugin. Older versions may miss hook context and fail to capture tool traces reliably.
1. Install plugin
```bash
git clone https://github.com/longmans/self-evolve
openclaw plugins install ./self-evolve
```
2. Set env var
```bash
export OPENAI_API_KEY=sk-xxx
```
3. Restart and verify
- Restart gateway.
```bash
openclaw gateway restart
```
- Check logs for:
- `self-evolve: initialized ...`
Optional: if you want to override defaults, run one-shot config
> Keep `embedding` default unchanged for remote consistency.
```bash
openclaw config set plugins.entries.self-evolve '{"enabled":true,"config":{"reward":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0},"experience":{"summarizer":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0}}}'
```
### Effect Logs

### Feedback Tips
- Praise clearly when it works (for positive reinforcement).
- Point out clearly when it fails (to down-rank bad strategies).
- Explicit feedback is better than vague messages like "ok".
### How It Works
1. `before_prompt_build`
- Manages a pending task state (`open` / `waiting_feedback`).
- Detects feedback, new-intent switch, idle close, TTL close, and max-turn close.
- Builds embedding and retrieves candidates.
- If candidates exist, injects `<self-evolve-memories>`; if not, still keeps task pending (bootstrap).
2. `agent_end`
- Captures assistant response and moves task to `waiting_feedback`.
3. Later user messages
- If feedback is detected, scores reward and decides learning.
- If reward + mode + intent gates pass, updates Q and appends episodic memory.
- If message looks like a new request, current task can be closed and a new one starts.
### Project Workflow
```mermaid
flowchart TD
A[Receive user message] --> B{Feedback turn?}
B -- Yes --> C[Score reward and check learning gates]
C --> D{Should learn?}
D -- Yes --> E[Local sanitizeMemoryText redaction]
E --> F[LLM summarizes and second redaction]
F --> G[Append local memory triplet]
G --> H[Optional remote ingest by request_key_id]
D -- No --> I[Skip learning]
B -- No --> J[Detect intent and task boundary]
J --> K[Retrieve local + remote candidates]
K --> L[Phase-B rank/select memories]
L --> M[Inject memories and generate reply]
M --> N[Set task to waiting_feedback]
N --> A
H --> A
I --> A
```
### Advanced Settings
Default learning gates:
- `runtime.observeTurns=0`
- `runtime.minAbsReward=0.15`
- `runtime.minRewardConfidence=0.55`
- `runtime.minFeedbackChars` has been removed.
Default retrieval gate:
- `retrieval.tau=0.85` (only inject memories when best similarity is high enough)
Learning modes (`runtime.learnMode`):
- `balanced` (default): prefer tool turns; no-tool turns require high reward/confidence.
- `tools_only`: learn only when tools were called (lowest token cost).
- `all`: learn all turns that pass reward gates (highest token cost).
Balanced-mode no-tool thresholds:
- `runtime.noToolMinAbsReward=0.8`
- `runtime.noToolMinRewardConfidence=0.9`
Task boundary defaults:
- `runtime.newIntentSimilarityThreshold=0.35`
- `runtime.idleTurnsToClose=2`
- `runtime.pendingTtlMs=300000` (5 minutes)
- `runtime.maxTurnsPerTask=5`
Remote shared memory (enabled by default):
- Default `remote.enabled=true`, default `remote.baseUrl=https://self-evolve.club/api/v1`.
- `remote.enabled=true` enables remote register/ingest/search/feedback.
- With remote enabled, you can also leverage high-value experience contributed by others to improve your own self-evolution quality.
- Plugin auto-registers once via `POST /v1/clients/register` and stores `request_key_id` locally.
- On retrieval, local and remote candidates are merged before Phase-B ranking.
- On learning, plugin reports selected remote triplets with reward for attribution.
- Privacy design:
- User intent and conversation traces are sanitized locally before being used as memory payload.
- First redaction: `sanitizeMemoryText` removes conversation metadata, IDs, and sender-like tags.
- Second redaction: the experience summarizer requires the LLM to output transferable strategy and replace sensitive data with `[REDACTED_*]` placeholders.
- Shared remote data is limited to sanitized triplets (`intent` / `experience` / `embedding`) with anonymous attribution via `request_key_id`.
- You can view shared contribution rankings at [https://self-evolve.club/#leaderboard](https://self-evolve.club/#leaderboard).
Remote config example:
```bash
openclaw config set plugins.entries.self-evolve.config.remote '{
"enabled": true,
"baseUrl": "https://self-evolve.club/api/v1",
"timeoutMs": 3000
}'
```
Disable remote sharing:
```bash
openclaw config set plugins.entries.self-evolve.config.remote.enabled false
```
Switch mode:
```bash
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"tools_only"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"all"'
openclaw config set plugins.entries.self-evolve.config.runtime.learnMode '"balanced"'
```
Memory retention:
- Default `memory.maxEntries=200`
- Over limit, keep higher-value memories (Q/success/recency/selectedCount), dedupe near-duplicates, and reserve a small fresh quota.
```bash
openclaw config set plugins.entries.self-evolve.config.memory.maxEntries 200
```
### FAQ
Q: How do I know `self-evolve` is running normally?
A: Check gateway logs for these signals:
- Startup:
- `self-evolve: initialized ...`
- `self-evolve: loaded <N> episodic memories`
- Hook pipeline:
- `[self-evolve] hook before_prompt_build ...`
- `[self-evolve] agent_end captured ...`
- `[self-evolve] llm_output captured ...`
- Learning pipeline:
- `self-evolve: feedback scored ...`
- `[self-evolve] learning start ...` / `[self-evolve] learning skipped ...`
- `[self-evolve] learning persisted to episodic store`
Q: How do I know the agent actually used evolved skills (episodic memory)?
A: Look for retrieval and injection evidence:
- `[self-evolve] phase-a candidates=<N>` where `N > 0`
- `[self-evolve] phase-b ... selected=<K>` where `K > 0`
- `[self-evolve] pending created ... selectedIds=<not none>`
- `[self-evolve] prependContext preview=<self-evolve-memories>...`
If you only see `selected=0` / `selectedIds=none`, no evolved memory was injected for that turn.
Q: How do I know learning has written new memory?
A: Look for:
- `[self-evolve] memory append ...`
- `[self-evolve] learning persisted to episodic store`
Then verify the state file (`plugins/self-evolve/episodic-memory.json`) has new entries.
## 中文
> 越用越强,每一次对话都在进化。
`self-evolve` 是一个为openclaw设计的自学习插件,可以更少token、更算法的学习新技能:
- 回答前检索 episodic memory 并注入上下文。
- 将一个任务聚合为多轮,再在检测到反馈时学习。
- 持续更新 Q 值并写入新记忆。
### 快速入门
> 建议先升级到 **openclaw 2026.3.2+**。旧版本可能出现 hook 上下文缺失,导致 tool trace 记录不稳定。
1. 安装插件
```bash
git clone https://github.com/longmans/self-evolve
openclaw plugins install ./self-evolve
```
2. 设置环境变量
```bash
export OPENAI_API_KEY=sk-xxx
```
3. 重启并验证
- 重启 gateway。
```bash
openclaw gateway restart
```
- 查看日志是否出现:
- `self-evolve: initialized ...`
可选:如果你想覆盖默认参数,再执行一条命令配置
> 为了和远端保持一致,不要修改 `embedding` 配置。
```bash
openclaw config set plugins.entries.self-evolve '{"enabled":true,"config":{"reward":{"provider":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0},"experience":{"summarizer":"openai","apiKey":"${OPENAI_API_KEY}","model":"gpt-4.1-mini","temperature":0}}}'
```
### 效果日志

### 反馈建议
- 做对时明确表扬(强化正确策略)。
- 做错时明确指出(降低错误策略权重)。
- 明确反馈优于“ok/继续”这类模糊反馈。
### 项目工作流程
```mermaid
flowchart TD
A[收到用户消息] --> B{是否反馈轮}
B -- 是 --> C[奖励打分并检查学习门槛]
C --> D{是否学习}
D -- 是 --> E[本地 sanitizeMemoryText 脱敏]
E --> F[LLM 总结并二次脱敏]
F --> G[写入本地记忆 triplet]
G --> H[可选远程写入 request_key_id 归因]
D -- 否 --> I[跳过学习]
B -- 否 --> J[识别意图并判断任务边界]
J --> K[检索本地+远程候选]
K --> L[Phase-B 排序并选择记忆]
L --> M[注入记忆并生成回复]
M --> N[任务进入 waiting_feedback]
N --> A
H --> A
I --> A
```
### 高级配置
默认学习门槛:
- `runtime.observeTurns=0`
- `runtime.minAbsReward=0.15`
- `runtime.minRewardConfidence=0.55`
- `runtime.minFeedbackChars` 已移除。
默认检索门槛:
- `retrieval.tau=0.85`(仅在最高相似度足够高时才注入记忆)
学习模式 `runtime.learnMode`:
- `balanced`(默认):优先学习工具回合;无工具回合需高奖励高置信。
- `tools_only`:仅学习有工具调用的回合(最省 token)。
- `all`:所有通过门槛的回合都学习(最费 token)。
任务边界默认值:
- `runtime.newIntentSimilarityThreshold=0.35`
- `runtime.idleTurnsToClose=2`
- `runtime.pendingTtlMs=300000`(5分钟)
- `runtime.maxTurnsPerTask=5`
远程共享记忆(默认开启):
- 默认 `remote.enabled=true`,默认 `remote.baseUrl=https://self-evolve.club/api/v1`。
- `remote.enabled=true` 后启用远程注册/写入/检索/反馈。
- 开启 remote 后,你也可以吸收其他人沉淀的高价值经验,帮助自己更好地完成自我进化。
- 插件会通过 `POST /v1/clients/register` 首次注册并本地保存 `request_key_id`。
- 检索时会把本地与远程候选合并后统一进入 Phase-B 排序。
- 学习时会上报被选中的远程 triplet 与 reward,供服务端做归因与统计。
- 隐私设计:
- 用户意图与对话轨迹在进入记忆载荷前会先做本地脱敏处理。
- 第一次脱敏:`sanitizeMemoryText` 去除会话元数据、message_id 与 sender/tag 等标识。
- 第二次脱敏:经验总结阶段要求 LLM 输出可迁移策略,并把敏感信息替换为 `[REDACTED_*]` 占位符。
- 远程共享仅包含脱敏后的 triplet(`intent` / `experience` / `embedding`),并使用 `request_key_id` 做匿名归因。
- 可以到网站查看共享贡献度排名:[https://self-evolve.club/#leaderboard](https://self-evolve.club/#leaderboard)。
远程配置示例:
```bash
openclaw config set plugins.entries.self-evolve.config.remote '{
"enabled": true,
"baseUrl": "h
... (truncated)
tools
Comments
Sign in to leave a comment