Tools
Signal Hunter
Signal Hunter - AI/ML market intelligence plugin for OpenClaw. Monitors GitHub, Reddit, HN, StackOverflow. Semantic search via Qdrant + bge-m3.
Install
pip install -r
Configuration Example
{
"id": "signal-hunter",
"path": "~/.openclaw/extensions/signal-hunter",
"config": {
"pythonBin": "python3",
"skillDir": null
}
}
README
# Signal Hunter - OpenClaw Plugin
Market intelligence for AI/ML builders. Monitors GitHub, Reddit, Hacker News and Stack Overflow for signals: developer pain points, feature requests, tool comparisons. Manages everything through a chat interface via [OpenClaw](https://github.com/openclaw).
---
## What it does
You type a keyword ("RAG", "ollama", "LangChain") in chat. Signal Hunter:
1. **Discovers** where the topic is discussed (repos, subreddits, SO tags) via real API calls
2. **Proposes a collection plan** - which repos/queries to monitor
3. **Collects incrementally** (GitHub issues, Reddit posts, HN threads, SO questions) using cursors
4. **Classifies** every signal with a local LLM using your extraction rules (pain points, feature requests, comparisons, adoption...)
5. **Embeds** relevant signals into Qdrant with `bge-m3`
6. **Answers questions** in natural language: "what are the top complaints about RAG retrieval this month?"
7. **Generates change reports** - weekly/monthly deltas with what's new and what grew
Everything runs on a VPS, fully offline (except for API calls to GitHub/Reddit/SO and the LLM providers you configure).
---
## Stack
| Component | Role |
|---|---|
| **Python 3.11+** | Core skill logic |
| **TypeScript** | OpenClaw plugin adapter (thin wrapper) |
| **PostgreSQL 16** | Structured storage: signals, profiles, cursors, LLM cost log |
| **Qdrant** | Vector search (cosine, 1024 dims) |
| **BAAI/bge-m3** | Cross-lingual embeddings via `sentence-transformers` |
| **Local LLM** | Classification, rule suggestions (OpenAI-compatible endpoint) |
| **Claude (Anthropic)** | Queries, resolution strategy (configurable) |
| **Docker Compose** | PostgreSQL + Qdrant services |
---
## Architecture
```
OpenClaw chat
โ
โผ
src/index.ts โ OpenClaw plugin entry (register tools + /sh command)
src/tools.ts โ 22 tool definitions (thin TS wrappers)
src/runner.ts โ spawns: python -m skill <command> [args]
โ
โผ JSON via stdout
skill/main.py โ CLI dispatcher (22 commands)
โ
โโโ core/resolver.py โ keyword discovery + LLM enrichment
โโโ core/orchestrator.py โ collect โ process โ embed pipeline
โโโ core/processor.py โ LLM classification (token-aware batching)
โโโ core/embedder.py โ bge-m3 โ Qdrant (Outbox pattern)
โโโ core/llm_router.py โ routes ops to local/Claude by config
โ
โโโ collectors/
โ โโโ github.py โ GitHub Issues (repo-scoped, cursor on updated_at)
โ โโโ reddit.py โ Reddit JSON API (no auth for public subs)
โ โโโ hackernews.py โ Algolia HN API (no auth)
โ โโโ stackoverflow.py โ Stack Exchange API v2.3
โ
โโโ storage/
โโโ postgres.py โ all SQL (raw_signals, processed_signals, cursors...)
โโโ vector.py โ Qdrant wrapper
โโโ config_manager.py โ atomic config.json writes (temp file + rename)
```
**Design principles:**
- Each collector is a self-contained module implementing `BaseCollector`
- Business logic stays in Python; TypeScript only handles IPC
- Discovery-first: LLM enriches only facts confirmed by API calls, never guesses
- Token-aware batching for LLM classification (validated: ~20K tokens per batch)
- Outbox pattern for embedding queue (PostgreSQL โ Qdrant, crash-safe)
- Anti-hallucination gate on query answers: URLs not in source data are stripped
---
## Prerequisites
- VPS or local machine with Python 3.11+
- Docker + Docker Compose (for PostgreSQL and Qdrant)
- [OpenClaw](https://github.com/openclaw) installed
- Local LLM with OpenAI-compatible API (e.g. [Ollama](https://ollama.com) with Devstral, Mistral, etc.)
- Anthropic API key (for queries and keyword resolution strategy)
---
## Installation
### 1. Clone and place the plugin
```bash
git clone https://github.com/fellis/openclaw-signal-hunter.git
cd openclaw-signal-hunter
```
Place (or symlink) the directory into your OpenClaw extensions folder:
```bash
ln -s $(pwd) ~/.openclaw/extensions/signal-hunter
```
### 2. Configure environment
```bash
cp .env.example .env
```
Edit `.env`:
```env
# PostgreSQL (docker-compose exposes 5433 on host)
DATABASE_URL=postgresql://signal:signal@localhost:5433/signal_hunter
# Qdrant
QDRANT_URL=http://localhost:6333
# GitHub (optional - public rate limit 60 req/hr works for testing)
GITHUB_TOKEN=ghp_your_token_here
# Local LLM (OpenAI-compatible endpoint)
LOCAL_LLM_BASE_URL=http://localhost:11434/v1
LOCAL_LLM_API_KEY=local
LOCAL_LLM_MODEL=devstral
# Anthropic Claude
ANTHROPIC_API_KEY=sk-ant-your_key_here
```
### 3. Start infrastructure
```bash
docker compose up -d
```
This starts PostgreSQL 16 (port 5433) and Qdrant (port 6333). The schema is applied automatically on first start.
### 4. Install Python dependencies
```bash
pip install -r requirements.txt
```
### 5. Register the plugin in OpenClaw
Add to your `~/.openclaw/openclaw.json` under `plugins`:
```json
{
"id": "signal-hunter",
"path": "~/.openclaw/extensions/signal-hunter",
"config": {
"pythonBin": "python3",
"skillDir": null
}
}
```
Reload OpenClaw. The plugin registers 22 tools and the `/sh` slash command.
---
## Usage walkthrough
Here is a complete scenario: tracking developer pain points around **RAG** and **ollama**.
### Step 1 - Check source readiness
```
You: source status
```
```
ClawBot:
โ hackernews - no auth required
โ github - no GITHUB_TOKEN
โ reddit - public API ready, ~10 req/min
โ stackoverflow - 300 req/day without key
```
```
You: how do I set up GitHub?
```
```
ClawBot:
1. Go to github.com โ Settings โ Developer settings โ Personal access tokens โ Tokens (classic)
2. Click "Generate new token (classic)"
3. Name: signal-hunter, Scopes: public_repo
4. Copy the token and type: github token=<your_token>
```
```
You: github token=ghp_xxxxxxxxxxxxxxxxxxx
```
```
ClawBot: โ GitHub ready. Limit: 5000 req/hr
```
---
### Step 2 - Configure LLM providers
```
You: show providers
```
```
ClawBot:
Provider | Type | Model | Operations
---------|----------------|-------------------|---------------------------
local | openai_compat | devstral | process, suggest_rules, resolve_enrich
claude | anthropic | claude-sonnet-4-5 | resolve_strategy, query
```
To route an operation to a different provider:
```
You: use claude for classification
```
```
ClawBot: โ llm_routing.process โ claude
```
---
### Step 3 - Add keywords and discover
```
You: start monitoring RAG, ollama, LangChain
```
Signal Hunter calls each source API (no LLM guessing) and returns discovered resources:
```
ClawBot:
DISCOVERY: RAG
GitHub: langchain-ai/langchain (88k stars), run-llama/llama_index (35k)
Reddit: r/LocalLLaMA (245k subs), r/LangChain (42k)
HN: 89 threads in last 90 days, avg score 124
SO: 1820 questions with tag [rag]
DISCOVERY: ollama
GitHub: ollama/ollama (85k stars, 2340 open issues)
Reddit: r/ollama (52k subs), r/LocalLLaMA
Proposed collection plan:
RAG: github(langchain-ai/langchain, run-llama/llama_index), reddit(r/LocalLLaMA, r/LangChain), hn(search "RAG"), so(tag:rag)
ollama: github(ollama/ollama), reddit(r/ollama, r/LocalLLaMA), hn, so
Approve all? Or adjust targets?
```
```
You: for RAG remove MachineLearning - too broad. Approve the rest.
```
```
ClawBot: โ Plans saved for 3 keywords.
```
---
### Step 4 - Collect
```
You: collect
```
```
ClawBot: Running in background... (first run: 10-20 min for 90 days of history)
```
```
ClawBot:
โ Collection done:
Keyword | GitHub | Reddit | HN | SO | Total
-----------|--------|--------|-----|-----|------
RAG | 400 | 312 | 89 | 200 | 1001
ollama | 200 | 287 | 43 | 98 | 628
LangChain | 200 | 198 | 67 | 200 | 665
2294 raw signals in database.
```
---
### Step 5 - Suggest and approve classification rules
```
You: suggest rules for RAG
```
The LLM analyzes actual posts from the database and proposes rules grounded in real content:
```
ClawBot:
Suggested extraction_rules for "RAG" (from 1001 real posts):
1. retrieval_failure (priority 5)
When retrieval returns irrelevant docs or causes hallucinations
Real examples: "RAG keeps returning wrong documents even with good embeddings"
"chunks are too large and context gets diluted"
2. chunking_strategy (priority 4)
Questions and pain around document splitting strategies
Real examples: "what's the optimal chunk size for technical docs?"
"sliding window chunking loses context between chunks"
3. embedding_model_choice (priority 4)
Comparisons and selection of embedding models
Real examples: "OpenAI ada vs bge-m3 for multilingual RAG"
4. reranking (priority 3)
Requests and pain around the reranking step
Real examples: "cohere reranker doubles cost but worth it?"
5. evaluation_difficulty (priority 3)
Difficulty evaluating RAG pipeline quality
Real examples: "how do you even measure RAG quality without ground truth?"
Approve all?
```
```
You: approve, but rename evaluation_difficulty to rag_eval
```
```
ClawBot: โ 5 rules saved to config.json
```
---
### Step 6 - Process (LLM classification)
```
You: process
```
```
ClawBot: Classifying 2294 signals with local LLM... (token-aware batching, ~20K tokens/batch)
โ Done. 1847 classified (relevant: 934, irrelevant: 913)
```
Then embed for semantic search:
```
You: embed
```
```
ClawBot: โ 934 vectors indexed into Qdrant (bge-m3, 1024 dims)
```
---
### Step 7 - Query the database
```
You: what are the top pain points with RAG retrieval this month?
```
Signal Hunter embeds the question, searches Qdrant for the most similar signals, re-ranks by `rank_score` (engagement + quality + time decay), then uses Claude to synthesize:
```
ClawBot:
Top RAG retrieval pain points (from 23 signals, last 30 days):
1. **Context dilution with large chunks** (14 signals)
Developers re
... (truncated)
tools
Comments
Sign in to leave a comment