Tools
Atlas
Atlas - Enterprise document indexing plugin for OpenClaw. Vectorless RAG using PageIndex with async indexing, incremental updates, and smart caching. Scales from 10 to 5000+ documents. Perfect for financial reports, legal docs, technical manuals, and research papers.
Install
npm install
```
Configuration Example
plugins:
- id: atlas
enabled: true
documentsDir: ~/Documents/atlas
README
# Atlas โ Document Navigation for OpenClaw
[](https://github.com/joshuaswarren/openclaw-atlas/releases)
[](LICENSE)
[](https://github.com/openclaw/openclaw)
[](https://github.com/VectifyAI/PageIndex)
[](https://www.typescriptlang.org/)
**Atlas** is an OpenClaw plugin that provides intelligent document indexing and navigation using [PageIndex](https://github.com/VectifyAI/PageIndex), a vectorless, reasoning-based RAG system with **production-ready scaling for 5000+ documents**.
## โจ What It Does
Atlas transforms your document collections into navigable knowledge maps:
- ๐ **Index documents** โ PDFs, Markdown, text files, HTML
- ๐ง **Reasoning-based search** โ No vector embeddings required
- ๐ฏ **Precise citations** โ Exact page and section references
- ๐บ๏ธ **Hierarchical navigation** โ Tree-structured document indexes
- โก **Async indexing** โ Non-blocking background processing
- ๐ **Incremental updates** โ Only index changed documents
- ๐ฆ **Smart sharding** โ Split large collections automatically
- ๐พ **Result caching** โ Lightning-fast repeated queries
- ๐ **Local LLM support** โ Use Ollama, LM Studio, MLX, vLLM with automatic cloud fallback
## ๐ Scaling Capabilities
Atlas is designed to scale from **10 to 5000+ documents**:
| Document Count | Index Time | Search Time | Strategy |
|----------------|------------|-------------|----------|
| 1-50 | < 5 min | < 5s | Single collection |
| 50-500 | 30-60 min | 5-15s | Async indexing |
| 500-5,000 | ~2 hours | 15-30s | **Sharding + Async** |
| 5,000+ | Use hybrid approach | 30s+ | **Topic partitioning** |
See **[SCALING.md](SCALING.md)** for comprehensive scaling documentation.
## ๐ฏ Why Atlas?
Traditional RAG systems chunk documents into vector embeddings. Atlas uses PageIndex's innovative approach:
- **No chunking** โ Documents preserve their structure
- **LLM reasoning** โ Traverses document trees intelligently
- **Perfect for** โ Financial reports, legal docs, technical manuals, research papers
## ๐ฆ Installation
1. Clone into OpenClaw extensions:
```bash
git clone https://github.com/your-repo/openclaw-atlas.git \
~/.openclaw/extensions/openclaw-atlas
```
2. Install dependencies:
```bash
cd ~/.openclaw/extensions/openclaw-atlas
npm install
```
3. Build:
```bash
npm run build
```
4. Install PageIndex:
```bash
pip install pageindex
```
5. Enable in OpenClaw config:
```yaml
plugins:
- id: atlas
enabled: true
documentsDir: ~/Documents/atlas
```
## โ๏ธ Configuration
### Basic Configuration
```yaml
plugins:
- id: atlas
enabled: true
pageindexPath: /usr/local/bin/pageindex # optional
documentsDir: ~/.openclaw/workspace/documents
indexOnStartup: false
maxResults: 5
contextTokens: 1500
supportedExtensions:
- .pdf
- .md
- .txt
- .html
debug: false
```
### Scaling Configuration
```yaml
plugins:
- id: atlas
# Async indexing (Phase 1)
asyncIndexing: true # Enable non-blocking indexing
maxConcurrentIndexes: 3 # Parallel job limit
# Incremental updates (Phase 2)
# Use --incremental flag with CLI
# Sharding (Phase 3)
shardThreshold: 500 # Auto-shard > 500 docs
# Caching (Phase 5)
cacheEnabled: true # Enable result caching
cacheTtl: 300000 # Cache TTL: 5 minutes
```
See **[SCALING.md](SCALING.md)** for detailed scaling configuration.
### Local LLM Configuration
Atlas supports local LLM providers (Ollama, LM Studio, MLX, vLLM) with automatic fallback to cloud models:
```yaml
plugins:
- id: atlas
# Local LLM (optional)
localLlmEnabled: true # Enable local LLM
localLlmUrl: http://localhost:1234/v1 # Auto-detects Ollama/LM Studio/MLX/vLLM
localLlmModel: qwen3-coder-30b-a3b-instruct-mlx@8bit # Your local model
localLlmFallback: true # Fall back to cloud if unavailable
```
**Supported Local LLM Providers:**
- **Ollama** (`http://localhost:11434`) โ Models like `llama3`, `deepseek-r1`
- **LM Studio** (`http://localhost:1234/v1`) โ Any OpenAI-compatible model
- **MLX** (`http://localhost:8080`) โ Apple Silicon optimized models
- **vLLM** (`http://localhost:8000`) โ Fast inference server
**Fallback Chain:**
1. Local LLM (if enabled)
2. Gateway's primary model
3. Gateway's fallback models (full chain)
This means Atlas can use **different models** than your gateway's default โ configure independently!
## ๐ฎ Usage
### Agent Tools
Agents can use these tools:
```
atlas_search(query, collection?, maxResults?)
โ Search through indexed documents
atlas_index(path, collection?, background?)
โ Index a new document or directory
atlas_collections()
โ List all document collections
atlas_status()
โ Check indexing status and stats
```
### CLI Commands
```bash
# Search documents
openclaw atlas search "LLM architecture patterns"
# Index with options
openclaw atlas index ~/Documents --background # Async
openclaw atlas index ~/Documents --incremental # Incremental
openclaw atlas index ~/Documents --shard 200 # Force shard
# Job management
openclaw atlas jobs # List all jobs
openclaw atlas job-status <job-id> # Check progress
openclaw atlas job-cancel <job-id> # Cancel job
# Cache management
openclaw atlas cache-stats # View cache stats
openclaw atlas cache-clear # Clear cache
# Collections
openclaw atlas collections # List collections
openclaw atlas status # System status
```
## ๐๏ธ Architecture
```
~/.openclaw/extensions/openclaw-atlas/
โโโ src/
โ โโโ index.ts # Plugin entry point
โ โโโ types.ts # TypeScript interfaces
โ โโโ config.ts # Config parsing
โ โโโ logger.ts # Logging wrapper
โ โโโ pageindex.ts # PageIndex API wrapper
โ โโโ storage.ts # Document & job management
โ โโโ tools.ts # Agent tools
โ โโโ cli.ts # CLI commands
โโโ openclaw.plugin.json # Plugin manifest
โโโ README.md # This file
โโโ SCALING.md # Scaling guide
โโโ CLAUDE.md # Agent guidelines
```
## ๐ How PageIndex Works
1. **Tree Construction** โ Documents become hierarchical index trees
2. **Reasoning Search** โ LLMs navigate the tree to find answers
3. **Citation Preservation** โ Exact source references maintained
Read more: [PageIndex GitHub](https://github.com/VectifyAI/PageIndex)
## ๐ Performance
### Small Collections (< 100 docs)
- Index time: < 5 minutes
- Search time: < 5 seconds
- Memory: ~50MB
### Medium Collections (100-1000 docs)
- Index time: 30-60 minutes (async)
- Search time: 5-15 seconds
- Memory: ~200MB
### Large Collections (1000-5000 docs)
- Index time: ~2 hours (sharded + async)
- Search time: 15-30 seconds (sharded)
- Memory: ~1GB
See **[SCALING.md](SCALING.md)** for optimization strategies.
## ๐ ๏ธ Development
```bash
# Watch mode for development
npm run dev
# Type checking
npm run typecheck
# Build
npm run build
```
## ๐ Documentation
- **[README.md](README.md)** โ This file
- **[SCALING.md](SCALING.md)** โ Comprehensive scaling guide
- **[CLAUDE.md](CLAUDE.md)** โ Agent development guidelines
## ๐ License
MIT
## ๐ Credits
- Built with [PageIndex](https://github.com/VectifyAI/PageIndex) by Vectify AI
- Part of the [OpenClaw](https://github.com/your-org/openclaw) ecosystem
tools
By
Comments
Sign in to leave a comment