← Back to Plugins
Tools

Atlas

joshuaswarren By joshuaswarren 👁 31 views ▲ 0 votes

Atlas - Enterprise document indexing plugin for OpenClaw. Vectorless RAG using PageIndex with async indexing, incremental updates, and smart caching. Scales from 10 to 5000+ documents. Perfect for financial reports, legal docs, technical manuals, and research papers.

GitHub

Install

npm install
   ```

Configuration Example

plugins:
     - id: atlas
       enabled: true
       documentsDir: ~/Documents/atlas

README

# Atlas โ€” Document Navigation for OpenClaw

[![Release](https://img.shields.io/github/v/release/joshuaswarren/openclaw-atlas)](https://github.com/joshuaswarren/openclaw-atlas/releases)
[![License: MIT](https://img.shields.io/github/license/joshuaswarren/openclaw-atlas)](LICENSE)
[![OpenClaw Plugin](https://img.shields.io/badge/OpenClaw-plugin-blue)](https://github.com/openclaw/openclaw)
[![PageIndex](https://img.shields.io/badge/PageIndex-integrated-green)](https://github.com/VectifyAI/PageIndex)
[![TypeScript](https://img.shields.io/badge/TypeScript-5.6-blue)](https://www.typescriptlang.org/)

**Atlas** is an OpenClaw plugin that provides intelligent document indexing and navigation using [PageIndex](https://github.com/VectifyAI/PageIndex), a vectorless, reasoning-based RAG system with **production-ready scaling for 5000+ documents**.

## โœจ What It Does

Atlas transforms your document collections into navigable knowledge maps:

- ๐Ÿ“š **Index documents** โ€” PDFs, Markdown, text files, HTML
- ๐Ÿง  **Reasoning-based search** โ€” No vector embeddings required
- ๐ŸŽฏ **Precise citations** โ€” Exact page and section references
- ๐Ÿ—บ๏ธ **Hierarchical navigation** โ€” Tree-structured document indexes
- โšก **Async indexing** โ€” Non-blocking background processing
- ๐Ÿ”„ **Incremental updates** โ€” Only index changed documents
- ๐Ÿ“ฆ **Smart sharding** โ€” Split large collections automatically
- ๐Ÿ’พ **Result caching** โ€” Lightning-fast repeated queries
- ๐Ÿ  **Local LLM support** โ€” Use Ollama, LM Studio, MLX, vLLM with automatic cloud fallback

## ๐Ÿš€ Scaling Capabilities

Atlas is designed to scale from **10 to 5000+ documents**:

| Document Count | Index Time | Search Time | Strategy |
|----------------|------------|-------------|----------|
| 1-50 | < 5 min | < 5s | Single collection |
| 50-500 | 30-60 min | 5-15s | Async indexing |
| 500-5,000 | ~2 hours | 15-30s | **Sharding + Async** |
| 5,000+ | Use hybrid approach | 30s+ | **Topic partitioning** |

See **[SCALING.md](SCALING.md)** for comprehensive scaling documentation.

## ๐ŸŽฏ Why Atlas?

Traditional RAG systems chunk documents into vector embeddings. Atlas uses PageIndex's innovative approach:

- **No chunking** โ€” Documents preserve their structure
- **LLM reasoning** โ€” Traverses document trees intelligently
- **Perfect for** โ€” Financial reports, legal docs, technical manuals, research papers

## ๐Ÿ“ฆ Installation

1. Clone into OpenClaw extensions:
   ```bash
   git clone https://github.com/your-repo/openclaw-atlas.git \
     ~/.openclaw/extensions/openclaw-atlas
   ```

2. Install dependencies:
   ```bash
   cd ~/.openclaw/extensions/openclaw-atlas
   npm install
   ```

3. Build:
   ```bash
   npm run build
   ```

4. Install PageIndex:
   ```bash
   pip install pageindex
   ```

5. Enable in OpenClaw config:
   ```yaml
   plugins:
     - id: atlas
       enabled: true
       documentsDir: ~/Documents/atlas
   ```

## โš™๏ธ Configuration

### Basic Configuration

```yaml
plugins:
  - id: atlas
    enabled: true
    pageindexPath: /usr/local/bin/pageindex  # optional
    documentsDir: ~/.openclaw/workspace/documents
    indexOnStartup: false
    maxResults: 5
    contextTokens: 1500
    supportedExtensions:
      - .pdf
      - .md
      - .txt
      - .html
    debug: false
```

### Scaling Configuration

```yaml
plugins:
  - id: atlas
    # Async indexing (Phase 1)
    asyncIndexing: true          # Enable non-blocking indexing
    maxConcurrentIndexes: 3      # Parallel job limit

    # Incremental updates (Phase 2)
    # Use --incremental flag with CLI

    # Sharding (Phase 3)
    shardThreshold: 500          # Auto-shard > 500 docs

    # Caching (Phase 5)
    cacheEnabled: true           # Enable result caching
    cacheTtl: 300000             # Cache TTL: 5 minutes
```

See **[SCALING.md](SCALING.md)** for detailed scaling configuration.

### Local LLM Configuration

Atlas supports local LLM providers (Ollama, LM Studio, MLX, vLLM) with automatic fallback to cloud models:

```yaml
plugins:
  - id: atlas
    # Local LLM (optional)
    localLlmEnabled: true                    # Enable local LLM
    localLlmUrl: http://localhost:1234/v1   # Auto-detects Ollama/LM Studio/MLX/vLLM
    localLlmModel: qwen3-coder-30b-a3b-instruct-mlx@8bit  # Your local model
    localLlmFallback: true                  # Fall back to cloud if unavailable
```

**Supported Local LLM Providers:**
- **Ollama** (`http://localhost:11434`) โ€” Models like `llama3`, `deepseek-r1`
- **LM Studio** (`http://localhost:1234/v1`) โ€” Any OpenAI-compatible model
- **MLX** (`http://localhost:8080`) โ€” Apple Silicon optimized models
- **vLLM** (`http://localhost:8000`) โ€” Fast inference server

**Fallback Chain:**
1. Local LLM (if enabled)
2. Gateway's primary model
3. Gateway's fallback models (full chain)

This means Atlas can use **different models** than your gateway's default โ€” configure independently!

## ๐ŸŽฎ Usage

### Agent Tools

Agents can use these tools:

```
atlas_search(query, collection?, maxResults?)
โ†’ Search through indexed documents

atlas_index(path, collection?, background?)
โ†’ Index a new document or directory

atlas_collections()
โ†’ List all document collections

atlas_status()
โ†’ Check indexing status and stats
```

### CLI Commands

```bash
# Search documents
openclaw atlas search "LLM architecture patterns"

# Index with options
openclaw atlas index ~/Documents --background        # Async
openclaw atlas index ~/Documents --incremental      # Incremental
openclaw atlas index ~/Documents --shard 200        # Force shard

# Job management
openclaw atlas jobs                                    # List all jobs
openclaw atlas job-status <job-id>                     # Check progress
openclaw atlas job-cancel <job-id>                     # Cancel job

# Cache management
openclaw atlas cache-stats                              # View cache stats
openclaw atlas cache-clear                              # Clear cache

# Collections
openclaw atlas collections                              # List collections
openclaw atlas status                                   # System status
```

## ๐Ÿ—๏ธ Architecture

```
~/.openclaw/extensions/openclaw-atlas/
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ index.ts          # Plugin entry point
โ”‚   โ”œโ”€โ”€ types.ts          # TypeScript interfaces
โ”‚   โ”œโ”€โ”€ config.ts         # Config parsing
โ”‚   โ”œโ”€โ”€ logger.ts         # Logging wrapper
โ”‚   โ”œโ”€โ”€ pageindex.ts      # PageIndex API wrapper
โ”‚   โ”œโ”€โ”€ storage.ts        # Document & job management
โ”‚   โ”œโ”€โ”€ tools.ts          # Agent tools
โ”‚   โ””โ”€โ”€ cli.ts            # CLI commands
โ”œโ”€โ”€ openclaw.plugin.json  # Plugin manifest
โ”œโ”€โ”€ README.md             # This file
โ”œโ”€โ”€ SCALING.md            # Scaling guide
โ””โ”€โ”€ CLAUDE.md             # Agent guidelines
```

## ๐Ÿ” How PageIndex Works

1. **Tree Construction** โ€” Documents become hierarchical index trees
2. **Reasoning Search** โ€” LLMs navigate the tree to find answers
3. **Citation Preservation** โ€” Exact source references maintained

Read more: [PageIndex GitHub](https://github.com/VectifyAI/PageIndex)

## ๐Ÿ“ˆ Performance

### Small Collections (< 100 docs)
- Index time: < 5 minutes
- Search time: < 5 seconds
- Memory: ~50MB

### Medium Collections (100-1000 docs)
- Index time: 30-60 minutes (async)
- Search time: 5-15 seconds
- Memory: ~200MB

### Large Collections (1000-5000 docs)
- Index time: ~2 hours (sharded + async)
- Search time: 15-30 seconds (sharded)
- Memory: ~1GB

See **[SCALING.md](SCALING.md)** for optimization strategies.

## ๐Ÿ› ๏ธ Development

```bash
# Watch mode for development
npm run dev

# Type checking
npm run typecheck

# Build
npm run build
```

## ๐Ÿ“š Documentation

- **[README.md](README.md)** โ€” This file
- **[SCALING.md](SCALING.md)** โ€” Comprehensive scaling guide
- **[CLAUDE.md](CLAUDE.md)** โ€” Agent development guidelines

## ๐Ÿ“„ License

MIT

## ๐Ÿ™ Credits

- Built with [PageIndex](https://github.com/VectifyAI/PageIndex) by Vectify AI
- Part of the [OpenClaw](https://github.com/your-org/openclaw) ecosystem

tools

Comments

Sign in to leave a comment

Loading comments...