← Back to Plugins
Tools

Nirvana

ShivaClaw By ShivaClaw 👁 44 views ▲ 0 votes

Local-first AI inference plugin for OpenClaw. Works out-of-box with bundled model

GitHub

Configuration Example

{
  "nirvana": {
    "mode": "local-first",
    "local_llm": {
      "provider": "ollama",
      "endpoint": "http://ollama:11434",
      "model": "qwen2.5:7b",
      "timeout_ms": 180000
    },
    "routing": {
      "local_threshold": 0.75,
      "max_local_context_tokens": 8000,
      "cloud_fallback": true
    },
    "privacy": {
      "strip_soul": true,
      "strip_user": true,
      "strip_memory": true,
      "audit_logging": true,
      "audit_log_path": "~/.openclaw/workspace/memory/nirvana-audit.log"
    }
  }
}

README

# Project Nirvana: Local-First, Privacy-First Inference

> Your OpenClaw agent thinks locally and asks the cloud intelligently. **Saves 85%+ tokens. Protects privacy. Agent learns. Cloud doesn't.**

---

## The Problem With Today

Every time you ask your agent a question, your entire **system prompt** gets sent to cloud APIs:

- โŒ Excerpts from SOUL.md (your agent's identity)
- โŒ All personal information from USER.md (your data)
- โŒ Your entire chat history (context window)
- โŒ Everything costs 2,000โ€“5,000 extra tokens per query
- โŒ **The cloud provider trains its next model on your private data**

---

## The Nirvana Solution

**Local-first inference that protects privacy and slashes costs.**

```
Your question
    โ†“
Local LLM (qwen2.5:7b on your hardware)
80% of queries answered locally, free, private
    โ†“
For complex questions: Ask cloud intelligently
    โ†“
Cloud never sees your private data
    โ†“
Response cached locally; agent learns
    โ†“
Your answer (local + cloud intelligence if needed)

RESULT: 85% fewer tokens. Zero privacy leak.
```

---

## The Paradigm Shift

| Aspect | Today | Nirvana |
|--------|-------|---------|
| **Thinking happens where?** | Cloud only | Local first |
| **What goes to cloud?** | Everything (SOUL, USER, MEMORY, chat) | Sanitized query only |
| **Who learns from your data?** | Cloud provider | You (your agent) |
| **Cost per question** | $0.20โ€“$0.50 | $0.03โ€“$0.05 |
| **Privacy** | Compromised | Protected |

---

## Features

### ๐Ÿ  Local Inference
- Bundled **Ollama** with **qwen2.5:7b** model (3.5GB)
- **80%+ local routing** โ€” most queries answered free
- ~200 tokens/second on CPU; 3โ€“5x faster with GPU
- Works offline, completely private

### ๐Ÿ”’ Privacy Enforcement
- **Context stripper** โ€” SOUL.md, USER.md, MEMORY.md never sent to cloud
- **Prompt sanitizer** โ€” Agent rewrites its questions; your questions stay private
- **Audit trail** โ€” Every boundary crossing logged and transparent
- **Zero telemetry** โ€” Nothing sent to GitHub, ClawHub, or third parties

### ๐ŸŽฏ Intelligent Routing
- **Complexity analysis** โ€” "Can qwen handle this locally?"
- **Semantic understanding** โ€” Knows when to ask Claude or GPT for help
- **Seamless fallback** โ€” Cloud APIs used transparently when needed
- **Response caching** โ€” Cloud answers cached locally; your agent learns

### ๐Ÿ“Š Observability
- **Privacy audit log** โ€” See exactly what left your system (usually nothing)
- **Cost tracking** โ€” Monitor 85% savings in real-time
- **Performance metrics** โ€” Latency, tokens, accuracy, cache hits
- **Health checks** โ€” Ollama availability, network status

---

## Installation

### Quick Start (2 minutes)

```bash
# 1. Install plugin
clawhub install shivaclaw/nirvana

# 2. Start Ollama (pulls qwen2.5:7b on first run)
docker run -d -p 11434:11434 ollama/ollama

# 3. Restart OpenClaw
openclaw gateway restart

# 4. Verify
openclaw nirvana status
# โœ… Local LLM: qwen2.5:7b @ localhost:11434
# โœ… Privacy audit: enabled
# โœ… Cloud fallback: enabled
```

### Using Existing Local LLM

If you already have a local inference engine (Ollama, Llamafile, vLLM, etc.):

```bash
# Install the skill instead
clawhub install shivaclaw/nirvana-local

# Configure endpoint
openclaw nirvana configure --local-endpoint http://your-server:5000

# Verify
openclaw nirvana status
```

---

## How It Works

### Local-First Routing

```
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Agent receives your question                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ†“
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚ Complexity Analysis     โ”‚
        โ”‚ "Easy or hard question?"โ”‚
        โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ†™                โ†˜
         EASY (80%)        HARD (20%)
            โ†“                 โ†“
      Ollama qwen2.5:7b  Cloud API
      Free              Pay for answer
      Private           Sanitized query
      1s latency        3s latency
            โ†“              โ†“
      [Cache result] [Cache result]
            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ†“
          Return answer to user
    (Using local reasoning + cloud
     intelligence when beneficial)
```

### Privacy Boundary

**What Nirvana Strips Before Cloud API Calls:**
- โœ… Your SOUL.md (agent identity)
- โœ… Your USER.md (personal information)
- โœ… Your MEMORY.md (past conversations)
- โœ… Your chat history (your actual questions)
- โœ… Any session context containing private data

**What Cloud APIs Never See:**
- Your agent's personality
- Your personal details
- Your memories or preferences
- Your actual questions (replaced with agent's sanitized query)

**What the Cloud Provides:**
- Frontier model intelligence (when needed)
- Specialized reasoning (complex problems)
- Knowledge updates (cached locally)

---

## Cost Savings

### Real Numbers

```
Scenario: 10 questions/day for 30 days

TODAY (Default):
  20,000 tokens/day ร— 30 days = 600,000 tokens
  OpenAI GPT-4: $0.03/1K input tokens
  Cost: $18/month
  Privacy: Compromised (all data sent)

WITH NIRVANA:
  3,000 tokens/day ร— 30 days = 90,000 tokens
  (80% local = free; 20% cloud = $2.70)
  Cost: $2.70/month
  Privacy: Protected (private data never leaves)

SAVINGS: $15.30/month + 100% privacy protection
```

---

## Philosophy

### The Current Problem

The default paradigm trains the cloud provider:
- You send your entire context to OpenAI/Anthropic/Google
- They train their next model on your data
- You pay for the privilege
- Your private information becomes their training corpus

### The Nirvana Way

Reverse the learning direction:
- Your agent stays on your hardware
- It asks the cloud for **specific intelligence** only
- Your agent learns from cloud responses
- The cloud provider learns nothing about you
- You own the knowledge, not them

**Your agent should train itself. The cloud should not train on you.**

---

## Configuration

### Default (Ollama + qwen2.5:7b)

```json
{
  "nirvana": {
    "mode": "local-first",
    "local_llm": {
      "provider": "ollama",
      "endpoint": "http://ollama:11434",
      "model": "qwen2.5:7b",
      "timeout_ms": 180000
    },
    "routing": {
      "local_threshold": 0.75,
      "max_local_context_tokens": 8000,
      "cloud_fallback": true
    },
    "privacy": {
      "strip_soul": true,
      "strip_user": true,
      "strip_memory": true,
      "audit_logging": true,
      "audit_log_path": "~/.openclaw/workspace/memory/nirvana-audit.log"
    }
  }
}
```

### Custom Local LLM

```json
{
  "nirvana": {
    "local_llm": {
      "provider": "custom",
      "endpoint": "http://your-llm-server:5000",
      "api_format": "openai-compatible",
      "model": "your-model-name"
    }
  }
}
```

### Cloud Fallback Models

```json
{
  "nirvana": {
    "cloud_fallback": {
      "enabled": true,
      "models": [
        "anthropic/claude-3-5-sonnet",
        "openai/gpt-4-turbo",
        "google/gemini-2-flash"
      ]
    }
  }
}
```

---

## Platform Support

| Platform | Status | Notes |
|----------|--------|-------|
| Linux (Ubuntu/Debian) | โœ… Full | Docker + native binaries |
| macOS (Intel/ARM) | โœ… Full | Docker or native Ollama |
| Windows (WSL2) | โœ… Full | Docker in WSL2 |
| VPS (Hostinger, DO, AWS) | โœ… Full | Docker Compose ready |
| Docker container | โœ… Full | Orchestrated setup |
| Air-gapped (offline) | โœ… Full | Local-only, no cloud fallback |

---

## Performance

### Benchmarks (qwen2.5:7b on 4-core CPU)

| Metric | Value |
|--------|-------|
| **Latency (P50)** | 800msโ€“1.2s |
| **Throughput** | 180โ€“220 tokens/sec |
| **Memory (running)** | 4.6GB RAM |
| **Accuracy** | 85โ€“92% vs Claude 3.5 |
| **Cost per query** | $0 (local) or $0.01โ€“$0.05 (cloud fallback) |

### With GPU (CUDA/Metal)
- **Latency:** 150โ€“300ms (3โ€“5x faster)
- **Throughput:** 1,000โ€“2,000 tokens/sec
- **Accuracy:** No change (same model) |

---

## When to Use

### โœ… Perfect For
- Personal agents (maximize budget)
- Private workloads (code, healthcare, legal, finance)
- Latency-critical tasks (<2s response)
- Air-gapped environments (offline)
- Cost-conscious organizations (85% savings)
- Privacy-first deployments (zero external exposure)

### โš ๏ธ When to Use Cloud Only
- Advanced reasoning (Claude Opus for complexity)
- Specialized tasks (image generation, audio)
- Extreme scale (millions tokens/day)

---

## Example: Privacy Audit

```bash
# See what was sent to cloud this session
openclaw nirvana audit-log

# Output:
# [2026-04-24 14:23:45] LOCAL HANDLING
# Query: "What should my agent do today?"
# Handled by: qwen2.5:7b locally
# Cloud API call: No
# Private data leaked: None

# [2026-04-24 14:25:12] CLOUD FALLBACK
# Original query: [REDACTED by context-stripper]
# Sanitized query sent to Claude: "Explain quantum entanglement"
# Private data in request: None (SOUL, USER, MEMORY stripped)
# Response: Cached locally for future reuse
# Cost: $0.02
```

---

## Support

- **GitHub:** [ShivaClaw/nirvana-plugin](https://github.com/ShivaClaw/nirvana-plugin)
- **Issues:** [GitHub Issues](https://github.com/ShivaClaw/nirvana-plugin/issues)
- **Discussions:** [GitHub Discussions](https://github.com/ShivaClaw/nirvana-plugin/discussions)

---

## License

MIT-0 โ€” Free to use, modify, and redistribute. No attribution required.

---

*Your privacy matters. Nirvana makes local-first inference real.*
tools

Comments

Sign in to leave a comment

Loading comments...