Jimbomesh Holler Server

Name: Jimbomesh Holler Server
Rating: 3.5 (1 reviews)
Author: IngressTechnology

By IngressTechnology 👁 83 views ▲ 0 votes

Open source AI inference server with Model Marketplace, Document RAG, and OpenAI-compatible API

Homepage GitHub

Configuration Example

{
  "providers": {
    "holler": {
      "type": "openai",
      "baseUrl": "http://localhost:1920/v1",
      "apiKey": "your-holler-api-key"
    }
  }
}

README

# JimboMesh Holler Server

On-prem embedding and LLM inference server for [JimboMesh](https://github.com/IngressTechnology/JimboMesh). Replaces cloud-based embedding calls (OpenRouter/OpenAI) with a local Ollama instance, keeping all data on-premises.

## What This Does

JimboMesh currently uses OpenRouter's `text-embedding-3-small` (1536d) for all embedding operations. This project provides a local Ollama server running `nomic-embed-text` (768d) as a drop-in replacement, eliminating the need for cloud API calls during ingestion.

```
Before:  ingest-*.js → embed.sh → OpenRouter API → Qdrant
After:   ingest-*.js → embed.sh → Ollama (local) → Qdrant
```

## Quick Start

The built-in setup scripts are the fastest way to get running — they handle
prerequisites, `.env` creation, API key generation, image build, startup, and
persist your setup selections to `.env` for seamless reinstalls/rebuilds.

### Linux / macOS (one command)

```bash
git clone https://github.com/IngressTechnology/jimbomesh-holler-server.git && cd jimbomesh-holler-server && ./setup.sh
```

### Windows PowerShell (one command)

```powershell
git clone https://github.com/IngressTechnology/jimbomesh-holler-server.git; cd jimbomesh-holler-server; .\setup.ps1
```

### No git? No problem.

**Linux / macOS — curl:**

```bash
curl -fsSL https://github.com/IngressTechnology/jimbomesh-holler-server/archive/refs/heads/main.tar.gz | tar xz && cd jimbomesh-holler-server-main && ./setup.sh
```

**Linux / macOS — wget:**

```bash
wget -qO- https://github.com/IngressTechnology/jimbomesh-holler-server/archive/refs/heads/main.tar.gz | tar xz && cd jimbomesh-holler-server-main && ./setup.sh
```

**Windows — PowerShell (no git):**

```powershell
irm https://github.com/IngressTechnology/jimbomesh-holler-server/archive/refs/heads/main.zip -OutFile holler.zip; Expand-Archive holler.zip .; cd jimbomesh-holler-server-main; .\setup.ps1
```

Add `--gpu` / `-WithGpu` for NVIDIA GPU support, `--qdrant` / `-WithQdrant` for a local vector DB.
See the [Quick Start Guide](QUICK_START.md) for all flags and the manual install path.

## Documentation by Audience

Start with the section that matches your role:

- **Users**: [QUICK_START.md](QUICK_START.md), [docs/API_USAGE.md](docs/API_USAGE.md), [docs/TROUBLESHOOTING.md](docs/TROUBLESHOOTING.md), [docs/INTEGRATION.md](docs/INTEGRATION.md)
- **Admins / Operators**: [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md), [docs/CONFIGURATION.md](docs/CONFIGURATION.md), [docs/SECURITY.md](docs/SECURITY.md), [docs/MAC_WINDOWS_SETUP.md](docs/MAC_WINDOWS_SETUP.md)
- **Contributors / Developers**: [CONTRIBUTING.md](CONTRIBUTING.md), [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md), [docs/CURSOR_VS_CODE.md](docs/CURSOR_VS_CODE.md), [docs/DOCKERBUILD.md](docs/DOCKERBUILD.md)

### Manual

```bash
cp .env.example .env
# Edit .env — REQUIRED: set JIMBOMESH_HOLLER_API_KEY (generate with: openssl rand -hex 32)
# Optional: set QDRANT_API_KEY if using --profile qdrant
docker compose build jimbomesh-still
docker compose up -d
```

First startup pulls the configured models (~2-5 minutes depending on network speed).

## Naming Convention

This project uses a hierarchical naming scheme:

- **Repository**: `jimbomesh-holler-server` — The overall project
- **Compose Project**: `jimbomesh-holler` — Docker Compose namespace
- **Service**: `jimbomesh-still` — The main Ollama service
- **Image**: `jimbomesh-still:latest` — Docker image

See [NAMING.md](NAMING.md) for details on why this structure supports multiple service variants.

## Architecture

```
┌────────────────────────────────────────────────────────┐
│                   Docker Network                       │
│                                                        │
│  ┌─────────────────────┐   ┌─────────────────────┐   │
│  │  jimbomesh-still     │   │  jimbomesh-qdrant   │   │
│  │                      │   │  (optional profile) │   │
│  │  ┌───────────────┐   │   │                     │   │
│  │  │ API Gateway   │   │   │  Qdrant v1.13.2     │   │
│  │  │ :1920 (ext)   │   │   │  :6333 REST         │   │
│  │  │ X-API-Key auth│   │   │  :6334 gRPC         │   │
│  │  │ /admin (UI)   │   │   │                     │   │
│  │  └───────┬───────┘   │   │                     │   │
│  │          ▼           │   │  Collections:       │   │
│  │  Ollama Server       │   │  - knowledge_base   │   │
│  │  :11435 (internal)   │   │  - memory           │   │
│  │  :9090 (health)      │   │  - client_research  │   │
│  │  Models:             │   │                     │   │
│  │  - nomic-embed-text  │   │                     │   │
│  │  - llama3.1:8b       │   │                     │   │
│  │                      │   │                     │   │
│  │  SQLite (holler.db)  │   │                     │   │
│  │  /data/ volume       │   │                     │   │
│  └─────────────────────┘   └─────────────────────┘   │
└────────────────────────────────────────────────────────┘
```

## Port 1920

JimboMesh runs on port **1920** by default — the year Prohibition started and moonshine went underground.

| Service | URL |
|---------|-----|
| Holler Gateway | http://localhost:1920 |
| Admin UI | http://localhost:1920/admin |
| OpenAI-Compatible API | http://localhost:1920/v1 |
| Ollama (upstream) | http://localhost:11434 |

To use a custom port, set the `PORT` environment variable:

```bash
PORT=8080 docker compose up
```

## Storage

All request logs, runtime settings, and aggregated statistics are stored in a SQLite database (`holler.db`) on a dedicated Docker volume (`holler_data`). Data persists across container restarts and rebuilds. Logs are automatically pruned after 30 days (configurable via `LOG_RETENTION_DAYS`).

## Deployment Modes

| Mode | How to Enable | Services |
|------|--------------|----------|
| CPU (default) | `docker compose up -d` | `jimbomesh-still` |
| NVIDIA GPU | Set `COMPOSE_FILE=docker-compose.yml:docker-compose.gpu.yml` in `.env`, then `docker compose up -d` | `jimbomesh-still` (with GPU) |
| macOS Metal GPU | Run `./setup.sh`, select **[P] Performance Mode** | `jimbomesh-still` (gateway only; Ollama runs natively) |
| + Qdrant | Append `--profile qdrant` to any `docker compose up` | + `jimbomesh-qdrant` |

**NVIDIA GPU mode:** Add to `.env`: `COMPOSE_FILE=docker-compose.yml:docker-compose.gpu.yml` (use `;` separator on Windows). Then all `docker compose` commands (up, down, restart) automatically include GPU support.

**macOS Performance Mode:** Docker cannot pass through Apple Metal GPU to containers. `setup.sh` detects macOS and offers a mode selection — Performance Mode installs Ollama natively via Homebrew (full Metal GPU), while Docker handles only the API gateway. See [docs/MAC_WINDOWS_SETUP.md](docs/MAC_WINDOWS_SETUP.md) for details.

## Models

| Model | Type | Dimensions | Size | Purpose |
|-------|------|-----------|------|---------|
| `nomic-embed-text` | Embedding | 768 | ~274 MB | Text embeddings for Qdrant |
| `llama3.1:8b` | LLM | — | ~4.9 GB | General-purpose inference (128K context) |

See [docs/CONFIGURATION.md](docs/CONFIGURATION.md#alternative-embedding-models) for alternative models.

## API

**Authentication Required:** All API requests must include the `X-API-Key` header.

OpenAPI schemas in this section:
- `GET /api/tags` → response `TagsResponse`
- `POST /api/embed` → request `OllamaEmbedRequest`, response `OllamaEmbedResponse`

```bash
# List models
curl -H "X-API-Key: your_api_key_here" \
  http://localhost:1920/api/tags

# Generate embeddings
curl -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed-text", "input": "Your text here"}' \
  http://localhost:1920/api/embed
```

### OpenAI-Compatible Endpoint

A drop-in `/v1/embeddings` endpoint that speaks the OpenAI format. Just change the base URL:

OpenAPI schemas: request `OpenAIEmbeddingsRequest`, response `OpenAIEmbeddingsResponse`.

```bash
curl -H "X-API-Key: your_api_key_here" \
  -H "Content-Type: application/json" \
  -d '{"model": "nomic-embed-text", "input": ["Hello world", "Another text"]}' \
  http://localhost:1920/v1/embeddings
```

Returns OpenAI-format response with `object`, `data[].embedding`, `model`, and `usage`. Supports batch (array of strings) natively.

**Authentication Errors:**
- `401 Unauthorized` — Missing API key
- `403 Forbidden` — Invalid API key
- `429 Too Many Requests` — Rate limit exceeded (60 req/min default)

### Health Endpoints (port 9090)

| Endpoint | Purpose | 200 when | 503 when |
|----------|---------|----------|----------|
| `/healthz` | Liveness probe | Ollama API responds | API unreachable |
| `/readyz` | Readiness probe | API up + model available | Any check fails |
| `/status` | Info/debug | API responds | API unreachable |

```bash
# Liveness check
curl -s http://localhost:9090/healthz | jq .

# Readiness check (used by Docker healthcheck)
curl -s http://localhost:9090/readyz | jq .

# Detailed status with model list
curl -s http://localhost:9090/status | jq .
```

## Admin UI

A built-in web admin panel is available at `/admin` on the API gateway port (default 1920). No additional ports or processes required.

**Quick connect** — use the auto-login URL printed by the installer:

```
http://localhost:1920/admin#key=YOUR_API_KEY
```

The `#key=` fragment logs you in automatically and is stripped from the URL bar
(hash fragments never leave the browser). Bookmark it for quick access, or
navigate to `http://localhost:1920/admin` and enter the key manually.

**Features:**

| Tab | Description |
|-----|-------------|
| Dashboard | Server health, Ollama latency, model count, uptime (auto-refresh 10s) |
| Models | List, pull (with streaming progress), delete, view details |
| Mesh | Connect/disconnect/cancel Mesh sessions, set coordinator URL and Holler name, view live Mesh connection log and mode |
| Playground | Test embeddings, chat (streaming), generate (streaming) |
| Configuration | Editable runtime settings, API key management (

... (truncated)

tools