Tools
Jimbomesh Holler Server
Open source AI inference server with Model Marketplace, Document RAG, and OpenAI-compatible API
Configuration Example
{
"providers": {
"holler": {
"type": "openai",
"baseUrl": "http://localhost:1920/v1",
"apiKey": "your-holler-api-key"
}
}
}
README
# JimboMesh Holler Server
On-prem embedding and LLM inference server for [JimboMesh](https://github.com/IngressTechnology/JimboMesh). Replaces cloud-based embedding calls (OpenRouter/OpenAI) with a local Ollama instance, keeping all data on-premises.
## What This Does
JimboMesh currently uses OpenRouter's `text-embedding-3-small` (1536d) for all embedding operations. This project provides a local Ollama server running `nomic-embed-text` (768d) as a drop-in replacement, eliminating the need for cloud API calls during ingestion.
```
Before: ingest-*.js โ embed.sh โ OpenRouter API โ Qdrant
After: ingest-*.js โ embed.sh โ Ollama (local) โ Qdrant
```
## Quick Start
The built-in setup scripts are the fastest way to get running โ they handle
prerequisites, `.env` creation, API key generation, image build, startup, and
persist your setup selections to `.env` for seamless reinstalls/rebuilds.
### Linux / macOS (one command)
```bash
git clone https://github.com/IngressTechnology/jimbomesh-holler-server.git && cd jimbomesh-holler-server && ./setup.sh
```
### Windows PowerShell (one command)
```powershell
git clone https://github.com/IngressTechnology/jimbomesh-holler-server.git; cd jimbomesh-holler-server; .\setup.ps1
```
### No git? No problem.
**Linux / macOS โ curl:**
```bash
curl -fsSL https://github.com/IngressTechnology/jimbomesh-holler-server/archive/refs/heads/main.tar.gz | tar xz && cd jimbomesh-holler-server-main && ./setup.sh
```
**Linux / macOS โ wget:**
```bash
wget -qO- https://github.com/IngressTechnology/jimbomesh-holler-server/archive/refs/heads/main.tar.gz | tar xz && cd jimbomesh-holler-server-main && ./setup.sh
```
**Windows โ PowerShell (no git):**
```powershell
irm https://github.com/IngressTechnology/jimbomesh-holler-server/archive/refs/heads/main.zip -OutFile holler.zip; Expand-Archive holler.zip .; cd jimbomesh-holler-server-main; .\setup.ps1
```
Add `--gpu` / `-WithGpu` for NVIDIA GPU support, `--qdrant` / `-WithQdrant` for a local vector DB.
See the [Quick Start Guide](QUICK_START.md) for all flags and the manual install path.
## Documentation by Audience
Start with the section that matches your role:
- **Users**: [QUICK_START.md](QUICK_START.md), [docs/API_USAGE.md](docs/API_USAGE.md), [docs/TROUBLESHOOTING.md](docs/TROUBLESHOOTING.md), [docs/INTEGRATION.md](docs/INTEGRATION.md)
- **Admins / Operators**: [docs/DEPLOYMENT.md](docs/DEPLOYMENT.md), [docs/CONFIGURATION.md](docs/CONFIGURATION.md), [docs/SECURITY.md](docs/SECURITY.md), [docs/MAC_WINDOWS_SETUP.md](docs/MAC_WINDOWS_SETUP.md)
- **Contributors / Developers**: [CONTRIBUTING.md](CONTRIBUTING.md), [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md), [docs/CURSOR_VS_CODE.md](docs/CURSOR_VS_CODE.md), [docs/DOCKERBUILD.md](docs/DOCKERBUILD.md)
### Manual
```bash
cp .env.example .env
# Edit .env โ REQUIRED: set JIMBOMESH_HOLLER_API_KEY (generate with: openssl rand -hex 32)
# Optional: set QDRANT_API_KEY if using --profile qdrant
docker compose build jimbomesh-still
docker compose up -d
```
First startup pulls the configured models (~2-5 minutes depending on network speed).
## Naming Convention
This project uses a hierarchical naming scheme:
- **Repository**: `jimbomesh-holler-server` โ The overall project
- **Compose Project**: `jimbomesh-holler` โ Docker Compose namespace
- **Service**: `jimbomesh-still` โ The main Ollama service
- **Image**: `jimbomesh-still:latest` โ Docker image
See [NAMING.md](NAMING.md) for details on why this structure supports multiple service variants.
## Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Docker Network โ
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ jimbomesh-still โ โ jimbomesh-qdrant โ โ
โ โ โ โ (optional profile) โ โ
โ โ โโโโโโโโโโโโโโโโโ โ โ โ โ
โ โ โ API Gateway โ โ โ Qdrant v1.13.2 โ โ
โ โ โ :1920 (ext) โ โ โ :6333 REST โ โ
โ โ โ X-API-Key authโ โ โ :6334 gRPC โ โ
โ โ โ /admin (UI) โ โ โ โ โ
โ โ โโโโโโโโโฌโโโโโโโโ โ โ โ โ
โ โ โผ โ โ Collections: โ โ
โ โ Ollama Server โ โ - knowledge_base โ โ
โ โ :11435 (internal) โ โ - memory โ โ
โ โ :9090 (health) โ โ - client_research โ โ
โ โ Models: โ โ โ โ
โ โ - nomic-embed-text โ โ โ โ
โ โ - llama3.1:8b โ โ โ โ
โ โ โ โ โ โ
โ โ SQLite (holler.db) โ โ โ โ
โ โ /data/ volume โ โ โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
## Port 1920
JimboMesh runs on port **1920** by default โ the year Prohibition started and moonshine went underground.
| Service | URL |
|---------|-----|
| Holler Gateway | http://localhost:1920 |
| Admin UI | http://localhost:1920/admin |
| OpenAI-Compatible API | http://localhost:1920/v1 |
| Ollama (upstream) | http://localhost:11434 |
To use a custom port, set the `PORT` environment variable:
```bash
PORT=8080 docker compose up
```
## Storage
All request logs, runtime settings, and aggregated statistics are stored in a SQLite database (`holler.db`) on a dedicated Docker volume (`holler_data`). Data persists across container restarts and rebuilds. Logs are automatically pruned after 30 days (configurable via `LOG_RETENTION_DAYS`).
## Deployment Modes
| Mode | How to Enable | Services |
|------|--------------|----------|
| CPU (default) | `docker compose up -d` | `jimbomesh-still` |
| NVIDIA GPU | Set `COMPOSE_FILE=docker-compose.yml:docker-compose.gpu.yml` in `.env`, then `docker compose up -d` | `jimbomesh-still` (with GPU) |
| macOS Metal GPU | Run `./setup.sh`, select **[P] Performance Mode** | `jimbomesh-still` (gateway only; Ollama runs natively) |
| + Qdrant | Append `--profile qdrant` to any `docker compose up` | + `jimbomesh-qdrant` |
**NVIDIA GPU mode:** Add to `.env`: `COMPOSE_FILE=docker-compose.yml:docker-compose.gpu.yml` (use `;` separator on Windows). Then all `docker compose` commands (up, down, restart) automatically include GPU support.
**macOS Performance Mode:** Docker cannot pass through Apple Metal GPU to containers. `setup.sh` detects macOS and offers a mode selection โ Performance Mode installs Ollama natively via Homebrew (full Metal GPU), while Docker handles only the API gateway. See [docs/MAC_WINDOWS_SETUP.md](docs/MAC_WINDOWS_SETUP.md) for details.
## Models
| Model | Type | Dimensions | Size | Purpose |
|-------|------|-----------|------|---------|
| `nomic-embed-text` | Embedding | 768 | ~274 MB | Text embeddings for Qdrant |
| `llama3.1:8b` | LLM | โ | ~4.9 GB | General-purpose inference (128K context) |
See [docs/CONFIGURATION.md](docs/CONFIGURATION.md#alternative-embedding-models) for alternative models.
## API
**Authentication Required:** All API requests must include the `X-API-Key` header.
OpenAPI schemas in this section:
- `GET /api/tags` โ response `TagsResponse`
- `POST /api/embed` โ request `OllamaEmbedRequest`, response `OllamaEmbedResponse`
```bash
# List models
curl -H "X-API-Key: your_api_key_here" \
http://localhost:1920/api/tags
# Generate embeddings
curl -H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"model": "nomic-embed-text", "input": "Your text here"}' \
http://localhost:1920/api/embed
```
### OpenAI-Compatible Endpoint
A drop-in `/v1/embeddings` endpoint that speaks the OpenAI format. Just change the base URL:
OpenAPI schemas: request `OpenAIEmbeddingsRequest`, response `OpenAIEmbeddingsResponse`.
```bash
curl -H "X-API-Key: your_api_key_here" \
-H "Content-Type: application/json" \
-d '{"model": "nomic-embed-text", "input": ["Hello world", "Another text"]}' \
http://localhost:1920/v1/embeddings
```
Returns OpenAI-format response with `object`, `data[].embedding`, `model`, and `usage`. Supports batch (array of strings) natively.
**Authentication Errors:**
- `401 Unauthorized` โ Missing API key
- `403 Forbidden` โ Invalid API key
- `429 Too Many Requests` โ Rate limit exceeded (60 req/min default)
### Health Endpoints (port 9090)
| Endpoint | Purpose | 200 when | 503 when |
|----------|---------|----------|----------|
| `/healthz` | Liveness probe | Ollama API responds | API unreachable |
| `/readyz` | Readiness probe | API up + model available | Any check fails |
| `/status` | Info/debug | API responds | API unreachable |
```bash
# Liveness check
curl -s http://localhost:9090/healthz | jq .
# Readiness check (used by Docker healthcheck)
curl -s http://localhost:9090/readyz | jq .
# Detailed status with model list
curl -s http://localhost:9090/status | jq .
```
## Admin UI
A built-in web admin panel is available at `/admin` on the API gateway port (default 1920). No additional ports or processes required.
**Quick connect** โ use the auto-login URL printed by the installer:
```
http://localhost:1920/admin#key=YOUR_API_KEY
```
The `#key=` fragment logs you in automatically and is stripped from the URL bar
(hash fragments never leave the browser). Bookmark it for quick access, or
navigate to `http://localhost:1920/admin` and enter the key manually.
**Features:**
| Tab | Description |
|-----|-------------|
| Dashboard | Server health, Ollama latency, model count, uptime (auto-refresh 10s) |
| Models | List, pull (with streaming progress), delete, view details |
| Mesh | Connect/disconnect/cancel Mesh sessions, set coordinator URL and Holler name, view live Mesh connection log and mode |
| Playground | Test embeddings, chat (streaming), generate (streaming) |
| Configuration | Editable runtime settings, API key management (
... (truncated)
tools
Comments
Sign in to leave a comment