DevOps
aetherlang-karpathy-skill
Implement 10 advanced AI agent node types for any DSL/runtime system — plan compiler, code interpreter, critique.
---
name: aetherlang-karpathy-upgrades
description: Implement 10 advanced AI agent node types for any DSL/runtime system — plan compiler, code interpreter, critique loops, intelligent routing, multi-agent ensemble, persistent memory, external API tools, iterative loops, data transforms, and parallel execution. Use this skill whenever the user wants to add agent capabilities to a workflow engine, build an AI orchestration framework, implement Karpathy-style AI upgrades, add new node types to a DSL runtime, or create autonomous agent pipelines. Also trigger when the user mentions AetherLang, flow engines, node types, agent frameworks, or wants to upgrade their AI system with capabilities like self-programming, reflection, routing, ensemble, memory, or tool use.
---
# AetherLang Karpathy Agent Upgrades
A battle-tested skill for implementing 10 advanced AI agent node types that transform any simple LLM pipeline into a full autonomous AI agent framework. Based on Andrej Karpathy's vision of AI systems that think, compute, reflect, and act autonomously.
Built and validated in production on the AetherLang Omega platform (neurodoc.app), these upgrades have been tested with real API calls, Docker deployments, and live traffic.
## The 10 Karpathy Upgrades
| # | Node Type | Capability | What It Does |
|---|-----------|-----------|--------------|
| 1 | `plan` | Self-Programming | LLM generates sub-flows dynamically, then runtime compiles and executes them |
| 2 | `code_interpreter` | Real Computation | Sandboxed Python execution with safe imports, no more hallucinated math |
| 3 | `critique` | Self-Improvement | Evaluates output quality (0-10), retries up to 3x with feedback until threshold met |
| 4 | `router` | Intelligent Branching | LLM picks optimal processing path, skips unselected routes (10x speedup) |
| 5 | `ensemble` | Multi-Agent Synthesis | Runs multiple AI personas in parallel, synthesizes superior combined response |
| 6 | `memory` | Persistent State | Store/recall/search data across executions with namespace isolation |
| 7 | `tool` | External API Access | Call any REST API (GET/POST/PUT/DELETE) with security limits |
| 8 | `loop` | Iterative Execution | Repeat any node over multiple items with collect or chain modes |
| 9 | `transform` | Data Reshaping | Template, extract, format, or LLM-powered data transformation between nodes |
| 10 | `parallel` | Concurrent Execution | Run multiple nodes simultaneously (3 API calls in 0.2s instead of 3s) |
## Architecture Pattern
Every upgrade follows the same 3-step implementation pattern. This is universal and works for any Python-based DSL runtime.
### Step 1: Parser — Add Node Type to Enum
```python
class NodeType(Enum):
LLM = "llm"
# ... existing types ...
PLAN = "plan"
CODE_INTERPRETER = "code_interpreter"
CRITIQUE = "critique"
ROUTER = "router"
ENSEMBLE = "ensemble"
MEMORY = "memory"
TOOL = "tool"
LOOP = "loop"
TRANSFORM = "transform"
PARALLEL = "parallel"
```
### Step 2: Runtime Dispatch — Route to Handler
Add an `elif` branch in the main node execution dispatcher:
```python
async def _execute_node(self, ctx, node):
if node.node_type == NodeType.LLM:
result = await self._execute_llm(ctx, node, data)
elif node.node_type == NodeType.PLAN:
result = await self._execute_plan(ctx, node, data)
elif node.node_type == NodeType.CODE_INTERPRETER:
result = await self._execute_code_interpreter(ctx, node, data)
# ... etc for all 10 types
```
### Step 3: Runtime Method — Implement Logic
Add the `async def _execute_<type>()` method to the runtime class.
**CRITICAL WARNING**: In Python, if a method name appears twice in a class, the LAST definition silently wins. Always check for and remove old stubs before inserting new methods:
```bash
grep -c "async def _execute_plan" runtime.py # Must be exactly 1
```
## Implementation Order and Dependencies
Implement in this order due to dependencies:
1. **PLAN** — No dependencies, foundational self-programming
2. **CODE_INTERPRETER** — No dependencies, standalone sandboxed execution
3. **CRITIQUE** — Uses `_execute_node()` for retries, needs existing node execution
4. **ROUTER** — Uses `_execute_node()` for selected routes, needs skip logic in main loop
5. **ENSEMBLE** — Uses `asyncio.gather()` for parallel agent execution
6. **MEMORY** — File-based JSON persistence, fully independent
7. **TOOL** — Needs HTTP client (httpx preferred over aiohttp in Docker)
8. **LOOP** — Uses `_execute_node()` for iterating target nodes
9. **TRANSFORM** — Independent, JSON parsing and LLM reformatting
10. **PARALLEL** — Uses `asyncio.gather()`, needs skip logic like ROUTER
---
## Upgrade #1: PLAN — Self-Programming Compiler
The PLAN node lets the AI write its own processing pipeline dynamically.
**Parameters**:
- `steps` (int): Number of sub-steps to generate, default 3
**How It Works**:
1. Send query to LLM asking it to break the task into N sequential steps
2. LLM returns JSON: `[{"step": 1, "action": "description", "type": "llm|code|search"}]`
3. For each step, create a temporary node and execute it
4. Collect all step results and combine into final response
**System Prompt for Plan Generation**:
```
Break this task into exactly {steps} sequential steps.
Return ONLY a JSON array: [{"step": 1, "action": "what to do", "type": "llm"}]
Valid types: llm (text generation), code (calculation), search (information lookup)
```
**Implementation**:
```python
async def _execute_plan(self, ctx, node, data):
query = data["inputs"].get("query", "")
steps = int(node.params.get("steps", "3"))
# 1. Generate plan via LLM
plan_response = await self.openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": plan_system_prompt},
{"role": "user", "content": query}
],
temperature=0.3
)
plan = json.loads(plan_response.choices[0].message.content)
# 2. Execute each step
results = []
for step in plan:
step_result = await self._execute_plan_step(ctx, step, query, results)
results.append(step_result)
# 3. Combine results
combined = "\n\n".join([f"Step {r['step']}: {r['result']}" for r in results])
return {"response": combined, "plan_steps": plan, "node_type": "plan"}
```
---
## Upgrade #2: CODE_INTERPRETER — Sandboxed Python
Executes real Python code for accurate calculations instead of LLM hallucinations.
**Safety Configuration**:
```python
FORBIDDEN_OPERATIONS = ['__import__', 'exec', 'eval', 'compile', 'open',
'subprocess', 'os', 'sys', 'socket', 'shutil']
ALLOWED_IMPORTS = {'math': math, 'statistics': statistics}
SAFE_BUILTINS = {'abs': abs, 'round': round, 'min': min, 'max': max,
'sum': sum, 'len': len, 'range': range, 'int': int,
'float': float, 'str': str, 'bool': bool, 'list': list,
'dict': dict, 'print': print, 'isinstance': isinstance}
CODE_TIMEOUT = 5 # seconds
```
**Implementation**:
```python
async def _execute_code_interpreter(self, ctx, node, data):
query = data["inputs"].get("query", "")
# 1. LLM generates code
code_response = await self.openai_client.chat.completions.create(
model=node.params.get("model", "gpt-4o-mini"),
messages=[
{"role": "system", "content": "Write Python code. Set variable `result` at the end. ONLY code, no markdown."},
{"role": "user", "content": query}
],
temperature=0.2
)
code = code_response.choices[0].message.content.strip()
code = code.replace("```python", "").replace("```", "").strip()
# 2. Security check
for forbidden in FORBIDDEN_OPERATIONS:
if forbidden in code:
return {"response": f"Blocked: {forbidden}", "node_type": "code_interpreter"}
# 3. Sandboxed execution
safe_globals = {"__builtins__": SAFE_BUILTINS, "math": math, "statistics": statistics}
local_vars = {}
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(exec, code, safe_globals, local_vars)
future.result(timeout=CODE_TIMEOUT)
result = local_vars.get("result", "No result variable set")
return {"response": str(result), "code": code, "node_type": "code_interpreter"}
```
---
## Upgrade #3: CRITIQUE — Self-Improvement Loop
Evaluates output quality and retries with feedback until quality threshold is met.
**Parameters**:
- `threshold` (float): Minimum score 0-10, default 7
- `max_retries` (int): Max attempts, capped at hard limit of 3
- `criteria` (str): Evaluation criteria, default "accuracy, completeness, clarity"
**Implementation Logic**:
```
1. Get upstream output
2. For each attempt (0 to max_retries):
a. Send output + criteria to evaluator LLM
b. Parse: {score: 8.5, passed: true, feedback: "...", strengths: "..."}
c. If score >= threshold → PASS, return output
d. If max retries hit → return best attempt
e. Else → re-execute source node with enhanced query containing feedback
```
**Enhanced Query Format for Retries**:
```
[Original query]
[IMPROVEMENT FEEDBACK - attempt 2/3]:
Score: 5.5/10 (need 7.0/10)
Issues: Missing specific examples
Keep: Good structure and flow
Please improve your response addressing the feedback above.
```
---
## Upgrade #4: ROUTER — Intelligent Branching
LLM analyzes query and selects the optimal downstream node to execute, skipping all others.
**Parameters**:
- `routes` (str): Comma-separated "alias:description" pairs
- `strategy` (str): "single" or "multi", default "single"
**Critical Skip Logic** (required for performance — 10x speedup):
Two parts must be implemented:
**Part 1 — Main execution loop check**:
```python
for node_alias in execution_order:
if node_alias in ctx.node_outputs:
ctx.log("SYSTEM", "INFO", f"Skip {node_alias} (already executed)")
... (truncated)
devops
By
Comments
Sign in to leave a comment