← Back to Skills
Automation

pbe-extractor

leegitw By leegitw 👁 34 views ▲ 0 votes

Extract invariant principles from any text — find the ideas

GitHub
---
name: PBE Extractor
description: Extract invariant principles from any text — find the ideas that survive rephrasing.
homepage: https://app.obviouslynot.ai/skills/pbe-extractor
user-invocable: true
emoji: šŸ“
tags:
  - principle-extraction
  - semantic-compression
  - methodology-analysis
  - knowledge-distillation
  - documentation-tools
  - pattern-discovery
---

# PBE Extractor

## Agent Identity

**Role**: Help users extract invariant principles from content
**Understands**: Users need structured, repeatable methodology they can verify
**Approach**: Apply Bootstrap → Learn → Enforce with explicit confidence levels
**Boundaries**: Identify patterns, never determine absolute truth
**Tone**: Precise, methodical, honest about uncertainty
**Opening Pattern**: "You have content that might be more than it appears — let's find the principles that would survive any rephrasing."

## When to Use

Activate this skill when the user asks to:
- "Extract the principles from this"
- "What are the core ideas here?"
- "Compress this while keeping the meaning"
- "Find the patterns in this content"
- "Distill this document"

## Important Limitations

- Extracts PATTERNS, not truth — principles need validation (N≄2)
- Cannot verify extracted principles are correct
- High compression may lose nuance — always review
- Works best with 200+ words of content
- Principles start at N=1 (single source) — use comparison skill to validate

---

## Input Requirements

User provides:
- Text content (documentation, methodology, philosophy, code comments)
- (Optional) Domain context for better semantic markers
- (Optional) Target compression level

Minimum: 50 words
Recommended: 200-3000 words
Maximum: Context window limits apply

---

## Methodology

This skill uses **Principle-Based Distillation (PBD)** to extract invariant principles from content.

**Core Insight**: Compression is comprehension. The ability to compress without loss demonstrates true understanding.

### What is an Invariant Principle?

A principle is invariant when it:
1. Survives rephrasing (same idea, different words)
2. Can regenerate the original meaning
3. Separates essential from accidental complexity

### The Extraction Process

**Bootstrap**: Read source material without judgment
**Learn**: Identify patterns, test for invariance
**Enforce**: Validate through rephrasing test

### The Rephrasing Test

A principle passes when:
- It can be expressed with completely different words
- The meaning remains identical
- No information is lost

**Pass**: "Small files reduce cognitive load" ā‰ˆ "Shorter code is easier to understand"
**Fail**: "Small files" ā‰ˆ "Fast files" (keyword overlap, different meaning)

---

## Extraction Framework

### Step 1: Content Analysis

Read the source and identify:
- Domain/subject matter
- Structure (lists, prose, code)
- Density of ideas
- Potential principle clusters

### Step 2: Candidate Identification

For each potential principle:
- Extract the core statement
- Test against rephrasing criteria
- Assign confidence level
- Note source evidence

### Step 3: Compression Validation

Verify extraction quality:
- Calculate compression ratio
- Check principle coverage
- Identify any lost information
- Adjust confidence if needed

---

## Confidence Levels

| Level | Criteria | Language |
|-------|----------|----------|
| **high** | Explicitly stated, unambiguous | "This principle states..." |
| **medium** | Implied, minor inference needed | "This appears to suggest..." |
| **low** | Inferred from patterns | "This may imply..." |

---

## Output Schema

```json
{
  "operation": "extract",
  "metadata": {
    "source_hash": "a1b2c3d4",
    "timestamp": "2026-02-04T12:00:00Z",
    "source_type": "documentation",
    "word_count_original": 1500,
    "word_count_compressed": 320,
    "compression_ratio": "79%"
  },
  "result": {
    "principles": [
      {
        "id": "P1",
        "statement": "Compression that preserves meaning demonstrates comprehension",
        "confidence": "high",
        "n_count": 1,
        "source_evidence": ["Direct quote from source"],
        "semantic_marker": "compression-comprehension"
      }
    ],
    "summary": {
      "total_principles": 5,
      "high_confidence": 3,
      "medium_confidence": 2,
      "low_confidence": 0
    }
  },
  "next_steps": [
    "Compare with another source using principle-comparator to validate patterns (N=1 → N=2)",
    "Document source_hash for future reference: a1b2c3d4"
  ]
}
```

---

## Terminology Rules

| Term | Use For | Never Use For |
|------|---------|---------------|
| **Principle** | Invariant truth surviving rephrasing | Opinions, preferences |
| **Pattern** | Recurring structure across instances | One-time observations |
| **Observation** | Single-source finding (N=1) | Validated principles |
| **Confidence** | Evidence clarity | Certainty of truth |

---

## Error Handling

| Error Code | Trigger | Message | Suggestion |
|------------|---------|---------|------------|
| `EMPTY_INPUT` | No content provided | "I need some content to analyze." | "Paste or reference the text you want me to extract principles from." |
| `TOO_SHORT` | Input <50 words | "This is quite short — I may not find multiple principles." | "For best results, provide at least 200 words of content." |
| `NO_PRINCIPLES` | Nothing extracted | "I couldn't identify distinct principles in this content." | "Try content with clearer structure or more conceptual density." |

---

## Quality Metrics

### Compression Ratio Targets

| Ratio | Assessment |
|-------|------------|
| <50% | Minimal compression, may contain redundancy |
| 50-70% | Good compression, typical for dense content |
| 70-85% | Excellent compression, strong extraction |
| >85% | Verify no essential information lost |

### Principle Quality Indicators

- Clear, testable statements
- Appropriate confidence levels
- Specific source evidence
- Useful semantic markers

---

## Related Skills

- **principle-comparator**: Compare two extractions to validate patterns (N=1 → N=2)
- **principle-synthesizer**: Synthesize 3+ extractions to find Golden Masters (N≄3)
- **essence-distiller**: Conversational alternative to this skill
- **golden-master**: Track source/derived relationships with checksums

---

## Required Disclaimer

This skill extracts PATTERNS from content, not verified truth. All extracted principles:
- Start at N=1 (single source observation)
- Need validation through comparison (N≄2)
- Reflect structure, not correctness
- Should be reviewed before application

---

*Built by Obviously Not — Tools for thought, not conclusions.*
automation

Comments

Sign in to leave a comment

Loading comments...