← Back to Skills
Productivity

chain-of-density

killerapp By killerapp 👁 22 views ▲ 0 votes

Iteratively densify text summaries using Chain-of-Density technique.

GitHub
---
name: chain-of-density
description: "Iteratively densify text summaries using Chain-of-Density technique. Use when compressing verbose documentation, condensing requirements, or creating executive summaries while preserving information density."
license: Apache-2.0
compatibility: "Python 3.10+ (for text_metrics.py script via uv run)"
metadata:
  author: agentic-insights
  version: "1.2"
  paper: "From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting"
  arxiv: "https://arxiv.org/abs/2309.04269"
---

# Chain-of-Density Summarization

Compress text through iterative entity injection following the CoD paper methodology. Each pass identifies missing entities from the source and incorporates them while maintaining identical length.

## The Method

Chain-of-Density works through multiple iterations:

1. **Iteration 1**: Create sparse, verbose base summary (4-5 sentences at `target_words`)
2. **Subsequent iterations**: Each iteration:
   - Identify 1-3 missing entities from SOURCE (not summary)
   - Rewrite summary to include them
   - Maintain IDENTICAL word count through compression

**Key principle**: Never drop entities - only add and compress.

## Missing Entity Criteria

Each entity added must meet ALL 5 criteria:

| Criterion | Description |
|-----------|-------------|
| **Relevant** | To the main story/topic |
| **Specific** | Descriptive yet concise (≤5 words) |
| **Novel** | Not in the previous summary |
| **Faithful** | Present in the source (no hallucination) |
| **Anywhere** | Can be from anywhere in the source |

## Quick Start

1. User provides text to summarize
2. Orchestrate 5 iterations via `cod-iteration` agent
3. Each iteration reports entities added via `Missing_Entities:` line
4. Return final summary + entity accumulation history

## Orchestration Pattern

```
Iteration 1: Sparse base (target_words, verbose filler)
     ↓ Missing_Entities: (none - establishing base)
Iteration 2: +3 entities, compress filler
     ↓ Missing_Entities: "entity1"; "entity2"; "entity3"
Iteration 3: +3 entities, compress more
     ↓ Missing_Entities: "entity4"; "entity5"; "entity6"
Iteration 4: +2 entities, tighten
     ↓ Missing_Entities: "entity7"; "entity8"
Iteration 5: +1-2 entities, final density
     ↓ Missing_Entities: "entity9"
Final dense summary (same word count, 9+ entities)
```

## How to Orchestrate

**Iteration 1** - Pass source text only:

```
Task(subagent_type="cod-iteration", prompt="""
iteration: 1
target_words: 80
text: [SOURCE TEXT HERE]
""")
```

**Iterations 2-5** - Pass BOTH previous summary AND source:

```
Task(subagent_type="cod-iteration", prompt="""
iteration: 2
target_words: 80
text: [PREVIOUS SUMMARY HERE]
source: [ORIGINAL SOURCE TEXT HERE]
""")
```

**Critical**:
- Invoke serially, not parallel
- Pass SOURCE text in every iteration for entity discovery
- Parse `Missing_Entities:` line to track entity accumulation

## Expected Agent Output Format

The `cod-iteration` agent returns:

```
Missing_Entities: "entity1"; "entity2"; "entity3"

Denser_Summary:
[The densified summary - identical word count to previous]
```

Parse both parts - track entities for history, pass summary to next iteration.

## Measuring Density

Use `scripts/text_metrics.py` for deterministic word counts:

```bash
echo "your summary text" | uv run scripts/text_metrics.py words
# Returns: word count

uv run scripts/text_metrics.py metrics "your summary text"
# Returns: {"words": N, "chars": N, "bytes": N}
```

## Parameters

| Parameter | Default | Description |
|-----------|---------|-------------|
| iterations | 5 | Number of density passes (paper uses 5) |
| target_words | 80 | Word count maintained across ALL iterations |
| return_history | false | Include intermediate summaries + entities |

Note: `target_words` can be adjusted based on source length and desired output density.

## Output Format

### Minimal (default)
```
[Final dense summary text]
```

### With History (return_history=true)
```yaml
final_summary: |
  [Dense summary at target_words with accumulated entities]
iterations:
  - turn: 1
    missing_entities: "(none - establishing base)"
    words: 80
    summary: |
      [Sparse iteration 1]
  - turn: 2
    missing_entities: "entity1; entity2; entity3"
    words: 80
    summary: |
      [Denser iteration 2]
  # ... etc
total_entities: 9
```

## When to Use

- Verbose documentation exceeding 500 words
- Requirements documents needing condensation
- Creating executive summaries from detailed reports
- Compressing skills that exceed recommended length

## When NOT to Use

- Legal/compliance text (precision required)
- Tutorial content (beginners need explanation)
- Already concise content (<300 words)
- Specifications (don't compress specs)

## Example

**Source** (180 words, verbose skill excerpt):
```
The name field is a required field that must be present in every skill.
The name field identifies the skill and must follow a specific format.
For the name field, you should use lowercase letters and hyphens only.
The name field can be 1 to 64 characters long. The description field
is also required and tells agents when to use your skill...
```

**Iteration 1** (Sparse, 80 words):
```
Missing_Entities: (none - establishing base)

Denser_Summary:
This document discusses the requirements for skill configuration fields in agent systems. It covers various aspects of how fields should be formatted and what values they can contain. The document also mentions validation rules that apply to these fields and provides guidance on best practices for developers. Additionally, it includes information about both optional and required elements that need to be considered when creating new skills for the system.
```

**Iteration 3** (After 2 passes, same 80 words):
```
Missing_Entities: "1-64 characters"; "lowercase alphanumeric-hyphens"; "Use when phrase"

Denser_Summary:
Skills require `name` (1-64 chars, lowercase alphanumeric-hyphens) and `description` fields with validation rules. The name identifies skills; descriptions tell agents when to invoke using "Use when..." phrases. Both fields have format constraints and best practices. Optional metadata fields provide author, version, and compatibility information for cross-platform agent discovery.
```

**Final Iteration 5** (Same 80 words, maximum density):
```
Missing_Entities: "Claude Code"; "Cursor"; "GitHub Copilot"

Denser_Summary:
Required: `name` (1-64 chars, ^[a-z0-9]+(-[a-z0-9]+)*$) and `description` (1-1024 chars) with validation. Description includes "Use when..." + discovery keywords for auto-invocation. Optional: license (SPDX), compatibility, metadata (author, version, tags). Cross-platform: Claude Code, Cursor, GitHub Copilot. Name matches directory. Progressive disclosure via references/, assets/, scripts/ subdirectories.
```

## Architecture Note

This skill implements the [CoD paper](https://arxiv.org/abs/2309.04269) methodology:
- Skill = orchestrator (this file)
- Agent = stateless worker (`cod-iteration`)
- Script = deterministic utility (`text_metrics.py`)

Sub-agents cannot call other sub-agents. Only skills orchestrate via Task tool.

## References

- Paper: [From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting](https://arxiv.org/abs/2309.04269)
- Dataset: [HuggingFace griffin/chain_of_density](https://huggingface.co/datasets/griffin/chain_of_density)
productivity

Comments

Sign in to leave a comment

Loading comments...