image2prompt

Name: image2prompt
Rating: 3.5 (1 reviews)
Author: zhang-shubo

By zhang-shubo 👁 60 views ▲ 0 votes

Analyze images and generate detailed prompts for image

GitHub

---
name: image2prompt
description: Analyze images and generate detailed prompts for image generation. Supports portrait, landscape, product, animal, illustration categories with structured or natural output.
homepage: https://docs.openclaw.ai/tools/image2prompt
user-invocable: true
metadata: {"openclaw":{"emoji":"🖼️","primaryEnv":"OPENAI_API_KEY","requires":{"anyBins":["openclaw"]}}}
---

# Image to Prompt

Analyze images and generate detailed, reproduction-quality prompts for AI image generation.

## Workflow

**Step 1: Category Detection**
First, classify the image into one of these categories:
- `portrait` — People as main subject (photos, artwork, digital art)
- `landscape` — Natural scenery, cityscapes, architecture, outdoor environments
- `product` — Commercial product photos, merchandise
- `animal` — Animals as main subject
- `illustration` — Diagrams, infographics, UI mockups, technical drawings
- `other` — Images that don't fit above categories

**Step 2: Category-Specific Analysis**
Generate a detailed prompt based on the detected category.

## Usage

### Basic Analysis

```bash
# Analyze an image (auto-detect category)
openclaw message send --image /path/to/image.jpg "Analyze this image and generate a detailed prompt for reproduction"
```

### Specify Output Format

**Natural Language (default):**
```
Analyze this image and write a detailed, flowing prompt description (600-1000 words for portraits, 400-600 for others).
```

**Structured JSON:**
```
Analyze this image and output a structured JSON description with all visual elements categorized.
```

### With Dimensions Extraction

Request dimension highlights to get tagged phrases for each visual aspect:
```
Analyze this image with dimension extraction. Tag phrases for: backgrounds, objects, characters, styles, actions, colors, moods, lighting, compositions, themes.
```

## Category-Specific Elements

### Portrait Analysis Covers:
- **Model/Style**: Photography type, quality level, visual style
- **Subject**: Gender, age, ethnicity, skin tone, body type
- **Facial Features**: Eyes, lips, face shape, expression
- **Hair**: Color, length, style, part
- **Pose**: Body position, orientation, leg/hand positions, gaze
- **Clothing**: Type, color, pattern, fit, material, style
- **Accessories**: Jewelry, bags, hats, etc.
- **Environment**: Location, ground, background, atmosphere
- **Lighting**: Type, time of day, shadows, contrast, color temperature
- **Camera**: Angle, height, shot type, lens, depth of field, perspective
- **Technical**: Realism, post-processing, resolution

### Landscape Analysis Covers:
- Terrain and water features
- Sky and atmospheric elements
- Foreground/background composition
- Natural lighting and atmosphere
- Color palette and photography style

### Product Analysis Covers:
- Product features and materials
- Design elements and shape
- Staging and background
- Studio lighting setup
- Commercial photography style

### Animal Analysis Covers:
- Species identification and markings
- Pose and behavior
- Expression and character
- Habitat and setting
- Wildlife/pet photography style

### Illustration Analysis Covers:
- Diagram type (flowchart, infographic, UI, etc.)
- Visual elements (icons, shapes, connectors)
- Layout and hierarchy
- Design style (flat, isometric, etc.)
- Color scheme and meaning

## Output Examples

### Natural Language Output (Portrait)
```json
{
  "prompt": "A stunning photorealistic portrait of a young woman in her mid-20s with fair porcelain skin and warm pink undertones. She has striking emerald green almond-shaped eyes with long dark lashes, full rose-colored lips curved in a subtle confident smile, and an oval face with high cheekbones..."
}
```

### Structured Output (Portrait)
```json
{
  "structured": {
    "model": "photorealistic",
    "quality": "ultra high",
    "style": "cinematic natural light photography",
    "subject": {
      "identity": "young beautiful woman",
      "gender": "female",
      "age": "mid 20s",
      "ethnicity": "European",
      "skin_tone": "fair porcelain with pink undertones",
      "body_type": "slim athletic",
      "facial_features": {
        "eyes": "emerald green, almond-shaped, intense gaze",
        "lips": "full, rose pink, subtle smile",
        "face_shape": "oval with high cheekbones",
        "expression": "confident and serene"
      },
      "hair": {
        "color": "warm honey blonde",
        "length": "long",
        "style": "soft waves",
        "part": "center"
      }
    },
    "pose": {
      "position": "standing",
      "body_orientation": "three-quarter turn to camera",
      "legs": "weight on right leg, relaxed stance",
      "hands": {
        "right_hand": "resting on hip",
        "left_hand": "hanging naturally at side"
      },
      "gaze": "direct eye contact with camera"
    },
    "clothing": {
      "type": "flowing maxi dress",
      "color": "dusty rose",
      "pattern": "solid",
      "details": "V-neckline, cinched waist, silk material",
      "style": "romantic feminine"
    },
    "accessories": ["delicate gold necklace", "small hoop earrings"],
    "environment": {
      "location": "outdoor garden",
      "ground": "cobblestone path",
      "background": "blooming roses, soft bokeh",
      "atmosphere": "dreamy and romantic"
    },
    "lighting": {
      "type": "natural sunlight",
      "time": "golden hour",
      "shadow_quality": "soft diffused shadows",
      "contrast": "medium",
      "color_temperature": "warm"
    },
    "camera": {
      "angle": "slightly below eye level",
      "camera_height": "chest height",
      "shot_type": "medium shot",
      "lens": "85mm",
      "depth_of_field": "shallow",
      "perspective": "slight compression, flattering"
    },
    "mood": "romantic, confident, ethereal",
    "realism": "highly photorealistic",
    "post_processing": "soft color grading, subtle glow",
    "resolution": "8k"
  }
}
```

### With Dimensions
```json
{
  "prompt": "...",
  "dimensions": {
    "backgrounds": ["outdoor garden", "blooming roses", "soft bokeh"],
    "objects": ["delicate gold necklace", "small hoop earrings"],
    "characters": ["young beautiful woman", "mid 20s", "European"],
    "styles": ["photorealistic", "cinematic natural light photography"],
    "actions": ["standing", "three-quarter turn", "direct eye contact"],
    "colors": ["dusty rose", "honey blonde", "emerald green"],
    "moods": ["romantic", "confident", "ethereal", "dreamy"],
    "lighting": ["golden hour", "natural sunlight", "soft diffused shadows"],
    "compositions": ["medium shot", "85mm", "shallow depth of field"],
    "themes": ["romantic feminine", "portrait photography"]
  }
}
```

## Tips for Best Results

1. **High-resolution images** produce more detailed prompts
2. **Clear, well-lit images** yield better category detection
3. **Request structured output** when you need programmatic access to individual elements
4. **Use dimensions extraction** when building prompt databases or training data
5. **Specify word count expectations** for natural language output if needed

## Integration

This skill works with any vision-capable model. For best results, use:
- GPT-4 Vision
- Claude 3 (Opus/Sonnet)
- Gemini Pro Vision

automation