Automation
acestep-songwriting
Music songwriting guide for ACE-Step.
---
name: acestep-songwriting
description: Music songwriting guide for ACE-Step. Provides professional knowledge on writing captions, lyrics, choosing BPM/key/duration, and structuring songs. Use this skill when users want to create, write, or plan a song before generating it with ACE-Step.
allowed-tools: Read
---
# ACE-Step Songwriting Guide
Professional music creation knowledge for writing captions, lyrics, and choosing music parameters for ACE-Step.
## Output Format
After using this guide, produce two things for the acestep skill:
1. **Caption** (`-c`): Style/genre/instruments/emotion description
2. **Lyrics** (`-l`): Complete structured lyrics with tags
3. **Parameters**: `--duration`, `--bpm`, `--key`, `--time-signature`, `--language`
---
## Caption: The Most Important Input
**Caption is the most important factor affecting generated music.**
Supports multiple formats: simple style words, comma-separated tags, complex natural language descriptions.
### Common Dimensions
| Dimension | Examples |
|-----------|----------|
| **Style/Genre** | pop, rock, jazz, electronic, hip-hop, R&B, folk, classical, lo-fi, synthwave |
| **Emotion/Atmosphere** | melancholic, uplifting, energetic, dreamy, dark, nostalgic, euphoric, intimate |
| **Instruments** | acoustic guitar, piano, synth pads, 808 drums, strings, brass, electric bass |
| **Timbre Texture** | warm, bright, crisp, muddy, airy, punchy, lush, raw, polished |
| **Era Reference** | 80s synth-pop, 90s grunge, 2010s EDM, vintage soul, modern trap |
| **Production Style** | lo-fi, high-fidelity, live recording, studio-polished, bedroom pop |
| **Vocal Characteristics** | female vocal, male vocal, breathy, powerful, falsetto, raspy, choir |
| **Speed/Rhythm** | slow tempo, mid-tempo, fast-paced, groovy, driving, laid-back |
| **Structure Hints** | building intro, catchy chorus, dramatic bridge, fade-out ending |
### Caption Writing Principles
1. **Specific beats vague** — "sad piano ballad with female breathy vocal" > "a sad song"
2. **Combine multiple dimensions** — style+emotion+instruments+timbre anchors direction precisely
3. **Use references well** — "in the style of 80s synthwave" conveys complex aesthetic quickly
4. **Texture words are useful** — warm, crisp, airy, punchy influence mixing and timbre
5. **Don't pursue perfection** — Caption is a starting point, iterate based on results
6. **Granularity determines freedom** — Less detail = more model creativity; more detail = more control
7. **Avoid conflicting words** — "classical strings" + "hardcore metal" degrades output
- **Fix: Repetition reinforcement** — Repeat the elements you want more
- **Fix: Conflict to evolution** — "Start with soft strings, middle becomes metal rock, end turns to hip-hop"
8. **Don't put BPM/key/tempo in Caption** — Use dedicated parameters instead
---
## Lyrics: The Temporal Script
Lyrics controls how music unfolds over time. It carries:
- Lyric text itself
- **Structure tags** ([Verse], [Chorus], [Bridge]...)
- **Vocal style hints** ([raspy vocal], [whispered]...)
- **Instrumental sections** ([guitar solo], [drum break]...)
- **Energy changes** ([building energy], [explosive drop]...)
### Structure Tags
| Category | Tag | Description |
|----------|-----|-------------|
| **Basic Structure** | `[Intro]` | Opening, establish atmosphere |
| | `[Verse]` / `[Verse 1]` | Verse, narrative progression |
| | `[Pre-Chorus]` | Pre-chorus, build energy |
| | `[Chorus]` | Chorus, emotional climax |
| | `[Bridge]` | Bridge, transition or elevation |
| | `[Outro]` | Ending, conclusion |
| **Dynamic Sections** | `[Build]` | Energy gradually rising |
| | `[Drop]` | Electronic music energy release |
| | `[Breakdown]` | Reduced instrumentation, space |
| **Instrumental** | `[Instrumental]` | Pure instrumental, no vocals |
| | `[Guitar Solo]` | Guitar solo |
| | `[Piano Interlude]` | Piano interlude |
| **Special** | `[Fade Out]` | Fade out ending |
| | `[Silence]` | Silence |
### Combining Tags
Use `-` for finer control, but keep it concise:
```
✅ [Chorus - anthemic]
❌ [Chorus - anthemic - stacked harmonies - high energy - powerful - epic]
```
Put complex style descriptions in Caption, not in tags.
### Caption-Lyrics Consistency
**Models are not good at resolving conflicts.** Checklist:
- Instruments in Caption ↔ Instrumental section tags in Lyrics
- Emotion in Caption ↔ Energy tags in Lyrics
- Vocal description in Caption ↔ Vocal control tags in Lyrics
### Vocal Control Tags
| Tag | Effect |
|-----|--------|
| `[raspy vocal]` | Raspy, textured vocals |
| `[whispered]` | Whispered |
| `[falsetto]` | Falsetto |
| `[powerful belting]` | Powerful, high-pitched singing |
| `[spoken word]` | Rap/recitation |
| `[harmonies]` | Layered harmonies |
| `[call and response]` | Call and response |
| `[ad-lib]` | Improvised embellishments |
### Energy and Emotion Tags
| Tag | Effect |
|-----|--------|
| `[high energy]` | High energy, passionate |
| `[low energy]` | Low energy, restrained |
| `[building energy]` | Increasing energy |
| `[explosive]` | Explosive energy |
| `[melancholic]` | Melancholic |
| `[euphoric]` | Euphoric |
| `[dreamy]` | Dreamy |
| `[aggressive]` | Aggressive |
### Lyric Writing Tips
1. **6-10 syllables per line** — Model aligns syllables to beats; keep similar counts for lines in same position (±1-2)
2. **Uppercase = stronger intensity** — `WE ARE THE CHAMPIONS!` (shouting) vs `walking through the streets` (normal)
3. **Parentheses = background vocals** — `We rise together (together)`
4. **Extend vowels** — `Feeeling so aliiive` (use cautiously, effects unstable)
5. **Clear section separation** — Blank lines between sections
### Avoiding "AI-flavored" Lyrics
| Red Flag | Description |
|----------|-------------|
| **Adjective stacking** | "neon skies, electric hearts, endless dreams" — vague imagery filler |
| **Rhyme chaos** | Inconsistent patterns or forced rhymes breaking meaning |
| **Blurred boundaries** | Lyric content crosses structure tags |
| **No breathing room** | Lines too long to sing in one breath |
| **Mixed metaphors** | Water → fire → flying — listeners can't anchor |
**Metaphor discipline**: One core metaphor per song, explore its multiple aspects.
---
## Music Metadata
**Most of the time, let LM auto-infer.** Only set manually when you have clear requirements.
| Parameter | Range | Description |
|-----------|-------|-------------|
| `bpm` | 30–300 | Slow 60–80, mid 90–120, fast 130–180 |
| `keyscale` | Key | e.g. `C Major`, `Am`. Common keys (C, G, D, Am, Em) most stable |
| `timesignature` | Time sig | `4/4` (most common), `3/4` (waltz), `6/8` (swing) |
| `vocal_language` | Language | Usually auto-detected from lyrics |
| `duration` | Seconds | See duration calculation below |
### When to Set Manually
| Scenario | Set |
|----------|-----|
| Daily generation | Let LM auto-infer |
| Clear tempo requirement | `bpm` |
| Specific style (waltz) | `timesignature=3/4` |
| Match other material | `bpm` + `duration` |
| Specific key color | `keyscale` |
---
## Duration Calculation
### Estimation Method
- **Intro/Outro**: 5-10 seconds each
- **Instrumental sections**: 5-15 seconds each
- **Typical structures**:
- 2 verses + 2 choruses: 120-150s minimum
- 2 verses + 2 choruses + bridge: 180-240s minimum
- Full song with intro/outro: 210-270s (3.5-4.5 min)
### BPM and Duration Relationship
- **Slower BPM (60-80)**: Need MORE duration for same lyrics
- **Medium BPM (100-130)**: Standard duration
- **Faster BPM (150-180)**: Can fit more lyrics, but still need breathing room
**Rule of thumb**: When in doubt, estimate longer. A song too short feels rushed.
---
Note: Lyrics tags (piano, powerful, whispered) are consistent with Caption (piano ballad, building to powerful chorus, intimate).
automation
By
Comments
Sign in to leave a comment