Songsee — Audio spectrograms/features (mel, chroma, MFCC) via CLI
Songsee
Section titled “Songsee”Audio spectrograms/features (mel, chroma, MFCC) via CLI.
Skill metadata
Section titled “Skill metadata”| Source | Bundled (installed by default) |
| Path | skills/media/songsee |
| Version | 1.0.0 |
| Author | community |
| License | MIT |
| Tags | Audio, Visualization, Spectrogram, Music, Analysis |
Reference: full SKILL.md
Section titled “Reference: full SKILL.md”The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
songsee
Section titled “songsee”Generate spectrograms and multi-panel audio feature visualizations from audio files.
Prerequisites
Section titled “Prerequisites”Requires Go:
go install github.com/steipete/songsee/cmd/songsee@latestOptional: ffmpeg for formats beyond WAV/MP3.
Quick Start
Section titled “Quick Start”# Basic spectrogramsongsee track.mp3
# Save to specific filesongsee track.mp3 -o spectrogram.png
# Multi-panel visualization gridsongsee track.mp3 --viz spectrogram,mel,chroma,hpss,selfsim,loudness,tempogram,mfcc,flux
# Time slice (start at 12.5s, 8s duration)songsee track.mp3 --start 12.5 --duration 8 -o slice.jpg
# From stdincat track.mp3 | songsee - --format png -o out.pngVisualization Types
Section titled “Visualization Types”Use --viz with comma-separated values:
| Type | Description |
|---|---|
spectrogram | Standard frequency spectrogram |
mel | Mel-scaled spectrogram |
chroma | Pitch class distribution |
hpss | Harmonic/percussive separation |
selfsim | Self-similarity matrix |
loudness | Loudness over time |
tempogram | Tempo estimation |
mfcc | Mel-frequency cepstral coefficients |
flux | Spectral flux (onset detection) |
Multiple --viz types render as a grid in a single image.
Common Flags
Section titled “Common Flags”| Flag | Description |
|---|---|
--viz | Visualization types (comma-separated) |
--style | Color palette: classic, magma, inferno, viridis, gray |
--width / --height | Output image dimensions |
--window / --hop | FFT window and hop size |
--min-freq / --max-freq | Frequency range filter |
--start / --duration | Time slice of the audio |
--format | Output format: jpg or png |
-o | Output file path |
- WAV and MP3 are decoded natively; other formats require
ffmpeg - Output images can be inspected with
vision_analyzefor automated audio analysis - Useful for comparing audio outputs, debugging synthesis, or documenting audio processing pipelines