Skip to content

Procedural CLI pipeline for animating static mascots (e.g., AI fox) into lip-synced music videos. Offline Blender/Python magic: Beats → Phonemes → 2D/3D/Hybrid renders. Non-AI, modular, fast (3-12 mins/clip). Fork the forge!

License

Notifications You must be signed in to change notification settings

semanticintent/semantic-foragecast-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Semantic Foragecast Engine

Production-ready pipeline for audio-driven animation in Blender

A configuration-first, modular system demonstrating Blender automation, audio analysis integration, and headless rendering architecture.

Python 3.9+ Blender 4.0+ License: MIT CI Status Code Style: Black PRs Welcome GitHub Stars GitHub Issues Documentation


What This Is

A fully functional pipeline that transforms audio files into animated videos with synchronized lip movements, beat-reactive gestures, and timed lyrics — all driven by YAML configuration files instead of manual animation.

But more importantly: A technical demonstration of production-ready Blender automation, showcasing:

  • ✅ Configuration-first architecture (no code changes for different outputs)
  • ✅ Headless rendering (cloud/container deployment ready)
  • ✅ Modular 4-phase pipeline with clean separation of concerns
  • ✅ Extensible plugin system (easy to add new animation modes)
  • ✅ Real-world performance benchmarks (tested in cloud environments)

Use Case: Automated music video generation (lyric videos, podcasts, educational content)

Learning Value: Demonstrates Blender Python API patterns, audio analysis integration, and pipeline architecture rarely documented elsewhere.


Quick Start

# 1. Install dependencies
pip install -r requirements.txt

# 2. Install Blender 4.0+ and FFmpeg
# https://www.blender.org/download/
# https://ffmpeg.org/download.html

# 3. Run the pipeline with test config (renders in 4-6 minutes)
python main.py --config config_ultra_fast.yaml

# 4. Find output video
ls outputs/ultra_fast/ultra_fast.mp4

Result: 30-second video with animated mascot, lip sync, and lyrics.


Documentation

For Developers

  • ARCHITECTURE.md - System design, data flow, extension points, deployment patterns
  • DEVELOPER_GUIDE.md - Step-by-step tutorials for adding modes, effects, and audio analysis
  • CASE_STUDIES.md - Real-world benchmarks, cloud rendering, performance optimization

For Users

Technical Docs


Architecture Overview

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Phase 1   │────▶│   Phase 2    │────▶│   Phase 3   │
│ Audio Prep  │     │  Rendering   │     │   Export    │
│             │     │              │     │             │
│ - Beats     │     │ - 2D/3D Mode │     │ - MP4       │
│ - Phonemes  │     │ - Lip Sync   │     │ - H.264     │
│ - Lyrics    │     │ - Gestures   │     │ - Audio Sync│
└─────────────┘     └──────────────┘     └─────────────┘
      ↓                     ↓                     ↓
  prep_data.json       PNG frames             final.mp4

Key Design Principles:

  • Separation of concerns: Each phase independent, cacheable outputs
  • Configuration over code: YAML drives all behavior
  • Extensibility: Plugin-style animation modes
  • Production-ready: Headless rendering, error handling, validation

See ARCHITECTURE.md for complete system design.


Features

Core Pipeline (4 Phases - All Complete ✅)

Phase 1: Audio Preprocessing

  • Beat/onset detection (LibROSA)
  • Phoneme extraction (Rhubarb Lip Sync or mock fallback)
  • Lyrics parsing (manual or automated with Whisper)
  • JSON output for downstream processing

Phase 2: Blender Rendering

  • 2D Grease Pencil mode (fast, stylized)
  • 3D mesh mode (planned)
  • Hybrid mode (planned)
  • Automated lip sync from phonemes
  • Beat-synchronized gestures
  • Timed lyric text objects

Phase 3: Video Export

  • FFmpeg integration (H.264, H.265, VP9)
  • Quality presets (low, medium, high, ultra)
  • Preview mode for rapid iteration
  • Audio synchronization

Phase 4: 2D Animation System

  • Image-to-stroke conversion
  • Grease Pencil animation
  • ~2x faster rendering than 3D
  • Stylized artistic output

Technical Highlights

Headless Rendering

  • Tested in Docker containers with Xvfb
  • No GUI required
  • Cloud deployment ready (AWS, GCP)
  • See CASE_STUDIES.md for cloud setup

Performance Optimization

  • Progressive quality configs (180p → 360p → 1080p)
  • Render time: 4 min (ultra-fast) to 50 min (production) for 30s video
  • Benchmarks included in CASE_STUDIES.md

Automated Lyrics


Configuration-Based Workflow

No code changes needed - just swap YAML files:

# config_ultra_fast.yaml (testing - 4 min render)
video:
  resolution: [320, 180]
  fps: 12
  samples: 16

# config_quick_test.yaml (preview - 12 min render)
video:
  resolution: [640, 360]
  fps: 24
  samples: 32

# config.yaml (production - 50 min render)
video:
  resolution: [1920, 1080]
  fps: 24
  samples: 64

Run with: python main.py --config <config_file>


Usage Examples

Basic Pipeline

# Run complete pipeline (all 3 phases)
python main.py --config config.yaml

# Run individual phases
python main.py --config config.yaml --phase 1  # Audio prep only
python main.py --config config.yaml --phase 2  # Render only
python main.py --config config.yaml --phase 3  # Export only

# Validate configuration
python main.py --config config.yaml --validate

Automated Lyrics

# Instead of manual lyrics.txt, auto-generate with Whisper
pip install openai-whisper
python auto_lyrics_whisper.py assets/song.wav --output assets/lyrics.txt

# Then run pipeline as normal
python main.py

Quick Testing

# Use ultra-fast config for rapid iteration (4 min for 30s video)
python main.py --config config_ultra_fast.yaml

# Or use the quick test script
python quick_test.py --auto-lyrics --debug

Extension Examples

Adding a New Animation Mode

See DEVELOPER_GUIDE.md for complete tutorials.

Quick example - Add particle system mode:

  1. Create particle_system.py with builder class
  2. Register in blender_script.py dispatcher
  3. Add mode: "particles" to config
  4. Run pipeline - no other code changes needed

Full tutorial with code samples in DEVELOPER_GUIDE.md

Adding a New Effect

Example - Camera shake on beats:

# effects.py
class CameraShakeEffect:
    def apply(self, camera):
        for beat_frame in self.prep_data['beats']['beat_frames']:
            # Add shake keyframes
            camera.location = shake_position
            camera.keyframe_insert(data_path="location", frame=beat_frame)

Add to config:

effects:
  camera_shake:
    enabled: true
    intensity: 0.2

Full implementation in DEVELOPER_GUIDE.md


Project Structure

semantic-foragecast-engine/
├── main.py                      # Orchestrator
├── prep_audio.py                # Phase 1: Audio analysis
├── blender_script.py            # Phase 2: Blender automation
├── grease_pencil.py             # 2D animation mode
├── export_video.py              # Phase 3: FFmpeg export
├── config.yaml                  # Production config
├── config_ultra_fast.yaml       # Fast testing config
├── config_360p_12fps.yaml       # Mid-quality config
├── quick_test.py                # Automated testing script
├── auto_lyrics_whisper.py       # Automated lyrics (Whisper)
├── auto_lyrics_gentle.py        # Automated lyrics (Gentle)
├── auto_lyrics_beats.py         # Beat-based lyrics
├── assets/                      # Sample inputs
│   ├── song.wav                 # 30s test audio
│   ├── fox.png                  # Mascot image
│   └── lyrics.txt               # Timed lyrics
├── outputs/                     # Generated outputs
│   ├── ultra_fast/              # Fast test outputs
│   ├── test_360p/               # Mid-quality outputs
│   └── production/              # High-quality outputs
├── docs/                        # Documentation
│   ├── ARCHITECTURE.md          # System design
│   ├── DEVELOPER_GUIDE.md       # Extension tutorials
│   ├── CASE_STUDIES.md          # Benchmarks & examples
│   ├── TESTING_GUIDE.md         # Quality/speed configs
│   ├── AUTOMATED_LYRICS_GUIDE.md
│   └── POSITIONING_GUIDE.md
└── tests/                       # Unit tests

Performance Benchmarks

30-second video render times (tested in cloud container, CPU only):

Config Resolution FPS Samples Render Time File Size Use Case
Ultra Fast 320x180 12 16 4 min 489 KB Testing pipeline
360p 12fps 640x360 12 16 6 min 806 KB Quality check
Quick Test 640x360 24 32 13 min ~1.5 MB Preview
Production 1920x1080 24 64 50 min ~8 MB Final output

Key finding: 360p @ 12fps is the sweet spot for development (6 min, good quality)

See CASE_STUDIES.md for complete benchmarks and optimization strategies.


Technical Stack

Core:

  • Python 3.11+
  • Blender 4.0+ (Python API)
  • FFmpeg 4.4+

Audio Analysis:

  • LibROSA 0.10.1 (beat detection, tempo)
  • Rhubarb Lip Sync (phoneme extraction)
  • Whisper (optional, auto lyrics)

Rendering:

  • Blender EEVEE engine
  • Grease Pencil for 2D mode
  • Xvfb for headless rendering

Configuration:

  • PyYAML 6.0.1
  • JSON for intermediate data

Platform Support

  • Development: Windows 11, macOS, Linux
  • Production: Ubuntu 22.04/24.04 (tested in Docker)
  • Cloud: AWS EC2, GCP Compute (headless mode)
  • Offline: No cloud dependencies required

See CROSS_PLATFORM_DEV_GUIDE.md for setup instructions.


Real-World Applications

Tested Use Cases:

  1. Music lyric videos - Automated generation for indie musicians
  2. Podcast visualization - Animated host for audio podcasts
  3. Educational content - Narrated lessons with animated teacher
  4. Brand mascot videos - Company mascot delivering announcements

Deployment Scenarios:

  • Local rendering (Windows/Mac development)
  • Docker containers (reproducible builds)
  • Cloud rendering (AWS/GCP for batch processing)
  • CI/CD integration (automated video generation)

See CASE_STUDIES.md for detailed case studies.


Why This Project Exists

Problem: Few production-ready examples exist for Blender automation. Most tutorials show basic concepts but not real-world architecture.

Solution: This project demonstrates:

  • How to structure a multi-phase pipeline
  • Configuration-first design patterns
  • Headless rendering in cloud environments
  • Audio-driven procedural animation
  • Extensible plugin architecture

Target Audience:

  • Developers learning Blender Python API
  • Pipeline engineers building automation tools
  • DevOps teams deploying headless rendering
  • Anyone needing automated video generation

Detailed Usage

Phase 1: Audio Preparation

# Run audio prep manually
python prep_audio.py assets/song.wav --output outputs/prep_data.json

# With lyrics
python prep_audio.py assets/song.wav --lyrics assets/lyrics.txt --output outputs/prep_data.json

# With Rhubarb for real phonemes (not mock)
python prep_audio.py assets/song.wav --rhubarb /path/to/rhubarb --output outputs/prep_data.json

Output: prep_data.json containing beats, phonemes, and lyrics timing

Phase 2: Blender Rendering

# Render with 2D Grease Pencil mode (fastest)
python main.py --config config.yaml --phase 2

# Enable debug visualization (colored position markers)
# Set debug_mode: true in config.yaml, then:
python main.py --config config.yaml --phase 2

Output: PNG frames in outputs/*/frames/

Phase 3: Video Export

# Encode frames to video
python main.py --config config.yaml --phase 3

# Or use export_video.py directly
python export_video.py \
  --frames outputs/frames \
  --audio assets/song.wav \
  --output outputs/video.mp4 \
  --quality high

Output: Final MP4 video

Automated Lyrics

# Method 1: Whisper (auto-transcribe, no lyrics needed)
pip install openai-whisper
python auto_lyrics_whisper.py assets/song.wav --output assets/lyrics.txt

# Method 2: Gentle (align known lyrics to audio)
docker run -p 8765:8765 lowerquality/gentle
python auto_lyrics_gentle.py --audio song.wav --lyrics text.txt --output lyrics.txt

# Method 3: Beat-based (distribute lyrics on beats)
python auto_lyrics_beats.py --prep-data prep_data.json --lyrics-text "Your lyrics here"

See AUTOMATED_LYRICS_GUIDE.md for detailed comparison.


Configuration Reference

Video Settings

video:
  resolution: [1920, 1080]  # Output resolution
  fps: 24                   # Frame rate
  render_engine: "EEVEE"    # EEVEE (fast) or CYCLES (quality)
  samples: 64               # Render samples (16-256)
  codec: "libx264"          # Video codec
  quality: "high"           # low, medium, high, ultra

Animation Settings

animation:
  mode: "2d_grease"         # 2d_grease, 3d, or hybrid
  enable_lipsync: true      # Phoneme-based lip sync
  enable_gestures: true     # Beat-synced movement
  enable_lyrics: true       # Timed lyric text
  gesture_intensity: 0.7    # 0.0-1.0

Style Settings

style:
  lighting: "jazzy"         # Lighting preset
  colors:
    primary: [0.8, 0.3, 0.9]
    secondary: [0.3, 0.8, 0.9]
    accent: [0.9, 0.8, 0.3]
  background: "solid"       # solid or hdri

gp_style:                   # 2D mode only
  stroke_thickness: 3
  ink_type: "clean"         # clean, sketchy, wobbly
  enable_wobble: false
  wobble_intensity: 0.5

Advanced Settings

advanced:
  debug_mode: false         # Show position markers
  preview_mode: false       # Low-res preview
  preview_scale: 0.5        # Preview resolution scale
  threads: null             # Render threads (null = auto)
  verbose: true             # Detailed logging

Testing

Unit Tests

# Run all tests
python -m unittest discover tests/

# Test specific phase
python tests/test_prep_audio.py
python tests/test_export_video.py

Integration Tests

# Test complete pipeline with ultra-fast config
python main.py --config config_ultra_fast.yaml

# Automated testing script
python quick_test.py

Manual Verification

# Enable debug mode to visualize positioning
# In config.yaml: debug_mode: true
python main.py --config config.yaml --phase 2

# Check frame 100 for colored markers
ls outputs/*/frames/frame_0100.png

Troubleshooting

Blender Not Found

# Linux: Install via apt
sudo apt-get install blender

# Mac: Install via Homebrew
brew install --cask blender

# Windows: Download installer
# https://www.blender.org/download/

Headless Rendering Fails

# Install Xvfb virtual display
sudo apt-get install xvfb

# Run with xvfb-run
xvfb-run -a python main.py --config config.yaml --phase 2

FFmpeg Not Found

# Linux
sudo apt-get install ffmpeg

# Mac
brew install ffmpeg

# Windows: Download from https://ffmpeg.org/

Lyrics Behind Mascot

Check positioning in config - text should be at y=-2.0, z=0.2:


Contributing

How to Contribute

  1. Fork the repository
  2. Create feature branch: git checkout -b feature/my-feature
  3. Make changes with tests
  4. Update documentation
  5. Submit pull request

What We're Looking For

  • New animation modes (3D, particle systems, etc.)
  • Audio analysis improvements (melody extraction, harmony)
  • Effects (camera movements, post-processing)
  • Performance optimizations
  • Bug fixes with tests
  • Documentation improvements

See DEVELOPER_GUIDE.md for extension tutorials.


Roadmap

Completed ✅

  • Phase 1: Audio preprocessing
  • Phase 2: Blender automation
  • Phase 3: Video export
  • Phase 4: 2D Grease Pencil mode
  • Headless rendering support
  • Automated lyrics (Whisper)
  • Debug visualization
  • Comprehensive documentation

Planned 🚧

  • 3D mesh animation mode
  • Hybrid mode (2D + 3D)
  • Advanced effects (fog, particles, camera shake)
  • Melody extraction and pitch-based animation
  • Multi-character support
  • Web UI for configuration
  • Real-time preview

FAQ

Q: Can I use this for commercial projects? A: Yes, MIT licensed. Attribution appreciated.

Q: Why is rendering slow? A: Use config_ultra_fast.yaml for testing (4 min). Production 1080p takes 50 min for 30s video.

Q: Can I run this without Blender installed? A: No, Phase 2 requires Blender. But you can run Phase 1 (audio prep) standalone.

Q: Does this require GPU? A: No, CPU rendering works. GPU recommended for faster production renders.

Q: Can I deploy this in Docker? A: Yes, see CASE_STUDIES.md for cloud deployment example.

Q: Is this AI-generated? A: No, this is procedural animation based on audio analysis, not machine learning.


License

MIT License - See LICENSE file for details


Acknowledgments


Links


Built with ❤️ for the Blender automation community

About

Procedural CLI pipeline for animating static mascots (e.g., AI fox) into lip-synced music videos. Offline Blender/Python magic: Beats → Phonemes → 2D/3D/Hybrid renders. Non-AI, modular, fast (3-12 mins/clip). Fork the forge!

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 3

  •  
  •  
  •