metacore-stack / vocalcanvas-studio Public

Notifications You must be signed in to change notification settings
Fork 2
Star 11

Craft expressive speech from text using a streamlined pipeline of voices, styles, and exports tailored for storytellers and developers alike.

metacore-stack.github.io/vocalcanvas-studio/

11 stars 2 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
css		css
img		img
inputs		inputs
js		js
outputs		outputs
LICENSE-2.0.txt		LICENSE-2.0.txt
README.md		README.md
index.html		index.html
index_img_only.html		index_img_only.html

Repository files navigation

VocalCanvas Studio

VocalCanvas Studio transforms text, images, and multi-page PDFs into expressive speech entirely in the browser with no server dependencies.

Highlights

Image OCR: Drop in a photo or scan to extract editable text in seconds.
PDF Pipeline: Render each page, run OCR, and build a unified transcript automatically.
Voice Studio: Preview voices, adjust rate, and queue narration segments before export.
Local Workflow: Runs offline via WebAssembly OCR and built-in browser speech synthesis.

Quick Start

Clone or download this repository.
Serve the project directory with any static web server, for example python -m http.server from the project root.
Open the served URL in a modern desktop browser (Chrome, Edge, or Firefox).

Using the App

Use the image uploader for PNG or JPG assets; use the PDF uploader for multi-page documents.
Confirm each recognition result, edit the transcript inline, and save snapshots as needed.
Select a voice, pitch, and rate, then generate narration for the full transcript or selected passages.
Export audio-ready text or the recognized transcript to plain .txt files via the download button.

Project Layout

index.html hosts the interface shell and orchestrates the main workflow.
js/main.js coordinates OCR, PDF rendering, and speech synthesis controls.
js/articulate.js focuses on speech synthesis helpers and voice selection logic.
css/ contains layout and component styling, including Bootstrap overrides.
inputs/ and outputs/ provide sample documents and generated transcripts for quick testing.

Technology Stack

Tesseract.js WebAssembly build for client-side OCR.
Mozilla PDF.js for rendering PDF pages into canvas elements.
Web Speech API for voice playback and narration controls.
Vanilla JavaScript enhanced with Bootstrap 4 utilities for layout.

Development Notes

Web Speech API voice availability varies by browser and operating system; confirm support before demos.
Large PDFs process page by page; keep the browser tab focused for faster OCR throughput.
To customize styling, extend the utility classes in css/main.css instead of editing Bootstrap directly.

License

This project is distributed under the Apache License 2.0. See LICENSE-2.0.txt for full details.

About

Craft expressive speech from text using a streamlined pipeline of voices, styles, and exports tailored for storytellers and developers alike.

metacore-stack.github.io/vocalcanvas-studio/

javascript data-science machine-learning ocr tesseract webapp tesseract-ocr tesseract-ocr-api ocr-recognition

Report repository

Releases

No releases published

Packages

No packages published

Languages