This guide covers the first-time setup path for running TextToSpeechPython on a new machine. It focuses on local requirements, runtime folders, OCR setup, and provider configuration.
You can open this same guide inside the application with Help > Setup Guide.
Install these before launching the app:
- Python 3.11 or newer, but still lower than Python 4
- Poetry
- Git, if you are working from a cloned repository
- Tesseract OCR, if you want scanned PDFs and image documents to produce text
- At least one text-to-speech provider configuration
Poetry installs the Python packages used by the app, including parser libraries
and the pytesseract bridge. It does not install the native Tesseract OCR
executable or create cloud-provider resources.
From the project root, run:
poetry installThis installs the application dependencies and the development dependencies used by the test suite.
If document import reports that parser packages are unavailable, rerun
poetry install from the project root with the same Python environment you use
to launch the app.
Run either command from the project root:
poetry run python -m app.mainor:
poetry run tts-appThe app can open before a provider is configured, but generation and export actions remain unavailable until the selected provider has valid settings.
The repository includes a PyInstaller spec and a PowerShell build script for creating a Windows executable folder build:
.\scripts\build_windows_exe.ps1 -CleanPyQt packaging can take several minutes while PyInstaller analyzes and collects
dependencies. The build script also writes a log to
data/dynamic/tmp/pyinstaller_build.log.
The built app is created at:
dist/TextToSpeech/TextToSpeech.exe
The executable build includes the app assets and docs used by the GUI, including the in-app setup guide. It does not bundle local credential files or cloud secrets.
OCR support in the executable still requires the native Tesseract OCR executable
to be installed on the user's machine and available on PATH.
If the executable reports a missing Azure Speech DLL, rebuild with -Clean so
PyInstaller recollects the native SDK libraries.
The app writes user-specific runtime files under data/dynamic/.
Common runtime paths include:
data/dynamic/app_settings.jsondata/dynamic/audio_history.jsondata/dynamic/audio/data/dynamic/logs/data/dynamic/tmp/
These files are machine-specific and should not be committed.
OCR support is needed for scanned PDFs, screenshots, photos of documents, and image-only imports.
To enable OCR:
- Install the Tesseract OCR executable for your operating system.
- Make sure
tesseractis available onPATH. - Verify the install from a terminal:
tesseract --versionIf this command is not found, scanned-document imports will not be able to use OCR even though the Python dependency is installed.
Open Tools > Settings in the app, choose the provider, fill in the provider
settings, and use the provider test button where available.
Azure Speech requires a Speech resource key and region.
You can configure Azure in the settings sidebar or with a local .env file:
[API]
key = YOUR_AZURE_SPEECH_KEY
region = YOUR_AZURE_REGIONFor more detail, see Azure configuration.
Amazon Polly requires AWS credentials and a region in a dedicated Polly config file:
[POLLY]
aws_access_key_id = YOUR_AWS_ACCESS_KEY_ID
aws_secret_access_key = YOUR_AWS_SECRET_ACCESS_KEY
aws_session_token = OPTIONAL_SESSION_TOKEN
region = us-east-1For more detail, see Amazon Polly configuration.
Gemini TTS requires a Google Cloud project and a service account JSON file with access to the needed text-to-speech APIs.
Use a dedicated Gemini config file:
[GEMINI]
project_id = YOUR_GOOGLE_CLOUD_PROJECT_ID
service_account_json = C:\path\to\service-account.json
region = globalFor more detail, see Gemini TTS configuration.
Offline Python TTS uses pyttsx3 and can run without cloud credentials. It may
still depend on the local speech engine installed on the operating system.
An optional local config file can be used for driver troubleshooting:
[LOCAL_TTS]
driver_name = autoFor more detail, see Offline Python TTS configuration.
After installing dependencies and configuring a provider:
- Launch the app.
- Open
Tools > Settings. - Select a provider and verify the provider settings.
- Type a short sentence in the editor.
- Use
Preview SSMLfor SSML-capable providers. - Use
Generate & PlayorGenerate File.
To run the automated tests:
poetry run pytest- If the app cannot import a supported document format, run
poetry install. - If scanned PDFs or images do not extract text, verify
tesseract --version. - If generation buttons stay disabled, add text to the editor and configure a valid provider.
- If Azure, Polly, or Gemini generation fails, retest the provider from
Tools > Settingsand confirm the provider-specific config file paths. - If offline TTS fails, try a different local driver in the settings sidebar.
- If preview playback is unavailable, generate a file instead; some environments do not expose multimedia playback support to PyQt.