Clustering Spoken Audio Samples by Delivery Style using Acoustic Features

Portland State University – Data Clustering Course Final Project This project clusters 200 spoken samples of the word “backward” from the Google Speech Commands dataset. Using acoustic features like MFCCs, pitch, tempo, and energy, the goal is to group samples by how the word is spoken - focusing on delivery style rather than content. The word is kept constant to isolate stylistic differences in speech.

Project Setup

Note: This project uses a Hugging Face token to authenticate and stream data from the Google Speech Commands dataset. To reproduce the code, you'll need to create a Hugging Face account and store your token in a .env file as described in the auth.py script.

Step 1: Install dependencies

pip install -r requirements.txt

Step 2: Run the dataset loading script

python load_dataset.py

This script

Authenticates with Hugging Face Hub using a token stored in your .env file
Streams the Google Speech Commands dataset
Filters and loads 200 samples labeled "backward"
Extracts acoustic features for each audio sample:
- MFCCs (Mel-frequency cepstral coefficients)
- Pitch
- Tempo
- Energy
Saves the extracted features to backward_features.csv for clustering and analysis in the notebook

Step 3:

Open and run the cells in demo-notebook.ipynb to:

Perform KMeans, DBSCAN, and Agglomerative clustering
Analyze cluster patterns
Visualize results
Play example audio clips from each cluster

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
auth.py		auth.py
backward_features.csv		backward_features.csv
demo-notebook.ipynb		demo-notebook.ipynb
extract.py		extract.py
load_dataset.py		load_dataset.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clustering Spoken Audio Samples by Delivery Style using Acoustic Features

Project Setup

Step 1: Install dependencies

Step 2: Run the dataset loading script

Step 3:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lkhellah/speech-clustering

Folders and files

Latest commit

History

Repository files navigation

Clustering Spoken Audio Samples by Delivery Style using Acoustic Features

Project Setup

Step 1: Install dependencies

Step 2: Run the dataset loading script

Step 3:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages