Skip to content

Commit d971120

Browse files
committed
Add pre-trained models
1 parent 3883e0e commit d971120

File tree

2 files changed

+45
-0
lines changed

2 files changed

+45
-0
lines changed

paper/paper.bib

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,10 @@ @inbook{packt
8282
isbn = {9781789133967},
8383
}
8484
85+
@misc{tzanetakis_essl_cook_2001,
86+
author = "Tzanetakis, George and Essl, Georg and Cook, Perry",
87+
title = "Automatic Musical Genre Classification Of Audio Signals",
88+
url = "http://ismir2001.ismir.net/pdf/tzanetakis.pdf",
89+
publisher = "The International Society for Music Information Retrieval",
90+
year = "2001"
91+
}

paper/paper.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ This software aims to provide machine learning engineers, data scientists, resea
4949

5050
# Statement of need
5151

52+
The motivation behind this software is understanding the popularity of Python for Machine Learning and presenting solutions for computing complex audio features using Python. This not only implies the need for resource to guide solutions for audio processing, but also signifies the need for Python guides and implementations to solve audio and speech classification tasks. The classifier implementation examples that are a part of this software and the README aim to give the users a sample solution to audio classification problems and help build the foundation to tackle new and unseen problems.
53+
5254
PyAudioProcessing is a Python based library for processing audio data into features and building Machine Learning models. Audio processing and feature extraction research is popular in MATLAB. There are comparatively fewer resources for audio processing and classification in Python. This tool contains implementation of popular and different audio feature extraction that can be use in combination with most scikit-learn classifiers. Audio data can be trained, tested and classified using pyAudioProcessing. The output consists of cross validation scores and results of testing on custom audio files.
5355

5456
The library lets the user extract aggregated data features calculated per audio file. Unique feature extractions such as Mel Frequency Cepstral Coefficients (MFCC) [@6921394], Gammatone Frequency Cepstral Coefficients (GFCC) [@inbook], spectral coefficients, chroma features and others are available to extract and use in combination with different backend classifiers. While MFCC features find use in most commonly encountered audio processing tasks such as audio type classification, speech classification, GFCC features have been found to have application in speaker identification or speaker diarization. Many such applications, comparisons and uses can be found in this IEEE paper [@6639061]. All these features are also helpful for a variety of other audio classification tasks.
@@ -59,10 +61,46 @@ PyAudioProcessing adds multiple additional features. The library includes the im
5961

6062
The library further provides some pre-build audio classification models such as `speechVSmusic`, `speechVSmusicVSbirds` sound classifier and `music genre` classifier for give the users a baseline of pre-trained models for their common audio classification tasks. The user can use the library to build custom classifiers with the help of the instructions in the README.
6163

64+
There is an additional functionality that allows users to convert their audio files to "wav" format to gain compatibility for using analysis and feature extraction on their audio files.
65+
6266
Given the use of this software in the community today inspires the need and growth of this software. It is referenced in a text book titled `Artificial Intelligence with Python Cookbook` published by Packt Publishing in October 2020 [@packt]. Additionally, pyAudioProcessing is a part of specific admissions requirement for a funded PhD project at University of Portsmouth <sup id="portsmouth">[1](#footnote_portsmouth)</sup>. It is further referenced in this thesis paer titled "Master Thesis AI Methodologies for Processing Acoustic Signals AI Usage for Processing Acoustic Signals" [@phdthesis].
6367

6468
<b id="footnote_portsmouth">1</b> https://www.port.ac.uk/study/postgraduate-research/research-degrees/phd/explore-our-projects/detection-of-emotional-states-from-speech-and-text [](#portsmouth)
6569

70+
71+
# Pre-trained models
72+
73+
This software offer pre-trained models. This is an evolving feature as new datasets and classification problems gain prominence in research. Some of the pre-trained models include the following.
74+
75+
1. Audio type classifier to determine speech versus music: Trained SVM classifier for classifying audio into two possible classes - music, speech. This classifier was trained using MFCC, spectral and chroma features. Cross-validation confusion matrix has scores such as follows.
76+
77+
| music | speech
78+
music | 48.80 | 1.20
79+
speech | 0.60 | 49.40
80+
81+
2. Audio type classifier to determine speech versus music versus bird sounds: Trained SVM classifier that classifying audio into three possible classes - music, speech and birds. This classifier was trained using MFCC, spectral and chroma features.
82+
83+
| music | speech | birds
84+
music | 31.53 | 0.73 | 1.07
85+
speech | 1.00 | 32.33 | 0.00
86+
birds | 0.00 | 0.00 | 33.33
87+
88+
3. Music genre classifier using the GTZAN [@tzanetakis_essl_cook_2001] dataset: Trained on SVM classifier using GFCC, MFCC, spectral and chroma features to classify music into 10 genre classes - blues, classical, country, disco, hiphop, jazz, metal, pop, reggae, rock.
89+
90+
| pop | met | dis | blu | reg | cla | rock | hip | cou | jazz
91+
pop | 7.25 | 0.00 | 0.74 | 0.38 | 0.09 | 0.09 | 0.33 | 0.60 | 0.50 | 0.04
92+
met | 0.03 | 8.74 | 0.66 | 0.09 | 0.00 | 0.00 | 0.45 | 0.00 | 0.04 | 0.00
93+
dis | 0.69 | 0.08 | 6.29 | 0.00 | 0.74 | 0.11 | 0.90 | 0.51 | 0.69 | 0.00
94+
blu | 0.00 | 0.20 | 0.00 | 8.31 | 0.25 | 0.08 | 0.44 | 0.09 | 0.30 | 0.34
95+
reg | 0.11 | 0.00 | 0.26 | 0.58 | 7.99 | 0.00 | 0.28 | 0.59 | 0.09 | 0.11
96+
cla | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.07 | 0.23 | 0.00 | 0.23 | 0.48
97+
rock | 0.14 | 0.90 | 1.10 | 0.80 | 0.35 | 0.29 | 5.31 | 0.01 | 1.09 | 0.01
98+
hip | 0.71 | 0.14 | 0.56 | 0.18 | 1.96 | 0.00 | 0.19 | 6.10 | 0.03 | 0.14
99+
cou | 0.25 | 0.15 | 0.84 | 0.64 | 0.08 | 0.10 | 1.87 | 0.00 | 5.84 | 0.24
100+
jazz | 0.04 | 0.01 | 0.13 | 0.41 | 0.00 | 0.76 | 0.31 | 0.00 | 0.53 | 7.81
101+
102+
These baseline models aim to present capability of audio feature generation algorithms in extracting meaningful numeric patterns from the audio data. One can train their own classifiers using similar features and different machine learning backend for researching and exploring improvements.
103+
66104
# Audio features
67105

68106
Information about getting started with audio processing is described in [@opensource].

0 commit comments

Comments
 (0)