Skip to content

Commit 6a48aff

Browse files
Merge branch 'master' into develop
2 parents 8bcc8ca + 6adf006 commit 6a48aff

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

77 files changed

+3287
-330
lines changed

Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,7 @@ docker_latest_release:
5151

5252
docker tag botium/botium-speech-dictate:$(VERSION) botium/botium-speech-dictate:latest
5353
docker push botium/botium-speech-dictate:latest
54+
55+
develop: docker_build_develop docker_publish_develop
56+
57+
release: docker_build_release docker_publish_release docker_latest_release

README.md

Lines changed: 54 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,20 +26,20 @@ Some examples what you can do with this:
2626
* Build voice-enabled chatbot services (for example, IVR systems)
2727
* see the [Rasa Custom Voice Channel](./connectors/rasa)
2828
* Classification of audio file transcriptions
29-
* [Automated Testing](https://chatbotslife.com/testing-alexa-skills-with-avs-mocha-and-botium-f6c22549f66e) of Voice services with [Botium](https://medium.com/@floriantreml/botium-in-a-nutshell-part-1-overview-f8d0ceaf8fb4)
29+
* [Automated Testing](https://wiki.botiumbox.com/how-to-guides/voice-app-testing/) of Voice services with [Botium](https://botium.ai)
3030

3131
## Installation
3232

3333
### Software and Hardware Requirements
3434

35-
* 8GB of RAM (accessible for Docker) and 40GB free HD space
35+
* 8GB of RAM (accessible for Docker) and 40GB free HD space (for full installation)
3636
* Internet connectivity
3737
* [docker](https://docs.docker.com/)
3838
* [docker-compose](https://docs.docker.com/compose/)
3939

40-
_Note: memory usage can be reduced if only one language is required - default configuration comes with two languages._
40+
_Note: memory usage can be reduced if only one language for Kaldi is required - default configuration comes with two languages._
4141

42-
### Use Prebuilt Docker Images
42+
### Full Installation (Prebuilt Docker Images)
4343

4444
Clone or download this repository and start with docker-compose:
4545

@@ -51,6 +51,12 @@ This will download the latest released prebuilt images from Dockerhub. To downlo
5151

5252
Point your browser to http://127.0.0.1 to open the [Swagger UI](https://swagger.io/tools/swagger-ui/) and browse/use the API definition.
5353

54+
### Slim Cloud-Specific Installation (Prebuilt Docker Images)
55+
56+
For the major cloud providers there are additional docker-compose files. If using those, the installation is more slim, as there is only the *frontend*-service required. For instance, add your Azure subscription key and Azure region key to the file *docker-compose-azure.yml* and start the services:
57+
58+
> docker-compose -f docker-compose-azure.yml up -d
59+
5460
### Optional: Build Docker Images
5561

5662
You can optionally built your own docker images (if you made any changes in this repository, for instance to download the latest version of a model). Clone or download this repository and run docker-compose:
@@ -74,6 +80,15 @@ Configuration changes with [environment variables](./frontend/resources/.env). S
7480

7581
**Recommendation:** Do not change the _.env_ file but create a _.env.local_ file to overwrite the default settings. This will prevent troubles on future _git pull_
7682

83+
### Request-Specific Configuration
84+
85+
If there is a JSON-formatted request body, or a multipart request body, certain sections are considered:
86+
87+
* **credentials** to override the server default credentials for cloud services
88+
* **config** to override the server default settings for the cloud API calls
89+
90+
*See samples below*
91+
7792
### Securing the API
7893

7994
The environment variable _BOTIUM_API_TOKENS_ contains a list of valid API Tokens accepted by the server (separated by whitespace or comma). The HTTP Header _BOTIUM_API_TOKEN_ is validated on each call to the API.
@@ -96,14 +111,12 @@ _Attention: in Google Chrome this only works with services published as HTTPS, y
96111
Point your browser to http://127.0.0.1/tts to open a MaryTTS interface for testing speech synthesis.
97112

98113
### Real Time API
99-
_Available for Kaldi only_
100-
101-
There are Websocket endpoints exposed for real-time audio decoding. Find the API description in the [Kaldi GStreamer Server documentation](https://github.com/alumae/kaldi-gstreamer-server#websocket-based-client-server-protocol).
102114

103-
The Websocket endpoints are:
115+
It is possible to stream audio from real-time audio decoding: Call the **/api/sttstream/{language}** endpoint to open a websocket stream, it will return three urls:
104116

105-
* English: ws://127.0.0.1/stt-en/client/ws/speech
106-
* German: ws://127.0.0.1/stt-de/client/ws/speech
117+
* wsUri - the Websocket uri to stream your audio to. By default, it accepts wav-formatted audio-chunks
118+
* statusUri - check if the stream is still open
119+
* endUri - end audio streaming and close websocket
107120

108121
## File System Watcher
109122

@@ -125,6 +138,18 @@ See [swagger.json](./frontend/src/swagger.json):
125138

126139
> curl -X POST "http://127.0.0.1/api/stt/en" -H "Content-Type: audio/wav" -T sample.wav
127140
141+
* HTTP POST to **/api/stt/{language}** for Speech-To-Text with Google, including credentials
142+
143+
> curl -X POST "http://127.0.0.1/api/stt/en-US?stt=google" -F "google={\"credentials\": {\"private_key\": \"xxx\", \"client_email\": \"xxx\"}}" -F content=@sample.wav
144+
145+
* HTTP POST to **/api/stt/{language}** for Speech-To-Text with Google, including switch to MP3 encoding
146+
147+
> curl -X POST "http://127.0.0.1/api/stt/en-US?stt=google" -F "google={\"config\": {\"encoding\": \"MP3\"}}" -F content=@sample.mp3
148+
149+
* HTTP POST to **/api/stt/{language}** for Speech-To-Text with IBM, including credentials
150+
151+
> curl -X POST "http://127.0.0.1/api/stt/en-US?stt=ibm" -F "google={\"credentials\": {\"apikey\": \"xxx\", \"serviceUrl\": \"xxx\"}}" -F content=@sample.wav
152+
128153
* HTTP GET to **/api/tts/{language}?text=...** for Text-To-Speech
129154

130155
> curl -X GET "http://127.0.0.1/api/tts/en?text=hello%20world" -o tts.wav
@@ -155,6 +180,25 @@ This project is standing on the shoulders of giants.
155180

156181
## Changelog
157182

183+
### 2022-03-06
184+
* Voice effects to consider audio file length
185+
186+
### 2022-02-28
187+
188+
* Applied Security Best Practices (not run as root user)
189+
190+
### 2022-01-12
191+
192+
* Added support for Azure Speech Services
193+
194+
### 2021-12-07
195+
196+
* Added endpoints for streaming audio and responses
197+
198+
### 2021-12-01
199+
200+
* Added option to hand over cloud credentials in request body
201+
158202
### 2021-01-26
159203

160204
* Added several profiles for adding noise or other audio artifacts to your files

connectors/sapcai/server/index.js

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ const app = require('express')()
66
const http = require('http').Server(app)
77
const io = require('socket.io')(http)
88

9-
const SAPCAI_TOKEN = '5ee1b84709db76f5bbff8ea14dc9ad85'
9+
const SAPCAI_TOKEN = process.env.SAPCAI_TOKEN
1010

1111
app.use(cors())
1212

@@ -94,7 +94,8 @@ io.on('connection', (socket) => {
9494
method: 'GET',
9595
url: 'https://speech.botiumbox.com/api/tts/en',
9696
params: {
97-
text: message.content
97+
text: message.content,
98+
voice: 'dfki-poppy-hsmm'
9899
},
99100
responseType: 'arraybuffer'
100101
}

connectors/ws/.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
node_modules
2+
package-lock.json
3+
test.js

connectors/ws/file.js

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
const fs = require('fs')
2+
const _ = require('lodash')
3+
const axios = require('axios').default
4+
const { WebSocket } = require('ws')
5+
6+
const sampleBuffer = fs.readFileSync('sample.raw')
7+
const playCount = 1
8+
const showInterim = false
9+
10+
const main = async () => {
11+
12+
const { data } = await axios.get('http://localhost:56000/api/sttstream/en?stt=kaldi')
13+
const ws = new WebSocket(data.wsUri)
14+
15+
ws.on('open', () => {
16+
for (let i = 0; i < playCount; i++) {
17+
setTimeout(() => _.chunk(sampleBuffer, 10000).forEach(c => ws.send(Buffer.from(c))), i * 1000)
18+
}
19+
setTimeout(() => axios.get(data.endUri), 3000 + playCount * 1000)
20+
setTimeout(() => ws.close(), 5000 + playCount * 1000)
21+
})
22+
23+
ws.on('message', (data) => {
24+
try {
25+
const dj = JSON.parse(data)
26+
if (showInterim || dj.final) console.log('received: %s', dj.text)
27+
} catch (err) {
28+
}
29+
})
30+
}
31+
main()

connectors/ws/package.json

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
{
2+
"name": "ws-sample",
3+
"version": "1.0.0",
4+
"scripts": {
5+
"file": "node file.js",
6+
"record": "node record.js"
7+
},
8+
"dependencies": {
9+
"axios": "^0.24.0",
10+
"lodash": "^4.17.21",
11+
"node-record-lpcm16": "^1.0.1",
12+
"ws": "^8.3.0"
13+
}
14+
}

connectors/ws/record.js

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
const recorder = require('node-record-lpcm16')
2+
const axios = require('axios').default
3+
const { WebSocket } = require('ws')
4+
5+
const main = async () => {
6+
7+
const { data } = await axios.get('http://localhost:56000/api/sttstream/en?stt=kaldi')
8+
const ws = new WebSocket(data.wsUri)
9+
10+
ws.on('open', () => {
11+
recorder
12+
.record({
13+
sampleRateHertz: 16000,
14+
threshold: 0, //silence threshold
15+
recordProgram: 'rec', // Try also "arecord" or "sox"
16+
silence: '5.0', //seconds of silence before ending
17+
})
18+
.stream()
19+
.on('error', console.error)
20+
.on('data', (data) => ws.send(data))
21+
})
22+
23+
ws.on('message', (data) => {
24+
console.log('received: %s', data);
25+
})
26+
}
27+
main()

connectors/ws/sample.raw

344 KB
Binary file not shown.

connectors/ws/sample.wav

345 KB
Binary file not shown.

connectors/ws/simple.js

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
const fs = require('fs')
2+
const _ = require('lodash')
3+
const axios = require('axios').default
4+
const { WebSocket } = require('ws')
5+
6+
const sampleBuffer = fs.readFileSync('sample.wav')
7+
8+
const main = async () => {
9+
10+
const { data } = await axios.get('http://localhost:56000/api/sttstream/en-US?stt=google')
11+
const ws = new WebSocket(data.wsUri)
12+
13+
ws.on('open', () => {
14+
ws.send(sampleBuffer)
15+
setTimeout(() => axios.get(data.endUri), 3000)
16+
setTimeout(() => ws.close(), 5000)
17+
})
18+
19+
ws.on('message', (data) => {
20+
try {
21+
const dj = JSON.parse(data)
22+
if (dj.final) console.log('received %s-%s: %s ', dj.start, dj.end, dj.text)
23+
} catch (err) {
24+
}
25+
})
26+
}
27+
main().catch(err => console.error(err.message))

0 commit comments

Comments
 (0)