Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
Showing results for 
Search instead for 
Did you mean: 
Voice platforms like Alexa and Google Assistant make it easy to provide a custom voice experience to your clients, even without going deeper in audio processing — everything is part of the platform. But what if you already invested quite some effort into building a chatbot on SAP Conversational AI ? You certainly don't want to switch to a totally new platform now.


This tutorial is part of the SAP Conversational AI Tutorial Challenge 2021 and the goal is to show a way how you can build your own voice platform using SAP Conversational AI and  Open Source tool Botium Speech Processing

When completing this tutorial you will have a working sample voice interface for your chatbot as a starter for your own custom implementation:

Botium Speech Processing is a unified, developer-friendly API to the best available free and Open-Source Speech-To-Text and Text-To-Speech services. Let’s combine this, but first let’s quickly have a look on the architecture.

  1. User speaks into a microphone

  2. A Speech-To-Text service translates into text (Botium Speech Processing)

  3. A chatbot platform extracts information out of the text and builds the text response  (SAP Conversational AI)

  4. A Text-To-Speech service translates into spoken text (Botium Speech Processing)

  5. User listens to the audio file

So let’s come to the fun part.


Here is what you need to have available on your workstation:

Launch Botium Speech Processing Service

Botium Speech Processing comes with a reasonable default configuration for a voice platform

Both of them are free and Open Source and a good match to get started with voice technologies, on the other hand they are without a doubt among the best free voice tools available.

Launching it can be done with a few command line calls.

$ git clone
$ cd botium-speech-processing
$ docker-compose up -d

Depending on network speed and hardware this step can take a while.

Pointing your browser to http://localhost will show the API explorer for Botium Speech Processing.

Botium Speech Processing API Explorer


Add Voice Capabilities to SAP Conversational AI

This Github repository includes sample webservice code which adds Speech-To-Text and Text-To-Speech capabilities to SAP Conversational AI.

First, clone the repository (if not already done before) and install the prerequisites:

$ git clone
$ cd botium-speech-processing/connectors/sapcai/server
$ npm install

Now you can launch the webservice with another command line call - replace my-sap-cai-token with your bot token:

$ SAPCAI_TOKEN=my-sap-cai-token npm start

Point your browser to http://localhost:5005 to bring up a minimal text-only chat interface to check if the connection to your SAP Conversational AI bot is already working:

Simple Text Interface

There is a simple web-based voice interface available here. You can launch it with:
$ git clone
$ cd botium-voice-interface
$ npm install
$ npm run serve

Point your browser to http://localhost:8080 - now it is time to turn on your microphone and speakers and have a chat with your SAP Conversational AI chatbot!

Simple Voice Interface


This tutorial should help you to add basic voice capabilities to your SAP Conversational AI chatbot, which you can use to start your own project for providing a voice experience to your clients.

Thanks for reading, hopefully you enjoyed this tutorial. Feel free to ask any questions in the comments!

Appendix: Code Walkthrough

For those who are interested, here is the relevant portion of the webservice code.

  1. In case audio data is received, extract the audio data from the webservice request

  2. Convert the audio data to a canonical audio codec format, in this case mono-channel wav audio

  3. Apply Speech-To-Text to extract the spoken text out of the audio

  4. Send the text to SAP Conversational AI to get the response text

  5. Apply Text-To-Speech to generate the audio out of the text

  6. Attach the audio data to the webservice response

  socket.on('user_uttered', async (msg) => {
if (msg && msg.message) {
let textInput = msg.message

if (msg.message.startsWith('data:')) {
const base64Data = msg.message.substring(msg.message.indexOf(',') + 1)
const audioData = Buffer.from(base64Data, 'base64')

const wavToMonoWavRequestOptions = {
method: 'POST',
url: '',
data: audioData,
headers: {
'content-type': 'audio/wav'
responseType: 'arraybuffer'
const wavToMonoWavResponse = await axios(wavToMonoWavRequestOptions)

const sttRequestOptions = {
method: 'POST',
url: '',
headers: {
'content-type': 'audio/wav'
responseType: 'json'
const sttResponse = await axios(sttRequestOptions)

textInput =

const requestOptions = {
method: 'POST',
url: '',
headers: {
Authorization: `Token ${SAPCAI_TOKEN}`
data: {
message: {
type: 'text',
content: textInput
conversation_id: msg.session_id || nanoid()
try {
const response = await axios(requestOptions)
for (const message of => t.type === 'text')) {
const botUttered = {
text: message.content

const ttsRequestOptions = {
method: 'GET',
url: '',
params: {
text: message.content,
voice: 'dfki-poppy-hsmm'
responseType: 'arraybuffer'
const ttsResponse = await axios(ttsRequestOptions) = 'data:audio/wav;base64,' + Buffer.from(, 'binary').toString('base64')

socket.emit('bot_uttered', botUttered)
} catch (err) {
Labels in this area