Add Siri-like Speech-to-Text to Your Chatbot

Dan_Wroblewski · ‎04-20-2022

In my blog post 2 weeks ago, I talked about how to use the SAP Conversational AI speech-to-text features with IBM's speech-to-text service to your chatbot. Now I want to show another fun way to implement speech to text, but with a 3rd-party JavaScript service that kind of mimics Siri and Alexa.

The speech-to-text documentation is available in GitHub (nice information, including about other Web Client APIs).

Siri-like chatbot

For the the SIri-like chatbot, I used a library called annyang, that uses the browsers speech recognition service and that enables setting up callbacks when certain words are recognized.

It was very easy to set up.

For this example, I used a skeleton React project and only had to modify 2 files. The first was the index.html file, where I loaded:

The Web Client script file for my chatbot – I used my sample shipping customer bot

The annyang JavaScript library

<script src="https://cdnjs.cloudflare.com/ajax/libs/annyang/2.6.0/annyang.min.js"></script> 



<script

    src="https://cdn.cai.tools.sap/webclient/bootstrap.js"

    data-channel-id="3fbd2b3d-064c-499c-926f-123456789012"

    data-token="be39e893c7c9ee64dbd123456789012"

    data-expander-type="CAI"

    data-expander-preferences="JTdCJTIyZXhwYX5kZXJMb2dvJTIyJTNBJTIyaHR0cHMlM0ElMkYlMkZjZG4uY2FpLnRvb2xzLnNhcCUyRndlYmNoYXQlMkZ3ZWJjaGF0LWxvZ28uc3ZnJTIyJTJDJTIyZXhwYW5kZXJUaXRsZSUyMiUzQSUyMkNsaWNrJTIwb24lMjBtZSElMjIlMkMlMjJvbmJvYXJkaW5nTWVzc2FnZSUyMiUzQSUyMkNoYXQlMjB3aXRoJTIwbWUhJTIyJTJDJTIydGhlbWUlMjIlM0ElMjJERUZBVUxUJTIyJTdE"

    id="cai-webclient-custom">

</script>

App.js

In the App.js (default controller for my UI), I did 2 things: add buttons to start and stop the listening:

<div className="App" >

    <header className="App-header">Express Shipping Company

        <p><button id='closebutton' onClick={myToggle}>Open/Close</button>&nbsp;&nbsp;&nbsp;

            <button id='startlistening' onClick={startlistening}>Listen</button>&nbsp;

            <button id='stoplistening' onClick={stoplistening}>Stop</button></p>

    </header>

</div>

Then I created listener methods to handle starting the speech recognition service and to create callbacks for certain words, including the name of the chatbot.

I called my chatbot Betty – am I going all Norman Bates from Psycho, talking to my chatbot 😲? And here's how I designed the conversation:

Click Listen to start speech recognition.

At first, I can talk, the speech is recognized but ignored unless I say "Betty" at the start.

If I start an utterance with "Betty", everything afterward is sent to the chatbot.

If I don't want to keep saying "Betty", I can say "Betty stay", and then from then on whatever I say is sent to to the chatbot.

If I want to turn off the speech recognition, I can say "stop". After this, you have to click on Listen to restart speech recognition.

function stoplistening(event) {

    window.annyang.abort();

    window.annyang.removeCallback();

};



function startlistening(event) {

    let stayOn = false

    if (window.annyang) {

        window.annyang.start();



        window.annyang.addCallback('resultNoMatch', function (userSaid, commandText, phrases) {

            let utterance = userSaid[0]

            if (stayOn) {

                window.sap.cai.webclient.sendMessage(utterance)

            }

        });



        var betty = function (rest) {

            if (rest == "stay") {

                stayOn = true

            }

            else {

                window.sap.cai.webclient.sendMessage(rest)

            }

        };

        var stop = function () {

            stoplistening(null)

        };

        var commands = {

            'Betty *rest': betty,

            'stop': stop,

        };

        window.annyang.addCommands(commands);

    }

};

The key chatbot API is window.sap.cai.webclient.sendMessage(utterance), which simply sends the text as an utterance from the user. The chatbot will then respond as any utterance.

IMPORTANT: The callbacks that recognize specific words/phrases work on the entire utterance, which the speech recognition service only knows in full until the user stops speaking. So, for example, if we recognize the word "Betty" at the start, then if someone says "Go Betty Go", this is not recognized by the callback we defined. So you must be careful to wait to finish one utterance before starting the next utterance.

Result

I start with a simulated site for a shipping company (sorry I didn't spend more time on the graphics 😀). I have a button that opens and closes the chatbot, and then I click on Listen.

The first time you will have to allow the use of the microphone for this website.

Here's the web site:

I say "How's it going", and nothing happens.

I say "Betty, hello" and the "hello" is sent to the chatbot.

I say "Betty stay" so I don't have to say "Betty" all the time. Nothing is sent to the chatbot.

I say "I want to price a package" and it is sent to the chatbot, and now I can say all the things related to pricing a package to run the chatbot.

Special thanks to kevin.changela for the tips on annyang.