[HOWTO] How To Use Wit.ai, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversations

Tony Li · Post by **Tony Li** » Wed Dec 21, 2016 8:55 pm

(Link to example scenes at end of post. Also includes Windows Speech Recognition, Google, & Recognissimo examples.)

A Dialogue System user developed an interesting way to run conversations. I wanted to post a description here in case it can help others.

Subtitles can be sent to a text-to-speech plugin such as RT-Voice. The Dialogue System has very easy integration for it.

This post covers the other direction: allowing the player to make responses simply by speaking.

In place of a traditional response menu UI, his solution listens for keywords, which he refers to as "intents." When editing a response dialogue entry, he puts the intent text into the Menu Text field. When the player speaks a keyword associated with a response, the Dialogue System chooses that response.

To accomplish this, he used wit.ai, an online speech recognition service. He started with the Unity / wit.ai integration at https://github.com/afauch/wit3d

He used the parts of wit3D that initiate recording, save a file of the recording, send it to wit.ai, get back the JSON, and parse the JSON for the intent. Then he passes the intent to a simple subclass of UnityUIDialogueUI that he named WitDialogueUI. (When he receives input from wit.ai, he calls a static method in WitDialogueUI.) The basic code is here:

WitDialogueUI.cs

Code: Select all

using UnityEngine;
using System.Collections;
using PixelCrushers.DialogueSystem;

public class WitDialogueUI : UnityUIDialogueUI {

    //Use a singleton to allow access from static methods
    public static WitDialogueUI Instance;

    public override void Awake() {
        base.Awake();
        Instance = this;
    }

    public bool showMenu = true; // Show response menu. Useful for debugging.

    public bool listening = false; // True when listening for a Wit.AI voice command.

    public Response[] responses;

    public override void ShowResponses(Subtitle subtitle, Response[] responses, float timeout) {
        base.ShowResponses(subtitle, responses, timeout);
        this.responses = responses; // Remember the responses to check when wit.ai returns an intent.
    }

    public override void HideResponses() {
        base.HideResponses();
        responses = null; // Response menu is done, so no responses to check.
    }

    // Get the Intent from the _Handler script (part of Wit3d)
    public static void getIntentFromWit3d(string Wit3Dintent) {
        if (!string.IsNullOrEmpty(Wit3Dintent)) {
            foreach (var response in Instance.responses) {
                if (string.Equals(Wit3Dintent, response.formattedText.text)) {
                    // We have a match, select the choice:
                    Instance.OnClick(response); // Simulate a click on a response button.
                }
            }
        }
    }    
}

So using this technique, he could create a game like The Wayne Investigation on Amazon Echo, or a mobile game that he could play in the car without having to look at a screen, or an accessible game for visually impaired players. Pretty neat!

Example scene (Wit.ai): WitAI_Example_2017-01-09.unitypackage

Example scene (Windows Speech Recognition): DS_WindowsSpeechRecognitionExample_2020-08-20.unitypackage

Tony Li · Post by **Tony Li** » Fri Mar 11, 2022 8:38 am

Here's an example dialogue UI integration script for Google Cloud Speech-To-Text.

Code: Select all

using Google.Cloud.Speech.V1;
using PixelCrushers.DialogueSystem;
using System.Threading;
using System.Threading.Tasks;
using UnityEngine;

public class GoogleSpeechDialogueUI : StandardDialogueUI
{
    public bool showMenu = true; //Show response menu. Useful for debugging

    public static bool listening = true; //True when listening for a Google STT voice
    public static bool foundMatch = false;

    private Response[] responses = null;
    private Response matchingResponse = null;

    private SpeechClient speech = null;
    private SpeechClient.StreamingRecognizeStream streamingCall = null;
    private Task task;
    private object writeLock;
    private bool writeMore;
    private NAudio.Wave.WaveInEvent waveIn;
    private bool didNotUnderstand = false;
    private string s = string.Empty;

    public override void ShowResponses(Subtitle subtitle, Response[] responses, float timeout)
    {
        this.responses = responses;
        if (NAudio.Wave.WaveIn.DeviceCount < 1)
        {
            Debug.Log("No microphone! Using basic response menu.");
            base.ShowResponses(subtitle, responses, timeout);
            return;
        }
        if (showMenu)
        {
            base.ShowResponses(subtitle, responses, timeout);
        }
        task = StartListening();
    }

    private Task StartListening()
    {
        listening = true;
        foundMatch = false;
        speech = SpeechClient.Create();
        streamingCall = speech.StreamingRecognize();
        matchingResponse = null;

        // Write the initial request with the config.
        streamingCall.WriteAsync(
        new StreamingRecognizeRequest()
        {
            StreamingConfig = new StreamingRecognitionConfig()
            {
                Config = new RecognitionConfig()
                {
                    Encoding =
                    RecognitionConfig.Types.AudioEncoding.Linear16,
                    SampleRateHertz = 16000,
                    LanguageCode = "en",
                },
                InterimResults = false,
            }
        });

        // Process responses as they arrive.
        Task processResponses = Task.Run(async () =>
        {
            while (await streamingCall.ResponseStream.MoveNext(
                default(CancellationToken)))
            {
                foreach (var result in streamingCall.ResponseStream
                    .Current.Results)
                {
                    foreach (var alternative in result.Alternatives)
                    {
                        string text = alternative.Transcript;
                        Debug.Log("Heard: " + alternative.Transcript);
                        CheckResponses(text.Trim());
                    }
                }
            }
        });

        // Read from the microphone and stream to API.
        writeLock = new object();
        writeMore = true;
        waveIn = new NAudio.Wave.WaveInEvent();
        waveIn.DeviceNumber = 0;
        waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
        waveIn.DataAvailable +=
            (object sender, NAudio.Wave.WaveInEventArgs args) =>
            {
                lock (writeLock)
                {
                    if (!writeMore) return;
                    streamingCall.WriteAsync(
                        new StreamingRecognizeRequest()
                        {
                            AudioContent = Google.Protobuf.ByteString
                                .CopyFrom(args.Buffer, 0, args.BytesRecorded)
                        }).Wait();
                }
            };
        waveIn.StartRecording();
        Debug.Log("Speak now.");

        return processResponses;
    }

    public override void Update()
    {
        base.Update();
        if (listening)
        {
            if (!string.IsNullOrEmpty(s))
            {
                Debug.Log(s);
                s = string.Empty;
            }
            if (foundMatch)
            {
                listening = false;
                Debug.Log("Stopping listening.");
                waveIn.StopRecording();
                lock (writeLock) writeMore = false;
                streamingCall.WriteCompleteAsync();
                OnClick(matchingResponse); // Simulate a click on a response button
            }
            else if (didNotUnderstand)
            {
                DialogueManager.ShowAlert("Sorry, I don't understand. Say something else.");
                didNotUnderstand = false;
            }
        }
    }

    public void CheckResponses(string text)
    {
        if (string.IsNullOrEmpty(text)) Debug.Log("text is blank");
        if (!string.IsNullOrEmpty(text))
        {
            foreach (var response in responses)
            {
                var menuOption = response.formattedText.text;
                s += "Checking option: " + menuOption + " against " + text + "\n";
                if (string.Equals(text, menuOption, System.StringComparison.OrdinalIgnoreCase))
                {
                    s += "Found a match for '" + menuOption + "': " + text + "\n";
                    //we have a match, select the choice:
                    foundMatch = true;
                    matchingResponse = response;
                    didNotUnderstand = false;
                    return;
                }
            }
        }
        didNotUnderstand = true;
    }
}

Tony Li · Post by **Tony Li** » Sun Jul 16, 2023 8:22 pm

Recognissimo integration:
Contributed by Matias Gesche:

Code: Select all

using System.Collections.Generic;
using UnityEngine;
using Recognissimo.Components;
using PixelCrushers.DialogueSystem;
using PixelCrushers;

public class SpeechRecognissimo : StandardDialogueUI
{
    public float speechRecognitionTimeout = 5;

    private VoiceControl m_voiceControl;
    private Response[] m_responses;
    private Response m_timeoutResponse;
    private float m_timeLeft;
    private bool m_isWaitingForResponse;

    public override void ShowResponses(Subtitle subtitle, Response[] responses, float timeout)
    {
        // Remember the responses to check when we recognize a keyword:
        if (responses.Length > 1)
        {
            // If we have more than one, assume the last response is the "I don't understand"/timeout special response.
            // Record it and remove it from the array of regular responses:
            m_timeoutResponse = responses[responses.Length - 1];
            var responsesExceptLast = new Response[responses.Length - 1];
            for (int i = 0; i < responses.Length - 1; i++)
            {
                responsesExceptLast[i] = responses[i];
            }
            responses = responsesExceptLast;
        }
        else
        {
            m_timeoutResponse = null;
        }
        m_responses = responses;
        m_timeLeft = speechRecognitionTimeout;
        m_isWaitingForResponse = true;

        // Show the responses:
        base.ShowResponses(subtitle, responses, timeout);

        // Identify the keywords to recognize:
        // (Each response's menu text can have keywords separated by pipe characters.)
        var allKeywords = new List<string>();
        foreach (var response in responses)
        {
            var responseKeywords = response.formattedText.text.Split('|');
            allKeywords.AddRange(responseKeywords);
        }

        // Set up the voice control recognizer:
        m_voiceControl = GetComponent<VoiceControl>();
        if (m_voiceControl == null)
        {
            m_voiceControl = gameObject.AddComponent<VoiceControl>();
        }
        m_voiceControl.AsapMode = true;

        foreach (var keyword in allKeywords)
        {
            m_voiceControl.Commands.Add(new VoiceControlCommand(keyword, () => OnPhraseRecognized(keyword)));
        }

        m_voiceControl.InitializationFailed.AddListener(e => Debug.LogError("Voice Control initialization failed: " + e.Message));
        m_voiceControl.StartProcessing();
    }

    public override void HideResponses()
    {
        base.HideResponses();

        // Stop speech recognition when we hide the menu:
        if (m_voiceControl != null)
        {
            m_voiceControl.StopProcessing();
            m_voiceControl.Commands.Clear();
        }
        m_timeoutResponse = null;
        m_isWaitingForResponse = false;
    }

    public override void Update()
    {
        base.Update();

        // Update speech recognition timer:
        if (m_isWaitingForResponse && m_timeoutResponse != null)
        {
            m_timeLeft -= Time.deltaTime;
            if (m_timeLeft <= 0)
            {
                // If time runs out, use the timeout response:
                OnClick(m_timeoutResponse);
                m_isWaitingForResponse = false;
            }
        }
    }

    private void OnPhraseRecognized(string phrase)
    {
        Debug.Log("Recognized: '" + phrase + "'");

        // Match the user's spoken phrase with one of the responses:
        foreach (var response in m_responses)
        {
            var responseKeywords = response.destinationEntry.DialogueText.Split('|');
            foreach (var responseKeyword in responseKeywords)
            {
                if (string.Equals(phrase, responseKeyword, System.StringComparison.OrdinalIgnoreCase))
                {
                    OnClick(response);
                    m_isWaitingForResponse = false;
                    return; // Exit the loop once the response is found and clicked.
                }
            }
        }
    }
}

Tony Li · Post by **Tony Li** » Fri Apr 05, 2024 8:29 pm

Here's another Recogmissmo integration with an example scene, including an alternative scene that uses overhead bubble panels.

DS_RecognissimoMenuPanelWithBubble_2024-04-10.unitypackage

Pixel Crushers Forum

[HOWTO] How To Use Wit.ai, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversations

[HOWTO] How To Use Wit.ai, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversations

Re: Using Wit.ai or Windows Speech Recognition in Conversations

Re: [HOWTO] How To Use Wit.ai, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversati

Re: [HOWTO] How To Use Wit.ai, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversati