Page 1 of 1

[HOWTO] How To Use, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversations

Posted: Wed Dec 21, 2016 8:55 pm
by Tony Li
(Link to example scenes at end of post. Also includes Windows Speech Recognition, Google, & Recognissimo examples.)

A Dialogue System user developed an interesting way to run conversations. I wanted to post a description here in case it can help others.

Subtitles can be sent to a text-to-speech plugin such as RT-Voice. The Dialogue System has very easy integration for it.

This post covers the other direction: allowing the player to make responses simply by speaking.

In place of a traditional response menu UI, his solution listens for keywords, which he refers to as "intents." When editing a response dialogue entry, he puts the intent text into the Menu Text field. When the player speaks a keyword associated with a response, the Dialogue System chooses that response.

To accomplish this, he used, an online speech recognition service. He started with the Unity / integration at

He used the parts of wit3D that initiate recording, save a file of the recording, send it to, get back the JSON, and parse the JSON for the intent. Then he passes the intent to a simple subclass of UnityUIDialogueUI that he named WitDialogueUI. (When he receives input from, he calls a static method in WitDialogueUI.) The basic code is here:


Code: Select all

using UnityEngine;
using System.Collections;
using PixelCrushers.DialogueSystem;

public class WitDialogueUI : UnityUIDialogueUI {

    //Use a singleton to allow access from static methods
    public static WitDialogueUI Instance;

    public override void Awake() {
        Instance = this;

    public bool showMenu = true; // Show response menu. Useful for debugging.

    public bool listening = false; // True when listening for a Wit.AI voice command.

    public Response[] responses;

    public override void ShowResponses(Subtitle subtitle, Response[] responses, float timeout) {
        base.ShowResponses(subtitle, responses, timeout);
        this.responses = responses; // Remember the responses to check when returns an intent.

    public override void HideResponses() {
        responses = null; // Response menu is done, so no responses to check.

    // Get the Intent from the _Handler script (part of Wit3d)
    public static void getIntentFromWit3d(string Wit3Dintent) {
        if (!string.IsNullOrEmpty(Wit3Dintent)) {
            foreach (var response in Instance.responses) {
                if (string.Equals(Wit3Dintent, response.formattedText.text)) {
                    // We have a match, select the choice:
                    Instance.OnClick(response); // Simulate a click on a response button.
So using this technique, he could create a game like The Wayne Investigation on Amazon Echo, or a mobile game that he could play in the car without having to look at a screen, or an accessible game for visually impaired players. Pretty neat!

Example scene ( WitAI_Example_2017-01-09.unitypackage

Example scene (Windows Speech Recognition): DS_WindowsSpeechRecognitionExample_2020-08-20.unitypackage

Re: Using or Windows Speech Recognition in Conversations

Posted: Fri Mar 11, 2022 8:38 am
by Tony Li
Here's an example dialogue UI integration script for Google Cloud Speech-To-Text.

Code: Select all

using Google.Cloud.Speech.V1;
using PixelCrushers.DialogueSystem;
using System.Threading;
using System.Threading.Tasks;
using UnityEngine;

public class GoogleSpeechDialogueUI : StandardDialogueUI
    public bool showMenu = true; //Show response menu. Useful for debugging

    public static bool listening = true; //True when listening for a Google STT voice
    public static bool foundMatch = false;

    private Response[] responses = null;
    private Response matchingResponse = null;

    private SpeechClient speech = null;
    private SpeechClient.StreamingRecognizeStream streamingCall = null;
    private Task task;
    private object writeLock;
    private bool writeMore;
    private NAudio.Wave.WaveInEvent waveIn;
    private bool didNotUnderstand = false;
    private string s = string.Empty;

    public override void ShowResponses(Subtitle subtitle, Response[] responses, float timeout)
        this.responses = responses;
        if (NAudio.Wave.WaveIn.DeviceCount < 1)
            Debug.Log("No microphone! Using basic response menu.");
            base.ShowResponses(subtitle, responses, timeout);
        if (showMenu)
            base.ShowResponses(subtitle, responses, timeout);
        task = StartListening();

    private Task StartListening()
        listening = true;
        foundMatch = false;
        speech = SpeechClient.Create();
        streamingCall = speech.StreamingRecognize();
        matchingResponse = null;

        // Write the initial request with the config.
        new StreamingRecognizeRequest()
            StreamingConfig = new StreamingRecognitionConfig()
                Config = new RecognitionConfig()
                    Encoding =
                    SampleRateHertz = 16000,
                    LanguageCode = "en",
                InterimResults = false,

        // Process responses as they arrive.
        Task processResponses = Task.Run(async () =>
            while (await streamingCall.ResponseStream.MoveNext(
                foreach (var result in streamingCall.ResponseStream
                    foreach (var alternative in result.Alternatives)
                        string text = alternative.Transcript;
                        Debug.Log("Heard: " + alternative.Transcript);

        // Read from the microphone and stream to API.
        writeLock = new object();
        writeMore = true;
        waveIn = new NAudio.Wave.WaveInEvent();
        waveIn.DeviceNumber = 0;
        waveIn.WaveFormat = new NAudio.Wave.WaveFormat(16000, 1);
        waveIn.DataAvailable +=
            (object sender, NAudio.Wave.WaveInEventArgs args) =>
                lock (writeLock)
                    if (!writeMore) return;
                        new StreamingRecognizeRequest()
                            AudioContent = Google.Protobuf.ByteString
                                .CopyFrom(args.Buffer, 0, args.BytesRecorded)
        Debug.Log("Speak now.");

        return processResponses;

    public override void Update()
        if (listening)
            if (!string.IsNullOrEmpty(s))
                s = string.Empty;
            if (foundMatch)
                listening = false;
                Debug.Log("Stopping listening.");
                lock (writeLock) writeMore = false;
                OnClick(matchingResponse); // Simulate a click on a response button
            else if (didNotUnderstand)
                DialogueManager.ShowAlert("Sorry, I don't understand. Say something else.");
                didNotUnderstand = false;

    public void CheckResponses(string text)
        if (string.IsNullOrEmpty(text)) Debug.Log("text is blank");
        if (!string.IsNullOrEmpty(text))
            foreach (var response in responses)
                var menuOption = response.formattedText.text;
                s += "Checking option: " + menuOption + " against " + text + "\n";
                if (string.Equals(text, menuOption, System.StringComparison.OrdinalIgnoreCase))
                    s += "Found a match for '" + menuOption + "': " + text + "\n";
                    //we have a match, select the choice:
                    foundMatch = true;
                    matchingResponse = response;
                    didNotUnderstand = false;
        didNotUnderstand = true;

Re: [HOWTO] How To Use, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversati

Posted: Sun Jul 16, 2023 8:22 pm
by Tony Li
Recognissimo integration:
Contributed by Matias Gesche:

Code: Select all

using System.Collections.Generic;
using UnityEngine;
using Recognissimo.Components;
using PixelCrushers.DialogueSystem;
using PixelCrushers;

public class SpeechRecognissimo : StandardDialogueUI
    public float speechRecognitionTimeout = 5;

    private VoiceControl m_voiceControl;
    private Response[] m_responses;
    private Response m_timeoutResponse;
    private float m_timeLeft;
    private bool m_isWaitingForResponse;

    public override void ShowResponses(Subtitle subtitle, Response[] responses, float timeout)
        // Remember the responses to check when we recognize a keyword:
        if (responses.Length > 1)
            // If we have more than one, assume the last response is the "I don't understand"/timeout special response.
            // Record it and remove it from the array of regular responses:
            m_timeoutResponse = responses[responses.Length - 1];
            var responsesExceptLast = new Response[responses.Length - 1];
            for (int i = 0; i < responses.Length - 1; i++)
                responsesExceptLast[i] = responses[i];
            responses = responsesExceptLast;
            m_timeoutResponse = null;
        m_responses = responses;
        m_timeLeft = speechRecognitionTimeout;
        m_isWaitingForResponse = true;

        // Show the responses:
        base.ShowResponses(subtitle, responses, timeout);

        // Identify the keywords to recognize:
        // (Each response's menu text can have keywords separated by pipe characters.)
        var allKeywords = new List<string>();
        foreach (var response in responses)
            var responseKeywords = response.formattedText.text.Split('|');

        // Set up the voice control recognizer:
        m_voiceControl = GetComponent<VoiceControl>();
        if (m_voiceControl == null)
            m_voiceControl = gameObject.AddComponent<VoiceControl>();
        m_voiceControl.AsapMode = true;

        foreach (var keyword in allKeywords)
            m_voiceControl.Commands.Add(new VoiceControlCommand(keyword, () => OnPhraseRecognized(keyword)));

        m_voiceControl.InitializationFailed.AddListener(e => Debug.LogError("Voice Control initialization failed: " + e.Message));

    public override void HideResponses()

        // Stop speech recognition when we hide the menu:
        if (m_voiceControl != null)
        m_timeoutResponse = null;
        m_isWaitingForResponse = false;

    public override void Update()

        // Update speech recognition timer:
        if (m_isWaitingForResponse && m_timeoutResponse != null)
            m_timeLeft -= Time.deltaTime;
            if (m_timeLeft <= 0)
                // If time runs out, use the timeout response:
                m_isWaitingForResponse = false;

    private void OnPhraseRecognized(string phrase)
        Debug.Log("Recognized: '" + phrase + "'");

        // Match the user's spoken phrase with one of the responses:
        foreach (var response in m_responses)
            var responseKeywords = response.destinationEntry.DialogueText.Split('|');
            foreach (var responseKeyword in responseKeywords)
                if (string.Equals(phrase, responseKeyword, System.StringComparison.OrdinalIgnoreCase))
                    m_isWaitingForResponse = false;
                    return; // Exit the loop once the response is found and clicked.

Re: [HOWTO] How To Use, Windows Speech Recognition, Google Cloud Voice Recognition, or Recognissimo in Conversati

Posted: Fri Apr 05, 2024 8:29 pm
by Tony Li
Here's another Recogmissmo integration with an example scene, including an alternative scene that uses overhead bubble panels.
