Dictation With The Mixed Reality Toolkit

Dictation With The Mixed Reality Toolkit

What you’ll need

If you haven’t done so already, be sure you’ve properly setup your development environment and you’ve imported the Mixed Reality Toolkit into your project.  You’ll also need to be familiar with the Unity Editor and its interface controls.  If you are not, there is a great tutorial series to get you started.

Getting Started

Note: Only availible for Windows Standalone and UWP Build Targets.

  1. Create a new scene
  2. Run the MRTK scene wizard via:
    MixedRealityToolkit/Configure/Apply Scene Settings
  3. Create an empty GameObject
  4. Rename the new GameObject to DictationHandler
  5. Create a new script named DictationHandler
  6. Attach the new DictationHandler script to your DictationHandler GameObject
  7. Open the new script in any text editor
  8. Implement the IInputClickHandler and IDictationHandler interfaces
  9. Add fields for the initial silence timeout, auto silence timeout, and total allowable recording time.
  10. Add fields for the text output
  11. Add a flag for recording
using UnityEngine;
using HoloToolkit.Unity.InputModule;

public class DictationHandler : MonoBehaviour, IInputClickHandler, IDictationHandler
{
    [SerializeField]
    [Range(0.1f, 5f)]
    [Tooltip("The time length in seconds before dictation recognizer session ends due to lack of audio input in case there was no audio heard in the current session.")]
    private float initialSilenceTimeout = 5f;

    [SerializeField]
    [Range(5f, 60f)]
    [Tooltip("The time length in seconds before dictation recognizer session ends due to lack of audio input.")]
    private float autoSilenceTimeout = 20f;

    [SerializeField]
    [Range(1, 60)]
    [Tooltip("Length in seconds for the manager to listen.")]
    private int recordingTime = 10;

    private string lastOutput;
    private string speechToTextOutput = string.Empty;
    public string SpeechToTextOutput { get { return speechToTextOutput; } }

    private bool isRecording;
}
  1. Add logic for handling the recording toggle when DictationHander GameObject is clicked
public void OnInputClicked(InputClickedEventData eventData)
{
    ToggleRecording();
}

private void ToggleRecording()
{
    if (isRecording)
    {
        isRecording = false;
        StartCoroutine(DictationInputManager.StopRecording());
    }
    else
    {
        isRecording = true;
        StartCoroutine(DictationInputManager.StartRecording(initialSilenceTimeout, autoSilenceTimeout, recordingTime));
    }
}
  1. Add logic for handling dictation results
void IDictationHandler.OnDictationHypothesis(DictationEventData eventData)
{
    speechToTextOutput = eventData.DictationResult;
}

void IDictationHandler.OnDictationResult(DictationEventData eventData)
{
    speechToTextOutput = eventData.DictationResult;
}

void IDictationHandler.OnDictationComplete(DictationEventData eventData)
{
    speechToTextOutput = eventData.DictationResult;
}

void IDictationHandler.OnDictationError(DictationEventData eventData)
{
    isRecording = false;
    speechToTextOutput = eventData.DictationResult;
    Debug.LogError(eventData.DictationResult);
    StartCoroutine(DictationInputManager.StopRecording());
}
  1. Add logic for displaying the results
private void Update()
{
    if (!string.IsNullOrEmpty(speechToTextOutput) && !lastOutput.Equals(speechToTextOutput))
    {
        Debug.Log(speechToTextOutput);
        lastOutput = speechToTextOutput;
    }
}

Next, we’ll take a look at handling the Hold, Navigation, and Manipulation Inputs.

Comments are closed.