Dictation With The Mixed Reality Toolkit
What you’ll need
If you haven’t done so already, be sure you’ve properly setup your development environment and you’ve imported the Mixed Reality Toolkit into your project. You’ll also need to be familiar with the Unity Editor and its interface controls. If you are not, there is a great tutorial series to get you started.
Getting Started
Note: Only availible for Windows Standalone and UWP Build Targets.
- Create a new scene
- Run the MRTK scene wizard via:
MixedRealityToolkit/Configure/Apply Scene Settings
- Create an empty GameObject
- Rename the new
GameObject
toDictationHandler
- Create a new script named
DictationHandler
- Attach the new
DictationHandler
script to yourDictationHandler
GameObject
- Open the new script in any text editor
- Implement the
IInputClickHandler
andIDictationHandler
interfaces - Add fields for the initial silence timeout, auto silence timeout, and total allowable recording time.
- Add fields for the text output
- Add a flag for recording
using UnityEngine; using HoloToolkit.Unity.InputModule; public class DictationHandler : MonoBehaviour, IInputClickHandler, IDictationHandler { [SerializeField] [Range(0.1f, 5f)] [Tooltip("The time length in seconds before dictation recognizer session ends due to lack of audio input in case there was no audio heard in the current session.")] private float initialSilenceTimeout = 5f; [SerializeField] [Range(5f, 60f)] [Tooltip("The time length in seconds before dictation recognizer session ends due to lack of audio input.")] private float autoSilenceTimeout = 20f; [SerializeField] [Range(1, 60)] [Tooltip("Length in seconds for the manager to listen.")] private int recordingTime = 10; private string lastOutput; private string speechToTextOutput = string.Empty; public string SpeechToTextOutput { get { return speechToTextOutput; } } private bool isRecording; }
- Add logic for handling the recording toggle when
DictationHander
GameObject
is clicked
public void OnInputClicked(InputClickedEventData eventData) { ToggleRecording(); } private void ToggleRecording() { if (isRecording) { isRecording = false; StartCoroutine(DictationInputManager.StopRecording()); } else { isRecording = true; StartCoroutine(DictationInputManager.StartRecording(initialSilenceTimeout, autoSilenceTimeout, recordingTime)); } }
- Add logic for handling dictation results
void IDictationHandler.OnDictationHypothesis(DictationEventData eventData) { speechToTextOutput = eventData.DictationResult; } void IDictationHandler.OnDictationResult(DictationEventData eventData) { speechToTextOutput = eventData.DictationResult; } void IDictationHandler.OnDictationComplete(DictationEventData eventData) { speechToTextOutput = eventData.DictationResult; } void IDictationHandler.OnDictationError(DictationEventData eventData) { isRecording = false; speechToTextOutput = eventData.DictationResult; Debug.LogError(eventData.DictationResult); StartCoroutine(DictationInputManager.StopRecording()); }
- Add logic for displaying the results
private void Update() { if (!string.IsNullOrEmpty(speechToTextOutput) && !lastOutput.Equals(speechToTextOutput)) { Debug.Log(speechToTextOutput); lastOutput = speechToTextOutput; } }
Next, we’ll take a look at handling the Hold, Navigation, and Manipulation Inputs.