A Speech Dialog Box for Universal Windows Phone apps

In this article I present a Universal Control to assist in text-to-speech and speech-to-text use cases on Windows Phone 8.1. The Speech Dialog Box is a templatable custom control in a shared universal project. Its default look is inspired by the Cortana UI: a text block with a microphone button attached to it:

Cortana SDB_Default

The Speech Input Dialog is an evolution of the Speech Input Box that I used in the previous two blog posts. This newer version of the control has a more ambitious purpose: it is designed to not only recognize and repeat speech input, but to engage a full conversation. Here’s a class diagram:

Style

The Speech Dialog Box comes as a localizable and templatable Custom Control. Its default style lives in the generic.xaml file. This style embeds a MediaElement, which is necessary for speaking, and a different grid for each relevant state. The control’s states are

  • Default: the control displays its current content (Question or Text) in a TextBlock, and also a Button to start listening.
  • Listening: the control shows in a TextBlock that he’s listening to you, and also a Button to cancel.
  • Typing: the control accepts typed input through a TextBox.
  • Thinking: the control shows in a TextBlock that he’s trying to recognize the input, and also a Button to cancel.
  • The Speaking state has no particular UI.

You don’t need to override this template to make the control blend into your design. The following properties are available to tweak its look:

  • Foreground: the color of the text in default and typing modes, becomes background color in other modes
  • Background: the background color in default and typing modes
  • Highlight: the secondary color, used for the border in typing mode and for the text and icons in listening mode
  • ButtonBackground: the background color for the button in default mode

Here’s and overview of the control in the different states, using the default color scheme:

SDB_Question SDB_Typing SDB_Listening SDB_Thinking

Managing a Conversation

The Speech Dialog Box comes with the following public members that help you to set up a two-way conversation:

  • VoiceGender: sets the gender to use for the voice (note: the language is taken from the UI culture).
  • Question: sets the question that the control will ask you.
  • Constraints: the list of constraints for speech recognition, e.g. a Semantic Interpretation for Speech Recognition (SISR) – there’s a nice example of this here.
  • StartListening: sets the control to listening mode.
  • Text: the recognized text.
  • TextChanged: this event is raised when the control has finished the text recognition.
  • ResponsePattern: the string format that specifies how the control will reply the recognized text to you, e.g. “I understood {0}”.
  • Speak: lets the control repeat the text.
  • Speak(string text): lets the control speak the specified text in its current voice.
  • SpeakSsml(string ssml): lets the  control speak the specified Speech Synthesis Markup Language (SSML) text in its current voice.

An example

The sample app contains buttons that demonstrate all of the control’s features. Here’s how the Speech Dialog Box is defined in XAML:

<controls:SpeechDialogBox x:Name="SpeechDialogBox"
                          Background="White"
                          Foreground="Black"
                          ButtonBackground="DimGray"
                          Highlight="DarkOrange" />

Here’s the code behind the ‘conversation’ button:

private async void ConversationButton_Click(object sender, RoutedEventArgs e)
{
    // Set the question.
    this.SpeechDialogBox.Question = "What's your favorite color?";

    // Let the control ask the question out loud.
    await this.SpeechDialogBox.Speak("What is your favorite color?");

    // Reset the control when it answered (optional).
    this.SpeechDialogBox.TextChanged += this.SpeechInputBox_TextChanged;

    // Teach the control to recognize the colors of the rainbow in a random text.
    var storageFile = await StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Assets//ColorRecognizer.xml"));
    var grammarFileConstraint = new SpeechRecognitionGrammarFileConstraint(storageFile, "colors");
    this.SpeechDialogBox.Constraints.Clear();
    this.SpeechDialogBox.Constraints.Add(grammarFileConstraint);
            
    // Format the spoken response.
    this.SpeechDialogBox.ResponsePattern = "What a coincidence. {0} is my favorite color too.";
            
    // Start listening
    this.SpeechDialogBox.StartListening();
}

private async void SpeechInputBox_TextChanged(object sender, EventArgs e)
{
    this.SpeechDialogBox.TextChanged -= this.SpeechInputBox_TextChanged;
    await this.SpeechDialogBox.Reset();
}

Here’s how the conversation may look like:

  • Phone: “What is your favorite color?”
  • Person: “I think it’s blue today.”
  • Phone: “What a coincidence: blue is my favorite color too.”

At this moment (at least before the Reset) the Text property of the control has the value “blue”, so you can continue the conversation with it. Cool, isn’t it?

Source Code

After many years, I decided to stop attaching ZIP files to my blog posts. All the newer stuff will be shared –and updated- through GitHub. This solution was created with Visual Studio 2013 Update 4.

Enjoy!

Xaml Brewer