Speech Recognition
Promptu applies proprietary, industry-leading technology to recognize and understand spoken commands and phrases. Promptu's competitive advantage derives from the company's innovations in four key areas: core speech technology, enhancement of commercial speech recognition software, user interface design, and user education.
Core Speech Technology
Speech recognition technologies fall into two basic categories: grammar-based or open dictation. Grammar-based recognizers will recognize only those phrases that match a predetermined list or pattern, called a grammar. Open dictation recognizers operate without this constraint: users may assemble words in any natural order to express their desired meaning. (As with human speech recognition, the words spoken must be known in advance to the system.)
Promptu's products use both technologies, as appropriate to the task at hand. Grammar-based recognizers generally achieve the highest recognition accuracy and the lowest latency, and are ideal for recognizing place names, dates, times, street addresses and the like. Their weakness is their rigidity: if a spoken phrase does not match the grammar, it cannot be recognized.
Open dictation systems are more flexible. Since they will recognize words spoken in any natural order, they are appropriate for text or email dictation systems. However, they are more sensitive to variations in accent, and more susceptible to background noise.
Promptu brings key inventions to the table, to improve both grammar-based recognition and open dictation.
Grammar-Based Recognition
Promptu has perfected techniques for creating highly flexible grammars. These techniques track actual and anticipated user speech behavior, so that the system's grammars contain many plausible variations of the phrases that a user is likely to say.
The same Promptu technology that creates flexible grammars can also identify and resolve semantic ambiguity-that is, cases where a single spoken phrase can have several distinct meanings. In such instances, Promptu's products automatically engage the user in a dialog, to resolve the ambiguity and determine the user's intentions.
Promptu has developed methods for acoustic analysis of grammars, to automatically determine which listed phrases (if any) are acoustically confusable, and therefore likely to be troublesome in practice. This allows Promptu to refine an application's grammars, in advance of live system deployment, and adjust them for the highest possible recognition accuracy.
Open Dictation
Promptu's open dictation systems are unique in their use of enrollment. Before dictating any actual messages, users are prompted to repeat a few short phrases. The Promptu recognizer processes these phrases, called enrollment utterances, to adapt to the speaker's accent and speech patterns. This yields both higher accuracy and lower latency.
Promptu Enhancements
Promptu purchases its core speech recognition technology from a commercial software vendor; this software is then enhanced according to Promptu's specifications.
Promptu's enhancements and modifications are of two kinds.
First, Promptu has developed custom acoustic models for selected languages. (An acoustic model is a mathematical template of speech sounds, and is used to judge which word or words most closely match a user's spoken input.) These custom models are built from both scripted audio recordings, targeting particular regions or accents, and from utterances captured from Promptu's live, deployed products.
Likewise, Promptu has collected a large corpus of text messages, representing typical SMS message content, and used them to develop custom language models. (A language model is a mathematical template of likely word sequences, and is used to ensure that the recognizer returns meaningful sentences.)
These custom model pairs-the acoustic model and the language model-yield higher recognition accuracy, through improved handling of accents, adaptation to the exceptionally clean Promptu audio channel, and greater consistency with typical message content.
Second, Promptu has modified and augmented the vendor-supplied dictionaries, correcting or adding tens of thousands of word pronunciations.
Controlled studies demonstrate that these enhancements, which are proprietary, can cut in half the number of recognition errors.
User Interface Design
Promptu's products feature innovative, best-of-class UI design to ensure a smooth, successful user experience.
All Promptu designs provide disambiguation. If the system is unsure of the contents of an utterance, or if the utterance was clearly recognized, but could have several different meanings, the system will ask the user for clarification. The former is an instance of acoustic disambiguation, the latter of semantic disambiguation. Both are intrinsic elements of Promptu's UI technology, and available in every product. The clarification mechanism is a clickable graphic display, which enables quick and decisive resolution of uncertainties.
Promptu's products are truly multi-modal. This means that spoken, button-press and touch commands may all be freely intermixed, to control the application. The user may choose whatever entry or control mode is most convenient. In particular, direct text input, by button or touch, is always available as an alternative to speech.
Promptu has also introduced techniques to improve the quality of the user's spoken audio. The Promptu speech capture interface provides synchronized visual and audio cues when activated, so the user knows just when the system is listening.
User Education
Every utterance presented to a Promptu server is analyzed in real time for common user speech input errors. If the system determines that the audio is faulty, the user is shown a helpful message, with advice on how to correct the problem. These "signal quality indicators," or SQIs, help users rapidly increase their competence in operating the product, and ensure a successful user experience.
In addition, Promptu's message dictation products include brief, entertaining animated videos, which teach the user how to avoid common mistakes.





