{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Device ASR with Emformer RNN-T\n\n**Author**: [Moto Hira](moto@meta.com)_, [Jeff Hwang](jeffhwang@meta.com)_.\n\nThis tutorial shows how to use Emformer RNN-T and streaming API\nto perform speech recognition on a streaming device input, i.e. microphone\non laptop.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
This tutorial requires FFmpeg libraries.\n Please refer to `FFmpeg dependency
This tutorial was tested on MacBook Pro and Dynabook with Windows 10.\n\n This tutorial does NOT work on Google Colab because the server running\n this tutorial does not have a microphone that you can talk to.
The proper value of ``backoff`` depends on the system configuration.\n One way to see if ``backoff`` value is appropriate is to save the\n series of acquired chunks as a continuous audio and listen to it.\n If ``backoff`` value is too large, then the data stream is discontinuous.\n The resulting audio sounds sped up.\n If ``backoff`` value is too small or zero, the audio stream is fine,\n but the data acquisition process enters busy-waiting state, and\n this increases the CPU consumption.
As the data acquisition subprocess will be launched with `\"spawn\"`\n method, all the code on global scope are executed on the subprocess\n as well.\n\n We want to instantiate pipeline only in the main process,\n so we put them in a function and invoke it within\n `__name__ == \"__main__\"` guard.