{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# StreamReader Advanced Usages\n\n**Author**: [Moto Hira](moto@meta.com)_\n\nThis tutorial is the continuation of\n[StreamReader Basic Usages](./streamreader_basic_tutorial.html)_.\n\nThis shows how to use :py:class:`~torchaudio.io.StreamReader` for\n\n- Device inputs, such as microphone, webcam and screen recording\n- Generating synthetic audio / video\n- Applying preprocessing with custom filter expressions\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import torch\nimport torchaudio\n\nprint(torch.__version__)\nprint(torchaudio.__version__)\n\nimport IPython\nimport matplotlib.pyplot as plt\nfrom torchaudio.io import StreamReader\n\nbase_url = \"https://download.pytorch.org/torchaudio/tutorial-assets\"\nAUDIO_URL = f\"{base_url}/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav\"\nVIDEO_URL = f\"{base_url}/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4.mp4\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Audio / Video device input\n\n.. seealso::\n\n   - [Accelerated Video Decoding with NVDEC](../hw_acceleration_tutorial.html)_.\n   - [Online ASR with Emformer RNN-T](./online_asr_tutorial.html)_.\n   - [Device ASR with Emformer RNN-T](./device_asr.html)_.\n\nGiven that the system has proper media devices and libavdevice is\nconfigured to use the devices, the streaming API can\npull media streams from these devices.\n\nTo do this, we pass additional parameters ``format`` and ``option``\nto the constructor. ``format`` specifies the device component and\n``option`` dictionary is specific to the specified component.\n\nThe exact arguments to be passed depend on the system configuration.\nPlease refer to https://ffmpeg.org/ffmpeg-devices.html for the detail.\n\nThe following example illustrates how one can do this on MacBook Pro.\n\nFirst, we need to check the available devices.\n\n.. code::\n\n   $ ffmpeg -f avfoundation -list_devices true -i \"\"\n   [AVFoundation indev @ 0x143f04e50] AVFoundation video devices:\n   [AVFoundation indev @ 0x143f04e50] [0] FaceTime HD Camera\n   [AVFoundation indev @ 0x143f04e50] [1] Capture screen 0\n   [AVFoundation indev @ 0x143f04e50] AVFoundation audio devices:\n   [AVFoundation indev @ 0x143f04e50] [0] MacBook Pro Microphone\n\nWe use `FaceTime HD Camera` as video device (index 0) and\n`MacBook Pro Microphone` as audio device (index 0).\n\nIf we do not pass any ``option``, the device uses its default\nconfiguration. The decoder might not support the configuration.\n\n.. code::\n\n   >>> StreamReader(\n   ...     src=\"0:0\",  # The first 0 means `FaceTime HD Camera`, and\n   ...                 # the second 0 indicates `MacBook Pro Microphone`.\n   ...     format=\"avfoundation\",\n   ... )\n   [avfoundation @ 0x125d4fe00] Selected framerate (29.970030) is not supported by the device.\n   [avfoundation @ 0x125d4fe00] Supported modes:\n   [avfoundation @ 0x125d4fe00]   1280x720@[1.000000 30.000000]fps\n   [avfoundation @ 0x125d4fe00]   640x480@[1.000000 30.000000]fps\n   Traceback (most recent call last):\n     File \"<stdin>\", line 1, in <module>\n     ...\n   RuntimeError: Failed to open the input: 0:0\n\nBy providing ``option``, we can change the format that the device\nstreams to a format supported by decoder.\n\n.. code::\n\n   >>> streamer = StreamReader(\n   ...     src=\"0:0\",\n   ...     format=\"avfoundation\",\n   ...     option={\"framerate\": \"30\", \"pixel_format\": \"bgr0\"},\n   ... )\n   >>> for i in range(streamer.num_src_streams):\n   ...     print(streamer.get_src_stream_info(i))\n   SourceVideoStream(media_type='video', codec='rawvideo', codec_long_name='raw video', format='bgr0', bit_rate=0, width=640, height=480, frame_rate=30.0)\n   SourceAudioStream(media_type='audio', codec='pcm_f32le', codec_long_name='PCM 32-bit floating point little-endian', format='flt', bit_rate=3072000, sample_rate=48000.0, num_channels=2)\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n## Synthetic source streams\n\nAs a part of device integration, ffmpeg provides a \"virtual device\"\ninterface. This interface provides synthetic audio / video data\ngeneration using libavfilter.\n\nTo use this, we set ``format=lavfi`` and provide a filter description\nto ``src``.\n\nThe detail of filter description can be found at\nhttps://ffmpeg.org/ffmpeg-filters.html\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Audio Examples\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Sine wave\nhttps://ffmpeg.org/ffmpeg-filters.html#sine\n\n.. code::\n\n   StreamReader(src=\"sine=sample_rate=8000:frequency=360\", format=\"lavfi\")\n\n.. raw:: html\n\n   <audio controls>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/sine.wav\">\n   </audio>\n   <img\n    src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/sine.png\"\n    class=\"sphx-glr-single-img\" style=\"width:80%\">\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Signal with arbitral expression\n\nhttps://ffmpeg.org/ffmpeg-filters.html#aevalsrc\n\n.. code::\n\n   # 5 Hz binaural beats on a 360 Hz carrier\n   StreamReader(\n       src=(\n           'aevalsrc='\n           'sample_rate=8000:'\n           'exprs=0.1*sin(2*PI*(360-5/2)*t)|0.1*sin(2*PI*(360+5/2)*t)'\n       ),\n       format='lavfi',\n    )\n\n.. raw:: html\n\n   <audio controls>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/aevalsrc.wav\">\n   </audio>\n   <img\n    src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/aevalsrc.png\"\n    class=\"sphx-glr-single-img\" style=\"width:80%\">\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Noise\nhttps://ffmpeg.org/ffmpeg-filters.html#anoisesrc\n\n.. code::\n\n   StreamReader(src=\"anoisesrc=color=pink:sample_rate=8000:amplitude=0.5\", format=\"lavfi\")\n\n.. raw:: html\n\n   <audio controls>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/anoisesrc.wav\">\n   </audio>\n   <img\n    src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/anoisesrc.png\"\n    class=\"sphx-glr-single-img\" style=\"width:80%\">\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Video Examples\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Cellular automaton\nhttps://ffmpeg.org/ffmpeg-filters.html#cellauto\n\n.. code::\n\n   StreamReader(src=f\"cellauto\", format=\"lavfi\")\n\n.. raw:: html\n\n   <video controls autoplay loop muted>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/cellauto.mp4\">\n   </video>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Mandelbrot\nhttps://ffmpeg.org/ffmpeg-filters.html#cellauto\n\n.. code::\n\n   StreamReader(src=f\"mandelbrot\", format=\"lavfi\")\n\n.. raw:: html\n\n   <video controls autoplay loop muted>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/mandelbrot.mp4\">\n   </video>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### MPlayer Test patterns\nhttps://ffmpeg.org/ffmpeg-filters.html#mptestsrc\n\n.. code::\n\n   StreamReader(src=f\"mptestsrc\", format=\"lavfi\")\n\n.. raw:: html\n\n   <video controls autoplay loop muted width=192 height=192>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/mptestsrc.mp4\">\n   </video>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### John Conway's life game\nhttps://ffmpeg.org/ffmpeg-filters.html#life\n\n.. code::\n\n   StreamReader(src=f\"life\", format=\"lavfi\")\n\n.. raw:: html\n\n   <video controls autoplay loop muted>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/life.mp4\">\n   </video>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Sierpinski carpet/triangle fractal\nhttps://ffmpeg.org/ffmpeg-filters.html#sierpinski\n\n.. code::\n\n   StreamReader(src=f\"sierpinski\", format=\"lavfi\")\n\n.. raw:: html\n\n   <video controls autoplay loop muted>\n       <source src=\"https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/sierpinski.mp4\">\n   </video>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Custom filters\n\nWhen defining an output stream, you can use\n:py:meth:`~torchaudio.io.StreamReader.add_audio_stream` and\n:py:meth:`~torchaudio.io.StreamReader.add_video_stream` methods.\n\nThese methods take ``filter_desc`` argument, which is a string\nformatted according to ffmpeg's\n[filter expression](https://ffmpeg.org/ffmpeg-filters.html).\n\nThe difference between ``add_basic_(audio|video)_stream`` and\n``add_(audio|video)_stream`` is that ``add_basic_(audio|video)_stream``\nconstructs the filter expression and passes it to the same underlying\nimplementation. Everything ``add_basic_(audio|video)_stream`` can be\nachieved with ``add_(audio|video)_stream``.\n\n<div class=\"alert alert-info\"><h4>Note</h4><p>- When applying custom filters, the client code must convert\n     the audio/video stream to one of the formats that torchaudio\n     can convert to tensor format.\n     This can be achieved, for example, by applying\n     ``format=pix_fmts=rgb24`` to video stream and\n     ``aformat=sample_fmts=fltp`` to audio stream.\n   - Each output stream has separate filter graph. Therefore, it is\n     not possible to use different input/output streams for a\n     filter expression. However, it is possible to split one input\n     stream into multiple of them, and merge them later.</p></div>\n\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Audio Examples\n\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# fmt: off\ndescs = [\n    # No filtering\n    \"anull\",\n    # Apply a highpass filter then a lowpass filter\n    \"highpass=f=200,lowpass=f=1000\",\n    # Manipulate spectrogram\n    (\n        \"afftfilt=\"\n        \"real='hypot(re,im)*sin(0)':\"\n        \"imag='hypot(re,im)*cos(0)':\"\n        \"win_size=512:\"\n        \"overlap=0.75\"\n    ),\n    # Manipulate spectrogram\n    (\n        \"afftfilt=\"\n        \"real='hypot(re,im)*cos((random(0)*2-1)*2*3.14)':\"\n        \"imag='hypot(re,im)*sin((random(1)*2-1)*2*3.14)':\"\n        \"win_size=128:\"\n        \"overlap=0.8\"\n    ),\n]\n# fmt: on"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "sample_rate = 8000\n\nstreamer = StreamReader(AUDIO_URL)\nfor desc in descs:\n    streamer.add_audio_stream(\n        frames_per_chunk=40000,\n        filter_desc=f\"aresample={sample_rate},{desc},aformat=sample_fmts=fltp\",\n    )\n\nchunks = next(streamer.stream())\n\n\ndef _display(i):\n    print(\"filter_desc:\", streamer.get_out_stream_info(i).filter_description)\n    fig, axs = plt.subplots(2, 1)\n    waveform = chunks[i][:, 0]\n    axs[0].plot(waveform)\n    axs[0].grid(True)\n    axs[0].set_ylim([-1, 1])\n    plt.setp(axs[0].get_xticklabels(), visible=False)\n    axs[1].specgram(waveform, Fs=sample_rate)\n    fig.tight_layout()\n    return IPython.display.Audio(chunks[i].T, rate=sample_rate)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Original\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Highpass / lowpass filter\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### FFT filter - Robot \ud83e\udd16\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### FFT filter - Whisper\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Video Examples\n\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# fmt: off\ndescs = [\n    # No effect\n    \"null\",\n    # Split the input stream and apply horizontal flip to the right half.\n    (\n        \"split [main][tmp];\"\n        \"[tmp] crop=iw/2:ih:0:0, hflip [flip];\"\n        \"[main][flip] overlay=W/2:0\"\n    ),\n    # Edge detection\n    \"edgedetect=mode=canny\",\n    # Rotate image by randomly and fill the background with brown\n    \"rotate=angle=-random(1)*PI:fillcolor=brown\",\n    # Manipulate pixel values based on the coordinate\n    \"geq=r='X/W*r(X,Y)':g='(1-X/W)*g(X,Y)':b='(H-Y)/H*b(X,Y)'\"\n]\n# fmt: on"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "streamer = StreamReader(VIDEO_URL)\nfor desc in descs:\n    streamer.add_video_stream(\n        frames_per_chunk=30,\n        filter_desc=f\"fps=10,{desc},format=pix_fmts=rgb24\",\n    )\n\nstreamer.seek(12)\n\nchunks = next(streamer.stream())\n\n\ndef _display(i):\n    print(\"filter_desc:\", streamer.get_out_stream_info(i).filter_description)\n    _, axs = plt.subplots(1, 3, figsize=(8, 1.9))\n    chunk = chunks[i]\n    for j in range(3):\n        axs[j].imshow(chunk[10 * j + 1].permute(1, 2, 0))\n        axs[j].set_axis_off()\n    plt.tight_layout()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Original\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(0)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Mirror\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(1)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Edge detection\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Random rotation\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "#### Pixel manipulation\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "_display(4)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Tag: :obj:`torchaudio.io`\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.14"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}