{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Oscillator and ADSR envelope\n\n**Author**: [Moto Hira](moto@meta.com)_\n\nThis tutorial shows how to synthesize various waveforms using\n:py:func:`~torchaudio.prototype.functional.oscillator_bank` and\n:py:func:`~torchaudio.prototype.functional.adsr_envelope`.\n\n

Warning

This tutorial requires prototype DSP features, which are\n available in nightly builds.\n\n Please refer to https://pytorch.org/get-started/locally\n for instructions for installing a nightly build.

\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\nimport torchaudio\n\nprint(torch.__version__)\nprint(torchaudio.__version__)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "try:\n from torchaudio.prototype.functional import adsr_envelope, oscillator_bank\nexcept ModuleNotFoundError:\n print(\n \"Failed to import prototype DSP features. \"\n \"Please install torchaudio nightly builds. \"\n \"Please refer to https://pytorch.org/get-started/locally \"\n \"for instructions to install a nightly build.\"\n )\n raise\n\nimport math\n\nimport matplotlib.pyplot as plt\nfrom IPython.display import Audio\n\nPI = torch.pi\nPI2 = 2 * torch.pi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Oscillator Bank\n\nSinusoidal oscillator generates sinusoidal waveforms from given\namplitudes and frequencies.\n\n\\begin{align}x_t = A_t \\sin \\theta_t\\end{align}\n\nWhere the phase $\\theta_t$ is found by integrating the instantaneous\nfrequency $f_t$.\n\n\\begin{align}\\theta_t = \\sum_{k=1}^{t} f_k\\end{align}\n\n

Note

Why integrate the frequencies? Instantaneous frequency represents the velocity\n of oscillation at given time. So integrating the instantaneous frequency gives\n the displacement of the phase of the oscillation, since the start.\n In discrete-time signal processing, integration becomes accumulation.\n In PyTorch, accumulation can be computed using :py:func:`torch.cumsum`.

\n\n:py:func:`torchaudio.prototype.functional.oscillator_bank` generates a bank of\nsinsuoidal waveforms from amplitude envelopes and instantaneous frequencies.\n\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simple Sine Wave\n\nLet's start with simple case.\n\nFirst, we generate sinusoidal wave that has constant frequency and\namplitude everywhere, that is, a regular sine wave.\n\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We define some constants and helper function that we use for\nthe rest of the tutorial.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "F0 = 344.0 # fundamental frequency\nDURATION = 1.1 # [seconds]\nSAMPLE_RATE = 16_000 # [Hz]\n\nNUM_FRAMES = int(DURATION * SAMPLE_RATE)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "def show(freq, amp, waveform, sample_rate, zoom=None, vol=0.3):\n t = (torch.arange(waveform.size(0)) / sample_rate).numpy()\n\n fig, axes = plt.subplots(4, 1, sharex=True)\n axes[0].plot(t, freq.numpy())\n axes[0].set(title=f\"Oscillator bank (bank size: {amp.size(-1)})\", ylabel=\"Frequency [Hz]\", ylim=[-0.03, None])\n axes[1].plot(t, amp.numpy())\n axes[1].set(ylabel=\"Amplitude\", ylim=[-0.03 if torch.all(amp >= 0.0) else None, None])\n axes[2].plot(t, waveform.numpy())\n axes[2].set(ylabel=\"Waveform\")\n axes[3].specgram(waveform, Fs=sample_rate)\n axes[3].set(ylabel=\"Spectrogram\", xlabel=\"Time [s]\", xlim=[-0.01, t[-1] + 0.01])\n\n for i in range(4):\n axes[i].grid(True)\n pos = axes[2].get_position()\n plt.tight_layout()\n\n if zoom is not None:\n ax = fig.add_axes([pos.x0 + 0.01, pos.y0 + 0.03, pos.width / 2.5, pos.height / 2.0])\n ax.plot(t, waveform)\n ax.set(xlim=zoom, xticks=[], yticks=[])\n\n waveform /= waveform.abs().max()\n return Audio(vol * waveform, rate=sample_rate, normalize=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we synthesize the audio with constant frequency and amplitude\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "freq = torch.full((NUM_FRAMES, 1), F0)\namp = torch.ones((NUM_FRAMES, 1))\n\nwaveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Combining multiple sine waves\n\n:py:func:`~torchaudio.prototype.functional.oscillator_bank` can\ncombine an arbitrary number of sinusoids to generate a waveform.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "freq = torch.empty((NUM_FRAMES, 3))\nfreq[:, 0] = F0\nfreq[:, 1] = 3 * F0\nfreq[:, 2] = 5 * F0\n\namp = torch.ones((NUM_FRAMES, 3)) / 3\n\nwaveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, waveform, SAMPLE_RATE, zoom=(1 / F0, 3 / F0))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing Frequencies across time\n\nLet's change the frequency over time. Here, we change the frequency\nfrom zero to the Nyquist frequency (half of the sample rate) in\nlog-scale so that it is easy to see the change in waveform.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "nyquist_freq = SAMPLE_RATE / 2\nfreq = torch.logspace(0, math.log(0.99 * nyquist_freq, 10), NUM_FRAMES).unsqueeze(-1)\namp = torch.ones((NUM_FRAMES, 1))\n\nwaveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, waveform, SAMPLE_RATE, vol=0.2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also oscillate frequency.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "fm = 2.5 # rate at which the frequency oscillates\nf_dev = 0.9 * F0 # the degree of frequency oscillation\n\nfreq = F0 + f_dev * torch.sin(torch.linspace(0, fm * PI2 * DURATION, NUM_FRAMES))\nfreq = freq.unsqueeze(-1)\n\namp = torch.ones((NUM_FRAMES, 1))\n\nwaveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, waveform, SAMPLE_RATE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ADSR Envelope\n\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, we change the amplitude over time. A common technique to model\namplitude is ADSR Envelope.\n\nADSR stands for Attack, Decay, Sustain, and Release.\n\n - `Attack` is the time it takes to reach from zero to the top level.\n - `Decay` is the time it takes from the top to reach sustain level.\n - `Sustain` is the level at which the level stays constant.\n - `Release` is the time it takes to drop to zero from sustain level.\n\nThere are many variants of ADSR model, additionally, some models have\nthe following properties\n\n - `Hold`: The time the level stays at the top level after attack.\n - non-linear decay/release: The decay and release take non-linear change.\n\n:py:class:`~torchaudio.prototype.functional.adsr_envelope` supports\nhold and polynomial decay.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "freq = torch.full((SAMPLE_RATE, 1), F0)\namp = adsr_envelope(\n SAMPLE_RATE,\n attack=0.2,\n hold=0.2,\n decay=0.2,\n sustain=0.5,\n release=0.2,\n n_decay=1,\n)\namp = amp.unsqueeze(-1)\n\nwaveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\naudio = show(freq, amp, waveform, SAMPLE_RATE)\nax = plt.gcf().axes[1]\nax.annotate(\"Attack\", xy=(0.05, 0.7))\nax.annotate(\"Hold\", xy=(0.28, 0.65))\nax.annotate(\"Decay\", xy=(0.45, 0.5))\nax.annotate(\"Sustain\", xy=(0.65, 0.3))\nax.annotate(\"Release\", xy=(0.88, 0.35))\naudio" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's look into some examples of how ADSR envelope can be used\nto create different sounds.\n\nThe following examples are inspired by\n[this article](https://www.edmprod.com/adsr-envelopes/)_.\n\n\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Drum Beats\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "unit = NUM_FRAMES // 3\nrepeat = 9\n\nfreq = torch.empty((unit * repeat, 2))\nfreq[:, 0] = F0 / 9\nfreq[:, 1] = F0 / 5\n\namp = torch.stack(\n (\n adsr_envelope(unit, attack=0.01, hold=0.125, decay=0.12, sustain=0.05, release=0),\n adsr_envelope(unit, attack=0.01, hold=0.25, decay=0.08, sustain=0, release=0),\n ),\n dim=-1,\n)\namp = amp.repeat(repeat, 1) / 2\n\nbass = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, bass, SAMPLE_RATE, vol=0.5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Pluck\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "tones = [\n 513.74, # do\n 576.65, # re\n 647.27, # mi\n 685.76, # fa\n 769.74, # so\n 685.76, # fa\n 647.27, # mi\n 576.65, # re\n 513.74, # do\n]\n\nfreq = torch.cat([torch.full((unit, 1), tone) for tone in tones], dim=0)\namp = adsr_envelope(unit, attack=0, decay=0.7, sustain=0.28, release=0.29)\namp = amp.repeat(9).unsqueeze(-1)\n\ndoremi = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, doremi, SAMPLE_RATE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Riser\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "env = adsr_envelope(NUM_FRAMES * 6, attack=0.98, decay=0.0, sustain=1, release=0.02)\n\ntones = [\n 484.90, # B4\n 513.74, # C5\n 576.65, # D5\n 1221.88, # D#6/Eb6\n 3661.50, # A#7/Bb7\n 6157.89, # G8\n]\nfreq = torch.stack([f * env for f in tones], dim=-1)\n\namp = env.unsqueeze(-1).expand(freq.shape) / len(tones)\n\nwaveform = oscillator_bank(freq, amp, sample_rate=SAMPLE_RATE)\n\nshow(freq, amp, waveform, SAMPLE_RATE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n\n- https://www.edmprod.com/adsr-envelopes/\n- https://pages.mtu.edu/~suits/notefreq432.html\n- https://alijamieson.co.uk/2021/12/19/forgive-me-lord-for-i-have-synth-a-guide-to-subtractive-synthesis/\n\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.14" } }, "nbformat": 4, "nbformat_minor": 0 }