{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Audio Resampling\n\n**Author**: [Caroline Chen](carolinechen@meta.com)_, [Moto Hira](moto@meta.com)_\n\nThis tutorial shows how to use torchaudio's resampling API.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import torch\nimport torchaudio\nimport torchaudio.functional as F\nimport torchaudio.transforms as T\n\nprint(torch.__version__)\nprint(torchaudio.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Preparation\n\nFirst, we import the modules and define the helper functions.\n\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import math\nimport timeit\n\nimport librosa\nimport matplotlib.colors as mcolors\nimport matplotlib.pyplot as plt\nimport pandas as pd\nimport resampy\nfrom IPython.display import Audio\n\npd.set_option(\"display.max_rows\", None)\npd.set_option(\"display.max_columns\", None)\n\nDEFAULT_OFFSET = 201\n\n\ndef _get_log_freq(sample_rate, max_sweep_rate, offset):\n \"\"\"Get freqs evenly spaced out in log-scale, between [0, max_sweep_rate // 2]\n\n offset is used to avoid negative infinity `log(offset + x)`.\n\n \"\"\"\n start, stop = math.log(offset), math.log(offset + max_sweep_rate // 2)\n return torch.exp(torch.linspace(start, stop, sample_rate, dtype=torch.double)) - offset\n\n\ndef _get_inverse_log_freq(freq, sample_rate, offset):\n \"\"\"Find the time where the given frequency is given by _get_log_freq\"\"\"\n half = sample_rate // 2\n return sample_rate * (math.log(1 + freq / offset) / math.log(1 + half / offset))\n\n\ndef _get_freq_ticks(sample_rate, offset, f_max):\n # Given the original sample rate used for generating the sweep,\n # find the x-axis value where the log-scale major frequency values fall in\n times, freq = [], []\n for exp in range(2, 5):\n for v in range(1, 10):\n f = v * 10**exp\n if f < sample_rate // 2:\n t = _get_inverse_log_freq(f, sample_rate, offset) / sample_rate\n times.append(t)\n freq.append(f)\n t_max = _get_inverse_log_freq(f_max, sample_rate, offset) / sample_rate\n times.append(t_max)\n freq.append(f_max)\n return times, freq\n\n\ndef get_sine_sweep(sample_rate, offset=DEFAULT_OFFSET):\n max_sweep_rate = sample_rate\n freq = _get_log_freq(sample_rate, max_sweep_rate, offset)\n delta = 2 * math.pi * freq / sample_rate\n cummulative = torch.cumsum(delta, dim=0)\n signal = torch.sin(cummulative).unsqueeze(dim=0)\n return signal\n\n\ndef plot_sweep(\n waveform,\n sample_rate,\n title,\n max_sweep_rate=48000,\n offset=DEFAULT_OFFSET,\n):\n x_ticks = [100, 500, 1000, 5000, 10000, 20000, max_sweep_rate // 2]\n y_ticks = [1000, 5000, 10000, 20000, sample_rate // 2]\n\n time, freq = _get_freq_ticks(max_sweep_rate, offset, sample_rate // 2)\n freq_x = [f if f in x_ticks and f <= max_sweep_rate // 2 else None for f in freq]\n freq_y = [f for f in freq if f in y_ticks and 1000 <= f <= sample_rate // 2]\n\n figure, axis = plt.subplots(1, 1)\n _, _, _, cax = axis.specgram(waveform[0].numpy(), Fs=sample_rate)\n plt.xticks(time, freq_x)\n plt.yticks(freq_y, freq_y)\n axis.set_xlabel(\"Original Signal Frequency (Hz, log scale)\")\n axis.set_ylabel(\"Waveform Frequency (Hz)\")\n axis.xaxis.grid(True, alpha=0.67)\n axis.yaxis.grid(True, alpha=0.67)\n figure.suptitle(f\"{title} (sample rate: {sample_rate} Hz)\")\n plt.colorbar(cax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Resampling Overview\n\nTo resample an audio waveform from one freqeuncy to another, you can use\n:py:class:`torchaudio.transforms.Resample` or\n:py:func:`torchaudio.functional.resample`.\n``transforms.Resample`` precomputes and caches the kernel used for resampling,\nwhile ``functional.resample`` computes it on the fly, so using\n``torchaudio.transforms.Resample`` will result in a speedup when resampling\nmultiple waveforms using the same parameters (see Benchmarking section).\n\nBoth resampling methods use [bandlimited sinc\ninterpolation](https://ccrma.stanford.edu/~jos/resample/)_ to compute\nsignal values at arbitrary time steps. The implementation involves\nconvolution, so we can take advantage of GPU / multithreading for\nperformance improvements.\n\n
When using resampling in multiple subprocesses, such as data loading\n with multiple worker processes, your application might create more\n threads than your system can handle efficiently.\n Setting ``torch.set_num_threads(1)`` might help in this case.