{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n# Audio Feature Extractions\n\n**Author**: [Moto Hira](moto@meta.com)_\n\n``torchaudio`` implements feature extractions commonly used in the audio\ndomain. They are available in ``torchaudio.functional`` and\n``torchaudio.transforms``.\n\n``functional`` implements features as standalone functions.\nThey are stateless.\n\n``transforms`` implements features as objects,\nusing implementations from ``functional`` and ``torch.nn.Module``.\nThey can be serialized using TorchScript.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import torch\nimport torchaudio\nimport torchaudio.functional as F\nimport torchaudio.transforms as T\n\nprint(torch.__version__)\nprint(torchaudio.__version__)\n\nimport librosa\nimport matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview of audio features\n\nThe following diagram shows the relationship between common audio features\nand torchaudio APIs to generate them.\n\n\n\nFor the complete list of available features, please refer to the\ndocumentation.\n\n\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Preparation\n\n
When running this tutorial in Google Colab, install the required packages\n\n .. code::\n\n !pip install librosa
``hop_length`` determines the time axis resolution.\n By default, (i.e. ``hop_length=None`` and ``win_length=None``),\n the value of ``n_fft // 4`` is used.\n Here we use the same ``hop_length`` value across different ``n_fft``\n so that they have the same number of elemets in the time axis.