fast-wavenet.pytorch - A PyTorch implementation of fast-wavenet

  •        873

This repo is currently incomplete, although I do hope to get back to working on this. Notably, I don't have an autoregressive fast forward function. I created a similar repo for bytenet, which is a predecessor to WaveNet. This repo does have an autoregressive forward function.



Related Projects

waveglow - A Flow-based Generative Network for Speech Synthesis

  •    Python

In our recent paper, we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of 2750 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation.

wavenet_vocoder - WaveNet vocoder

  •    Python

The goal of the repository is to provide an implementation of the WaveNet vocoder, which can generate high quality raw speech samples conditioned on linguistic or acoustic features. Audio samples are available at

tensorflow-wavenet - A TensorFlow implementation of DeepMind's WaveNet paper

  •    Python

This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation. The WaveNet neural network architecture directly generates a raw audio waveform, showing excellent results in text-to-speech and general audio generation (see the DeepMind blog post and paper for details).

tacotron2 - Tacotron 2 - PyTorch implementation with faster-than-realtime inference

  •    Jupyter

Tacotron 2 PyTorch implementation of Natural TTS Synthesis By Conditioning Wavenet On Mel Spectrogram Predictions. This implementation includes distributed and fp16 support and uses the LJSpeech dataset.

nv-wavenet - Reference implementation of real-time autoregressive wavenet inference

  •    Cuda

nv-wavenet is a CUDA reference implementation of autoregressive WaveNet inference. In particular, it implements the WaveNet variant described by Deep Voice. nv-wavenet only implements the autoregressive portion of the network; conditioning vectors must be provided externally. More details about the implementation and performance can be found on the NVIDIA Developer Blog. In all three implementations, a single kernel runs inference for potentially many samples.

wavenet - Keras WaveNet implementation

  •    Python

Based on and $ KERAS_BACKEND=theano python2 predict with models/run_20160920_120916/config.json predict_seconds=1~~ EDIT: The pretrained model had to be removed from the repository as it wasn't compatible with recent changes.

music-translation - A UNIVERSAL MUSIC TRANSLATION NETWORK - a method for translating music across musical instruments and styles

  •    Cuda

PyTorch implementation of the method described in the A Universal Music Translation Network. We present a method for translating music across musical instruments and styles. This method is based on unsupervised training of a multi-domain wavenet autoencoder, with a shared encoder and a domain-independent latent space that is trained end-to-end on waveforms.

nnAudio - Audio processing by using pytorch 1D convolution network

  •    Python

nnAudio is an audio processing toolbox using PyTorch convolutional neural network as its backend. By doing so, spectrograms can be generated from audio on-the-fly during neural network training and the Fourier kernels (e.g. or CQT kernels) can be trained. Kapre has a similar concept in which they also use 1D convolutional neural network to extract spectrograms based on Keras.

fairseq-py - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

  •    Python

This is a PyTorch version of fairseq, a sequence-to-sequence learning toolkit from Facebook AI Research. The original authors of this reimplementation are (in no particular order) Sergey Edunov, Myle Ott, and Sam Gross. The toolkit implements the fully convolutional model described in Convolutional Sequence to Sequence Learning and features multi-GPU training on a single machine as well as fast beam search generation on both CPU and GPU. We provide pre-trained models for English to French and English to German translation. Currently fairseq-py requires PyTorch version >= 0.3.0. Please follow the instructions here:

grokking-pytorch - The Hitchiker's Guide to PyTorch


PyTorch is a flexible deep learning framework that allows automatic differentiation through dynamic neural networks (i.e., networks that utilise dynamic control flow like if statements and while loops). It supports GPU acceleration, distributed training, various optimisations, and plenty more neat features. These are some notes on how I think about using PyTorch, and don't encompass all parts of the library or every best practice, but may be helpful to others. Neural networks are a subclass of computation graphs. Computation graphs receive input data, and data is routed to and possibly transformed by nodes which perform processing on the data. In deep learning, the neurons (nodes) in neural networks typically transform data with parameters and differentiable functions, such that the parameters can be optimised to minimise a loss via gradient descent. More broadly, the functions can be stochastic, and the structure of the graph can be dynamic. So while neural networks may be a good fit for dataflow programming, PyTorch's API has instead centred around imperative programming, which is a more common way for thinking about programs. This makes it easier to read code and reason about complex programs, without necessarily sacrificing much performance; PyTorch is actually pretty fast, with plenty of optimisations that you can safely forget about as an end user (but you can dig in if you really want to).

PyTorch-Tutorial - Build your neural network easy and fast

  •    Jupyter

In these tutorials for pyTorch, we will build our first Neural Network and try to build some advanced Neural Network architectures developed recent years. Thanks for liufuyang's notebook files which is a great contribution to this tutorial.

pytorch-segmentation - Pytorch for Segmentation

  •    Python

This repo has been deprecated currently and I will not maintain it. Meanwhile, I strongly recommend you can refer to my new repo: TorchSeg, which offers fast, modular reference implementation and easy training of semantic segmentation algorithms in PyTorch. A repository contains some exiting networks and some experimental networks for semantic segmentation.

open-unmix-pytorch - Open-Unmix - Music Source Separation for PyTorch

  •    Python

This repository contains the PyTorch (1.8+) implementation of Open-Unmix, a deep neural network reference implementation for music source separation, applicable for researchers, audio engineers and artists. Open-Unmix provides ready-to-use models that allow users to separate pop music into four stems: vocals, drums, bass and the remaining other instruments. The models were pre-trained on the freely available MUSDB18 dataset. See details at apply pre-trained model. 03/07/2021: We added umxl, a model that was trained on extra data which significantly improves the performance, especially generalization.

PyTorch-YOLOv3 - Minimal PyTorch implementation of YOLOv3

  •    Python

Minimal implementation of YOLOv3 in PyTorch. Abstract We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that’s pretty swell. It’s a little bigger than last time but more accurate. It’s still fast though, don’t worry. At 320 × 320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 AP50 in 51 ms on a Titan X, compared to 57.5 AP50 in 198 ms by RetinaNet, similar performance but 3.8× faster. As always, all the code is online at

kaolin - A PyTorch Library for Accelerating 3D Deep Learning Research

  •    Python

NVIDIA Kaolin library provides a PyTorch API for working with a variety of 3D representations and includes a growing collection of GPU-optimized operations such as modular differentiable rendering, fast conversions between representations, data loading, 3D checkpoints and more. Kaolin library is part of a larger suite of tools for 3D deep learning research. For example, the Omniverse Kaolin App will allow interactive visualization of 3D checkpoints. To find out more about the Kaolin ecosystem, visit the NVIDIA Kaolin Dev Zone page.

svoice - We provide a PyTorch implementation of the paper Voice Separation with an Unknown Number of Multiple Speakers In which, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously

  •    Python

We provide a PyTorch implementation of our speaker voice separation research work. In Voice Separation with an Unknown Number of Multiple Speakers, we present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers. Please note that this implementation does not contain the "IDloss" as described in the paper. First, install Python 3.7 (recommended with Anaconda).

pytorch-tutorial - PyTorch Tutorial for Deep Learning Researchers

  •    Python

This repository provides tutorial code for deep learning researchers to learn PyTorch. In the tutorial, most of the models were implemented with less than 30 lines of code. Before starting this tutorial, it is recommended to finish Official Pytorch Tutorial.

pytorch-caffe-darknet-convert - convert between pytorch, caffe prototxt/weights and darknet cfg/weights

  •    Python

This repository is specially designed for pytorch-yolo2 to convert pytorch trained model to any platform. It can also be used as a common model converter between pytorch, caffe and darknet. MIT License (see LICENSE file).

pytorch-cpp - Pytorch C++ Library

  •    C++

Pytorch-C++ is a simple C++ 11 library which provides a Pytorch-like interface for building neural networks and inference (so far only forward pass is supported). The library respects the semantics of torch.nn module of PyTorch. Models from pytorch/vision are supported and can be easily converted. We also support all the models from our image segmentation repository (scroll down for the gif with example output of one of our segmentation models). The library heavily relies on an amazing ATen library and was inspired by cunnproduction.

We have large collection of open source products. Follow the tags from Tag Cloud >>

Open source products are scattered around the web. Please provide information about the open source projects you own / you use. Add Projects.