1*a58d3d2aSXin Li# LPCNet 2*a58d3d2aSXin Li 3*a58d3d2aSXin LiLow complexity implementation of the WaveRNN-based LPCNet algorithm, as described in: 4*a58d3d2aSXin Li 5*a58d3d2aSXin Li- J.-M. Valin, J. Skoglund, [LPCNet: Improving Neural Speech Synthesis Through Linear Prediction](https://jmvalin.ca/papers/lpcnet_icassp2019.pdf), *Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, arXiv:1810.11846, 2019. 6*a58d3d2aSXin Li- J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet](https://jmvalin.ca/papers/improved_lpcnet.pdf), *Proc. ICASSP*, arxiv:2106.04129, 2022. 7*a58d3d2aSXin Li- K. Subramani, J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation](https://jmvalin.ca/papers/lpcnet_end2end.pdf), *Proc. INTERSPEECH*, arxiv:2106.04129, 2022. 8*a58d3d2aSXin Li 9*a58d3d2aSXin LiFor coding/PLC applications of LPCNet, see: 10*a58d3d2aSXin Li 11*a58d3d2aSXin Li- J.-M. Valin, J. Skoglund, [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://jmvalin.ca/papers/lpcnet_codec.pdf), *Proc. INTERSPEECH*, arxiv:1903.12087, 2019. 12*a58d3d2aSXin Li- J. Skoglund, J.-M. Valin, [Improving Opus Low Bit Rate Quality with Neural Speech Synthesis](https://jmvalin.ca/papers/opusnet.pdf), *Proc. INTERSPEECH*, arxiv:1905.04628, 2020. 13*a58d3d2aSXin Li- J.-M. Valin, A. Mustafa, C. Montgomery, T.B. Terriberry, M. Klingbeil, P. Smaragdis, A. Krishnaswamy, [Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model](https://jmvalin.ca/papers/lpcnet_plc.pdf), *Proc. INTERSPEECH*, arxiv:2205.05785, 2022. 14*a58d3d2aSXin Li- J.-M. Valin, J. Büthe, A. Mustafa, [Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder](https://jmvalin.ca/papers/valin_dred.pdf), *Proc. ICASSP*, arXiv:2212.04453, 2023. ([blog post](https://www.amazon.science/blog/neural-encoding-enables-more-efficient-recovery-of-lost-audio-packets)) 15*a58d3d2aSXin Li 16*a58d3d2aSXin Li# Introduction 17*a58d3d2aSXin Li 18*a58d3d2aSXin LiWork in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (SSE2, SSSE3, AVX, AVX2/FMA, NEON currently supported). The code also supports very low bitrate compression at 1.6 kb/s. 19*a58d3d2aSXin Li 20*a58d3d2aSXin LiThe BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended. 21*a58d3d2aSXin Li 22*a58d3d2aSXin LiThis software is an open source starting point for LPCNet/WaveRNN-based speech synthesis and coding. 23*a58d3d2aSXin Li 24*a58d3d2aSXin Li# Using the existing software 25*a58d3d2aSXin Li 26*a58d3d2aSXin LiYou can build the code using: 27*a58d3d2aSXin Li 28*a58d3d2aSXin Li``` 29*a58d3d2aSXin Li./autogen.sh 30*a58d3d2aSXin Li./configure 31*a58d3d2aSXin Limake 32*a58d3d2aSXin Li``` 33*a58d3d2aSXin LiNote that the autogen.sh script is used when building from Git and will automatically download the latest model 34*a58d3d2aSXin Li(models are too large to put in Git). By default, LPCNet will attempt to use 8-bit dot product instructions on AVX\*/Neon to 35*a58d3d2aSXin Lispeed up inference. To disable that (e.g. to avoid quantization effects when retraining), add --disable-dot-product to the 36*a58d3d2aSXin Liconfigure script. LPCNet does not yet have a complete implementation for some of the integer operations on the ARMv7 37*a58d3d2aSXin Liarchitecture so for now you will also need --disable-dot-product to successfully compile on 32-bit ARM. 38*a58d3d2aSXin Li 39*a58d3d2aSXin LiIt is highly recommended to set the CFLAGS environment variable to enable AVX or NEON *prior* to running configure, otherwise 40*a58d3d2aSXin Lino vectorization will take place and the code will be very slow. On a recent x86 CPU, something like 41*a58d3d2aSXin Li``` 42*a58d3d2aSXin Liexport CFLAGS='-Ofast -g -march=native' 43*a58d3d2aSXin Li``` 44*a58d3d2aSXin Lishould work. On ARM, you can enable Neon with: 45*a58d3d2aSXin Li``` 46*a58d3d2aSXin Liexport CFLAGS='-Ofast -g -mfpu=neon' 47*a58d3d2aSXin Li``` 48*a58d3d2aSXin LiWhile not strictly required, the -Ofast flag will help with auto-vectorization, especially for dot products that 49*a58d3d2aSXin Licannot be optimized without -ffast-math (which -Ofast enables). Additionally, -falign-loops=32 has been shown to 50*a58d3d2aSXin Lihelp on x86. 51*a58d3d2aSXin Li 52*a58d3d2aSXin LiYou can test the capabilities of LPCNet using the lpcnet\_demo application. To encode a file: 53*a58d3d2aSXin Li``` 54*a58d3d2aSXin Li./lpcnet_demo -encode input.pcm compressed.bin 55*a58d3d2aSXin Li``` 56*a58d3d2aSXin Liwhere input.pcm is a 16-bit (machine endian) PCM file sampled at 16 kHz. The raw compressed data (no header) 57*a58d3d2aSXin Liis written to compressed.bin and consists of 8 bytes per 40-ms packet. 58*a58d3d2aSXin Li 59*a58d3d2aSXin LiTo decode: 60*a58d3d2aSXin Li``` 61*a58d3d2aSXin Li./lpcnet_demo -decode compressed.bin output.pcm 62*a58d3d2aSXin Li``` 63*a58d3d2aSXin Liwhere output.pcm is also 16-bit, 16 kHz PCM. 64*a58d3d2aSXin Li 65*a58d3d2aSXin LiAlternatively, you can run the uncompressed analysis/synthesis using -features 66*a58d3d2aSXin Liinstead of -encode and -synthesis instead of -decode. 67*a58d3d2aSXin LiThe same functionality is available in the form of a library. See include/lpcnet.h for the API. 68*a58d3d2aSXin Li 69*a58d3d2aSXin LiTo try packet loss concealment (PLC), you first need a PLC model, which you can get with: 70*a58d3d2aSXin Li``` 71*a58d3d2aSXin Li./download_model.sh plc-3b1eab4 72*a58d3d2aSXin Li``` 73*a58d3d2aSXin Lior (for the PLC challenge submission): 74*a58d3d2aSXin Li``` 75*a58d3d2aSXin Li./download_model.sh plc_challenge 76*a58d3d2aSXin Li``` 77*a58d3d2aSXin LiPLC can be tested with: 78*a58d3d2aSXin Li``` 79*a58d3d2aSXin Li./lpcnet_demo -plc_file noncausal_dc error_pattern.txt input.pcm output.pcm 80*a58d3d2aSXin Li``` 81*a58d3d2aSXin Liwhere error_pattern.txt is a text file with one entry per 20-ms packet, with 1 meaning "packet lost" and 0 meaning "packet not lost". 82*a58d3d2aSXin Linoncausal_dc is the non-causal (5-ms look-ahead) with special handling for DC offsets. It's also possible to use "noncausal", "causal", 83*a58d3d2aSXin Lior "causal_dc". 84*a58d3d2aSXin Li 85*a58d3d2aSXin Li# Training a new model 86*a58d3d2aSXin Li 87*a58d3d2aSXin LiThis codebase is also meant for research and it is possible to train new models. These are the steps to do that: 88*a58d3d2aSXin Li 89*a58d3d2aSXin Li1. Set up a Keras system with GPU. 90*a58d3d2aSXin Li 91*a58d3d2aSXin Li1. Generate training data: 92*a58d3d2aSXin Li ``` 93*a58d3d2aSXin Li ./dump_data -train input.s16 features.f32 data.s16 94*a58d3d2aSXin Li ``` 95*a58d3d2aSXin Li where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data. 96*a58d3d2aSXin Li 97*a58d3d2aSXin Li1. Now that you have your files, train with: 98*a58d3d2aSXin Li ``` 99*a58d3d2aSXin Li python3 training_tf2/train_lpcnet.py features.f32 data.s16 model_name 100*a58d3d2aSXin Li ``` 101*a58d3d2aSXin Li and it will generate an h5 file for each iteration, with model\_name as prefix. If it stops with a 102*a58d3d2aSXin Li "Failed to allocate RNN reserve space" message try specifying a smaller --batch-size for train\_lpcnet.py. 103*a58d3d2aSXin Li 104*a58d3d2aSXin Li1. You can synthesise speech with Python and your GPU card (very slow): 105*a58d3d2aSXin Li ``` 106*a58d3d2aSXin Li ./dump_data -test test_input.s16 test_features.f32 107*a58d3d2aSXin Li ./training_tf2/test_lpcnet.py lpcnet_model_name.h5 test_features.f32 test.s16 108*a58d3d2aSXin Li ``` 109*a58d3d2aSXin Li 110*a58d3d2aSXin Li1. Or with C on a CPU (C inference is much faster): 111*a58d3d2aSXin Li First extract the model files nnet\_data.h and nnet\_data.c 112*a58d3d2aSXin Li ``` 113*a58d3d2aSXin Li ./training_tf2/dump_lpcnet.py lpcnet_model_name.h5 114*a58d3d2aSXin Li ``` 115*a58d3d2aSXin Li and move the generated nnet\_data.\* files to the src/ directory. 116*a58d3d2aSXin Li Then you just need to rebuild the software and use lpcnet\_demo as explained above. 117*a58d3d2aSXin Li 118*a58d3d2aSXin Li# Speech Material for Training 119*a58d3d2aSXin Li 120*a58d3d2aSXin LiSuitable training material can be obtained from [Open Speech and Language Resources](https://www.openslr.org/). See the datasets.txt file for details on suitable training data. 121*a58d3d2aSXin Li 122*a58d3d2aSXin Li# Reading Further 123*a58d3d2aSXin Li 124*a58d3d2aSXin Li1. [LPCNet: DSP-Boosted Neural Speech Synthesis](https://people.xiph.org/~jm/demo/lpcnet/) 125*a58d3d2aSXin Li1. [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://people.xiph.org/~jm/demo/lpcnet_codec/) 126*a58d3d2aSXin Li1. Sample model files (check compatibility): https://media.xiph.org/lpcnet/data/ 127