Name Date Size #Lines LOC

..--

arm/H25-Apr-2025-272120

torch/H25-Apr-2025-25,10918,209

training_tf2/H25-Apr-2025-4,4593,302

x86/H25-Apr-2025-329146

LPCNet.ymlH A D25-Apr-2025335 2511

READMEH A D25-Apr-202514 21

README.mdH A D25-Apr-20256.8 KiB12799

adaconvtest.cH A D25-Apr-202511.9 KiB449389

burg.cH A D25-Apr-20259.1 KiB247169

burg.hH A D25-Apr-20252.4 KiB4211

common.hH A D25-Apr-20251 KiB5746

datasets.txtH A D25-Apr-20258.8 KiB174162

download_model.batH A D25-Apr-2025258 107

download_model.shH A D25-Apr-2025185 117

dred_coding.cH A D25-Apr-20251.7 KiB4514

dred_coding.hH A D25-Apr-20251.5 KiB376

dred_config.hH A D25-Apr-20252.1 KiB5516

dred_decoder.cH A D25-Apr-20254.5 KiB13082

dred_decoder.hH A D25-Apr-20251.9 KiB5017

dred_encoder.cH A D25-Apr-202513.4 KiB364291

dred_encoder.hH A D25-Apr-20252.6 KiB7235

dred_rdovae.hH A D25-Apr-20251.6 KiB439

dred_rdovae_dec.cH A D25-Apr-20256.6 KiB14098

dred_rdovae_dec.hH A D25-Apr-20252.3 KiB5422

dred_rdovae_enc.cH A D25-Apr-20255.5 KiB11168

dred_rdovae_enc.hH A D25-Apr-20252.1 KiB5319

dump_data.cH A D25-Apr-20258.7 KiB281233

dump_lpcnet_tables.cH A D25-Apr-20253.7 KiB10562

fargan.cH A D25-Apr-20259.3 KiB226167

fargan.hH A D25-Apr-20252.6 KiB6933

freq.cH A D25-Apr-20258.7 KiB329267

freq.hH A D25-Apr-20252.2 KiB6225

fwgan.cH A D25-Apr-202512.7 KiB323238

fwgan.hH A D25-Apr-20252.9 KiB8447

kiss99.cH A D25-Apr-20253 KiB8243

kiss99.hH A D25-Apr-20251.8 KiB4713

lossgen.cH A D25-Apr-20256 KiB197123

lossgen.hH A D25-Apr-20251.8 KiB5619

lossgen_demo.cH A D25-Apr-2025524 2321

lpcnet.cH A D25-Apr-20259.7 KiB284231

lpcnet.hH A D25-Apr-20257 KiB18438

lpcnet_demo.cH A D25-Apr-20256.7 KiB218177

lpcnet_enc.cH A D25-Apr-20258.4 KiB231176

lpcnet_plc.cH A D25-Apr-20257.7 KiB212166

lpcnet_private.hH A D25-Apr-20252.5 KiB9172

lpcnet_tables.cH A D25-Apr-202518.6 KiB308299

meson.buildH A D25-Apr-20251.6 KiB6551

nndsp.cH A D25-Apr-202514.7 KiB417327

nndsp.hH A D25-Apr-20254 KiB14494

nnet.cH A D25-Apr-20255.6 KiB150107

nnet.hH A D25-Apr-20255.4 KiB164107

nnet_arch.hH A D25-Apr-20259 KiB248187

nnet_default.cH A D25-Apr-20251.4 KiB365

osce.cH A D25-Apr-202544.3 KiB1,4201,161

osce.hH A D25-Apr-20252.9 KiB8541

osce_config.hH A D25-Apr-20251.9 KiB6121

osce_features.cH A D25-Apr-202519.6 KiB455338

osce_features.hH A D25-Apr-20252.4 KiB5116

osce_structs.hH A D25-Apr-20253.4 KiB12680

parse_lpcnet_weights.cH A D25-Apr-20257.4 KiB239196

pitchdnn.cH A D25-Apr-20252.4 KiB8068

pitchdnn.hH A D25-Apr-2025710 3523

tansig_table.hH A D25-Apr-20252.3 KiB5146

test_vec.cH A D25-Apr-20252.7 KiB129106

vec.hH A D25-Apr-202510.9 KiB390337

vec_avx.hH A D25-Apr-202524.5 KiB885722

vec_neon.hH A D25-Apr-202513.2 KiB474380

write_lpcnet_weights.cH A D25-Apr-20253.2 KiB9865

README

1See README.md
2

README.md

1# LPCNet
2
3Low complexity implementation of the WaveRNN-based LPCNet algorithm, as described in:
4
5- J.-M. Valin, J. Skoglund, [LPCNet: Improving Neural Speech Synthesis Through Linear Prediction](https://jmvalin.ca/papers/lpcnet_icassp2019.pdf), *Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP)*, arXiv:1810.11846, 2019.
6- J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet](https://jmvalin.ca/papers/improved_lpcnet.pdf), *Proc. ICASSP*, arxiv:2106.04129, 2022.
7- K. Subramani, J.-M. Valin, U. Isik, P. Smaragdis, A. Krishnaswamy, [End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation](https://jmvalin.ca/papers/lpcnet_end2end.pdf), *Proc. INTERSPEECH*, arxiv:2106.04129, 2022.
8
9For coding/PLC applications of LPCNet, see:
10
11- J.-M. Valin, J. Skoglund, [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://jmvalin.ca/papers/lpcnet_codec.pdf), *Proc. INTERSPEECH*, arxiv:1903.12087, 2019.
12- J. Skoglund, J.-M. Valin, [Improving Opus Low Bit Rate Quality with Neural Speech Synthesis](https://jmvalin.ca/papers/opusnet.pdf), *Proc. INTERSPEECH*, arxiv:1905.04628, 2020.
13- J.-M. Valin, A. Mustafa, C. Montgomery, T.B. Terriberry, M. Klingbeil, P. Smaragdis, A. Krishnaswamy, [Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model](https://jmvalin.ca/papers/lpcnet_plc.pdf), *Proc. INTERSPEECH*, arxiv:2205.05785, 2022.
14- J.-M. Valin, J. Büthe, A. Mustafa, [Low-Bitrate Redundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder](https://jmvalin.ca/papers/valin_dred.pdf), *Proc. ICASSP*, arXiv:2212.04453, 2023. ([blog post](https://www.amazon.science/blog/neural-encoding-enables-more-efficient-recovery-of-lost-audio-packets))
15
16# Introduction
17
18Work in progress software for researching low CPU complexity algorithms for speech synthesis and compression by applying Linear Prediction techniques to WaveRNN. High quality speech can be synthesised on regular CPUs (around 3 GFLOP) with SIMD support (SSE2, SSSE3, AVX, AVX2/FMA, NEON currently supported). The code also supports very low bitrate compression at 1.6 kb/s.
19
20The BSD licensed software is written in C and Python/Keras. For training, a GTX 1080 Ti or better is recommended.
21
22This software is an open source starting point for LPCNet/WaveRNN-based speech synthesis and coding.
23
24# Using the existing software
25
26You can build the code using:
27
28```
29./autogen.sh
30./configure
31make
32```
33Note that the autogen.sh script is used when building from Git and will automatically download the latest model
34(models are too large to put in Git). By default, LPCNet will attempt to use 8-bit dot product instructions on AVX\*/Neon to
35speed up inference. To disable that (e.g. to avoid quantization effects when retraining), add --disable-dot-product to the
36configure script. LPCNet does not yet have a complete implementation for some of the integer operations on the ARMv7
37architecture so for now you will also need --disable-dot-product to successfully compile on 32-bit ARM.
38
39It is highly recommended to set the CFLAGS environment variable to enable AVX or NEON *prior* to running configure, otherwise
40no vectorization will take place and the code will be very slow. On a recent x86 CPU, something like
41```
42export CFLAGS='-Ofast -g -march=native'
43```
44should work. On ARM, you can enable Neon with:
45```
46export CFLAGS='-Ofast -g -mfpu=neon'
47```
48While not strictly required, the -Ofast flag will help with auto-vectorization, especially for dot products that
49cannot be optimized without -ffast-math (which -Ofast enables). Additionally, -falign-loops=32 has been shown to
50help on x86.
51
52You can test the capabilities of LPCNet using the lpcnet\_demo application. To encode a file:
53```
54./lpcnet_demo -encode input.pcm compressed.bin
55```
56where input.pcm is a 16-bit (machine endian) PCM file sampled at 16 kHz. The raw compressed data (no header)
57is written to compressed.bin and consists of 8 bytes per 40-ms packet.
58
59To decode:
60```
61./lpcnet_demo -decode compressed.bin output.pcm
62```
63where output.pcm is also 16-bit, 16 kHz PCM.
64
65Alternatively, you can run the uncompressed analysis/synthesis using -features
66instead of -encode and -synthesis instead of -decode.
67The same functionality is available in the form of a library. See include/lpcnet.h for the API.
68
69To try packet loss concealment (PLC), you first need a PLC model, which you can get with:
70```
71./download_model.sh plc-3b1eab4
72```
73or (for the PLC challenge submission):
74```
75./download_model.sh plc_challenge
76```
77PLC can be tested with:
78```
79./lpcnet_demo -plc_file noncausal_dc error_pattern.txt input.pcm output.pcm
80```
81where error_pattern.txt is a text file with one entry per 20-ms packet, with 1 meaning "packet lost" and 0 meaning "packet not lost".
82noncausal_dc is the non-causal (5-ms look-ahead) with special handling for DC offsets. It's also possible to use "noncausal", "causal",
83or "causal_dc".
84
85# Training a new model
86
87This codebase is also meant for research and it is possible to train new models. These are the steps to do that:
88
891. Set up a Keras system with GPU.
90
911. Generate training data:
92   ```
93   ./dump_data -train input.s16 features.f32 data.s16
94   ```
95   where the first file contains 16 kHz 16-bit raw PCM audio (no header) and the other files are output files. This program makes several passes over the data with different filters to generate a large amount of training data.
96
971. Now that you have your files, train with:
98   ```
99   python3 training_tf2/train_lpcnet.py features.f32 data.s16 model_name
100   ```
101   and it will generate an h5 file for each iteration, with model\_name as prefix. If it stops with a
102   "Failed to allocate RNN reserve space" message try specifying a smaller --batch-size for  train\_lpcnet.py.
103
1041. You can synthesise speech with Python and your GPU card (very slow):
105   ```
106   ./dump_data -test test_input.s16 test_features.f32
107   ./training_tf2/test_lpcnet.py lpcnet_model_name.h5 test_features.f32 test.s16
108   ```
109
1101. Or with C on a CPU (C inference is much faster):
111   First extract the model files nnet\_data.h and nnet\_data.c
112   ```
113   ./training_tf2/dump_lpcnet.py lpcnet_model_name.h5
114   ```
115   and move the generated nnet\_data.\* files to the src/ directory.
116   Then you just need to rebuild the software and use lpcnet\_demo as explained above.
117
118# Speech Material for Training
119
120Suitable training material can be obtained from [Open Speech and Language Resources](https://www.openslr.org/).  See the datasets.txt file for details on suitable training data.
121
122# Reading Further
123
1241. [LPCNet: DSP-Boosted Neural Speech Synthesis](https://people.xiph.org/~jm/demo/lpcnet/)
1251. [A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet](https://people.xiph.org/~jm/demo/lpcnet_codec/)
1261. Sample model files (check compatibility): https://media.xiph.org/lpcnet/data/
127