conversational_speech - OpenGrok cross reference for /aosp_15_r20/external/webrtc/modules/audio_processing/test/conversational_speech/

# Conversational Speech generator tool

Tool to generate multiple-end audio tracks to simulate conversational speech
with two or more participants.

The input to the tool is a directory containing a number of audio tracks and
a text file indicating how to time the sequence of speech turns (see the Example
section).

Since the timing of the speaking turns is specified by the user, the generated
tracks may not be suitable for testing scenarios in which there is unpredictable
network delay (e.g., end-to-end RTC assessment).

Instead, the generated pairs can be used when the delay is constant (obviously
including the case in which there is no delay).
For instance, echo cancellation in the APM module can be evaluated using two-end
audio tracks as input and reverse input.

By indicating negative and positive time offsets, one can reproduce cross-talk
(aka double-talk) and silence in the conversation.

### Example

For each end, there is a set of audio tracks, e.g., a1, a2 and a3 (speaker A)
and b1, b2 (speaker B).
The text file with the timing information may look like this:

```
A a1 0
B b1 0
A a2 100
B b2 -200
A a3 0
A a4 0
```

The first column indicates the speaker name, the second contains the audio track
file names, and the third the offsets (in milliseconds) used to concatenate the
chunks. An optional fourth column contains positive or negative integral gains
in dB that will be applied to the tracks. It's possible to specify the gain for
some turns but not for others. If the gain is left out, no gain is applied.

Assume that all the audio tracks in the example above are 1000 ms long.
The tool will then generate two tracks (A and B) that look like this:

**Track A**
```
  a1 (1000 ms)
  silence (1100 ms)
  a2 (1000 ms)
  silence (800 ms)
  a3 (1000 ms)
  a4 (1000 ms)
```

**Track B**
```
  silence (1000 ms)
  b1 (1000 ms)
  silence (900 ms)
  b2 (1000 ms)
  silence (2000 ms)
```

The two tracks can be also visualized as follows (one characheter represents
100 ms, "." is silence and "*" is speech).

```
t: 0         1        2        3        4        5        6 (s)
A: **********...........**********........********************
B: ..........**********.........**********....................
                                ^ 200 ms cross-talk
        100 ms silence ^
```
Name		Date	Size	#Lines	LOC
..		-	-
BUILD.gn	H A D	25-Apr-2025	2.1 KiB	82	76
OWNERS.webrtc	H A D	25-Apr-2025	61	4	3
README.md	H A D	25-Apr-2025	2.3 KiB	75	60
config.cc	H A D	25-Apr-2025	883	32	16
config.h	H A D	25-Apr-2025	1.3 KiB	44	25
generator.cc	H A D	25-Apr-2025	3.1 KiB	90	60
generator_unittest.cc	H A D	25-Apr-2025	24 KiB	676	430
mock_wavreader.cc	H A D	25-Apr-2025	1.1 KiB	35	19
mock_wavreader.h	H A D	25-Apr-2025	1.6 KiB	49	28
mock_wavreader_factory.cc	H A D	25-Apr-2025	2.6 KiB	67	43
mock_wavreader_factory.h	H A D	25-Apr-2025	2 KiB	60	36
multiend_call.cc	H A D	25-Apr-2025	6.8 KiB	194	132
multiend_call.h	H A D	25-Apr-2025	3.4 KiB	105	73
simulator.cc	H A D	25-Apr-2025	9.1 KiB	236	153
simulator.h	H A D	25-Apr-2025	1.5 KiB	45	25
timing.cc	H A D	25-Apr-2025	2.1 KiB	74	50
timing.h	H A D	25-Apr-2025	1.5 KiB	52	31
wavreader_abstract_factory.h	H A D	25-Apr-2025	1.2 KiB	35	18
wavreader_factory.cc	H A D	25-Apr-2025	1.9 KiB	66	38
wavreader_factory.h	H A D	25-Apr-2025	1.2 KiB	37	20
wavreader_interface.h	H A D	25-Apr-2025	1.2 KiB	41	20