xref: /aosp_15_r20/external/pytorch/benchmarks/distributed/rpc/rl/README.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1# Distributed RPC Reinforcement Learning Benchmark
2
3This tool is used to measure `torch.distributed.rpc` throughput and latency for reinforcement learning.
4
5The benchmark spawns one *agent* process and a configurable number of *observer* processes. As this benchmark focuses on RPC throughput and latency, the agent uses a dummy policy and observers all use randomly generated states and rewards. In each iteration, observers pass their state to the agent through `torch.distributed.rpc` and wait for the agent to respond with an action. If `batch=False`, then the agent will process and respond to a single observer request at a time. Otherwise, the agent will accumulate requests from multiple observers and run them through the policy in one shot. There is also a separate *coordinator* process that manages the *agent* and *observers*.
6
7In addition to printing measurements, this benchmark produces a JSON file.  Users may choose a single argument to provide multiple comma-separated entries for (ie: `world_size="10,50,100"`) in which case the JSON file produced can be passed to the plotting repo to visually see how results differ.  In this case, each entry for the variable argument will be placed on the x axis.
8
9The benchmark results comprise of 4 key metrics:
101. _Agent Latency_ - How long does it take from the time the first action request in a batch is received from an observer to the time an action is selected by the agent for each request in that batch.  If `batch=False` you can think of it as `batch_size=1`.
112. _Agent Throughput_ - The number of request processed per second for a given batch.  Agent throughput is literally computed as `(batch_size / agent_latency)`.  If not using batch, you can think of it as `batch_size=1`.
123. _Observer Latency_ - Time it takes from the moment an action is requested by a single observer to the time the response is received from the agent.  Therefore if `batch=False`, observer latency is the agent latency plus the transit time it takes for the request to get to the agent from the observer plus the transit time it takes for the response to get to the observer from the agent.  When `batch=True` there will be more variation due to some observer requests being queued in a batch for longer than others depending on what order those requests came into the batch in.
134. _Observer Throughput_ - Number of requests processed per second for a single observer.  Observer Throughput is literally computed as `(1 / observer_latency)`.
14
15## Requirements
16
17This benchmark depends on PyTorch.
18
19## How to run
20
21For any environments you are interested in, pass the corresponding arguments to `python launcher.py`.
22
23```python launcher.py --world-size="10,20" --master-addr="127.0.0.1" --master-port="29501 --batch="True" --state-size="10-20-10" --nlayers="5" --out-features="10" --output-file-path="benchmark_report.json"```
24
25Example Output:
26
27```
28--------------------------------------------------------------
29PyTorch distributed rpc benchmark reinforcement learning suite
30--------------------------------------------------------------
31master_addr : 127.0.0.1
32master_port : 29501
33batch : True
34state_size : 10-20-10
35nlayers : 5
36out_features : 10
37output_file_path : benchmark_report.json
38x_axis_name : world_size
39world_size | agent latency (seconds)     agent throughput            observer latency (seconds)  observer throughput
40            p50    p75    p90    p95    p50    p75    p90    p95    p50    p75    p90    p95    p50    p75    p90    p95
4110          0.002  0.002  0.002  0.002  4432   4706   4948   5128   0.002  0.003  0.003  0.003  407    422    434    443
4220          0.004  0.005  0.005  0.005  4244   4620   4884   5014   0.005  0.005  0.006  0.006  191    207    215    220
43