1# Distributed RPC Reinforcement Learning Benchmark 2 3This tool is used to measure `torch.distributed.rpc` throughput and latency for reinforcement learning. 4 5The benchmark spawns one *agent* process and a configurable number of *observer* processes. As this benchmark focuses on RPC throughput and latency, the agent uses a dummy policy and observers all use randomly generated states and rewards. In each iteration, observers pass their state to the agent through `torch.distributed.rpc` and wait for the agent to respond with an action. If `batch=False`, then the agent will process and respond to a single observer request at a time. Otherwise, the agent will accumulate requests from multiple observers and run them through the policy in one shot. There is also a separate *coordinator* process that manages the *agent* and *observers*. 6 7In addition to printing measurements, this benchmark produces a JSON file. Users may choose a single argument to provide multiple comma-separated entries for (ie: `world_size="10,50,100"`) in which case the JSON file produced can be passed to the plotting repo to visually see how results differ. In this case, each entry for the variable argument will be placed on the x axis. 8 9The benchmark results comprise of 4 key metrics: 101. _Agent Latency_ - How long does it take from the time the first action request in a batch is received from an observer to the time an action is selected by the agent for each request in that batch. If `batch=False` you can think of it as `batch_size=1`. 112. _Agent Throughput_ - The number of request processed per second for a given batch. Agent throughput is literally computed as `(batch_size / agent_latency)`. If not using batch, you can think of it as `batch_size=1`. 123. _Observer Latency_ - Time it takes from the moment an action is requested by a single observer to the time the response is received from the agent. Therefore if `batch=False`, observer latency is the agent latency plus the transit time it takes for the request to get to the agent from the observer plus the transit time it takes for the response to get to the observer from the agent. When `batch=True` there will be more variation due to some observer requests being queued in a batch for longer than others depending on what order those requests came into the batch in. 134. _Observer Throughput_ - Number of requests processed per second for a single observer. Observer Throughput is literally computed as `(1 / observer_latency)`. 14 15## Requirements 16 17This benchmark depends on PyTorch. 18 19## How to run 20 21For any environments you are interested in, pass the corresponding arguments to `python launcher.py`. 22 23```python launcher.py --world-size="10,20" --master-addr="127.0.0.1" --master-port="29501 --batch="True" --state-size="10-20-10" --nlayers="5" --out-features="10" --output-file-path="benchmark_report.json"``` 24 25Example Output: 26 27``` 28-------------------------------------------------------------- 29PyTorch distributed rpc benchmark reinforcement learning suite 30-------------------------------------------------------------- 31master_addr : 127.0.0.1 32master_port : 29501 33batch : True 34state_size : 10-20-10 35nlayers : 5 36out_features : 10 37output_file_path : benchmark_report.json 38x_axis_name : world_size 39world_size | agent latency (seconds) agent throughput observer latency (seconds) observer throughput 40 p50 p75 p90 p95 p50 p75 p90 p95 p50 p75 p90 p95 p50 p75 p90 p95 4110 0.002 0.002 0.002 0.002 4432 4706 4948 5128 0.002 0.003 0.003 0.003 407 422 434 443 4220 0.004 0.005 0.005 0.005 4244 4620 4884 5014 0.005 0.005 0.006 0.006 191 207 215 220 43