README.md
1# clpeak
2
3[](https://app.travis-ci.com/github/krrishnarraj/clpeak)
4[](https://snapcraft.io/clpeak)
5
6A synthetic benchmarking tool to measure peak capabilities of opencl devices. It only measures the peak metrics that can be achieved using vector operations and does not represent a real-world use case
7
8## Building
9
10```console
11git submodule update --init --recursive --remote
12mkdir build
13cd build
14cmake ..
15cmake --build .
16```
17
18## Sample
19
20```text
21Platform: NVIDIA CUDA
22 Device: Tesla V100-SXM2-16GB
23 Driver version : 390.77 (Linux x64)
24 Compute units : 80
25 Clock frequency : 1530 MHz
26
27 Global memory bandwidth (GBPS)
28 float : 767.48
29 float2 : 810.81
30 float4 : 843.06
31 float8 : 726.12
32 float16 : 735.98
33
34 Single-precision compute (GFLOPS)
35 float : 15680.96
36 float2 : 15674.50
37 float4 : 15645.58
38 float8 : 15583.27
39 float16 : 15466.50
40
41 No half precision support! Skipped
42
43 Double-precision compute (GFLOPS)
44 double : 7859.49
45 double2 : 7849.96
46 double4 : 7832.96
47 double8 : 7799.82
48 double16 : 7740.88
49
50 Integer compute (GIOPS)
51 int : 15653.47
52 int2 : 15654.40
53 int4 : 15655.21
54 int8 : 15659.04
55 int16 : 15608.65
56
57 Transfer bandwidth (GBPS)
58 enqueueWriteBuffer : 10.64
59 enqueueReadBuffer : 11.92
60 enqueueMapBuffer(for read) : 9.97
61 memcpy from mapped ptr : 8.62
62 enqueueUnmap(after write) : 11.04
63 memcpy to mapped ptr : 9.16
64
65 Kernel launch latency : 7.22 us
66```
67