Deskside box with lotsa GPUs
By joe
- 2 minutes read - 399 wordsTesting this for a partner. A Pegasus deskside supercomputer with 12x X5690 CPU cores, 48 GB RAM, 500 MB/s IO channel (soon to 1 GB/s), and a GTX 260 graphics card. Connected to an XCT a-Brix 2U unit with 4x NVidia Fermi C2050’s (normally we’d use a JackRabbit unit, but they are all busy with customer projects right now). First, lets see whats there:
[root@pegasus C]# lspci | grep nVidia | grep VGA
06:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
0b:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
84:00.0 VGA compatible controller: nVidia Corporation GT200 [GeForce GTX 260] (rev a1)
89:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
8e:00.0 VGA compatible controller: nVidia Corporation Unknown device 06d1 (rev a3)
Ahhh …. nice! And yes, you can order units like this now from the day job. Now lets have a little fun
[root@pegasus C]# bin/linux/release/MonteCarloMultiGPU
main(): generating input data...
main(): starting 5 host threads...
main(): waiting for GPU results...
main(): GPU statistics
GPU #0
Options : 52
Simulation paths: 262144
GPU #1
Options : 51
Simulation paths: 262144
GPU #2
Options : 51
Simulation paths: 262144
GPU #3
Options : 51
Simulation paths: 262144
GPU #4
Options : 51
Simulation paths: 262144
Total time (ms.): 0.073000
Options per sec.: 3506849.366609
main(): comparing Monte Carlo and Black-Scholes results...
L1 norm : 2.995473E-06
Average reserve: 382.126091
PASSED
Yeah … baby! Even ran on the less capable GTX260. Try some random numbers
[root@pegasus C]# bin/linux/release/MersenneTwister
bin/linux/release/MersenneTwister Starting...
Initializing data for 24000000 samples...
Loading CPU and GPU twisters configurations...
Generating random numbers on GPU...
MersenneTwister, Throughput = 2.4075 GNumbers/s, Time = 0.00997 s, Size = 24002560 Numbers, NumDevsUsed = 1, Workgroup = 128
Reading back the results...
Checking GPU results...
...generating random numbers on CPU using reference generator
...applying Box-Muller transformation on CPU
...comparing the results
Max absolute error: 2.324581E-06
L1 norm: 1.713886E-07
PASSED
Not bad, but used only one device.
[root@pegasus C]# bin/linux/release/simpleMultiGPU
CUDA-capable device count: 5
Generating input data...
Computing with 5 GPU's...
GPU Processing time: 29965.884766 (ms)
Computing with Host CPU...
Comparing GPU and Host CPU results...
GPU sum: 16777304.000000
CPU sum: 16777294.395033
Relative difference: 5.724980E-07
PASSED
Been meaning to get a set of more invasive tests going on these units. Will see if I can get my Riemann code ported this weekend. Then try a few other things as well.