Are GPU tasks less power efficient than CPU tasks for this project?


Message boards : Number crunching : Are GPU tasks less power efficient than CPU tasks for this project?

Message board moderation

To post messages, you must log in.
AuthorMessage
WuJJ

Send message
Joined: 5 Aug 16
Posts: 2
Credit: 34,873,655
RAC: 38,878
Message 7650 - Posted: 31 Dec 2022, 21:32:16 UTC
I just recently started crunching this project and I found it a bit puzzling that GPU tasks seem to be far less efficient.

Taking two examples, which is fairly typical from my system:
GPU: https://asteroidsathome.net/boinc/result.php?resultid=351295608
CPU: https://asteroidsathome.net/boinc/result.php?resultid=349927002

Both have 1380023.0000 GFLOPs (rsc_fpops_est) from what I see, generally should mean they are solving similar sized problems unless the number is bogus. The similar credits (hopefully not bogus either) also lend more evidence to that, though I am not sure if it's a direct result of same rsc_fpops_est.

The peak GFLOPs indeed reflected the power of GPU relative to a single CPU core, hundreds of times better. However, the runtime of the GPU task is not dramatically shorter. I did monitor the load a bit and SM was 100% busy, so it's not like the workload was badly optimized, or frequently blocked on PCIE transfer, etc.

If the assumption of both tasks are of the same size is correct, this would put the GPU dramatically less power efficient than the CPU. The GPU was consuming 100W, but only cutting runtime by half. While my CPU was consuming 200W, it has 32 such threads to spare. Sure my GTX 1080 is quite older compared to my new CPU, but I don't expect it would be beaten by CPU this hard. After all, GPUs dedicate so much area for compute, and has a much higher FP throughput even as shown by the task stats.

Any idea if they tasks are of the same size? If they are actually very different and rsc_fpops_est is bogus, what's the ratio between the compute required to solve each?
ID: 7650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 16 Nov 22
Posts: 131
Credit: 144,196,377
RAC: 485,923
Message 7651 - Posted: 1 Jan 2023, 0:16:26 UTC - in response to Message 7650.  
You can ignore the GFLOPS for the gpu as BOINC has never correctly computed and displayed it before.

Since that part of the code hasn't changed since BOINC first enabled gpu usage, the algorithm for computing GFLOPS equated the cards of that time to be the same as the cpus of that time.

The 1080 is a good card for its period. Your 7950X is the current cutting edge. I don't think the gpu application is very well optimized with poor parallelization even if it shows the card at 100% utilization.

My 3080's at 2X aren't that significantly faster compared to your 1080 doing them in ~ 180-200 seconds compared to your 1080's (assumed 1X) 900 seconds.

I'm sure if I can persuade my team wizard developer to look at the application he could wrangle much better performance out of the code. He has performed similar wizardry on the Seti and Einstein applications.

If you are concerned about energy efficiency, then I believe your analysis is correct. I would compare daily RAC or just look at the APR's for that host for both apps and you can see the cpu APR is greater.
https://asteroidsathome.net/boinc/host_app_versions.php?hostid=729645

You might reconfigure to just run the cpu tasks and move the gpu to another project where the gpu is better utilized.

A proud member of the OFA (Old Farts Association)
ID: 7651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WuJJ

Send message
Joined: 5 Aug 16
Posts: 2
Credit: 34,873,655
RAC: 38,878
Message 7652 - Posted: 1 Jan 2023, 0:35:40 UTC - in response to Message 7651.  
Thanks for the details and additional data points. Yes, that's only one 1080. I will ignore whatever GFLOPs BOINC is reporting from now on. Looks like I was also naive to assume just because SM's are fully utilized the app should be decently optimized.

You might reconfigure to just run the cpu tasks and move the gpu to another project where the gpu is better utilized.

Yes, this is exactly why I was asking. I run multiple projects with the same few hosts and thus a fixed amount of compute/power. I am trying to distributing the projects to maximize overall contribution.
ID: 7652 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile petri33
Volunteer developer
Volunteer tester

Send message
Joined: 22 Nov 22
Posts: 26
Credit: 64,294,803
RAC: 7
Message 7668 - Posted: 7 Jan 2023, 22:47:46 UTC - in response to Message 7651.  
Hi!

Some wizardry is being done here too. I'm running three at a time and I'm working with the source administrators. It will take time (a lot) to get fully acquainted and get a deep knowledge and understanding what is going on in, at , on , with, etc ... where the computing is being done (GPU).

I hope We will face a good new year here too!

You can probably find my results and host following my username - account - host - ...

Petri
ID: 7668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 23 Apr 21
Posts: 85
Credit: 115,471,479
RAC: 203,842
Message 7692 - Posted: 12 Jan 2023, 23:10:38 UTC - in response to Message 7668.  
Hi!

Some wizardry is being done here too. I'm running three at a time and I'm working with the source administrators. It will take time (a lot) to get fully acquainted and get a deep knowledge and understanding what is going on in, at , on , with, etc ... where the computing is being done (GPU).

I hope We will face a good new year here too!

You can probably find my results and host following my username - account - host - ...

Petri


wizardry is your forte :)

so far, heavy reliance on double. FP64 cards perform quite well.

can you get the same precision with float + more wizardry?

ID: 7692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile petri33
Volunteer developer
Volunteer tester

Send message
Joined: 22 Nov 22
Posts: 26
Credit: 64,294,803
RAC: 7
Message 7712 - Posted: 16 Jan 2023, 14:31:58 UTC
Hi Keith!

For now I'm concentrating in memory access patterns. My Titan V (double master) does not do any better than a GTX 2080 Ti. That implies there are problems like strided access, i.e. consecutive threads access non consecutive memory locations.

This is BAD:
T1 T1 T1 T1 .. (50 x T1) .. T1 T2 T2 T2 .. (50 x T2) .. T2 T3 T3 T3 .. (50 x T3) .. T3 ...

A good access pattern would be:
T1 T2 T3 T4 ... TN-1 TN T1 T2 T3 T4 ... TN-1 TN T1 T2 T3 ...

The bad one achieves only about 0.25% of mem throughput.

I tried with float, but accuracy is lost. Results do not validate.
--
Petri
ID: 7712 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile petri33
Volunteer developer
Volunteer tester

Send message
Joined: 22 Nov 22
Posts: 26
Credit: 64,294,803
RAC: 7
Message 7743 - Posted: 17 Feb 2023, 21:43:35 UTC - in response to Message 7652.  

Last modified: 17 Feb 2023, 21:45:47 UTC
https://asteroidsathome.net/boinc/workunit.php?wuid=155519775

3090 vs 3 x 2080 Ti

Or what?

I'm still developing.
ID: 7743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 23 Apr 21
Posts: 85
Credit: 115,471,479
RAC: 203,842
Message 7744 - Posted: 18 Feb 2023, 13:50:27 UTC - in response to Message 7743.  
https://asteroidsathome.net/boinc/workunit.php?wuid=155519775

3090 vs 3 x 2080 Ti

Or what?

I'm still developing.


:)

ID: 7744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tito

Send message
Joined: 22 Jul 13
Posts: 5
Credit: 31,218,891
RAC: 57,371
Message 7865 - Posted: 19 Aug 2023, 15:12:25 UTC - in response to Message 7743.  
https://asteroidsathome.net/boinc/workunit.php?wuid=155519775

3090 vs 3 x 2080 Ti

Or what?

I'm still developing.

Any progress? I would like to move CPU to Asteroids, but don't know if there would be wise if GPU would boost soon.
ID: 7865 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile petri33
Volunteer developer
Volunteer tester

Send message
Joined: 22 Nov 22
Posts: 26
Credit: 64,294,803
RAC: 7
Message 7880 - Posted: 30 Aug 2023, 16:38:24 UTC - in response to Message 7865.  
Hi!

Some progress, but not so much. Tweaking every day. Just wait and do other projects in the mean time if you wish.

Petri
ID: 7880 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tito

Send message
Joined: 22 Jul 13
Posts: 5
Credit: 31,218,891
RAC: 57,371
Message 7888 - Posted: 3 Sep 2023, 11:55:18 UTC - in response to Message 7880.  
Thx for update.
ID: 7888 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Are GPU tasks less power efficient than CPU tasks for this project?