Are GPU tasks less power efficient than CPU tasks for this project?
Message boards :
Number crunching :
Are GPU tasks less power efficient than CPU tasks for this project?
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 16 Posts: 2 Credit: 34,460,639 RAC: 60,963 |
I just recently started crunching this project and I found it a bit puzzling that GPU tasks seem to be far less efficient. Taking two examples, which is fairly typical from my system: GPU: https://asteroidsathome.net/boinc/result.php?resultid=351295608 CPU: https://asteroidsathome.net/boinc/result.php?resultid=349927002 Both have 1380023.0000 GFLOPs (rsc_fpops_est) from what I see, generally should mean they are solving similar sized problems unless the number is bogus. The similar credits (hopefully not bogus either) also lend more evidence to that, though I am not sure if it's a direct result of same rsc_fpops_est. The peak GFLOPs indeed reflected the power of GPU relative to a single CPU core, hundreds of times better. However, the runtime of the GPU task is not dramatically shorter. I did monitor the load a bit and SM was 100% busy, so it's not like the workload was badly optimized, or frequently blocked on PCIE transfer, etc. If the assumption of both tasks are of the same size is correct, this would put the GPU dramatically less power efficient than the CPU. The GPU was consuming 100W, but only cutting runtime by half. While my CPU was consuming 200W, it has 32 such threads to spare. Sure my GTX 1080 is quite older compared to my new CPU, but I don't expect it would be beaten by CPU this hard. After all, GPUs dedicate so much area for compute, and has a much higher FP throughput even as shown by the task stats. Any idea if they tasks are of the same size? If they are actually very different and rsc_fpops_est is bogus, what's the ratio between the compute required to solve each? |
Send message Joined: 16 Nov 22 Posts: 131 Credit: 138,029,924 RAC: 488,729 |
You can ignore the GFLOPS for the gpu as BOINC has never correctly computed and displayed it before. Since that part of the code hasn't changed since BOINC first enabled gpu usage, the algorithm for computing GFLOPS equated the cards of that time to be the same as the cpus of that time. The 1080 is a good card for its period. Your 7950X is the current cutting edge. I don't think the gpu application is very well optimized with poor parallelization even if it shows the card at 100% utilization. My 3080's at 2X aren't that significantly faster compared to your 1080 doing them in ~ 180-200 seconds compared to your 1080's (assumed 1X) 900 seconds. I'm sure if I can persuade my team wizard developer to look at the application he could wrangle much better performance out of the code. He has performed similar wizardry on the Seti and Einstein applications. If you are concerned about energy efficiency, then I believe your analysis is correct. I would compare daily RAC or just look at the APR's for that host for both apps and you can see the cpu APR is greater. https://asteroidsathome.net/boinc/host_app_versions.php?hostid=729645 You might reconfigure to just run the cpu tasks and move the gpu to another project where the gpu is better utilized. A proud member of the OFA (Old Farts Association) |
Send message Joined: 5 Aug 16 Posts: 2 Credit: 34,460,639 RAC: 60,963 |
Thanks for the details and additional data points. Yes, that's only one 1080. I will ignore whatever GFLOPs BOINC is reporting from now on. Looks like I was also naive to assume just because SM's are fully utilized the app should be decently optimized. You might reconfigure to just run the cpu tasks and move the gpu to another project where the gpu is better utilized. Yes, this is exactly why I was asking. I run multiple projects with the same few hosts and thus a fixed amount of compute/power. I am trying to distributing the projects to maximize overall contribution. |
Send message Joined: 22 Nov 22 Posts: 26 Credit: 64,294,803 RAC: 23 |
Hi! Some wizardry is being done here too. I'm running three at a time and I'm working with the source administrators. It will take time (a lot) to get fully acquainted and get a deep knowledge and understanding what is going on in, at , on , with, etc ... where the computing is being done (GPU). I hope We will face a good new year here too! You can probably find my results and host following my username - account - host - ... Petri |
Send message Joined: 23 Apr 21 Posts: 85 Credit: 112,891,332 RAC: 204,248 |
Hi! wizardry is your forte :) so far, heavy reliance on double. FP64 cards perform quite well. can you get the same precision with float + more wizardry? |
Send message Joined: 22 Nov 22 Posts: 26 Credit: 64,294,803 RAC: 23 |
Hi Keith! For now I'm concentrating in memory access patterns. My Titan V (double master) does not do any better than a GTX 2080 Ti. That implies there are problems like strided access, i.e. consecutive threads access non consecutive memory locations. This is BAD: T1 T1 T1 T1 .. (50 x T1) .. T1 T2 T2 T2 .. (50 x T2) .. T2 T3 T3 T3 .. (50 x T3) .. T3 ... A good access pattern would be: T1 T2 T3 T4 ... TN-1 TN T1 T2 T3 T4 ... TN-1 TN T1 T2 T3 ... The bad one achieves only about 0.25% of mem throughput. I tried with float, but accuracy is lost. Results do not validate. -- Petri |
Send message Joined: 22 Nov 22 Posts: 26 Credit: 64,294,803 RAC: 23 |
Last modified: 17 Feb 2023, 21:45:47 UTC https://asteroidsathome.net/boinc/workunit.php?wuid=155519775 3090 vs 3 x 2080 Ti Or what? I'm still developing. |
Send message Joined: 23 Apr 21 Posts: 85 Credit: 112,891,332 RAC: 204,248 |
https://asteroidsathome.net/boinc/workunit.php?wuid=155519775 :) |
Send message Joined: 22 Jul 13 Posts: 5 Credit: 30,495,138 RAC: 58,434 |
https://asteroidsathome.net/boinc/workunit.php?wuid=155519775 Any progress? I would like to move CPU to Asteroids, but don't know if there would be wise if GPU would boost soon. |
Send message Joined: 22 Nov 22 Posts: 26 Credit: 64,294,803 RAC: 23 |
|
Send message Joined: 22 Jul 13 Posts: 5 Credit: 30,495,138 RAC: 58,434 |
|
Message boards :
Number crunching :
Are GPU tasks less power efficient than CPU tasks for this project?