Posts by WuJJ

1) (Message 7652)
Posted 1 Jan 2023 by WuJJ
Post:
Thanks for the details and additional data points. Yes, that's only one 1080. I will ignore whatever GFLOPs BOINC is reporting from now on. Looks like I was also naive to assume just because SM's are fully utilized the app should be decently optimized.

You might reconfigure to just run the cpu tasks and move the gpu to another project where the gpu is better utilized.

Yes, this is exactly why I was asking. I run multiple projects with the same few hosts and thus a fixed amount of compute/power. I am trying to distributing the projects to maximize overall contribution.
2) (Message 7650)
Posted 31 Dec 2022 by WuJJ
Post:
I just recently started crunching this project and I found it a bit puzzling that GPU tasks seem to be far less efficient.

Taking two examples, which is fairly typical from my system:
GPU: https://asteroidsathome.net/boinc/result.php?resultid=351295608
CPU: https://asteroidsathome.net/boinc/result.php?resultid=349927002

Both have 1380023.0000 GFLOPs (rsc_fpops_est) from what I see, generally should mean they are solving similar sized problems unless the number is bogus. The similar credits (hopefully not bogus either) also lend more evidence to that, though I am not sure if it's a direct result of same rsc_fpops_est.

The peak GFLOPs indeed reflected the power of GPU relative to a single CPU core, hundreds of times better. However, the runtime of the GPU task is not dramatically shorter. I did monitor the load a bit and SM was 100% busy, so it's not like the workload was badly optimized, or frequently blocked on PCIE transfer, etc.

If the assumption of both tasks are of the same size is correct, this would put the GPU dramatically less power efficient than the CPU. The GPU was consuming 100W, but only cutting runtime by half. While my CPU was consuming 200W, it has 32 such threads to spare. Sure my GTX 1080 is quite older compared to my new CPU, but I don't expect it would be beaten by CPU this hard. After all, GPUs dedicate so much area for compute, and has a much higher FP throughput even as shown by the task stats.

Any idea if they tasks are of the same size? If they are actually very different and rsc_fpops_est is bogus, what's the ratio between the compute required to solve each?