GPU FP64, AVX-512, AffinityWatcher and the Lasso process.
Message boards :
Number crunching :
GPU FP64, AVX-512, AffinityWatcher and the Lasso process.
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 1 Jan 13 Posts: 90 Credit: 10,400,477 RAC: 8,079 |
with 65W ECO mode, my clocks speeds are zen 5 2662 Mhz zen 4 3417 Mhz Zen 5 has a lower clock speed but can process more work per clock thanks to 1x512b vs 2x256b. Power is the limiting factor, without those restrictions, Zen 5 is faster since it can execute 1x512 operations while maintaining relatively high frequencies simultaneously. Even if the vector is 2 times wider and the CPU should be able to process tasks 2 times faster. However, it's achievable only in very synthetic benchmarks. With real apps, more factors come into play like data structures, memory, type of work etc. This is why some applications may gain significant performance boosts from AVX512, while others might not see any improvement, even if they are optimized for it. If the clock speeds are worse than the performance benefits from AVX512, the performance may even be worse. btw there's an upcoming patch for GCC15/14 https://www.phoronix.com/news/GCC-15-Lands-More-Zen-5-Tuning that could slightly help extract more performance from Zen 5 compared to the stock app |
Send message Joined: 24 Jun 24 Posts: 12 Credit: 34,542 RAC: 20 |
|
Send message Joined: 1 Jan 13 Posts: 90 Credit: 10,400,477 RAC: 8,079 |
Block dimension is a fixed value (currently hardcoded to 128), that determines how tasks are allocated across the GPU's resources and has nothing to do with dGMA. Maybe Direct GMA could enhance performance, but the code has to be optimized for it and it appears to be limited to professional cards like AMD FirePro, which limits users who could potentially benefit from it (and I don't have one) |
Send message Joined: 24 Jun 24 Posts: 12 Credit: 34,542 RAC: 20 |
I have one like that, but setting the value to 64 makes the task calculate the fastest. https://asteroidsathome.net/boinc/show_host_detail.php?hostid=783761 FirePro W8100 8GB - you can buy a used one on ebay for 130-350$. |
Send message Joined: 1 Jan 13 Posts: 90 Credit: 10,400,477 RAC: 8,079 |
hmm, how much faster? I've tested the same WU on Radeon 7900 XTX (@2Ghz), but the difference is within the margin of an error. Block dim: 64 - 7m 27s Block dim: 128 - 7m 28s (default) note that asteroid tasks aren't the same, so to get a valid comparison you have to test it with the same WU indeed, tuning the parameter could have an effect on certain GPUs (better or worse), but I don't expect it to lead to any substantial improvement. |
Previous · 1 · 2
Message boards :
Number crunching :
GPU FP64, AVX-512, AffinityWatcher and the Lasso process.