GPU FP64, AVX-512, AffinityWatcher and the Lasso process.

Message boards : Number crunching : GPU FP64, AVX-512, AffinityWatcher and the Lasso process.

Author	Message
ahorek's team Volunteer developer Volunteer tester Send message Joined: 1 Jan 13 Posts: 180 Credit: 16,027,662 RAC: 2,182	Message 8567 - Posted: 6 Sep 2024, 17:07:32 UTC with 65W ECO mode, my clocks speeds are zen 5 2662 Mhz zen 4 3417 Mhz Zen 5 has a lower clock speed but can process more work per clock thanks to 1x512b vs 2x256b. Power is the limiting factor, without those restrictions, Zen 5 is faster since it can execute 1x512 operations while maintaining relatively high frequencies simultaneously. Even if the vector is 2 times wider and the CPU should be able to process tasks 2 times faster. However, it's achievable only in very synthetic benchmarks. With real apps, more factors come into play like data structures, memory, type of work etc. This is why some applications may gain significant performance boosts from AVX512, while others might not see any improvement, even if they are optimized for it. If the clock speeds are worse than the performance benefits from AVX512, the performance may even be worse. btw there's an upcoming patch for GCC15/14 https://www.phoronix.com/news/GCC-15-Lands-More-Zen-5-Tuning that could slightly help extract more performance from Zen 5 compared to the stock app ID: 8567 · Rating: 0 · rate: / Reply Quote

chr80 Send message Joined: 24 Jun 24 Posts: 12 Credit: 34,542 RAC: 0	Message 8601 - Posted: 18 Oct 2024, 23:52:14 UTC What is the optimal block size "Block dim: 128" for Direct GMA? for GPU. ID: 8601 · Rating: 0 · rate: / Reply Quote

ahorek's team Volunteer developer Volunteer tester Send message Joined: 1 Jan 13 Posts: 180 Credit: 16,027,662 RAC: 2,182	Message 8602 - Posted: 19 Oct 2024, 16:05:36 UTC - in response to Message 8601. Block dimension is a fixed value (currently hardcoded to 128), that determines how tasks are allocated across the GPU's resources and has nothing to do with dGMA. Maybe Direct GMA could enhance performance, but the code has to be optimized for it and it appears to be limited to professional cards like AMD FirePro, which limits users who could potentially benefit from it (and I don't have one) ID: 8602 · Rating: 0 · rate: / Reply Quote

chr80 Send message Joined: 24 Jun 24 Posts: 12 Credit: 34,542 RAC: 0	Message 8603 - Posted: 20 Oct 2024, 19:43:02 UTC - in response to Message 8602. I have one like that, but setting the value to 64 makes the task calculate the fastest. https://asteroidsathome.net/boinc/show_host_detail.php?hostid=783761 FirePro W8100 8GB - you can buy a used one on ebay for 130-350$. ID: 8603 · Rating: 0 · rate: / Reply Quote

ahorek's team Volunteer developer Volunteer tester Send message Joined: 1 Jan 13 Posts: 180 Credit: 16,027,662 RAC: 2,182	Message 8604 - Posted: 20 Oct 2024, 21:22:19 UTC - in response to Message 8603. hmm, how much faster? I've tested the same WU on Radeon 7900 XTX (@2Ghz), but the difference is within the margin of an error. Block dim: 64 - 7m 27s Block dim: 128 - 7m 28s (default) note that asteroid tasks aren't the same, so to get a valid comparison you have to test it with the same WU indeed, tuning the parameter could have an effect on certain GPUs (better or worse), but I don't expect it to lead to any substantial improvement. ID: 8604 · Rating: 0 · rate: / Reply Quote

Previous · 1 · 2

Message boards : Number crunching : GPU FP64, AVX-512, AffinityWatcher and the Lasso process.