Double precision?

Message boards : Number crunching : Double precision?

Author	Message
Lamberto Vitali Send message Joined: 14 Jun 23 Posts: 101 Credit: 5,914 RAC: 0	Message 9351 - Posted: 2 Jan 2026, 19:19:09 UTC Is Gemini correct here? The Proportion of FP64: In Asteroids@Home, nearly 100% of the core compute-heavy tasks utilize FP64. The project uses the Convex-Inversion method to derive the shapes and spin states of asteroids from photometric data. Because orbital mechanics and gravitational simulations require high numerical stability to avoid "drift" over long simulated timeframes, FP32 is insufficient The Calculation: Raw Compute Factor: The Radeon VII has a native FP64 rate that is 4.5x higher than the WX 9100 (3.5 / 0.77 \approx 4.54). Memory Bandwidth Factor: Asteroids@Home is also notoriously memory-starved. The Radeon VII provides 2.1x more bandwidth, ensuring the compute units are fed data much faster. Estimated Performance Gain The Radeon VII is roughly 4 to 5 times faster than the WX 9100 Pro for Asteroids@Home. Because the MI50 you are setting up is a "pure" compute version of the Radeon VII with an even better 1:2 FP64 rate (~6.7 TFLOPS), it will likely double the performance of the Radeon VII again. You are essentially moving from a "workstation" level to a "supercomputer" level for this specific project ID: 9351 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 16 Nov 22 Posts: 191 Credit: 215,619,803 RAC: 147,724	Message 9352 - Posted: 2 Jan 2026, 23:17:41 UTC I don't put much trust in "theoreticals? Better data is achieved by actually deploying the hardware in an actual Boinc cruncher and see whether the applications can actually get any advantage from the new crunching hardware. A proud member of the OFA (Old Farts Association) ID: 9352 · Rating: 0 · rate: / Reply Quote

Lamberto Vitali Send message Joined: 14 Jun 23 Posts: 101 Credit: 5,914 RAC: 0	Message 9356 - Posted: 3 Jan 2026, 15:26:44 UTC - in response to Message 9352. They're in the post, I'll let you know.... I thought someone might know how DP and memory hungry these tasks are. Just for fun, here's a rig of old 280X cards, all refurbished by me to run cool. ID: 9356 · Rating: 0 · rate: / Reply Quote

Lamberto Vitali Send message Joined: 14 Jun 23 Posts: 101 Credit: 5,914 RAC: 0	Message 9385 - Posted: 23 Jan 2026, 8:40:02 UTC Well I've got the Radeon Instinct MI50 card (FP64 1:2 industrial version of a Radeon 7), and cheaper as nobody knows what it is) and it's actually slower than the Radeon WX9100 Pro with Asteroids. Technically it's supposed to be 13 teraflops instead of 12 on Fp32 and it also has High Bandwidth Memory. But Gemini reckons I'm throttling it with one lane of PCI Express 2 on a riser. So I've ordered a rather fancy Oculink riser which uses some cables from SAS drives so it can have four lanes of PCI Express 5 (actually PCI Express 3 because it's on a mining motherboard). Those cards will take an x16 slot and split it into 4 of 4x at up to PCI Express 5 to run four cards fast on 1 metre Oculink cables. Not too expensive either, if your motherboard supports bifurcation, they're only 20 quid to run 4 cards. 50 quid if it has to bifurcate itself. When I get them I'll post if it's faster on Asteroids with lots of VRAM transfers and FP64 use. ID: 9385 · Rating: 0 · rate: / Reply Quote

ahorek's team Volunteer developer Volunteer tester Send message Joined: 1 Jan 13 Posts: 216 Credit: 16,972,275 RAC: 48,926	Message 9386 - Posted: 23 Jan 2026, 16:07:27 UTC Radeon 7 is still a GCN architecture. The problem is wave64 mode, it’s only effective when the app applies identical operations to multiple data => if the algorithm is simple (just multiplying numbers, FFT etc...) it works great. However, real-world compute applications (and games) are not perfectly optimized and perform many different types of calculations. In those cases, GCN struggles because it's hard to feed all those units. RDNA introduces a wave32 mode, which is less efficient in ideal scenarios but performs better in real-world applications. In other words, even if RDNA appears weaker on paper, it can often make much better use of its theoretical hardware performance with less power than GCN. Larger, faster caches and higher core frequencies also contribute to better performance. Higher memory bandwidth only matters when bandwidth is the limiting factor. The Asteroids app is constrained by memory access patterns rather than raw bandwidth. Testing PCIe speed could be interesting, but it will likely have little to no impact. Similar to mining, compute tasks are rarely limited by it. ID: 9386 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Volunteer developer Volunteer tester Send message Joined: 23 Apr 21 Posts: 129 Credit: 143,466,106 RAC: 209	Message 9387 - Posted: 23 Jan 2026, 16:23:38 UTC - in response to Message 9385. Well I've got the Radeon Instinct MI50 card (FP64 1:2 industrial version of a Radeon 7), and cheaper as nobody knows what it is) and it's actually slower than the Radeon WX9100 Pro with Asteroids. Technically it's supposed to be 13 teraflops instead of 12 on Fp32 and it also has High Bandwidth Memory. But Gemini reckons I'm throttling it with one lane of PCI Express 2 on a riser. So I've ordered a rather fancy Oculink riser which uses some cables from SAS drives so it can have four lanes of PCI Express 5 (actually PCI Express 3 because it's on a mining motherboard). Those cards will take an x16 slot and split it into 4 of 4x at up to PCI Express 5 to run four cards fast on 1 metre Oculink cables. Not too expensive either, if your motherboard supports bifurcation, they're only 20 quid to run 4 cards. 50 quid if it has to bifurcate itself. When I get them I'll post if it's faster on Asteroids with lots of VRAM transfers and FP64 use. most people know or could know what an MI50 is. it's a Google search away. that's not why it's cheap. it's cheap because its a 7-8 year old AMD GPU with only 16GB VRAM (not great for AI), and limited use cases where it will perform well. It should do great at Einstein though. plus it's harder for most people to get going since it doesnt have an active cooler. the Radeon VII is usually slightly more expensive, even though it performs similarly, just because it includes a fan and is more plug-n-play with standard PCs. check out prices of the MI50 32GB. exactly the same performance specs, just 2x the VRAM, and prices are now ~450-500 USD simply because having more VRAM makes it more viable for AI and drives up demand. ID: 9387 · Rating: 0 · rate: / Reply Quote

Lamberto Vitali Send message Joined: 14 Jun 23 Posts: 101 Credit: 5,914 RAC: 0	Message 9388 - Posted: 24 Jan 2026, 0:17:41 UTC - in response to Message 9387. Last modified: 24 Jan 2026, 0:19:37 UTC most people know or could know what an MI50 is. it's a Google search away. that's not why it's cheap. But it is cheap, it cost a fair bit less than the slower Radeon 7. it's cheap because its a 7-8 year old AMD GPU with only 16GB VRAM (not great for AI), and limited use cases where it will perform well. It wasn't long ago people on here were boasting about having Radeon 7s! And it's the fastest card I've got. I will never spend more than £150 on a card! It should do great at Einstein though. plus it's harder for most people to get going since it doesnt have an active cooler. People keep telling me this, but it does, it's an axial blower on the end. they also keep telling me I can't connect and monitor to it but it has mini display port. the Radeon VII is usually slightly more expensive, even though it performs similarly, just because it includes a fan and is more plug-n-play with standard PCs. it was very plug and play actually because Windows 11 just immediately installed Ready on seven drivers which I changed to the professional drivers in case it was not using extra 64 bit parts. check out prices of the MI50 32GB. exactly the same performance specs, just 2x the VRAM, and prices are now ~450-500 USD simply because having more VRAM makes it more viable for AI and drives up demand. 16GB is plentiful for Boinc and games. I only struggle with the 3GB cards. I'll see in a few weeks what happens with the Oculink connectors as I'm sure 1 lane of PCIE 2 is throttling it. I'l be moving to 4 lanes of PCIE 3, an 8x speed increase. ID: 9388 · Rating: 0 · rate: / Reply Quote

Lamberto Vitali Send message Joined: 14 Jun 23 Posts: 101 Credit: 5,914 RAC: 0	Message 9397 - Posted: 5 Feb 2026, 0:25:51 UTC - in response to Message 9352. I don't put much trust in "theoreticals? Better data is achieved by actually deploying the hardware in an actual Boinc cruncher and see whether the applications can actually get any advantage from the new crunching hardware. No discernible difference between the cards. I want that FP64 Milkyway back! As for switching to Oculink, it seems for any project, 4xv3.0 lanes is no better than 1xv2.0 lane. Except if I run two genefer extremes from primegrid at once, then I get collisions on the single lane and productivity drops to 20%. I've been told it helps folding@home though. ID: 9397 · Rating: 0 · rate: / Reply Quote

Message boards : Number crunching : Double precision?