Double precision?


Message boards : Number crunching : Double precision?

Message board moderation

To post messages, you must log in.
AuthorMessage
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 99
Credit: 5,914
RAC: 0
Message 9351 - Posted: 2 Jan 2026, 19:19:09 UTC
Is Gemini correct here?

The Proportion of FP64:
​In Asteroids@Home, nearly 100% of the core compute-heavy tasks utilize FP64.
The project uses the Convex-Inversion method to derive the shapes and spin states of asteroids from photometric data. Because orbital mechanics and gravitational simulations require high numerical stability to avoid "drift" over long simulated timeframes, FP32 is insufficient

The Calculation:
​Raw Compute Factor: The Radeon VII has a native FP64 rate that is 4.5x higher than the WX 9100 (3.5 / 0.77 \approx 4.54).
​Memory Bandwidth Factor: Asteroids@Home is also notoriously memory-starved. The Radeon VII provides 2.1x more bandwidth, ensuring the compute units are fed data much faster.
​Estimated Performance Gain
​The Radeon VII is roughly 4 to 5 times faster than the WX 9100 Pro for Asteroids@Home.
​Because the MI50 you are setting up is a "pure" compute version of the Radeon VII with an even better 1:2 FP64 rate (~6.7 TFLOPS), it will likely double the performance of the Radeon VII again. You are essentially moving from a "workstation" level to a "supercomputer" level for this specific project
ID: 9351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 16 Nov 22
Posts: 188
Credit: 202,940,159
RAC: 170,628
Message 9352 - Posted: 2 Jan 2026, 23:17:41 UTC
I don't put much trust in "theoreticals? Better data is achieved by actually deploying the hardware in an actual Boinc cruncher and see whether the applications can actually get any advantage from the new crunching hardware.

A proud member of the OFA (Old Farts Association)
ID: 9352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 99
Credit: 5,914
RAC: 0
Message 9356 - Posted: 3 Jan 2026, 15:26:44 UTC - in response to Message 9352.  
They're in the post, I'll let you know....

I thought someone might know how DP and memory hungry these tasks are.

Just for fun, here's a rig of old 280X cards, all refurbished by me to run cool.

ID: 9356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 99
Credit: 5,914
RAC: 0
Message 9385 - Posted: 23 Jan 2026, 8:40:02 UTC
Well I've got the Radeon Instinct MI50 card (FP64 1:2 industrial version of a Radeon 7), and cheaper as nobody knows what it is) and it's actually slower than the Radeon WX9100 Pro with Asteroids. Technically it's supposed to be 13 teraflops instead of 12 on Fp32 and it also has High Bandwidth Memory. But Gemini reckons I'm throttling it with one lane of PCI Express 2 on a riser. So I've ordered a rather fancy Oculink riser which uses some cables from SAS drives so it can have four lanes of PCI Express 5 (actually PCI Express 3 because it's on a mining motherboard). Those cards will take an x16 slot and split it into 4 of 4x at up to PCI Express 5 to run four cards fast on 1 metre Oculink cables. Not too expensive either, if your motherboard supports bifurcation, they're only 20 quid to run 4 cards. 50 quid if it has to bifurcate itself.

When I get them I'll post if it's faster on Asteroids with lots of VRAM transfers and FP64 use.
ID: 9385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 180
Credit: 16,029,851
RAC: 1,807
Message 9386 - Posted: 23 Jan 2026, 16:07:27 UTC
Radeon 7 is still a GCN architecture. The problem is wave64 mode, it’s only effective when the app applies identical operations to multiple data => if the algorithm is simple (just multiplying numbers, FFT etc...) it works great. However, real-world compute applications (and games) are not perfectly optimized and perform many different types of calculations. In those cases, GCN struggles because it's hard to feed all those units.
RDNA introduces a wave32 mode, which is less efficient in ideal scenarios but performs better in real-world applications. In other words, even if RDNA appears weaker on paper, it can often make much better use of its theoretical hardware performance with less power than GCN. Larger, faster caches and higher core frequencies also contribute to better performance.
Higher memory bandwidth only matters when bandwidth is the limiting factor. The Asteroids app is constrained by memory access patterns rather than raw bandwidth.
Testing PCIe speed could be interesting, but it will likely have little to no impact. Similar to mining, compute tasks are rarely limited by it.
ID: 9386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 23 Apr 21
Posts: 127
Credit: 142,543,584
RAC: 31,412
Message 9387 - Posted: 23 Jan 2026, 16:23:38 UTC - in response to Message 9385.  
Well I've got the Radeon Instinct MI50 card (FP64 1:2 industrial version of a Radeon 7), and cheaper as nobody knows what it is) and it's actually slower than the Radeon WX9100 Pro with Asteroids. Technically it's supposed to be 13 teraflops instead of 12 on Fp32 and it also has High Bandwidth Memory. But Gemini reckons I'm throttling it with one lane of PCI Express 2 on a riser. So I've ordered a rather fancy Oculink riser which uses some cables from SAS drives so it can have four lanes of PCI Express 5 (actually PCI Express 3 because it's on a mining motherboard). Those cards will take an x16 slot and split it into 4 of 4x at up to PCI Express 5 to run four cards fast on 1 metre Oculink cables. Not too expensive either, if your motherboard supports bifurcation, they're only 20 quid to run 4 cards. 50 quid if it has to bifurcate itself.

When I get them I'll post if it's faster on Asteroids with lots of VRAM transfers and FP64 use.


most people know or could know what an MI50 is. it's a Google search away. that's not why it's cheap.

it's cheap because its a 7-8 year old AMD GPU with only 16GB VRAM (not great for AI), and limited use cases where it will perform well. It should do great at Einstein though. plus it's harder for most people to get going since it doesnt have an active cooler. the Radeon VII is usually slightly more expensive, even though it performs similarly, just because it includes a fan and is more plug-n-play with standard PCs.

check out prices of the MI50 32GB. exactly the same performance specs, just 2x the VRAM, and prices are now ~450-500 USD simply because having more VRAM makes it more viable for AI and drives up demand.

ID: 9387 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 99
Credit: 5,914
RAC: 0
Message 9388 - Posted: 24 Jan 2026, 0:17:41 UTC - in response to Message 9387.  

Last modified: 24 Jan 2026, 0:19:37 UTC
most people know or could know what an MI50 is. it's a Google search away. that's not why it's cheap.

But it is cheap, it cost a fair bit less than the slower Radeon 7.

it's cheap because its a 7-8 year old AMD GPU with only 16GB VRAM (not great for AI), and limited use cases where it will perform well.

It wasn't long ago people on here were boasting about having Radeon 7s! And it's the fastest card I've got. I will never spend more than £150 on a card!

It should do great at Einstein though. plus it's harder for most people to get going since it doesnt have an active cooler.

People keep telling me this, but it does, it's an axial blower on the end.

they also keep telling me I can't connect and monitor to it but it has mini display port.

the Radeon VII is usually slightly more expensive, even though it performs similarly, just because it includes a fan and is more plug-n-play with standard PCs.

it was very plug and play actually because Windows 11 just immediately installed Ready on seven drivers which I changed to the professional drivers in case it was not using extra 64 bit parts.

check out prices of the MI50 32GB. exactly the same performance specs, just 2x the VRAM, and prices are now ~450-500 USD simply because having more VRAM makes it more viable for AI and drives up demand.

16GB is plentiful for Boinc and games. I only struggle with the 3GB cards.

I'll see in a few weeks what happens with the Oculink connectors as I'm sure 1 lane of PCIE 2 is throttling it. I'l be moving to 4 lanes of PCIE 3, an 8x speed increase.
ID: 9388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 99
Credit: 5,914
RAC: 0
Message 9397 - Posted: 5 Feb 2026, 0:25:51 UTC - in response to Message 9352.  
I don't put much trust in "theoreticals? Better data is achieved by actually deploying the hardware in an actual Boinc cruncher and see whether the applications can actually get any advantage from the new crunching hardware.
No discernible difference between the cards. I want that FP64 Milkyway back!

As for switching to Oculink, it seems for any project, 4xv3.0 lanes is no better than 1xv2.0 lane. Except if I run two genefer extremes from primegrid at once, then I get collisions on the single lane and productivity drops to 20%.

I've been told it helps folding@home though.
ID: 9397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Double precision?