nVidia GPU comparison

Message boards : Number crunching : nVidia GPU comparison

Author	Message
KLiK Send message Joined: 3 Apr 14 Posts: 29 Credit: 20,565,661 RAC: 877	Message 8764 - Posted: 4 Feb 2025, 15:13:23 UTC As I was recently considering to purchase some newer card, so made some table in order to keep my mind busy. Here is the list of the table, from import from Asteroids@home & one other site: GPUs table: So from all of those, here is the comparison (note the lines match upper table for the card gen). You owuld expect that the speed of GPU is the same as marketing, check here: But actually, the speed of GPU is not same on Asteroids@home, as some cards differ quite a lot from previous comparison graph: Note that the speed of 4090 is almost the same as 4080. So probably that is either the CPU bottlenecking the GPU or the algorithm is bottlenecked. What do you think?[/img] non-profit org. Play4Life in Zagreb, Croatia, EU ID: 8764 · Rating: 0 · rate: / Reply Quote

ahorek's team Volunteer developer Volunteer tester Send message Joined: 1 Jan 13 Posts: 146 Credit: 13,543,094 RAC: 29,605	Message 8765 - Posted: 4 Feb 2025, 17:14:26 UTC Interesting! Where did you obtain the data? FP32 isn't a valid comparison for Asteroids. The app relies on FP64 and the peak performance could vary between generations/vendors. Each new generation usually has a worse FP32/FP64 ratio because FP64 isn't very useful for games and all vendors want to sell (much more expensive) professional cards for computing purposes. https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889 btw INT performance usually isn't in marketing performance charts, but you can measure it with tools like clpeak gaming performance could be very different from computing performance and it also depends on what the app is doing, for instance: primegrid apps typically use INT32 calculations (where NVidia excels) einstein, gpugrid is mainly FP32 with some FP64 (usually favors AMD) asteroids, milkyway (the old separation app) FP64 regarding asteroids, the current app isn't limited by CPU or raw computational power but rather by memory access, so it doesn't scale / is as efficient as it could be, but we're working on optimizations. ID: 8765 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Volunteer developer Volunteer tester Send message Joined: 23 Apr 21 Posts: 112 Credit: 124,277,576 RAC: 5,841	Message 8766 - Posted: 4 Feb 2025, 19:10:20 UTC also you should check ACTUAL power use, not the top TDP spec, because it does not use all of the TDP when running asteroids since the performance is so limited by the memory access. ID: 8766 · Rating: 0 · rate: / Reply Quote

Ian&Steve C. Volunteer developer Volunteer tester Send message Joined: 23 Apr 21 Posts: 112 Credit: 124,277,576 RAC: 5,841	Message 8767 - Posted: 4 Feb 2025, 19:25:34 UTC - in response to Message 8765. einstein, gpugrid is mainly FP32 with some FP64 (usually favors AMD) Einstein is sensitive to the memory bandwidth. lots of data flying back and forth between VRAM and GPU. both the O3AS and BRP7 app usually see improvements proportional to improvements in the memory bandwidth. a good example is the Titan V vs the V100. the exact same core and architecture, and very similar FP32/FP64/etc metris, only varying slightly due to slight clock speed differences. with the Titan V (3x active HBM stacks) 3072-bit bus, 651 GB/s bandwidth V100 (4x active HBM stacks) 4096-bit bus, 897 GB/s bandwidth V100 sees an increase in bandwidth by 37%, and also sees a performance improvement over Titan V by about 36% under the same conditions on the O3AS application. Einstein I don't think favors AMD. that used to be the case, but the old OpenCL apps were unoptimized and had flaws in the OpenCL code which artificially limited Nvidia performance (issues with serialization of compute, fixed by Petri), and since the advent of CUDA code at Einstein, even the stock apps, they pretty well favor Nvidia now. GPUGRID, I'm not sure you can say it favors AMD when their project only has CUDA apps. you have to use ZLUDA to even contribute lol ID: 8767 · Rating: 0 · rate: / Reply Quote

ahorek's team Volunteer developer Volunteer tester Send message Joined: 1 Jan 13 Posts: 146 Credit: 13,543,094 RAC: 29,605	Message 8769 - Posted: 4 Feb 2025, 23:56:42 UTC The FP32 marketing numbers don’t tell the whole story. Actual compute performance depends on how an application utilizes the GPU, each app may behave differently and memory bandwidth can also be a limiting factor. I recall when AMD had a specialized rotate instruction, while NVIDIA had to emulate it with four instructions to achieve the same result. In rare cases, such as SHA-256 calculations, AMD was significantly faster, but that feature was completely irrelevant for gaming. I agree that the situation has changed, and NVIDIA is now far ahead of the competition. While other vendors can still compete in gaming, NVIDIA typically holds a significant advantage when it comes to general computing performance. ID: 8769 · Rating: 0 · rate: / Reply Quote

KLiK Send message Joined: 3 Apr 14 Posts: 29 Credit: 20,565,661 RAC: 877	Message 8770 - Posted: 6 Feb 2025, 7:08:05 UTC - in response to Message 8766. also you should check ACTUAL power use, not the top TDP spec, because it does not use all of the TDP when running asteroids since the performance is so limited by the memory access. Yes, that would also be right...but that data is not available! & that data tells more about "efficiency of the algorithm used", than anything else... non-profit org. Play4Life in Zagreb, Croatia, EU ID: 8770 · Rating: 0 · rate: / Reply Quote

KLiK Send message Joined: 3 Apr 14 Posts: 29 Credit: 20,565,661 RAC: 877	Message 8771 - Posted: 6 Feb 2025, 8:19:04 UTC - in response to Message 8765. Interesting! Where did you obtain the data? FP32 isn't a valid comparison for Asteroids. The app relies on FP64 and the peak performance could vary between generations/vendors. Each new generation usually has a worse FP32/FP64 ratio because FP64 isn't very useful for games and all vendors want to sell (much more expensive) professional cards for computing purposes. https://www.techpowerup.com/gpu-specs/geforce-rtx-4090.c3889 btw INT performance usually isn't in marketing performance charts, but you can measure it with tools like clpeak gaming performance could be very different from computing performance and it also depends on what the app is doing, for instance: primegrid apps typically use INT32 calculations (where NVidia excels) einstein, gpugrid is mainly FP32 with some FP64 (usually favors AMD) asteroids, milkyway (the old separation app) FP64 regarding asteroids, the current app isn't limited by CPU or raw computational power but rather by memory access, so it doesn't scale / is as efficient as it could be, but we're working on optimizations. Also made the FP64 GFLOPs / W graph also, as mentioned these speeds for calcs: But the same graph goes for % / W (of TDP): So all in all, some cards are either bottle necked or severely slower then some counterparts! & 2nd graph (% / W) is the one to look, as this gives you effectiveness of the GPU app algorithm, which might have issue on some cards...like on 4090, which has no computational gain from 4080 or 4080Ti! non-profit org. Play4Life in Zagreb, Croatia, EU ID: 8771 · Rating: 0 · rate: / Reply Quote

Message boards : Number crunching : nVidia GPU comparison