Amd Cards



Message board moderation

To post messages, you must log in.
AuthorMessage
Profile Georgi Vidinski
Volunteer moderator
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 22 Nov 17
Posts: 159
Credit: 13,180,466
RAC: 13
Message 7917 - Posted: 10 Sep 2023, 17:15:35 UTC - in response to Message 7912.  

Last modified: 10 Sep 2023, 17:16:18 UTC
Georgi,

what environment did you build the Linux apps from? GLIBC is a base package that's not easy to upgrade and well integrated to the OS. doing apt update and apt upgrade will not change this package. only a full OS upgrade will change it. having a 2.38 dependency is very bleeding edge, and that's not even available in Ubuntu yet. the latest release (23.04) is still on 2.37, and 2.38 is still in active development for their upcoming 23.10 release.

since 22.04 is still the primary LTS version, i think it would make sense if you use an older build environment (recommend something older like ubuntu 20.04 era with older GLIBC) to avoid these compatibility issues. otherwise people on linux will only be able to run the apps with the very latest OS, which not everyone wants to do.


Ian&Steve C. you're right,

I was working on Arch Linux 6.1.49-1-MANJARO. And yes, except the GCC, wich I had to downgrade to v12 everything else was in final versions (GLIBC 2.38 include). And I did't realized there could be a dependency issues. Well, as I love to say - it's a constant learning curve, and thank you for pointing that out!

I'll see what I can do and will recompile the lunux app.
I'll change the
Grid dim
to dynamic value from
clGetDeviceInfo()
as well.
“The good thing about science is that it's true whether or not you believe in it.” ― Neil deGrasse Tyson
ID: 7917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 23 Apr 21
Posts: 70
Credit: 57,734,682
RAC: 529,615
Message 7918 - Posted: 10 Sep 2023, 17:20:43 UTC - in response to Message 7917.  
sounds great. glad to help :)

ID: 7918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
THX

Send message
Joined: 17 Feb 13
Posts: 3
Credit: 25,262,359
RAC: 1,889
Message 7919 - Posted: 10 Sep 2023, 20:02:56 UTC
Sometimes it just memory leak and get stuck like 18hours and a "Peak swap size: 73.27 GB". I saw that one buggy when I saw that my ssd was slowly be eating by virtual memory file.
ID: 7919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 76
Credit: 5,914
RAC: 0
Message 7920 - Posted: 10 Sep 2023, 21:03:47 UTC - in response to Message 7919.  
Sometimes it just memory leak and get stuck like 18hours and a "Peak swap size: 73.27 GB". I saw that one buggy when I saw that my ssd was slowly be eating by virtual memory file.
Should programs not have a memory limiter?
ID: 7920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 65
Credit: 6,253,292
RAC: 3,651
Message 7921 - Posted: 11 Sep 2023, 7:28:40 UTC - in response to Message 7916.  

Last modified: 11 Sep 2023, 7:29:10 UTC
I get those, they either work and complete in an hour on an 8000Gflop GPU, with linear progress on the Boinc counter, or they stick at 0.01% forever and eventually consume all the GPU's VRAM. On mine, they start moving from 0.01% in 3 minutes. If they haven't, they're never going to. I think I've had 10 work ok and 3 break.

It seems not all tasks are the same.


Yes. That task now shows 4:22:59 elapsed with 02:29:40 r4maining. But it still shows .010 progress.

I'm gonna abort it.

Steven Gaber
Oldsmar, FL
ID: 7921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jonathan Brier

Send message
Joined: 19 Jun 12
Posts: 2
Credit: 4,448,268
RAC: 8
Message 7922 - Posted: 11 Sep 2023, 13:18:26 UTC

Last modified: 11 Sep 2023, 13:18:49 UTC
In terms of any GPU app failures on Pop!_OS 22.04, right now mesa's rusticl is disabled per https://github.com/pop-os/mesa/commit/75774150750b059f8de74e0e2895d1e74238a23d until the OS maintainers update to the meson package to 0.61.4. Newer cards may not support OpenCL when using the Mesa drivers. Example error logs would be seen if https://asteroidsathome.net/boinc/show_host_detail.php?hostid=728794 receives GPU tasks for the RX 6600XT

Additionally - Einstein@home is a software test case for Mesa, asteroids@home could possibly be a software test case if it adds anything different. See: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7420
ID: 7922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Weber462

Send message
Joined: 25 Oct 22
Posts: 15
Credit: 9,560,150
RAC: 20,425
Message 7923 - Posted: 11 Sep 2023, 16:07:18 UTC - in response to Message 7921.  
mine were erroring out the same way. I have a 7900xtx and 6800x on that host. Its weird, a few would work, but this most would get hung up like you described. on win11
ID: 7923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FritzB

Send message
Joined: 9 May 13
Posts: 3
Credit: 7,612,245
RAC: 4,645
Message 7924 - Posted: 11 Sep 2023, 19:10:38 UTC - in response to Message 7923.  
This one https://asteroidsathome.net/boinc/result.php?resultid=400435687 is running for 7.5 hours, stuck at 0,01% and using >7GB RAM.
ID: 7924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Georgi Vidinski
Volunteer moderator
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 22 Nov 17
Posts: 159
Credit: 13,180,466
RAC: 13
Message 7925 - Posted: 11 Sep 2023, 21:43:21 UTC
New build with fixes is on its way. But I still have to run some tests on it first.

It still may have issues with integrated AMD Graphics though. Those CPU based Graphics needs different memory alignment. Unfortunately there is now way for us to distinct them at project level from discrete GPUs.
So, those of you who have such systems may want to restrict their use for now using cc_congif.xml and
<exclude_gpu>
tag (Client configuration).
At least until we handle their specs through the code.

Georgi
“The good thing about science is that it's true whether or not you believe in it.” ― Neil deGrasse Tyson
ID: 7925 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 76
Credit: 5,914
RAC: 0
Message 7926 - Posted: 11 Sep 2023, 22:12:08 UTC

Last modified: 11 Sep 2023, 22:12:24 UTC
Einstein distinguish GPUs to know which ones can handle certain apps. But they might just be looking at the reported OpenCL version. Not sure what else you can see on the server end.

Will my old R9 280X cards work with the new version, or should I switch them off? They're failing every task in seconds.
ID: 7926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 76
Credit: 5,914
RAC: 0
Message 7927 - Posted: 12 Sep 2023, 21:19:31 UTC

Last modified: 12 Sep 2023, 21:25:43 UTC
The new version gets done 3 times faster, and the GPU is producing more heat. Still not as hot as for other projects, and the tasks still take longer than a CPU (20 minutes on an R9 Nano vs 24 at once in 1 hour (so 1 every 2.5 minutes) on a Ryzen 9 3900XT), but it's a big improvement.

I was going to try two at a time, but I notice these take about 2GB on the card, so two wouldn't fit easily in 4GB. Have you deliberately set it to use about half the GPU RAM?

How come they need so much memory compared with the CPU tasks which are about 13MB?

Still not working on older cards like the OpenCL 1.2 R9 280X.
ID: 7927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FritzB

Send message
Joined: 9 May 13
Posts: 3
Credit: 7,612,245
RAC: 4,645
Message 7928 - Posted: 12 Sep 2023, 21:52:40 UTC - in response to Message 7925.  

Last modified: 12 Sep 2023, 21:53:11 UTC
Do you use something like this to run CUDA code on AMD GPUs?
ID: 7928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pavel_Kirpichenko

Send message
Joined: 19 Oct 12
Posts: 2
Credit: 2,848,979
RAC: 452
Message 7929 - Posted: 13 Sep 2023, 6:37:15 UTC
On my Radeon RX 5500 XT, tasks are completed in about 20 minutes. At the same time, it comes out about 150 points per hour, which of course does not please.
ID: 7929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 76
Credit: 5,914
RAC: 0
Message 7930 - Posted: 13 Sep 2023, 6:44:55 UTC
The earlier ones gave a huge number of points, making them equivalent to a CPU per time. I guess that was a bonus for being a tester.

The coding is going to have to go 10 times faster to be worth using. GPUs are supposed to outpace CPUs.
ID: 7930 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Skip Da Shu

Send message
Joined: 6 Mar 23
Posts: 3
Credit: 6,349,711
RAC: 6,910
Message 7932 - Posted: 13 Sep 2023, 11:36:15 UTC - in response to Message 7925.  

Last modified: 13 Sep 2023, 11:36:51 UTC
Thank you Georgi.
ID: 7932 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 23 Apr 21
Posts: 70
Credit: 57,734,682
RAC: 529,615
Message 7933 - Posted: 13 Sep 2023, 12:27:03 UTC - in response to Message 7930.  

Last modified: 13 Sep 2023, 12:28:16 UTC
The earlier ones gave a huge number of points, making them equivalent to a CPU per time. I guess that was a bonus for being a tester


I found only one instance of the high credit reward in your tasks.
https://asteroidsathome.net/boinc/workunit.php?wuid=174901689

It’s an artifact of CreditNew, not due to a being a “tester”. On that one task, you were matched up with another amd GPU task. Since both of your devices reported a really high Flops value (relative to the CPUs) so the credit reward got scaled up a lot.

Though I’m not quite sure why the same thing doesn’t happen with the CUDA app wingmen who generallly have much higher reported flops values (but should be taking your flops value as baseline). Maybe that only happens when the same exact app is used.

This kind of idiosyncrasy is why CreditNew is not ideal. And why a static reward is better IMO. The “value” of a task shouldn’t change depending on what device run it. A CPU task is the same as a GPU task. And two hosts shouldn’t receive wildly more credit just because they ran on GPU and happened to match with each other.

ID: 7933 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Krümel

Send message
Joined: 12 May 13
Posts: 5
Credit: 943,335
RAC: 70
Message 7934 - Posted: 13 Sep 2023, 14:07:01 UTC
Tasks won´t work on my machine.
https://asteroidsathome.net/boinc/result.php?resultid=400860889

GLIBC problem.
ID: 7934 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Volunteer developer
Volunteer tester
Avatar

Send message
Joined: 23 Apr 21
Posts: 70
Credit: 57,734,682
RAC: 529,615
Message 7935 - Posted: 13 Sep 2023, 14:20:59 UTC - in response to Message 7934.  
Tasks won´t work on my machine.
https://asteroidsathome.net/boinc/result.php?resultid=400860889

GLIBC problem.


Looks like Georgi will have to recompile this again with an older environment. Strange that this was put up yesterday. Maybe he accidentally put up the same app again.

ID: 7935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 76
Credit: 5,914
RAC: 0
Message 7936 - Posted: 13 Sep 2023, 19:12:26 UTC - in response to Message 7933.  

Last modified: 13 Sep 2023, 19:22:31 UTC
I found only one instance of the high credit reward in your tasks.
https://asteroidsathome.net/boinc/workunit.php?wuid=174901689
I thought there were two. Might have imagined the other, maybe I was looking at the other guy's score.

It’s an artifact of CreditNew, not due to a being a “tester”. On that one task, you were matched up with another amd GPU task. Since both of your devices reported a really high Flops value (relative to the CPUs) so the credit reward got scaled up a lot.

Though I’m not quite sure why the same thing doesn’t happen with the CUDA app wingmen who generallly have much higher reported flops values (but should be taking your flops value as baseline). Maybe that only happens when the same exact app is used.

This kind of idiosyncrasy is why CreditNew is not ideal. And why a static reward is better IMO. The “value” of a task shouldn’t change depending on what device run it. A CPU task is the same as a GPU task. And two hosts shouldn’t receive wildly more credit just because they ran on GPU and happened to match with each other.
Agreed, you should get "paid" for work done, not how long you take. If you and me worked on a building site and you built 10 walls in a day and I built one, with credit new we'd get the same pay! People who buy expensive equipment should be rewarded for it.

Assuming by flops you mean floating point operations PER SECOND (Boinc lists a task as expected task SIZE of so many flops, although it does write FLOPs with the s smaller, plural of FLOP?):

If you do an Einstein task on your much faster GPUs than mine, and you get matched with one of my cards, do you really get less credit? I'm lost as to how credit new is working out the credit. If it's flops (speed) x time taken, this should be equal for both our cards. You're 5 times faster but in a 5th of the time.
ID: 7936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 76
Credit: 5,914
RAC: 0
Message 7937 - Posted: 13 Sep 2023, 19:14:53 UTC
"Our situation has not improved" -- Sean Connery, Raiders of the Lost Ark.

The new version 102.18 is worse than 102.17.

Newer card R9 Nano:
Never starts processing on GPU, 10GB system RAM used, uses CPU time only.

Older card R9 280X:
No longer aborts at 5 seconds, but 5GB system RAM used, and runs for several hours without completing or using any GPU/CPU time.
ID: 7937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote