AMD OpenCL issues



Message board moderation

To post messages, you must log in.
AuthorMessage
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8028 - Posted: 18 Sep 2023, 23:08:14 UTC - in response to Message 8026.  
even if it's a mobile "power-efficient" GPU, compared to more recent cards, it draws 4-5x more power in total to complete the same amount of work (primegrid, einstein). It's hard to make a fair comparison, so take it as an estimate of what the real difference could be.

just because some chip like this is capable of running the code, it doesn't mean it makes sense to torture it 24/7. At least if you care about power... for challenges to squeeze a few points or as a simple heater it's fine.
ID: 8028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8029 - Posted: 19 Sep 2023, 5:55:25 UTC - in response to Message 8028.  
just because some chip like this is capable of running the code, it doesn't mean it makes sense to torture it 24/7. At least if you care about power... for challenges to squeeze a few points or as a simple heater it's fine.
Yeah I'm wasting power [1], but I'm also not buying expensive kit. My GPUs I got for £30 to £50 each and I'll run them until I can't repair them any more. Might not be most cost effective, but it feels like it's the right thing to do. I spent more on repairing my 21 year old car than was sensible, would have been better to get another one, but then I inherit unknown problems. Once I've put money into this one, I don't want to scrap it.

[1]In winter it's not wasted, as otherwise I'd turn on a heater. An no I won't get heat pumps, they cost a fortune, they're loud, and they break down, and they're not efficient when it's cold outside anyway, I'll stick to resistive heating.

If you use solar, why not use up the power which would go to waste anyway when the batteries are fully charged? No point selling it to the power company for a tenth of it's real value.
ID: 8029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8030 - Posted: 19 Sep 2023, 5:57:17 UTC

Last modified: 19 Sep 2023, 5:59:15 UTC
Back on topic - I'm getting a number of stuck tasks on AMD GPUs. I'm wondering if there's a way to force a task to try again from the beginning to see if it's random, or if certain ones will never work on a GPU? I've not seen them get up to the maximum of _6, so I assume the task is ok when it's retried probably on a CPU. I can't from here force it to go onto another GPU of my own or someone else's, but I could at least try starting it again - by deleting a checkpoint and restarting Boinc or something?
ID: 8030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Broer

Send message
Joined: 11 Sep 12
Posts: 21
Credit: 6,159,814
RAC: 1,384
Message 8031 - Posted: 19 Sep 2023, 13:18:00 UTC - in response to Message 7970.  
The IGP of my Athlon-A12 9800E wants to do another 2234 days, processing the AMD GPU WU...having done 0.019% in 10 hours
I'm aborting it

ID: 8031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8032 - Posted: 19 Sep 2023, 13:57:48 UTC
if you're on Windows, try this trick:

run "regedit"
go to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
change or add a key TdrLevel
set it to "0" (disabled)
restart


it took my poor iGPU 2,5hours to finish :) https://asteroidsathome.net/boinc/result.php?resultid=401933890

Asteroid developers could lower the intensity for low-end GPUs, so they'll be able to process tasks without timing out the driver. It needs some experimenting, but it shouldn't be hard to fix (if this is the only issue).

also (as already mentioned before), a watchdog could help to prevent stuck tasks in case something like this happens...
ID: 8032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Georgi Vidinski
Volunteer moderator
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 22 Nov 17
Posts: 159
Credit: 13,180,518
RAC: 0
Message 8033 - Posted: 19 Sep 2023, 14:35:17 UTC - in response to Message 8032.  
Thanks ahorek's team

Good point. Both suggestions are taken in to consideration. Will look into it at first chance.

As to the idea of disabling the TDRLevel, that didn't work for me. I only achieved total freeze of the interface until reboot. The only thing that worked (till some point) was the TDRDelay. But it had side effects on the behavior of the UI, locking you out for the time specified if the issue has been raised. So be careful with that. From my experience if the issue persists it's a problem with the code. Another sharp edge that needs to be polished at first chance.

Georgi
“The good thing about science is that it's true whether or not you believe in it.” ― Neil deGrasse Tyson
ID: 8033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8034 - Posted: 19 Sep 2023, 15:03:26 UTC - in response to Message 8032.  

Last modified: 19 Sep 2023, 15:06:38 UTC
I looked up what this TDR option does, and am I right in thinking if it was tripping the driver (like a circuit breaker), I'd be getting an error message? I don't get error messages with Asteroids. I do very occasionally get them with more demanding projects on old tired GPUs.

I'll not change it after reading Georgi's post.

I've got a diamond tipped polisher here for engraving and stuff, not sure if it would help with sharp programming.
ID: 8034 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8035 - Posted: 19 Sep 2023, 15:19:43 UTC
yes, this will work only if the card is "just slow", as a workaround for testing... If there's a different problem, it could freeze the whole system and it won't automatically recover without a full restart, that's what this protection is for. Either way, it should be fixed in the app.
ID: 8035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8036 - Posted: 19 Sep 2023, 15:25:03 UTC
So is there any way to get a Boinc task to start again from the beginning? It would be interesting to see if one which sticks will always stick.
ID: 8036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8037 - Posted: 19 Sep 2023, 15:29:33 UTC

Last modified: 19 Sep 2023, 15:32:47 UTC
@Lamberto Vitali is there something relevant in your logs? is it just some tasks or does it always fail?

without any (relevant) error, access to a failing system, or the source code, I don't think I can help...
ID: 8037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8038 - Posted: 19 Sep 2023, 16:00:49 UTC

Last modified: 19 Sep 2023, 16:01:46 UTC
This is the last one which failed (I aborted it when it ran too long not making progress):
https://asteroidsathome.net/boinc/result.php?resultid=401833384

Running 8 cards 24/7, which all take 20 minutes to complete successfully, I find 10 to abort a day (although I often am not here to notice until they were stuck for 5 hours).
ID: 8038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8039 - Posted: 19 Sep 2023, 16:29:50 UTC

Last modified: 19 Sep 2023, 16:30:12 UTC
ok, thanks, this is definitely a different problem...

there should be more recent drivers for this card - 2841.19 is even older than mine... Try this one:
https://www.amd.com/en/support/graphics/amd-radeon-r9-series/amd-radeon-r9-200-series/amd-radeon-r9-280x
but I'm not sure, they'll work since Windows 11 isn't officially supported :)
ID: 8039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8040 - Posted: 19 Sep 2023, 16:55:16 UTC - in response to Message 8039.  

Last modified: 19 Sep 2023, 16:59:24 UTC
ok, thanks, this is definitely a different problem...

there should be more recent drivers for this card - 2841.19 is even older than mine... Try this one:
https://www.amd.com/en/support/graphics/amd-radeon-r9-series/amd-radeon-r9-200-series/amd-radeon-r9-280x
but I'm not sure, they'll work since Windows 11 isn't officially supported :)
Their numbers don't make sense. You said I'm using 2841.19, yet your link goes to 22.6.1. Those are in different measurements.

Actually, that **is** the driver I'm using. You linked to the very page I got mine from.

When I look in Windows, it tells me it's 26.20.1208.2! That's three different systems to measure the driver version. I'm completely lost. Anyway, I have always installed (since it came out) the one you linked to.
ID: 8040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8041 - Posted: 19 Sep 2023, 17:41:45 UTC
yeah, it's misleading... 22.6.1 (year, month, revision) is the main driver version.

OpenCL device C version: OpenCL C 1.2 | OpenCL 1.2 AMD-APP (3240.7)
and one of many parts of the bundle with different versioning is an OpenCL driver - this is the second number.

I have the same legacy drivers (22.6.1), but my OpenCL drivers are for some reason more recent than yours. It's the same architecture... weird.
ID: 8041 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Georgi Vidinski
Volunteer moderator
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 22 Nov 17
Posts: 159
Credit: 13,180,518
RAC: 0
Message 8042 - Posted: 19 Sep 2023, 17:50:19 UTC - in response to Message 8031.  
Pushing this down as it was moved from News and went quite behind.

The IGP of my Athlon-A12 9800E wants to do another 2234 days, processing the AMD GPU WU...having done 0.019% in 10 hours
I'm aborting it

“The good thing about science is that it's true whether or not you believe in it.” ― Neil deGrasse Tyson
ID: 8042 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8043 - Posted: 19 Sep 2023, 18:00:47 UTC - in response to Message 8041.  
I have the same legacy drivers (22.6.1), but my OpenCL drivers are for some reason more recent than yours. It's the same architecture... weird.
Where do I obtain the newer OpenCL? Or did we both install the same thing and it put something different on yours? Do you have 280X aswell, or a different Tahiti? All the Tahitis I've ever used I used the same download, but I never checked if they ended up with different OpenCL versions.
ID: 8043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team
Volunteer developer
Volunteer tester

Send message
Joined: 1 Jan 13
Posts: 98
Credit: 10,437,435
RAC: 1,582
Message 8044 - Posted: 19 Sep 2023, 18:26:12 UTC - in response to Message 8043.  
the AMD bundle includes drivers for many cards, but the exact opencl/display driver version always depends on your specific card. Unfortunately, you can't upgrade it separately.
ID: 8044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8045 - Posted: 19 Sep 2023, 18:41:17 UTC - in response to Message 8044.  
the AMD bundle includes drivers for many cards, but the exact opencl/display driver version always depends on your specific card. Unfortunately, you can't upgrade it separately.
It would probably be a bad idea to do so. I've had problems in the past with two similar cards on one machine, and the driver doesn't get on with one of them. You'd think it could have two drivers installed....
ID: 8045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Georgi Vidinski
Volunteer moderator
Project administrator
Project developer
Project tester
Avatar

Send message
Joined: 22 Nov 17
Posts: 159
Credit: 13,180,518
RAC: 0
Message 8057 - Posted: 21 Sep 2023, 4:29:54 UTC - in response to Message 8045.  
I will suggest to use the most stable enterprise grade driver Radeon™ Pro Software for Enterprise 21.Q1.2 from AMD Radeon™ R9 280X Drivers & Support. It is listed right below the Adrenalin Edition driver.

It is highly recommended to use "Pro" drivers from AMD than the Adrenalin drivers, when they are available, especially when we talk about science.
“The good thing about science is that it's true whether or not you believe in it.” ― Neil deGrasse Tyson
ID: 8057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lamberto Vitali

Send message
Joined: 14 Jun 23
Posts: 85
Credit: 5,914
RAC: 0
Message 8058 - Posted: 21 Sep 2023, 10:57:23 UTC - in response to Message 8057.  
I think I tried those before and they weren't as good, because in that case they're 1 year more out of date.
ID: 8058 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote