Many failed GPU workunits in a row


Message boards : Number crunching : Many failed GPU workunits in a row

Message board moderation

To post messages, you must log in.
AuthorMessage
robertmiles

Send message
Joined: 17 Aug 14
Posts: 49
Credit: 5,225,280
RAC: 0
Message 4850 - Posted: 16 Apr 2016, 19:21:07 UTC
A workunit that appears to be missing one of the files required to run on a GTX 560:

http://asteroidsathome.net/boinc/result.php?resultid=112153989

Or possibly, the CC detection didn't work properly.

Part of a large cluster of GPU workunits from various BOINC projects that failed about the same time, so at least one of them did not leave BOINC and the driver in a state where another GPU workunit would start properly.

Period Search Application v101.12 (cuda55)
ID: 4850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
robertmiles

Send message
Joined: 17 Aug 14
Posts: 49
Credit: 5,225,280
RAC: 0
Message 4851 - Posted: 17 Apr 2016, 1:40:43 UTC - in response to Message 4850.  
The problem appears to be in the 364.* series of Nvidia drivers, and most often triggered by a POEM@home task. Once the problem is triggered, a few dozen OpenCL tasks (not necessarily all from the same BOINC project) will give a quick Compute Error, or the whole computer can lock up.

Threads on the problem:

https://www.primegrid.com/forum_thread.php?id=6769#94223

http://boinc.fzk.de/poem/forum_thread.php?id=1205#10896

The 362.00 driver does not appear to cause this problem.
ID: 4851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ralf02061973
Avatar

Send message
Joined: 12 Mar 15
Posts: 1
Credit: 2,370,720
RAC: 0
Message 4854 - Posted: 24 Apr 2016, 14:56:04 UTC

Last modified: 24 Apr 2016, 15:39:47 UTC
i also had this Problem.

first i thought it is a hardwareproblem on my side, but after i read this thread here i downgraded the Driver to 362.00.

it seems it is not only a Problem with POEM@home, because it is not installed here.

until now all gpu-tasks running normal without Errors.

thx for the info about that Driver Problem ;) .
Boinc runs here on:
Intel i7-3770K + IntelHD4000
Android-Stick-ARM-Cotex-A17
Sony-Z5C-ARM-Cortex-A53/A57
Nvidia GT-630 / Nvidia GTX-750Ti
ID: 4854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
robertmiles

Send message
Joined: 17 Aug 14
Posts: 49
Credit: 5,225,280
RAC: 0
Message 4856 - Posted: 3 May 2016, 1:34:06 UTC - in response to Message 4851.  

Last modified: 3 May 2016, 1:35:32 UTC
The problem appears to be in the 364.* series of Nvidia drivers, and most often triggered by a POEM@home task. Once the problem is triggered, a few dozen OpenCL tasks (not necessarily all from the same BOINC project) will give a quick Compute Error, or the whole computer can lock up.

Threads on the problem:

https://www.primegrid.com/forum_thread.php?id=6769#94223

http://boinc.fzk.de/poem/forum_thread.php?id=1205#10896

The 362.00 driver does not appear to cause this problem.


The 365.10 driver is now available, but does not fix this problem.

The problem has also been seen on PrimeGrid.
ID: 4856 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Viking69

Send message
Joined: 18 Jan 13
Posts: 5
Credit: 7,411,584
RAC: 3,757
Message 4860 - Posted: 8 May 2016, 21:33:33 UTC

Last modified: 8 May 2016, 21:43:25 UTC
I am guessing that My PC is also suffering from this malady.

http://asteroidsathome.net/boinc/result.php?resultid=114760634

What is going wrong? I am up to date with my BOINC and Nvidia software (well I thought I was, but there is an update to Nvidia, but from what I have read here, it is not a fix.). But from reading these posts, I need to back-grade my driver. That causes other programs to freak as they know what the latest driver for my boards need to be and as such will then not start. SETI@home_BETA_Test was having intermittent errors, but not as consistent as Asteroids.
ID: 4860 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Many failed GPU workunits in a row