Computation errors


Message boards : Problems and bug reports : Computation errors

Message board moderation

To post messages, you must log in.
AuthorMessage
James Lee*

Send message
Joined: 28 Sep 13
Posts: 29
Credit: 117,718,096
RAC: 19,378
Message 4981 - Posted: 21 Sep 2016, 14:44:28 UTC
I have 10 machines running Asteroids, and ALL of them just started computation errors. I had to open my machines to other projects until this is fixed.
ID: 4981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Melvin Bobo Slacke

Send message
Joined: 17 Nov 13
Posts: 3
Credit: 8,178,235
RAC: 671
Message 4982 - Posted: 21 Sep 2016, 15:43:46 UTC
Heh, something odd happened a few hours ago, my i5 on Linux started completing
WUs in 1.5 minutes instead of normal 1.2 hours, guess thats related, they get validated so I guess they are ok.

These are with app "Period Search Application v102.10 (avx)" and wus ps_160915_input..

Very weird.. ;-P
ID: 4982 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
-Alex-

Send message
Joined: 30 Aug 16
Posts: 2
Credit: 31,691,520
RAC: 0
Message 4983 - Posted: 21 Sep 2016, 16:45:05 UTC
Yesterday I added six computers to my account and today I got very pleasantly surprised with my statistics. I then got pleasantly shocked and later sadly so to see the rate at which my stats kept increasing...

There are two problems (maybe two different symptoms of the same issue):

A. Several computational errors, on one computer that I watched the tasks barely get to start before they get marked as failures. However not all tasks fail but I've got more than 200 tasks labeled with Error in just a day.

B. Extremely fast completion times. I have several computers running trough tasks in 150-300s but it should take more like 4000-8000s. I've seen it happen to the AVX tasks. They do get validated though...!

I'm guessing the results are rubbish? What will happen to stats, will someone eventually clean out these short run time tasks?
ID: 4983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 28 Sep 13
Posts: 29
Credit: 117,718,096
RAC: 19,378
Message 4986 - Posted: 22 Sep 2016, 0:49:26 UTC
Melvyn and Alex, Same issues. Hope the moderators check this and post a response. In the mean time, I had to change to different projects. I hate doing that, Asteroids has been what started my BOINC "career", and I want to get back to it.
ID: 4986 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael

Send message
Joined: 29 Aug 16
Posts: 3
Credit: 18,166,835
RAC: 1,634
Message 4987 - Posted: 23 Sep 2016, 1:12:39 UTC
Yep, also a bunch of failures here. Some with SSE2, others with SSE3. But these are only a handful, most runs complete successfully.
As for the short run times, I do get those occasionally. I also had one with CUDA that completed in 172 seconds! http://asteroidsathome.net/boinc/workunit.php?wuid=52894026
It was validated with AVX by someone else who did it in ~300 secs. Maybe it's just a short WU, and not a bug.
ID: 4987 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 28 Sep 13
Posts: 29
Credit: 117,718,096
RAC: 19,378
Message 4990 - Posted: 24 Sep 2016, 9:30:34 UTC
The errors have stopped. I do not have ANY buffer set up, so those that have buffers may find errors for WUs received from 9/21 thru 9/23.
ID: 4990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sublimemm

Send message
Joined: 19 Sep 16
Posts: 3
Credit: 81,600
RAC: 0
Message 4991 - Posted: 24 Sep 2016, 23:23:36 UTC - in response to Message 4990.  

Last modified: 24 Sep 2016, 23:32:22 UTC
I'm still getting computation errors when using a Pascal gpu (gtx 1080). freshly downloaded tasks, btw.
ID: 4991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael

Send message
Joined: 29 Aug 16
Posts: 3
Credit: 18,166,835
RAC: 1,634
Message 4994 - Posted: 26 Sep 2016, 0:20:25 UTC - in response to Message 4991.  
Have a look, are those work units originally started in that time period of 21-23? It maybe new to you, but not new overall.
ID: 4994 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sublimemm

Send message
Joined: 19 Sep 16
Posts: 3
Credit: 81,600
RAC: 0
Message 4995 - Posted: 26 Sep 2016, 2:33:40 UTC - in response to Message 4994.  

Last modified: 26 Sep 2016, 2:40:48 UTC
I guess they're old. Here's one that failed

http://asteroidsathome.net/boinc/workunit.php?wuid=52226661

I'll just let them error out and see if I start getting some good ones.

The problem with that though, is that I tried it before and one task, instead of erroring out immediately (no big deal) actually consumed my GPU for 100% for over 14 hours... that is some SERIOUS electricity wasted.
ID: 4995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
sublimemm

Send message
Joined: 19 Sep 16
Posts: 3
Credit: 81,600
RAC: 0
Message 4996 - Posted: 26 Sep 2016, 2:44:29 UTC - in response to Message 4995.  
i just errored out over 500 tasks... i dont think i'm ever going to get a good one.
ID: 4996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
James Lee*

Send message
Joined: 28 Sep 13
Posts: 29
Credit: 117,718,096
RAC: 19,378
Message 5000 - Posted: 28 Sep 2016, 3:16:23 UTC
Errors just came back on all machines.
ID: 5000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jade Kintsugi

Send message
Joined: 10 Apr 15
Posts: 1
Credit: 17,308,320
RAC: 0
Message 5001 - Posted: 28 Sep 2016, 6:30:58 UTC
Happening to me as well. Opened my machine up to tasks and all I get are computation errors.

This has clearly been going on for a while, has no one modding the project noticed?
ID: 5001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChinookFoehn

Send message
Joined: 1 Oct 14
Posts: 1
Credit: 8,228,849
RAC: 0
Message 5002 - Posted: 28 Sep 2016, 6:52:26 UTC

Last modified: 28 Sep 2016, 7:09:32 UTC
I am getting 24 errors at a time, likely, falsely reported as computational errors. The error occurs the moment the work unit is completely downloaded, not when actual computations are started, as this laptop can only process 4 work units at a time and is, at the moment, processing the last work 2 error-free work units, and 2 World Community Grid work-units. (I suppose, it is possible, that BOINC may stop processing the WCG units, given the high priority I have given Asteroids@Home, and the errors, immediately, occur then.)
ID: 5002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richie

Send message
Joined: 25 Jul 14
Posts: 64
Credit: 100,582,080
RAC: 0
Message 5003 - Posted: 29 Sep 2016, 20:20:31 UTC
I attached one fresh "Microsoft Windows running on an AMD x86_64 or Intel EM64T CPU" host to see how CPU tasks are doing at the moment.

All tasks with SSE2 and SSE3 application finish with immediate computation error.
Tasks with AVX application complete unnaturally fast, but without error. I can't see yet if they will validate successfully.
ID: 5003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richie

Send message
Joined: 25 Jul 14
Posts: 64
Credit: 100,582,080
RAC: 0
Message 5004 - Posted: 29 Sep 2016, 21:53:32 UTC - in response to Message 5003.  
I can't see yet if they will validate successfully.


Yes, they seem to validate successfully. A task which now takes a couple of minutes to complete is treated similarly with old tasks that used to take hours to complete.

So, this current situation is basically damaging the long time "badge system" by heavily twisting the requirements of how much work is needed.
ID: 5004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 5008 - Posted: 3 Oct 2016, 8:14:20 UTC
Hi, I am going to check this. I am very sorry, I still had some unexpected issues so I couldn't check the project.
ID: 5008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
User

Send message
Joined: 31 Aug 16
Posts: 1
Credit: 237,120
RAC: 0
Message 5014 - Posted: 6 Oct 2016, 3:15:52 UTC

Last modified: 6 Oct 2016, 3:17:13 UTC
I thought replacing my old graphics card with a new one caused it, but it seems that this has been happening to me since I started back on Asteroids@home a few days ago. Before the new card had arrived.

Went from an nVidia GTX 460 to an nVidia GTX 1060.

GPU GTX 1060 wrote:
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
The system cannot find the file specified.
(0x2) - exit code 2 (0x2)
</message>
<stderr_txt>
CUDA RC12!!!!!!!!!!
CUDA Device number: 0
CUDA Device: GeForce GTX 1060 6GB
Compute capability: 6.1
Multiprocessors: 10
Unsupported CC detected (CC2.0 and better supported only).

</stderr_txt>
]]>


GPU GTX 460 wrote:
<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
The system cannot find the file specified.
(0x2) - exit code 2 (0x2)
</message>
<stderr_txt>
CUDA RC12!!!!!!!!!!
CUDA Device number: 0
CUDA Device: GeForce GTX 460
Compute capability: 2.1
Multiprocessors: 7
Grid dim: 56 = 7*8
Block dim: 128
Unsupported CC detected (CC2.0 and better supported only).

</stderr_txt>
]]>


I've also had a few CPU access violations or unknown errors, but I'm primarily concerned about GPU for now since it's rejecting them all flat-out. As far as I can tell, I can still do CPU fine.

Historically, the GTX 460 worked fine. Or rather, stderr said roughly the same thing but the exit status was success. The only difference was that instead of "Unsupported CC detected (CC2.0 and better supported only)" it said "05:15:13 (8272): called boinc_finish" or a similar string.

Please let me know if there's anything I can do to help.
ID: 5014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 3 Aug 16
Posts: 19
Credit: 51,517,421
RAC: 8,531
Message 5015 - Posted: 9 Oct 2016, 20:48:25 UTC
I don't believe Asteroids supports Pascal GPUs yet.
ID: 5015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ncubes

Send message
Joined: 16 Jan 23
Posts: 2
Credit: 1,758,364
RAC: 5,928
Message 7808 - Posted: 23 Apr 2023, 14:16:38 UTC
I appear to be causing computation errors when I run and debug a c program. Asteroids seems to run fine when left alone. I recently looked at all sixteen asteroids tasks running on my computer and all of them were doing fine. After running my c program several of the tasks had computation errors. Am I causing those errors? If so, what can I do to fix the problem? Thank you.
ID: 7808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 16 Nov 22
Posts: 99
Credit: 55,707,147
RAC: 390,462
Message 7809 - Posted: 24 Apr 2023, 1:56:54 UTC - in response to Message 7808.  
Pretty obvious. Don't run your C program debug at the same time as crunching Asteroids.

A proud member of the OFA (Old Farts Association)
ID: 7809 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Problems and bug reports : Computation errors