Long running job.


Message boards : Number crunching : Long running job.

Message board moderation

To post messages, you must log in.
1 · 2 · Next
AuthorMessage
adrianxw

Send message
Joined: 5 Dec 12
Posts: 46
Credit: 10,094,615
RAC: 623
Message 6043 - Posted: 18 Nov 2018, 7:55:27 UTC
This machine normally runs an Asteroids work unit for a little over an hour. I have a job on here now which shows 100.000% progress and 00:00:00 remaining but it has so far "run" for 17:37:55. There is nothing on the transfers page. Suspending and releasing just stopped and started the job from where it was, it continues, and my CPU monitor shows all cores and threads running 100%. It would seem to me that it is stuck in a loop of some kind from which it has no way out.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 6043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bloodrain

Send message
Joined: 19 May 13
Posts: 14
Credit: 5,934,840
RAC: 0
Message 6045 - Posted: 19 Nov 2018, 8:11:07 UTC - in response to Message 6043.  
delete it. bad wu
ID: 6045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 5 Dec 12
Posts: 46
Credit: 10,094,615
RAC: 623
Message 6047 - Posted: 19 Nov 2018, 14:19:05 UTC
It is certainly possible that it is a bad work unit. If I delete it, it will be sent out again, and someone else might loose hours of compute time on it, this could happen several times. It has, so far, been sent out 5 times, one ended with a download error, so I discount that, but two more show "Timed out - no reponse" which is probably what would happen to it here, so it has wasted a load of crunching time. I'd prefer a project member look at this and remove the task. It's name is ps_181106_input_59651_5.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 6047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alessandro Freda

Send message
Joined: 13 Jan 13
Posts: 14
Credit: 149,268,934
RAC: 572
Message 6049 - Posted: 19 Nov 2018, 20:28:58 UTC - in response to Message 6047.  
Same problem on at least 2 my PCs (first running on AVX, 2dn on SSE2),
these are some of the "long" WUs:

ps_181113_input_107029_2_0
ps_181113_input_107029_1_0
ps_181113_input_107029_3_0
ps_181108_input_31762_6_1
ps_181108_input_31762_5_1
ps_181102_input_10465_1_2
ps_181020_input_169988_4_3

it seems to me that it was an old problem, seen some years ago
The question is, it's happening again, or they're a new kind of WU ?
I'm erasing them so far ...
ID: 6049 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 5 Dec 12
Posts: 46
Credit: 10,094,615
RAC: 623
Message 6050 - Posted: 20 Nov 2018, 6:30:08 UTC

Last modified: 20 Nov 2018, 6:32:18 UTC
I sent hum a private message yesterday to which I've had no reply yet. I've not deleted mine, I don't want them going out again wasting someone elses CPU time. These tasks are wasting weeks of crunching.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 6050 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alessandro Freda

Send message
Joined: 13 Jan 13
Posts: 14
Credit: 149,268,934
RAC: 572
Message 6051 - Posted: 20 Nov 2018, 8:58:06 UTC - in response to Message 6050.  
I sent hum a private message yesterday to which I've had no reply yet. I've not deleted mine, I don't want them going out again wasting someone elses CPU time. These tasks are wasting weeks of crunching.

Ok, so you suggest to suspend these WUs ? (to avoid to waste also my CPU time) Anyway deadline and reschedule will come.
ID: 6051 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 5 Dec 12
Posts: 46
Credit: 10,094,615
RAC: 623
Message 6053 - Posted: 20 Nov 2018, 12:57:54 UTC
How long does a work unit usually take on your machine, how long have these had? If they have run 50% longer than normal, yes, I'd suspend them for now. I am kind of hoping the admin will get onto this case and do something about it. As it is, they appear to be burning an awful lot of CPU time for nothing. Aborting them means they just get sent to someone else and the cycle repeats.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 6053 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
adrianxw

Send message
Joined: 5 Dec 12
Posts: 46
Credit: 10,094,615
RAC: 623
Message 6055 - Posted: 21 Nov 2018, 7:53:44 UTC
I've still had no input from the admin or crew here. The BOINC people have said there is little to be done apart from resetting the project, which I did, and that killed and sent back the damaged job, so it can go and waste someone elses time.

Dumping the project.
Wave upon wave of demented avengers march cheerfully out of obscurity into the dream.
ID: 6055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bloodrain

Send message
Joined: 19 May 13
Posts: 14
Credit: 5,934,840
RAC: 0
Message 6057 - Posted: 21 Nov 2018, 10:12:03 UTC - in response to Message 6055.  
for me a wu take 2 hours. am running 32 at a time
ID: 6057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mike Thomson

Send message
Joined: 3 Apr 13
Posts: 3
Credit: 45,976,851
RAC: 29
Message 6076 - Posted: 10 Dec 2018, 16:52:11 UTC
Been running wu's on two computers for a few years, but since the last Boinc update have run into the problem of wu's stalling at anywhere from 98 to 100%. If I shut down Boinc and restart, the offending wu's disappear and new ones start. Wu's normally take between 1 to 2 hours, but I've seen them at 17 hours before I caught them and restarted Boinc. Not sure whether the last update to Boinc has anything to do with the problem, but it has only happened since the update.
ID: 6076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Madbovine
Avatar

Send message
Joined: 21 Dec 15
Posts: 9
Credit: 1,095,360
RAC: 0
Message 6084 - Posted: 12 Dec 2018, 6:16:48 UTC
Have not had any WU's for 6 days now
not sure whats happening
ID: 6084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 78
Credit: 6,848,052
RAC: 2,514
Message 6088 - Posted: 12 Dec 2018, 18:21:14 UTC - in response to Message 6084.  
Have not had any WU's for 6 days now
not sure whats happening


Me neither. It's frustrating.

I have a certain amount of loyalty and persistence, but I'm considering bailing out of this one and trying another project. Not sure which ones are worthy of my computer's efforts, provide enough work, don't tie up my system's resources and are adequately supported by project managers.

Any suggestions?

Steve Gaber
Oldsmar, FL
ID: 6088 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richie

Send message
Joined: 25 Jul 14
Posts: 64
Credit: 100,582,080
RAC: 0
Message 6090 - Posted: 13 Dec 2018, 5:28:56 UTC - in response to Message 6088.  
Any suggestions?


Well, let's assume you're looking for a "spare project" and if you're into astrophysics then I think Einstein and their 'Gravitational Wave All-sky search on LIGO O1 Open Data' is scientifically quite interesting. Those tasks require about 150MB of RAM. Based on the specs of your computer I guess a task could finish in about 10 hours. There's also 'Gamma-ray pulsar search #5' which uses more RAM (a few hundred MBs) and tasks run a little bit faster. Both are CPU applications.
ID: 6090 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 78
Credit: 6,848,052
RAC: 2,514
Message 6091 - Posted: 13 Dec 2018, 5:59:44 UTC - in response to Message 6090.  
Any suggestions?


Well, let's assume you're looking for a "spare project" and if you're into astrophysics then I think Einstein and their 'Gravitational Wave All-sky search on LIGO O1 Open Data' is scientifically quite interesting. Those tasks require about 150MB of RAM. Based on the specs of your computer I guess a task could finish in about 10 hours. There's also 'Gamma-ray pulsar search #5' which uses more RAM (a few hundred MBs) and tasks run a little bit faster. Both are CPU applications.


Ritchie:

Thanks for the reply and for those suggestions. I'll give Asteroids a little while before I decide.

Cheers.

Steve Gaber
Oldsmar, FL
ID: 6091 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Madbovine
Avatar

Send message
Joined: 21 Dec 15
Posts: 9
Credit: 1,095,360
RAC: 0
Message 6095 - Posted: 15 Dec 2018, 2:37:24 UTC - in response to Message 6088.  
i have gone to SETI for work
ID: 6095 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 78
Credit: 6,848,052
RAC: 2,514
Message 6096 - Posted: 16 Dec 2018, 8:05:07 UTC - in response to Message 6095.  

Last modified: 16 Dec 2018, 8:07:32 UTC
i have gone to SETI for work


Still haven't gotten any work from Asteroids after nearly a week.

I started Rosetta today. Considered Einstein, but thought I'd go for a change of disciplines. We'll see how that works out.

Steve Gaber
Oldsmar, FL
ID: 6096 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steve Gaber

Send message
Joined: 7 Mar 14
Posts: 78
Credit: 6,848,052
RAC: 2,514
Message 6109 - Posted: 21 Dec 2018, 18:54:41 UTC - in response to Message 6096.  

Last modified: 21 Dec 2018, 18:55:39 UTC
i have gone to SETI for work


Still haven't gotten any work from Asteroids after nearly a week.

I started Rosetta today. Considered Einstein, but thought I'd go for a change of disciplines. We'll see how that works out.

Steve Gaber
Oldsmar, FL


Since I joined running Rosetta a week ago and running it for a few days, there are now no tasks available.

On a whim, I updated Asteroids. Lo and behold,I got three tasks.

So we'll see how my computer likes running SETI@Home, Rosetta and Asteroids concurrently.

Steve Gaber
Oldsmar, FL
ID: 6109 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bloodrain

Send message
Joined: 19 May 13
Posts: 14
Credit: 5,934,840
RAC: 0
Message 6110 - Posted: 22 Dec 2018, 10:00:25 UTC - in response to Message 6109.  
yeah sorry on eating some of those wu. my main pc loves to eat those things every 2 hours.
ID: 6110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alexander

Send message
Joined: 28 Apr 13
Posts: 87
Credit: 26,717,693
RAC: 130
Message 6111 - Posted: 24 Dec 2018, 11:56:37 UTC
I too have problems with asteroid wu's. They start with an estimated runtime of ~100 minutes, at a runtime of a day or so they remain running @100% . When restarting BOINC they start again at 0%.
For the moment i will delete all wu's, waing for new ones.

Alexander
ID: 6111 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jaroslav Cerny

Send message
Joined: 15 Apr 14
Posts: 1
Credit: 13,556,566
RAC: 0
Message 6112 - Posted: 24 Dec 2018, 13:58:14 UTC

Last modified: 24 Dec 2018, 13:59:35 UTC
The similar issue on my PC, jobs are finished seamlessly but actual runtime is more than 25 hours now (sometimes > 30) instead of 6-7. I am sorry but PrimeGrid CPU tasks (espec. GCW) are more acceptable for me.
ID: 6112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Long running job.