Long running job.
Message boards :
Number crunching :
Long running job.
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Dec 12 Posts: 46 Credit: 10,092,769 RAC: 703 |
This machine normally runs an Asteroids work unit for a little over an hour. I have a job on here now which shows 100.000% progress and 00:00:00 remaining but it has so far "run" for 17:37:55. There is nothing on the transfers page. Suspending and releasing just stopped and started the job from where it was, it continues, and my CPU monitor shows all cores and threads running 100%. It would seem to me that it is stuck in a loop of some kind from which it has no way out. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 19 May 13 Posts: 14 Credit: 5,934,840 RAC: 0 |
|
Send message Joined: 5 Dec 12 Posts: 46 Credit: 10,092,769 RAC: 703 |
It is certainly possible that it is a bad work unit. If I delete it, it will be sent out again, and someone else might loose hours of compute time on it, this could happen several times. It has, so far, been sent out 5 times, one ended with a download error, so I discount that, but two more show "Timed out - no reponse" which is probably what would happen to it here, so it has wasted a load of crunching time. I'd prefer a project member look at this and remove the task. It's name is ps_181106_input_59651_5. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 13 Jan 13 Posts: 14 Credit: 149,266,448 RAC: 527 |
Same problem on at least 2 my PCs (first running on AVX, 2dn on SSE2), these are some of the "long" WUs: ps_181113_input_107029_2_0 ps_181113_input_107029_1_0 ps_181113_input_107029_3_0 ps_181108_input_31762_6_1 ps_181108_input_31762_5_1 ps_181102_input_10465_1_2 ps_181020_input_169988_4_3 it seems to me that it was an old problem, seen some years ago The question is, it's happening again, or they're a new kind of WU ? I'm erasing them so far ... |
Send message Joined: 5 Dec 12 Posts: 46 Credit: 10,092,769 RAC: 703 |
Last modified: 20 Nov 2018, 6:32:18 UTC I sent hum a private message yesterday to which I've had no reply yet. I've not deleted mine, I don't want them going out again wasting someone elses CPU time. These tasks are wasting weeks of crunching. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 13 Jan 13 Posts: 14 Credit: 149,266,448 RAC: 527 |
I sent hum a private message yesterday to which I've had no reply yet. I've not deleted mine, I don't want them going out again wasting someone elses CPU time. These tasks are wasting weeks of crunching. Ok, so you suggest to suspend these WUs ? (to avoid to waste also my CPU time) Anyway deadline and reschedule will come. |
Send message Joined: 5 Dec 12 Posts: 46 Credit: 10,092,769 RAC: 703 |
How long does a work unit usually take on your machine, how long have these had? If they have run 50% longer than normal, yes, I'd suspend them for now. I am kind of hoping the admin will get onto this case and do something about it. As it is, they appear to be burning an awful lot of CPU time for nothing. Aborting them means they just get sent to someone else and the cycle repeats. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 5 Dec 12 Posts: 46 Credit: 10,092,769 RAC: 703 |
I've still had no input from the admin or crew here. The BOINC people have said there is little to be done apart from resetting the project, which I did, and that killed and sent back the damaged job, so it can go and waste someone elses time. Dumping the project. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Send message Joined: 19 May 13 Posts: 14 Credit: 5,934,840 RAC: 0 |
|
Send message Joined: 3 Apr 13 Posts: 3 Credit: 45,976,851 RAC: 43 |
Been running wu's on two computers for a few years, but since the last Boinc update have run into the problem of wu's stalling at anywhere from 98 to 100%. If I shut down Boinc and restart, the offending wu's disappear and new ones start. Wu's normally take between 1 to 2 hours, but I've seen them at 17 hours before I caught them and restarted Boinc. Not sure whether the last update to Boinc has anything to do with the problem, but it has only happened since the update.
|
Send message Joined: 21 Dec 15 Posts: 9 Credit: 1,095,360 RAC: 0 |
|
Send message Joined: 7 Mar 14 Posts: 78 Credit: 6,840,398 RAC: 2,811 |
Have not had any WU's for 6 days now Me neither. It's frustrating. I have a certain amount of loyalty and persistence, but I'm considering bailing out of this one and trying another project. Not sure which ones are worthy of my computer's efforts, provide enough work, don't tie up my system's resources and are adequately supported by project managers. Any suggestions? Steve Gaber Oldsmar, FL |
Send message Joined: 25 Jul 14 Posts: 64 Credit: 100,582,080 RAC: 0 |
Any suggestions? Well, let's assume you're looking for a "spare project" and if you're into astrophysics then I think Einstein and their 'Gravitational Wave All-sky search on LIGO O1 Open Data' is scientifically quite interesting. Those tasks require about 150MB of RAM. Based on the specs of your computer I guess a task could finish in about 10 hours. There's also 'Gamma-ray pulsar search #5' which uses more RAM (a few hundred MBs) and tasks run a little bit faster. Both are CPU applications. |
Send message Joined: 7 Mar 14 Posts: 78 Credit: 6,840,398 RAC: 2,811 |
Any suggestions? Ritchie: Thanks for the reply and for those suggestions. I'll give Asteroids a little while before I decide. Cheers. Steve Gaber Oldsmar, FL |
Send message Joined: 21 Dec 15 Posts: 9 Credit: 1,095,360 RAC: 0 |
|
Send message Joined: 7 Mar 14 Posts: 78 Credit: 6,840,398 RAC: 2,811 |
Last modified: 16 Dec 2018, 8:07:32 UTC |
Send message Joined: 7 Mar 14 Posts: 78 Credit: 6,840,398 RAC: 2,811 |
Last modified: 21 Dec 2018, 18:55:39 UTC i have gone to SETI for work Since I joined running Rosetta a week ago and running it for a few days, there are now no tasks available. On a whim, I updated Asteroids. Lo and behold,I got three tasks. So we'll see how my computer likes running SETI@Home, Rosetta and Asteroids concurrently. Steve Gaber Oldsmar, FL |
Send message Joined: 19 May 13 Posts: 14 Credit: 5,934,840 RAC: 0 |
|
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,716,176 RAC: 0 |
I too have problems with asteroid wu's. They start with an estimated runtime of ~100 minutes, at a runtime of a day or so they remain running @100% . When restarting BOINC they start again at 0%. For the moment i will delete all wu's, waing for new ones. Alexander |
Send message Joined: 15 Apr 14 Posts: 1 Credit: 13,556,566 RAC: 0 |
Last modified: 24 Dec 2018, 13:59:35 UTC |
Message boards :
Number crunching :
Long running job.