UL/DL-Server are down


Message boards : Number crunching : UL/DL-Server are down

Message board moderation

To post messages, you must log in.
1 · 2 · Next
AuthorMessage
Profile Saenger
Avatar

Send message
Joined: 18 Jun 12
Posts: 23
Credit: 5,455,486
RAC: 0
Message 2 - Posted: 18 Jun 2012, 15:23:53 UTC
I just attached here, and BOINC tried to get work, but only got this as an answer:
Mo 18 Jun 2012 17:20:19 CEST | Asteroids@home | [sched_op] Starting scheduler request
Mo 18 Jun 2012 17:20:19 CEST | Asteroids@home | [sched_op] Fetching master file
Mo 18 Jun 2012 17:20:20 CEST | Asteroids@home | [sched_op] Got master file; parsing
Mo 18 Jun 2012 17:20:20 CEST | Asteroids@home | [sched_op] Found 1 scheduler URLs in master file
Mo 18 Jun 2012 17:20:20 CEST | Asteroids@home | Master file download succeeded
Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | [sched_op] Starting scheduler request
Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | Sending scheduler request: Project initialization.
Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | Requesting new tasks for CPU and NVIDIA GPU
Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | [sched_op] CPU work request: 1.00 seconds; 0.00 CPUs
Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | [sched_op] NVIDIA GPU work request: 1.00 seconds; 0.00 GPUs
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | Scheduler request completed: got 1 new tasks
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] Server version 701
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | Project requested delay of 7 seconds
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [task] result state=NEW for ps_170612_22890_1 from handle_scheduler_reply
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] estimated total CPU task duration: 182331 seconds
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] Deferring communication for 7 sec
Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] Reason: requested by project
Mo 18 Jun 2012 17:20:28 CEST | Asteroids@home | [task] result state=FILES_DOWNLOADING for ps_170612_22890_1 from CS::update_results
Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | Started download of period_search_10000_i686-pc-linux-gnu
Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | [file_xfer] URL: http://asteroidsathome.net/boinc/download/period_search_10000_i686-pc-linux-gnu
Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | Started download of period_search_in_22890
Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | [file_xfer] URL: http://asteroidsathome.net/boinc/download/145/period_search_in_22890
Mo 18 Jun 2012 17:20:30 CEST | Asteroids@home | [file_xfer] http op done; retval -224 (file not found)
Mo 18 Jun 2012 17:20:30 CEST | Asteroids@home | [file_xfer] file transfer status -224 (file not found)
Mo 18 Jun 2012 17:20:30 CEST | Asteroids@home | Giving up on download of period_search_in_22890: file not found
Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | [file_xfer] http op done; retval 0 (Success)
Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | [file_xfer] file transfer status 0 (Success)
Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | Finished download of period_search_10000_i686-pc-linux-gnu
Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | [file_xfer] Throughput 300017 bytes/sec
Mo 18 Jun 2012 17:20:40 CEST | Asteroids@home | [sched_op] Deferring communication for 1 min 4 sec
Mo 18 Jun 2012 17:20:40 CEST | Asteroids@home | [sched_op] Reason: Unrecoverable error for task ps_170612_22890_1 (WU download error: couldn't get input files:<file_xfer_error>  <file_name>period_search_in_22890</file_name>  <error_code>-224</error_code>  <error_message>file not found</error_message></file_xfer_error>)


The reason is probably, that the UL/DL server is down
Grüße vom Sänger
ID: 2 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rilian
Avatar

Send message
Joined: 18 Jun 12
Posts: 8
Credit: 83,844
RAC: 0
Message 3 - Posted: 18 Jun 2012, 17:25:50 UTC
<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>period_search_in_2263</file_name>
<error_code>-224</error_code>
<error_message>file not found</error_message>
</file_xfer_error>

</message>
]]>
ID: 3 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 5 - Posted: 18 Jun 2012, 18:03:26 UTC
Mon 18 Jun 2012 19:01:32 BST|Asteroids@home|Master file download succeeded
Mon 18 Jun 2012 19:01:37 BST|Asteroids@home|Sending scheduler request: Project initialization. Requesting 1 seconds of work, reporting 0 completed tasks
Mon 18 Jun 2012 19:01:42 BST|Asteroids@home|Scheduler request succeeded: got 1 new tasks
Mon 18 Jun 2012 19:01:44 BST|Asteroids@home|Started download of period_search_10000_i686-pc-linux-gnu
Mon 18 Jun 2012 19:01:44 BST|Asteroids@home|Started download of period_search_in_35517
Mon 18 Jun 2012 19:01:45 BST|Asteroids@home|Giving up on download of period_search_in_35517: file not found
Mon 18 Jun 2012 19:01:49 BST|Asteroids@home|Finished download of period_search_10000_i686-pc-linux-gnu

:(

Cheers,

Al.
ID: 5 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 6 - Posted: 18 Jun 2012, 18:14:48 UTC
Hello everyone, I am sorry for the troubles, we are debugging the server this week so there may be some errors and I thank you for any report of them.
The reason of this download problem is about a little mistake in work generator script. Download/upload server is down on purpose.
ID: 6 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sirius B
Avatar

Send message
Joined: 20 Jun 12
Posts: 3
Credit: 1,002,906
RAC: 0
Message 15 - Posted: 20 Jun 2012, 10:24:12 UTC
Thanks for the info. looking forward to crunching.
ID: 15 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 17 - Posted: 20 Jun 2012, 18:22:21 UTC
I've managed to download a WU on each of my linux hosts now. Let's just hope they crunch OK :)

Pity there's no Windows application yet, but I've posted that in the Wish List forum :)

Cheers,

Al.
ID: 17 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 19 - Posted: 20 Jun 2012, 18:41:52 UTC
Downloading of the WUs should work fine now. I suppose that tommorow we could release much more WUs for crunching.
ID: 19 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 18 Jun 12
Posts: 23
Credit: 5,455,486
RAC: 0
Message 24 - Posted: 20 Jun 2012, 20:51:29 UTC - in response to Message 19.  
Downloading of the WUs should work fine now. I suppose that tommorow we could release much more WUs for crunching.

Got some as well, I'm only waiting for my wingman to crunch 'em, some strange guy called Kyong ;)

Server status still says the UL/DL is deactivated and work generator not started.
Grüße vom Sänger
ID: 24 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 27 - Posted: 21 Jun 2012, 10:23:58 UTC
I downloaded first group of WUs for testing to two computers without running 24/7. :-D I have these two computers by the bed so they aren't running at night.
UL/DL and work generator aren't needed, I will correct it. There is a script for adding new work so the default work generator and UL/DL server isn't needed.
ID: 27 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 18 Jun 12
Posts: 23
Credit: 5,455,486
RAC: 0
Message 31 - Posted: 21 Jun 2012, 20:45:40 UTC
What happened to those two: 519 and 505, both crunched by us two and both declared as WU cancelled as error, so probably neither your nor my machines fault?
Grüße vom Sänger
ID: 31 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 33 - Posted: 22 Jun 2012, 10:47:16 UTC - in response to Message 31.  
My WU http://asteroidsathome.net/boinc/workunit.php?wuid=534 suffered a similar fate when my wingnut's replication was cancelled. It was Bok this time, so looks like that last batch just got wholesale cancelled.

Cheers,

Al.
ID: 33 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 34 - Posted: 22 Jun 2012, 10:52:11 UTC
There was a problem due to minimum quorum setting, so units had to be canceled. But everything should be now corrected, so there should be no more canceling and much more units will be released soon.
ID: 34 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 35 - Posted: 22 Jun 2012, 11:09:59 UTC - in response to Message 34.  
Oh, the MQ was 2 on mine, and max error etc was 20. Must have got caught in the crossfire :)

Cheers,

Al.
ID: 35 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 18 Jun 12
Posts: 23
Credit: 5,455,486
RAC: 0
Message 39 - Posted: 22 Jun 2012, 13:45:17 UTC
Ah, the pleasures of crunching Alpha projects :D

Go on with your work, stay as informative as you are, and everything will straighten itself out.
Grüße vom Sänger
ID: 39 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 40 - Posted: 22 Jun 2012, 15:20:03 UTC

Last modified: 22 Jun 2012, 15:21:53 UTC
I hope this was short alpha-beta test. All reported bugs are fixed. There is now about 7600 units for crunchig, so let's start crunch and let's stay informing about new discovered bugs. :-D
ID: 40 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 43 - Posted: 22 Jun 2012, 17:33:21 UTC - in response to Message 40.  
Are these 'real' rather than test WUs now? They seem to likely to take much longer than before.

Which is OK. But ...

This WU http://asteroidsathome.net/boinc/workunit.php?wuid=4446 has a failed download, leaving me running the only live replication, as the replacement hasn't been sent for some reason. I don't mind burning up cycles on short WUs that ultimately fail to validate, but long ones, I'd rather not :)

Cheers,

Al.
ID: 43 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 45 - Posted: 22 Jun 2012, 17:50:49 UTC
These units should take about 20 hours of computing. Test units are about 1 hour. And the WU is waiting to be sent now.
ID: 45 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 46 - Posted: 22 Jun 2012, 18:06:39 UTC - in response to Message 45.  
Ok :)

My linux hosts are really old and slow, so those times will be 3 times long for me, but that's OK. I'll let them run :)

Cheers,

Al.
ID: 46 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 47 - Posted: 22 Jun 2012, 21:05:15 UTC - in response to Message 46.  
I hope that everything is okay now. And maximum delay is about 4800 hours of computing so you should take it easy. :-D
ID: 47 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ChertseyAl

Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 57 - Posted: 24 Jun 2012, 17:40:55 UTC - in response to Message 45.  
And the WU is waiting to be sent now.


Will it ever be sent though? I've completed my replication, but the replacement for the failed download remains unsent http://asteroidsathome.net/boinc/workunit.php?wuid=4446

Similarly this WU http://asteroidsathome.net/boinc/workunit.php?wuid=5579 that I'm running had it's partner WU die with a computation error and the replacement WU has not been sent out. I've suspended it for now as 20 hours would be a lot to lose on one WU.

I ask because I've lost so much computing time at other projects because of similar problems (WUs being cancelled before the 3rd replication went out etc). I've only got 2 linux hosts and I don't really want to tie them up with unproductive work. Once the Windows app is available I won't be so worrried as I've got a dozen of those waiting to go :)

Cheers,

Al.

p.s. Yes, I know this is really still an alpha project so I shouldn't expect anything to work :)

p.p.s. Based on the one WU that I've had validated, the credit is way too low compared to other projects. Not much worried about it, but maybe something is wrong with the validator.

ID: 57 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : UL/DL-Server are down