UL/DL-Server are down
log in

Advanced search

Message boards : Number crunching : UL/DL-Server are down

1 · 2 · Next
Author Message
Profile Saenger
Avatar
Send message
Joined: 18 Jun 12
Posts: 23
Credit: 4,090,595
RAC: 3,270
Message 2 - Posted: 18 Jun 2012, 15:23:53 UTC

I just attached here, and BOINC tried to get work, but only got this as an answer:

Mo 18 Jun 2012 17:20:19 CEST | Asteroids@home | [sched_op] Starting scheduler request Mo 18 Jun 2012 17:20:19 CEST | Asteroids@home | [sched_op] Fetching master file Mo 18 Jun 2012 17:20:20 CEST | Asteroids@home | [sched_op] Got master file; parsing Mo 18 Jun 2012 17:20:20 CEST | Asteroids@home | [sched_op] Found 1 scheduler URLs in master file Mo 18 Jun 2012 17:20:20 CEST | Asteroids@home | Master file download succeeded Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | [sched_op] Starting scheduler request Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | Sending scheduler request: Project initialization. Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | Requesting new tasks for CPU and NVIDIA GPU Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | [sched_op] CPU work request: 1.00 seconds; 0.00 CPUs Mo 18 Jun 2012 17:20:25 CEST | Asteroids@home | [sched_op] NVIDIA GPU work request: 1.00 seconds; 0.00 GPUs Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | Scheduler request completed: got 1 new tasks Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] Server version 701 Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | Project requested delay of 7 seconds Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [task] result state=NEW for ps_170612_22890_1 from handle_scheduler_reply Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] estimated total CPU task duration: 182331 seconds Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] estimated total NVIDIA GPU task duration: 0 seconds Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] Deferring communication for 7 sec Mo 18 Jun 2012 17:20:27 CEST | Asteroids@home | [sched_op] Reason: requested by project Mo 18 Jun 2012 17:20:28 CEST | Asteroids@home | [task] result state=FILES_DOWNLOADING for ps_170612_22890_1 from CS::update_results Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | Started download of period_search_10000_i686-pc-linux-gnu Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | [file_xfer] URL: http://asteroidsathome.net/boinc/download/period_search_10000_i686-pc-linux-gnu Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | Started download of period_search_in_22890 Mo 18 Jun 2012 17:20:29 CEST | Asteroids@home | [file_xfer] URL: http://asteroidsathome.net/boinc/download/145/period_search_in_22890 Mo 18 Jun 2012 17:20:30 CEST | Asteroids@home | [file_xfer] http op done; retval -224 (file not found) Mo 18 Jun 2012 17:20:30 CEST | Asteroids@home | [file_xfer] file transfer status -224 (file not found) Mo 18 Jun 2012 17:20:30 CEST | Asteroids@home | Giving up on download of period_search_in_22890: file not found Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | [file_xfer] http op done; retval 0 (Success) Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | [file_xfer] file transfer status 0 (Success) Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | Finished download of period_search_10000_i686-pc-linux-gnu Mo 18 Jun 2012 17:20:33 CEST | Asteroids@home | [file_xfer] Throughput 300017 bytes/sec Mo 18 Jun 2012 17:20:40 CEST | Asteroids@home | [sched_op] Deferring communication for 1 min 4 sec Mo 18 Jun 2012 17:20:40 CEST | Asteroids@home | [sched_op] Reason: Unrecoverable error for task ps_170612_22890_1 (WU download error: couldn't get input files:<file_xfer_error> <file_name>period_search_in_22890</file_name> <error_code>-224</error_code> <error_message>file not found</error_message></file_xfer_error>)


The reason is probably, that the UL/DL server is down
____________
Grüße vom Sänger

Profile rilian
Avatar
Send message
Joined: 18 Jun 12
Posts: 8
Credit: 83,844
RAC: 0
Message 3 - Posted: 18 Jun 2012, 17:25:50 UTC

<core_client_version>6.12.33</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>period_search_in_2263</file_name>
<error_code>-224</error_code>
<error_message>file not found</error_message>
</file_xfer_error>

</message>
]]>
____________

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 5 - Posted: 18 Jun 2012, 18:03:26 UTC

Mon 18 Jun 2012 19:01:32 BST|Asteroids@home|Master file download succeeded
Mon 18 Jun 2012 19:01:37 BST|Asteroids@home|Sending scheduler request: Project initialization. Requesting 1 seconds of work, reporting 0 completed tasks
Mon 18 Jun 2012 19:01:42 BST|Asteroids@home|Scheduler request succeeded: got 1 new tasks
Mon 18 Jun 2012 19:01:44 BST|Asteroids@home|Started download of period_search_10000_i686-pc-linux-gnu
Mon 18 Jun 2012 19:01:44 BST|Asteroids@home|Started download of period_search_in_35517
Mon 18 Jun 2012 19:01:45 BST|Asteroids@home|Giving up on download of period_search_in_35517: file not found
Mon 18 Jun 2012 19:01:49 BST|Asteroids@home|Finished download of period_search_10000_i686-pc-linux-gnu

:(

Cheers,

Al.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 6 - Posted: 18 Jun 2012, 18:14:48 UTC

Hello everyone, I am sorry for the troubles, we are debugging the server this week so there may be some errors and I thank you for any report of them.
The reason of this download problem is about a little mistake in work generator script. Download/upload server is down on purpose.

Profile Sirius B
Avatar
Send message
Joined: 20 Jun 12
Posts: 3
Credit: 1,002,840
RAC: 0
Message 15 - Posted: 20 Jun 2012, 10:24:12 UTC

Thanks for the info. looking forward to crunching.
____________

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 17 - Posted: 20 Jun 2012, 18:22:21 UTC

I've managed to download a WU on each of my linux hosts now. Let's just hope they crunch OK :)

Pity there's no Windows application yet, but I've posted that in the Wish List forum :)

Cheers,

Al.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 19 - Posted: 20 Jun 2012, 18:41:52 UTC

Downloading of the WUs should work fine now. I suppose that tommorow we could release much more WUs for crunching.

Profile Saenger
Avatar
Send message
Joined: 18 Jun 12
Posts: 23
Credit: 4,090,595
RAC: 3,270
Message 24 - Posted: 20 Jun 2012, 20:51:29 UTC - in response to Message 19.

Downloading of the WUs should work fine now. I suppose that tommorow we could release much more WUs for crunching.

Got some as well, I'm only waiting for my wingman to crunch 'em, some strange guy called Kyong ;)

Server status still says the UL/DL is deactivated and work generator not started.
____________
Grüße vom Sänger

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 27 - Posted: 21 Jun 2012, 10:23:58 UTC

I downloaded first group of WUs for testing to two computers without running 24/7. :-D I have these two computers by the bed so they aren't running at night.
UL/DL and work generator aren't needed, I will correct it. There is a script for adding new work so the default work generator and UL/DL server isn't needed.

Profile Saenger
Avatar
Send message
Joined: 18 Jun 12
Posts: 23
Credit: 4,090,595
RAC: 3,270
Message 31 - Posted: 21 Jun 2012, 20:45:40 UTC

What happened to those two: 519 and 505, both crunched by us two and both declared as WU cancelled as error, so probably neither your nor my machines fault?
____________
Grüße vom Sänger

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 33 - Posted: 22 Jun 2012, 10:47:16 UTC - in response to Message 31.

My WU http://asteroidsathome.net/boinc/workunit.php?wuid=534 suffered a similar fate when my wingnut's replication was cancelled. It was Bok this time, so looks like that last batch just got wholesale cancelled.

Cheers,

Al.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 34 - Posted: 22 Jun 2012, 10:52:11 UTC

There was a problem due to minimum quorum setting, so units had to be canceled. But everything should be now corrected, so there should be no more canceling and much more units will be released soon.

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 35 - Posted: 22 Jun 2012, 11:09:59 UTC - in response to Message 34.

Oh, the MQ was 2 on mine, and max error etc was 20. Must have got caught in the crossfire :)

Cheers,

Al.

Profile Saenger
Avatar
Send message
Joined: 18 Jun 12
Posts: 23
Credit: 4,090,595
RAC: 3,270
Message 39 - Posted: 22 Jun 2012, 13:45:17 UTC

Ah, the pleasures of crunching Alpha projects :D

Go on with your work, stay as informative as you are, and everything will straighten itself out.
____________
Grüße vom Sänger

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 40 - Posted: 22 Jun 2012, 15:20:03 UTC
Last modified: 22 Jun 2012, 15:21:53 UTC

I hope this was short alpha-beta test. All reported bugs are fixed. There is now about 7600 units for crunchig, so let's start crunch and let's stay informing about new discovered bugs. :-D

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 43 - Posted: 22 Jun 2012, 17:33:21 UTC - in response to Message 40.

Are these 'real' rather than test WUs now? They seem to likely to take much longer than before.

Which is OK. But ...

This WU http://asteroidsathome.net/boinc/workunit.php?wuid=4446 has a failed download, leaving me running the only live replication, as the replacement hasn't been sent for some reason. I don't mind burning up cycles on short WUs that ultimately fail to validate, but long ones, I'd rather not :)

Cheers,

Al.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 45 - Posted: 22 Jun 2012, 17:50:49 UTC

These units should take about 20 hours of computing. Test units are about 1 hour. And the WU is waiting to be sent now.

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 46 - Posted: 22 Jun 2012, 18:06:39 UTC - in response to Message 45.

Ok :)

My linux hosts are really old and slow, so those times will be 3 times long for me, but that's OK. I'll let them run :)

Cheers,

Al.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 47 - Posted: 22 Jun 2012, 21:05:15 UTC - in response to Message 46.

I hope that everything is okay now. And maximum delay is about 4800 hours of computing so you should take it easy. :-D

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 57 - Posted: 24 Jun 2012, 17:40:55 UTC - in response to Message 45.

And the WU is waiting to be sent now.


Will it ever be sent though? I've completed my replication, but the replacement for the failed download remains unsent http://asteroidsathome.net/boinc/workunit.php?wuid=4446

Similarly this WU http://asteroidsathome.net/boinc/workunit.php?wuid=5579 that I'm running had it's partner WU die with a computation error and the replacement WU has not been sent out. I've suspended it for now as 20 hours would be a lot to lose on one WU.

I ask because I've lost so much computing time at other projects because of similar problems (WUs being cancelled before the 3rd replication went out etc). I've only got 2 linux hosts and I don't really want to tie them up with unproductive work. Once the Windows app is available I won't be so worrried as I've got a dozen of those waiting to go :)

Cheers,

Al.

p.s. Yes, I know this is really still an alpha project so I shouldn't expect anything to work :)

p.p.s. Based on the one WU that I've had validated, the credit is way too low compared to other projects. Not much worried about it, but maybe something is wrong with the validator.

1 · 2 · Next
Post to thread

Message boards : Number crunching : UL/DL-Server are down


Main page · Your account · Message boards


Copyright © 2020 Asteroids@home