Resends
log in

Advanced search

Message boards : Number crunching : Resends

1 · 2 · Next
Author Message
Profile trigggl
Send message
Joined: 23 Jun 12
Posts: 2
Credit: 417,431
RAC: 0
Message 71 - Posted: 28 Jun 2012, 11:19:58 UTC

Do tasks with a wing man having an error ever get re-sent? So far, all of my completed tasks, but one, have a wing man with an error. None of them have been re-sent, yet.

Bok
Send message
Joined: 19 Jun 12
Posts: 4
Credit: 1,485,675
RAC: 0
Message 72 - Posted: 28 Jun 2012, 13:15:50 UTC

+1, I've got a ton of workunits pending waiting on resends right now... :(

BobCat13
Send message
Joined: 18 Jun 12
Posts: 5
Credit: 1,000,099
RAC: 0
Message 73 - Posted: 28 Jun 2012, 14:43:43 UTC

They will get sent, eventually. It seems on a lot of new projects resends are added to the end of the queue instead of being added to the beginning. There is a configuration option for the server about resends.

Here is the page about resends: Project Options
Looks like it is under the section Accelerating retries, option <reliable_priority_on_over>X</reliable_priority_on_over>

ChertseyAl
Send message
Joined: 18 Jun 12
Posts: 34
Credit: 1,537,551
RAC: 0
Message 74 - Posted: 28 Jun 2012, 16:13:02 UTC - in response to Message 73.

It's a very risk option for a new project to add resends to the end of the queue. I can think of a couple of projects where I lost a lot of completed WUs when the batch was abandoned, or the server was wiped, or the database was corrupted.

So I'm limiting my exposure and only running WUs where my wingnut has already completed. If the wingnut has failed for any reason, I abort the WU. As a result, I'm not getting much done :(

Cheers,

Al.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 76 - Posted: 30 Jun 2012, 20:49:34 UTC
Last modified: 30 Jun 2012, 20:58:34 UTC

I was puzzled why you had aborted the results. Thanks everyone for noticing. I added the option to config.xml, so now any error units should be added to the beggining of queue.

AMDave
Send message
Joined: 19 Jun 12
Posts: 11
Credit: 100,197
RAC: 0
Message 77 - Posted: 2 Jul 2012, 10:50:01 UTC

Good reporting & deduction guys.
I was still trying to figure out what was going on.

Good change, Kyong.

I have re-enabled my clients.

Profile Saenger
Avatar
Send message
Joined: 18 Jun 12
Posts: 23
Credit: 4,090,595
RAC: 3,270
Message 79 - Posted: 4 Jul 2012, 14:42:11 UTC - in response to Message 71.

Do tasks with a wing man having an error ever get re-sent? So far, all of my completed tasks, but one, have a wing man with an error. None of them have been re-sent, yet.

It's the same here for WUs I crunched successful.
And here for those I smashed, and someone else is waiting for a resend.

You said something about "solving", I can't see anything in that direction.

____________
Grüße vom Sänger

Bok
Send message
Joined: 19 Jun 12
Posts: 4
Credit: 1,485,675
RAC: 0
Message 80 - Posted: 4 Jul 2012, 15:34:46 UTC

Agreed, I now have 112 pending results all of which I've checked are waiting on resends.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 81 - Posted: 4 Jul 2012, 19:10:28 UTC

BOINC server applies changes on what is going to happen. So WUs, which are already ready to sent, have already their queue order so nothing to change with them. Only new work units are going to the beggining of the queue.

Profile Saenger
Avatar
Send message
Joined: 18 Jun 12
Posts: 23
Credit: 4,090,595
RAC: 3,270
Message 90 - Posted: 6 Jul 2012, 19:15:34 UTC - in response to Message 81.

BOINC server applies changes on what is going to happen. So WUs, which are already ready to sent, have already their queue order so nothing to change with them. Only new work units are going to the beggining of the queue.


So you have to let the queue dry out until all waiting resends are gone, correct?
____________
Grüße vom Sänger

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 98 - Posted: 10 Jul 2012, 7:46:22 UTC

Correct

Profile x3mEn
Send message
Joined: 19 Jun 12
Posts: 1
Credit: 10,030
RAC: 0
Message 311 - Posted: 14 Oct 2012, 22:07:31 UTC
Last modified: 14 Oct 2012, 22:08:15 UTC

What happened with resends again?
I have 2 WUs waiting for Resend more than 1 month:
http://asteroidsathome.net/boinc/workunit.php?wuid=16655
http://asteroidsathome.net/boinc/workunit.php?wuid=10365

Profile Conan
Avatar
Send message
Joined: 19 Jun 12
Posts: 30
Credit: 3,995,475
RAC: 821
Message 315 - Posted: 16 Oct 2012, 21:36:28 UTC

I also have a number of work units over a month old.

The allocated Task Numbers on these older WUs are in the 84,000 range.

We are currently up to 82,000 range so it wont be long now and these work units will be sent out to be processed.

Conan

Pollux_P3D
Send message
Joined: 8 Dec 12
Posts: 4
Credit: 25,510,080
RAC: 0
Message 497 - Posted: 21 Dec 2012, 19:45:55 UTC - in response to Message 315.

Google Translate:
It would be very nice if a fault by wingman, the Wu would be shipped immediately again.


Have a Nice Day
Pollux

Profile Bryan
Send message
Joined: 11 Dec 12
Posts: 7
Credit: 106,113,480
RAC: 0
Message 499 - Posted: 21 Dec 2012, 22:03:44 UTC

I just checked some of my 460 pending WU and found that when a wingman aborts a WU it isn't resent (9 days plus and counting on multiple WU).

There is a Boinc Stats Team Challenge on Asteroids that ends sometime today. Typically when those challenges end there are bunches and bunches of WU that get aborted. If the aborted WU don't get resent it is going to get frustrating real quick!

Dagorath
Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 500 - Posted: 21 Dec 2012, 22:17:05 UTC
Last modified: 21 Dec 2012, 22:24:34 UTC

This issue was investigated and discussed thoroughly at Sixtrack recently. I am definitely not an expert on the server code but from the events and discussions at Sixtrack I am convinced that it is not enough to just turn on the "priority resends" option Kyong and others have referred to. It seems that option alone will not put resends at the front of the queue. All it does is reduce a resend's deadline time and send it to a host that has proven itself to have an average turn around time that meets the admin defined criterion and returns results that verify reliably.

From the discussions and experiments at Sixtrack it seems obvious to me that the stock server code does not have a mechanism for putting resends at queue front. That mechanism is added to the server in custom script(s) that use a high and low watermark system. An admin can do it manually too. An example is in order. Let's say a project has 20,000 results it needs crunched. If they dump all 20,000 tasks into the hopper at once then the order in which they get sent is immutable via existing/stock server code. If there are resends they will go to the queue tail regardless of the "priority resends" option. The trick to getting resends resent quickly, as far as I can tell and do read this more than just once because I don't explain it very well, is to dump just 500 of the 20,000 tasks into the queue and when the number of ready to send tasks dwindles to 100 you replenish which means you queue (add to the queue) whatever resends are waiting plus enough unsent tasks to top the queue back up to 500 tasks ready to send. If the ready to send number is 100 then the resends are only 100 tasks back from queue front and behind the resends is a big chunk of unsent tasks. Thus in this example the high water mark is 500 and the low water mark is 100. The ready to send number swings between 100 and 500, eventually all 20,000 get their first go and at the end you have just a few (relatively speaking) resends which you put in the queue and perhap follow with the first 500 of your next batch of 20,000.

In practice the high and low water marks are whatever numbers works best for the admin, the 100/500 I used are just example numbes. Some projects replenish the queue manually, other projects have a script that monitors the number of ready to send tasks and replenishes the queue automatically.

That's what came out of the discussions and tests at Sixtrack and in my opinion it has been confirmed by other observations and remarks I've run across. There were suggestions in the Sixtrack discussions that the admin take a look at the feeder and work generator (are those the proper names?) scripts in use at projects that send resends quickly and perhaps borrow some code or get some ideas on how he can integrate fast resends into his own server.

High water and low water... it makes beautiful sense to me. Not a database expert here, not a BOINC server expert, just saying what I heard and saying it makes some sense to me. YMMV.

Profile Bryan
Send message
Joined: 11 Dec 12
Posts: 7
Credit: 106,113,480
RAC: 0
Message 501 - Posted: 21 Dec 2012, 23:45:42 UTC - in response to Message 500.

I've seen both techniques used by multiple projects. GPUGrid for example will reissue a aborted or "errored" wu within minutes. The reissued WU carries the standard deadline.

NumberFields on the other hand reissues the WU immediately with a "high priority" tag and gives an accelerated 3 day deadline.

In any case the resends should take priority over "new" WU. I've been working the project 10 days now and keep expecting my pending WU to stabilize as on other projects. But now knowing that aborted, errored, or timed out WU aren't resent I can see why my pending keeps growing on a daily basis :(

Dagorath
Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 502 - Posted: 22 Dec 2012, 4:50:12 UTC - in response to Message 501.

I've seen both techniques used by multiple projects.


Indeed you can use the "prority resends" mechanism which does not move the resend to the queue head in combination with the high and low watermarks technique which does move the resend ahead in the queue.

GPUGrid for example will reissue a aborted or "errored" wu within minutes. The reissued WU carries the standard deadline.


I've noticed that too, it takes them only a few minutes to issue the resend. If they are using the high water and low water marks method then I would guess they have a low high water mark which depletes very quickly down to the low water mark. Or they may be using some technique I am totally unfamiliar with. There's usually more than one way to get the job done.

NumberFields on the other hand reissues the WU immediately with a "high priority" tag and gives an accelerated 3 day deadline.


Indeed they do and I believe they have one of the shortest "pending verification" delays in the community. I wonder how they get them resent immediately. I know the high priority mechanism doesn't cause the immediate resend. Methinks they don't have a queue at all and they just generate tasks on the fly. I suspect the way it works there is that if there is a resend waiting when they receive a request for a task then they just send the resend. If there isn't a resend waiting then they generate a new task and send it. Lots of ways to skin a cat.

Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 9 Jun 12
Posts: 576
Credit: 52,667,664
RAC: 0
Message 503 - Posted: 22 Dec 2012, 10:05:33 UTC

I did some changes, keep watching it if it helped. It should have affect on new aborted tasks.

Dagorath
Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 504 - Posted: 23 Dec 2012, 3:08:56 UTC - in response to Message 503.

I aborted task 368022 from wu 166350 at 13/12, 02:55;33 UTC. The resend was created 4 seconds later. We shall see how long it takes to be sent.

1 · 2 · Next
Post to thread

Message boards : Number crunching : Resends


Main page · Your account · Message boards


Copyright © 2020 Asteroids@home