Download failed



Message board moderation

To post messages, you must log in.
AuthorMessage
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 1675 - Posted: 5 Sep 2013, 17:48:20 UTC
Ok, 7 may be better. But server uses special subfolders which aren't still same so if I add it on another server and copy it back, it won't work and there will be mess with files.
ID: 1675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Highlander

Send message
Joined: 16 Aug 13
Posts: 4
Credit: 3,234,196
RAC: 0
Message 1676 - Posted: 5 Sep 2013, 20:00:19 UTC
that is a pitty, have thought also something about that, k, then i hope, the network connection on your side and the servers stays stable .
ID: 1676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HA-SOFT, s.r.o.
Project developer
Project tester

Send message
Joined: 21 Dec 12
Posts: 176
Credit: 135,168,558
RAC: 12,135
Message 1678 - Posted: 5 Sep 2013, 20:35:38 UTC - in response to Message 1676.  

Last modified: 5 Sep 2013, 22:23:21 UTC
For every one who wants to help us flush invalid tasks or do not want to click update every second:

create cmd file like this one:

:1
boinccmd --project http://asteroidsathome.net/boinc update
timeout 60
goto 1


and run it. It updates project every 1 minute. Add full path before boinccmd if required.

Thanks a lot for your patience.
ID: 1678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 1681 - Posted: 6 Sep 2013, 1:16:21 UTC - in response to Message 1678.  
For every one who wants to help us flush invalid tasks or do not want to click update every second:

create cmd file like this one:

:1
boinccmd --project http://asteroidsathome.net/boinc update
timeout 60
goto 1


Good idea but this strategy has been attempted at other projects and it's not as effective as one might think. The problem is it flushes invalid tasks only until the host's task cache fills. At that point the host won't request more tasks until it crunches enough tasks to drain the cache to the host's minimum cache setting. Then the host will request more tasks but soon it's cache is full again. In the end the flushing is severely constrained by how fast the host crunches tasks. It's better than nothing, but it may take a long time to flush the tasks from the server unless many hosts run the script.

What the script needs to do before every update is abort any task the host has not started crunching. Yes, that will abort tasks that downloaded properly but those will be sent to another host and you can be fairly sure it will not be running the flush script and that it will crunch the task. Remember the bad tasks still have a max errors setting of 20. Only by aborting all tasks that have not started will the host be forced to request more tasks on each and every update.

If there only 1,000 more bad tasks then it's probably not worth putting any effort into improving the flush script. If there are 10,000 more bad tasks then maybe it's worth improving the script. How many more bad tasks remain and how long will it take at the current flush rate?
ID: 1681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sonoraguy

Send message
Joined: 11 Jun 13
Posts: 8
Credit: 15,481,080
RAC: 0
Message 1684 - Posted: 6 Sep 2013, 4:38:04 UTC
I hope you guys are on the right track. I've recorded about 8,000 failed downloads myself in the last couple days. This is a little crazy.
ID: 1684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 1686 - Posted: 6 Sep 2013, 11:50:02 UTC - in response to Message 1684.  

Last modified: 6 Sep 2013, 11:53:25 UTC
Doan be worry, Sonoraguy (Sonora, California?). It's only computers and we are still their masters for at least another year. No guarantees after that.

I dug up a script I used a few months ago to flush bad tasks at another project. I'm pretty sure I can adapt it to work here too. If so it should flush a few thousand bad tasks per day but hmmmm... with 20 errors required for every task it might take a while. Wanna help?
ID: 1686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 1688 - Posted: 6 Sep 2013, 12:43:35 UTC
I have decreased max_errors to 3 in database so cleaning the bad results should be fast now.
ID: 1688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sonoraguy

Send message
Joined: 11 Jun 13
Posts: 8
Credit: 15,481,080
RAC: 0
Message 1689 - Posted: 6 Sep 2013, 14:55:54 UTC
:-) well.... the attack of the killer Downloads Failed seems to be over and humanity is once again safe to continue on with its own nefarious deeds! Came in at over 9,600 errors on this!
ID: 1689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 18 Jun 12
Posts: 15
Credit: 5,027,400
RAC: 0
Message 1690 - Posted: 6 Sep 2013, 17:03:47 UTC - in response to Message 1688.  
I have decreased max_errors to 3 in database so cleaning the bad results should be fast now.


well done!
ID: 1690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 1692 - Posted: 6 Sep 2013, 18:56:22 UTC - in response to Message 1690.  
I think my flusher script aborted about 1,000 tasks. That might be just a drop in the bucket. I shut the script down because I reached my daily task limit and can't get anymore tasks today. Of the last bunch of tasks I received, about 50% were failed downloads so I'm not sure this is over yet. Before I hit the daily task limit I received batches that had no failed downloads only to be followed by batches that had ~75% download fails so I'm not so sure this is over yet. That was my Linux host. Think I'll put the script on my Windows host later and see how many failed downloads I get on that one before I hit the daily limit.

Also, I've been wondering... what harm do these failed downloads really do? You get some failed downloads, but when your host needs more tasks it requests more and continues requesting until it receives some good tasks and then it goes back to work. Is anybody experiencing any real problems because of that? Any major dead time or hosts languishing about with nothing to do for more than 10 minutes? Sorry, I'm tied up with other stuff and am unable to keep close watch on my machines so I could be missing the obvious here.
ID: 1692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomaseq

Send message
Joined: 23 Feb 13
Posts: 3
Credit: 5,057,040
RAC: 0
Message 1705 - Posted: 7 Sep 2013, 21:40:31 UTC
There are any problems with download...
ID: 1705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sonoraguy

Send message
Joined: 11 Jun 13
Posts: 8
Credit: 15,481,080
RAC: 0
Message 1708 - Posted: 8 Sep 2013, 0:05:37 UTC
Question: Has anyone taken notice of the fact that the issue of the Download Failed issue seems to have returned? After a day of relatively clean downloads, we're back to seeing about 60% - 80% of them fail again. I've had about 1,000 downloads fail in the last day or so.

I don't expect this happens often but I've also had Work Units declared invalid because I processed it but there were 16 or 18 "Error while Downloading" so the WU was dropped due to 20 failures. Examples:
http://asteroidsathome.net/boinc/workunit.php?wuid=4842346
http://asteroidsathome.net/boinc/workunit.php?wuid=4842338

Just a thought, but it might be an idea to clear the database of all those download failures.
ID: 1708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nanoprobe

Send message
Joined: 15 Jan 13
Posts: 12
Credit: 904,320
RAC: 0
Message 1710 - Posted: 8 Sep 2013, 1:08:34 UTC - in response to Message 1708.  
Question: Has anyone taken notice of the fact that the issue of the Download Failed issue seems to have returned? After a day of relatively clean downloads, we're back to seeing about 60% - 80% of them fail again. I've had about 1,000 downloads fail in the last day or so.

I don't expect this happens often but I've also had Work Units declared invalid because I processed it but there were 16 or 18 "Error while Downloading" so the WU was dropped due to 20 failures. Examples:
http://asteroidsathome.net/boinc/workunit.php?wuid=4842346
http://asteroidsathome.net/boinc/workunit.php?wuid=4842338

Just a thought, but it might be an idea to clear the database of all those download failures.

I can't get any tasks. They all fail to download.
ID: 1710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 27 Jun 12
Posts: 129
Credit: 62,716,409
RAC: 90
Message 1713 - Posted: 8 Sep 2013, 3:30:15 UTC - in response to Message 1708.  
Just a thought, but it might be an idea to clear the database of all those download failures.

They should be able identify the download failed by their error count being > 5 and then mark them all as cancelled to save download failures. Just a thought Kyong might want to consider.

I am still getting attempted downloads of ones up to _19 so I don't think the setting of them to a max of 3 errors worked.
BOINC blog
ID: 1713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 1714 - Posted: 8 Sep 2013, 3:50:20 UTC - in response to Message 1708.  

Last modified: 8 Sep 2013, 3:57:29 UTC
Question: Has anyone taken notice of the fact that the issue of the Download Failed issue seems to have returned? After a day of relatively clean downloads, we're back to seeing about 60% - 80% of them fail again. I've had about 1,000 downloads fail in the last day or so.


Kyong knows the reason for this recurring problem best but my hunch is the bad downloaders are not sprinkled evenly throughout the database. Instead they were injected in relatively large blocks. We worked through a bad block 2 days ago then hit a good block but now we're into another bad block. Or something like that. I've seen similar problems at a few other projects too. It'll work out fairly soon.

I don't expect this happens often but I've also had Work Units declared invalid because I processed it but there were 16 or 18 "Error while Downloading" so the WU was dropped due to 20 failures.


It's possible that as we continue to purge these bad downloadersd we're going to see that happen more often. It almost happened to me too except I was running a script that auto requests an Asteroids project update every 2 minutes so the server spotted the task and asked my host to abort it. My host complied with the abort request and I lost only 2 minutes crunch time on that task.

If you want to have those kinds of tasks aborted on your host too then check this post from HA-Soft in which he provides a small and easily installed Windows batch file (script) that will cause your host to auto-update Asteroids project too. Batch files are not something the average computer user needs to learn so it's possible you don't know how to implement it. The thing is BOINCing isn't exactly "average sort of computer activity" as you well know, right? If you don't know but would like to learn then ask and someone will guide you through it. It's a very handy thing to know how to do because frequently we can use batch files and scripts like HA-Soft's to correct various problems BOINCers run into.

If you don't mind installing Python on your host you can implement a scipt I'm using that flushes the bad downloaders from the server even faster than HA-Soft's script and causes tasks to auto-abort just as his script does. Python is a very powerful scripting language. It's safe and secure because if you or anybody else is interested in using it I will publish the script here in a post where others will vet it. If it poses any security risk to anyone they'll say so and the post will be deleted very quickly.

Just a thought, but it might be an idea to clear the database of all those download failures.


I'm not so sure the bad downloaders are causing us volunteers any real grief that we cannot avoid with minimal effort. Weigh that against the fact that messing around with the database has, in the past, caused some projects major grief. The likelihood of major grief happening depends on a number of factors I won't bother going into but let me assure you that I know Kyong and HA-Soft are not a couple of inexperienced amateurs bumbling their way through this. I am quite sure they've weighed their options and think the present course of action is the best one for the project. I trust them and I hope we will all trust them. Remember this... nobody is going to criticize you or anybody else if you just suspend Asteroids for however long it takes to work this out. Probably nobody will even know if you decide to do that.
ID: 1714 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kyong
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 9 Jun 12
Posts: 584
Credit: 52,667,664
RAC: 0
Message 1715 - Posted: 8 Sep 2013, 7:53:00 UTC
I have again decreased max_error to 3, I thought that the bad WUs was 17001 - 18000, not even to 19000.
ID: 1715 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 27 Jun 12
Posts: 129
Credit: 62,716,409
RAC: 90
Message 1719 - Posted: 8 Sep 2013, 10:53:51 UTC - in response to Message 1715.  

Last modified: 8 Sep 2013, 10:59:48 UTC
I have again decreased max_error to 3, I thought that the bad WUs was 17001 - 18000, not even to 19000.


I have computed some that have max errors set to 3, but the wingman got a computing error. Hopefully we won't waste these work units. Example:

ps_130831_18208_165 Compute error

As Dagorath says probably best not to fiddle with the database and just let them fail naturally now.
BOINC blog
ID: 1719 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wettermann

Send message
Joined: 18 Jun 12
Posts: 8
Credit: 5,274,530
RAC: 0
Message 1722 - Posted: 8 Sep 2013, 14:58:04 UTC
From 59 WU´s there were 39 with permanent http-error.

08.09.2013 16:50:15 | Asteroids@home | Scheduler request completed: got 59 new tasks
08.09.2013 16:50:16 | Asteroids@home | work fetch suspended by user
08.09.2013 16:50:17 | Asteroids@home | Started download of period_search_10100_windows_intelx86__sse2.exe
08.09.2013 16:50:17 | Asteroids@home | Started download of input_18357_95
08.09.2013 16:50:19 | Asteroids@home | Finished download of input_18357_95
08.09.2013 16:50:19 | Asteroids@home | Started download of input_18357_65
08.09.2013 16:50:20 | Asteroids@home | Finished download of period_search_10100_windows_intelx86__sse2.exe
08.09.2013 16:50:20 | Asteroids@home | Finished download of input_18357_65
08.09.2013 16:50:20 | Asteroids@home | Started download of input_18356_195
08.09.2013 16:50:20 | Asteroids@home | Started download of input_18355_149
08.09.2013 16:50:21 | Asteroids@home | Giving up on download of input_18356_195: permanent HTTP error
08.09.2013 16:50:21 | Asteroids@home | Giving up on download of input_18355_149: permanent HTTP error
08.09.2013 16:50:21 | Asteroids@home | Started download of input_18355_156
08.09.2013 16:50:21 | Asteroids@home | Started download of input_18355_172
08.09.2013 16:50:22 | Asteroids@home | Giving up on download of input_18355_156: permanent HTTP error
08.09.2013 16:50:22 | Asteroids@home | Giving up on download of input_18355_172: permanent HTTP error
08.09.2013 16:50:22 | Asteroids@home | Started download of input_18356_17
08.09.2013 16:50:22 | Asteroids@home | Started download of input_18357_101
08.09.2013 16:50:24 | Asteroids@home | Giving up on download of input_18356_17: permanent HTTP error
08.09.2013 16:50:24 | Asteroids@home | Started download of input_18356_123
08.09.2013 16:50:25 | Asteroids@home | Finished download of input_18357_101
08.09.2013 16:50:25 | Asteroids@home | Giving up on download of input_18356_123: permanent HTTP error

says Tommy the Wettermann
ID: 1722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sonoraguy

Send message
Joined: 11 Jun 13
Posts: 8
Credit: 15,481,080
RAC: 0
Message 1723 - Posted: 8 Sep 2013, 17:41:18 UTC
Thanks Dagorath. I implemented the cmd file you suggested on two systems and we are, once again, fully loaded with WUs to run.
ID: 1723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dagorath

Send message
Joined: 16 Aug 12
Posts: 293
Credit: 1,116,280
RAC: 0
Message 1724 - Posted: 9 Sep 2013, 0:18:36 UTC - in response to Message 1723.  
Thanks Dagorath. I implemented the cmd file you suggested on two systems and we are, once again, fully loaded with WUs to run.


Wonderful!
ID: 1724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote