Download failed
Message boards :
Problems and bug reports :
Download failed
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 9 Jun 12 Posts: 584 Credit: 52,667,664 RAC: 0 |
|
Send message Joined: 16 Aug 13 Posts: 4 Credit: 3,234,196 RAC: 0 |
|
![]() Send message Joined: 21 Dec 12 Posts: 176 Credit: 136,462,135 RAC: 0 |
Last modified: 5 Sep 2013, 22:23:21 UTC For every one who wants to help us flush invalid tasks or do not want to click update every second: create cmd file like this one: :1 and run it. It updates project every 1 minute. Add full path before boinccmd if required. Thanks a lot for your patience. |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
For every one who wants to help us flush invalid tasks or do not want to click update every second: Good idea but this strategy has been attempted at other projects and it's not as effective as one might think. The problem is it flushes invalid tasks only until the host's task cache fills. At that point the host won't request more tasks until it crunches enough tasks to drain the cache to the host's minimum cache setting. Then the host will request more tasks but soon it's cache is full again. In the end the flushing is severely constrained by how fast the host crunches tasks. It's better than nothing, but it may take a long time to flush the tasks from the server unless many hosts run the script. What the script needs to do before every update is abort any task the host has not started crunching. Yes, that will abort tasks that downloaded properly but those will be sent to another host and you can be fairly sure it will not be running the flush script and that it will crunch the task. Remember the bad tasks still have a max errors setting of 20. Only by aborting all tasks that have not started will the host be forced to request more tasks on each and every update. If there only 1,000 more bad tasks then it's probably not worth putting any effort into improving the flush script. If there are 10,000 more bad tasks then maybe it's worth improving the script. How many more bad tasks remain and how long will it take at the current flush rate? |
Send message Joined: 11 Jun 13 Posts: 8 Credit: 15,481,080 RAC: 0 |
|
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Last modified: 6 Sep 2013, 11:53:25 UTC Doan be worry, Sonoraguy (Sonora, California?). It's only computers and we are still their masters for at least another year. No guarantees after that. I dug up a script I used a few months ago to flush bad tasks at another project. I'm pretty sure I can adapt it to work here too. If so it should flush a few thousand bad tasks per day but hmmmm... with 20 errors required for every task it might take a while. Wanna help? |
![]() ![]() Send message Joined: 9 Jun 12 Posts: 584 Credit: 52,667,664 RAC: 0 |
|
Send message Joined: 11 Jun 13 Posts: 8 Credit: 15,481,080 RAC: 0 |
|
Send message Joined: 18 Jun 12 Posts: 15 Credit: 5,027,400 RAC: 0 |
|
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
I think my flusher script aborted about 1,000 tasks. That might be just a drop in the bucket. I shut the script down because I reached my daily task limit and can't get anymore tasks today. Of the last bunch of tasks I received, about 50% were failed downloads so I'm not sure this is over yet. Before I hit the daily task limit I received batches that had no failed downloads only to be followed by batches that had ~75% download fails so I'm not so sure this is over yet. That was my Linux host. Think I'll put the script on my Windows host later and see how many failed downloads I get on that one before I hit the daily limit. Also, I've been wondering... what harm do these failed downloads really do? You get some failed downloads, but when your host needs more tasks it requests more and continues requesting until it receives some good tasks and then it goes back to work. Is anybody experiencing any real problems because of that? Any major dead time or hosts languishing about with nothing to do for more than 10 minutes? Sorry, I'm tied up with other stuff and am unable to keep close watch on my machines so I could be missing the obvious here. |
Send message Joined: 23 Feb 13 Posts: 3 Credit: 5,057,040 RAC: 0 |
|
Send message Joined: 11 Jun 13 Posts: 8 Credit: 15,481,080 RAC: 0 |
Question: Has anyone taken notice of the fact that the issue of the Download Failed issue seems to have returned? After a day of relatively clean downloads, we're back to seeing about 60% - 80% of them fail again. I've had about 1,000 downloads fail in the last day or so. I don't expect this happens often but I've also had Work Units declared invalid because I processed it but there were 16 or 18 "Error while Downloading" so the WU was dropped due to 20 failures. Examples: http://asteroidsathome.net/boinc/workunit.php?wuid=4842346 http://asteroidsathome.net/boinc/workunit.php?wuid=4842338 Just a thought, but it might be an idea to clear the database of all those download failures. |
Send message Joined: 15 Jan 13 Posts: 12 Credit: 904,320 RAC: 0 |
Question: Has anyone taken notice of the fact that the issue of the Download Failed issue seems to have returned? After a day of relatively clean downloads, we're back to seeing about 60% - 80% of them fail again. I've had about 1,000 downloads fail in the last day or so. I can't get any tasks. They all fail to download. |
![]() Send message Joined: 27 Jun 12 Posts: 129 Credit: 62,725,780 RAC: 0 |
Just a thought, but it might be an idea to clear the database of all those download failures. They should be able identify the download failed by their error count being > 5 and then mark them all as cancelled to save download failures. Just a thought Kyong might want to consider. I am still getting attempted downloads of ones up to _19 so I don't think the setting of them to a max of 3 errors worked. BOINC blog |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Last modified: 8 Sep 2013, 3:57:29 UTC Question: Has anyone taken notice of the fact that the issue of the Download Failed issue seems to have returned? After a day of relatively clean downloads, we're back to seeing about 60% - 80% of them fail again. I've had about 1,000 downloads fail in the last day or so. Kyong knows the reason for this recurring problem best but my hunch is the bad downloaders are not sprinkled evenly throughout the database. Instead they were injected in relatively large blocks. We worked through a bad block 2 days ago then hit a good block but now we're into another bad block. Or something like that. I've seen similar problems at a few other projects too. It'll work out fairly soon. I don't expect this happens often but I've also had Work Units declared invalid because I processed it but there were 16 or 18 "Error while Downloading" so the WU was dropped due to 20 failures. It's possible that as we continue to purge these bad downloadersd we're going to see that happen more often. It almost happened to me too except I was running a script that auto requests an Asteroids project update every 2 minutes so the server spotted the task and asked my host to abort it. My host complied with the abort request and I lost only 2 minutes crunch time on that task. If you want to have those kinds of tasks aborted on your host too then check this post from HA-Soft in which he provides a small and easily installed Windows batch file (script) that will cause your host to auto-update Asteroids project too. Batch files are not something the average computer user needs to learn so it's possible you don't know how to implement it. The thing is BOINCing isn't exactly "average sort of computer activity" as you well know, right? If you don't know but would like to learn then ask and someone will guide you through it. It's a very handy thing to know how to do because frequently we can use batch files and scripts like HA-Soft's to correct various problems BOINCers run into. If you don't mind installing Python on your host you can implement a scipt I'm using that flushes the bad downloaders from the server even faster than HA-Soft's script and causes tasks to auto-abort just as his script does. Python is a very powerful scripting language. It's safe and secure because if you or anybody else is interested in using it I will publish the script here in a post where others will vet it. If it poses any security risk to anyone they'll say so and the post will be deleted very quickly. Just a thought, but it might be an idea to clear the database of all those download failures. I'm not so sure the bad downloaders are causing us volunteers any real grief that we cannot avoid with minimal effort. Weigh that against the fact that messing around with the database has, in the past, caused some projects major grief. The likelihood of major grief happening depends on a number of factors I won't bother going into but let me assure you that I know Kyong and HA-Soft are not a couple of inexperienced amateurs bumbling their way through this. I am quite sure they've weighed their options and think the present course of action is the best one for the project. I trust them and I hope we will all trust them. Remember this... nobody is going to criticize you or anybody else if you just suspend Asteroids for however long it takes to work this out. Probably nobody will even know if you decide to do that. |
![]() ![]() Send message Joined: 9 Jun 12 Posts: 584 Credit: 52,667,664 RAC: 0 |
|
![]() Send message Joined: 27 Jun 12 Posts: 129 Credit: 62,725,780 RAC: 0 |
Last modified: 8 Sep 2013, 10:59:48 UTC I have again decreased max_error to 3, I thought that the bad WUs was 17001 - 18000, not even to 19000. I have computed some that have max errors set to 3, but the wingman got a computing error. Hopefully we won't waste these work units. Example: ps_130831_18208_165 Compute error As Dagorath says probably best not to fiddle with the database and just let them fail naturally now. BOINC blog |
![]() Send message Joined: 18 Jun 12 Posts: 8 Credit: 5,274,731 RAC: 0 |
From 59 WU´s there were 39 with permanent http-error. 08.09.2013 16:50:15 | Asteroids@home | Scheduler request completed: got 59 new tasks 08.09.2013 16:50:16 | Asteroids@home | work fetch suspended by user 08.09.2013 16:50:17 | Asteroids@home | Started download of period_search_10100_windows_intelx86__sse2.exe 08.09.2013 16:50:17 | Asteroids@home | Started download of input_18357_95 08.09.2013 16:50:19 | Asteroids@home | Finished download of input_18357_95 08.09.2013 16:50:19 | Asteroids@home | Started download of input_18357_65 08.09.2013 16:50:20 | Asteroids@home | Finished download of period_search_10100_windows_intelx86__sse2.exe 08.09.2013 16:50:20 | Asteroids@home | Finished download of input_18357_65 08.09.2013 16:50:20 | Asteroids@home | Started download of input_18356_195 08.09.2013 16:50:20 | Asteroids@home | Started download of input_18355_149 08.09.2013 16:50:21 | Asteroids@home | Giving up on download of input_18356_195: permanent HTTP error 08.09.2013 16:50:21 | Asteroids@home | Giving up on download of input_18355_149: permanent HTTP error 08.09.2013 16:50:21 | Asteroids@home | Started download of input_18355_156 08.09.2013 16:50:21 | Asteroids@home | Started download of input_18355_172 08.09.2013 16:50:22 | Asteroids@home | Giving up on download of input_18355_156: permanent HTTP error 08.09.2013 16:50:22 | Asteroids@home | Giving up on download of input_18355_172: permanent HTTP error 08.09.2013 16:50:22 | Asteroids@home | Started download of input_18356_17 08.09.2013 16:50:22 | Asteroids@home | Started download of input_18357_101 08.09.2013 16:50:24 | Asteroids@home | Giving up on download of input_18356_17: permanent HTTP error 08.09.2013 16:50:24 | Asteroids@home | Started download of input_18356_123 08.09.2013 16:50:25 | Asteroids@home | Finished download of input_18357_101 08.09.2013 16:50:25 | Asteroids@home | Giving up on download of input_18356_123: permanent HTTP error says Tommy the Wettermann |
Send message Joined: 11 Jun 13 Posts: 8 Credit: 15,481,080 RAC: 0 |
|
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
|
Message boards :
Problems and bug reports :
Download failed