Asteroids downloading 6+ tasks at once ... disrupts balance with other BOINC projects
Message boards :
Problems and bug reports :
Asteroids downloading 6+ tasks at once ... disrupts balance with other BOINC projects
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Dec 12 Posts: 1 Credit: 76,560 RAC: 0 |
I started up with the Asteroids project at the beginning of Jan 2013. Recently, in the last 2 weeks, when Asteroids requests new tasks it asks for 6 tasks at once. This disrupts my balancing of work for other BOINC projects. Asteroids did not behave like this in first weeks and if one core was free it asked for 1 task. I don't believe I changed a preference (if there is one for this). Now it has to work through most of Asteroid task queue before it requests something from Einstein or such. Let me know if there is a setting I can adjust for this. Thanks |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
That is the way BOINC is intended to work. It will focus on one project for a while then shift focus to a different project. It did not act that way when you first began Asteroids because it needs time to adjust and to learn how long Asteroids tasks require to complete. Your balance of work will be honored and maintained over the longterm. You must think longterm rather than shortterm. What happens in the space of 24 hours may not appear to be balanced but over the period of a few weeks it will be balanced. |
Send message Joined: 31 Oct 12 Posts: 7 Credit: 4,381,920 RAC: 0 |
Has the way the units are compressed been changed ? Previous downloads reported as taking 3 1/2 hours to run on my laptop which is correct. New units appear to be 1/10th the size and now report as taking 14 mins to run, but they still take 3 1/2 hrs to run. Obviously this has resulted in Boinc downloading more units than I require. In fact I might be struggling to get through them before they timeout. |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Last modified: 27 Jan 2013, 17:40:24 UTC In fact I might be struggling to get through them before they timeout. That's why you need to keep a small cache. What has happened is they screwed up the numbers that allow BOINC to estimate how long the tasks take. I'm getting the same thing... estimate of 12 minutes but they end up needing 3 hours. However I keep a very small cache so I won't have any trouble meeting deadlines. These guys screwed up this week, next week it will be some other project, it's a never ending cycle. Either you plan ahead for the screw ups and have it easy by setting a small cache or you run into deadline trouble and fight with it all the time. Take your pick, whatever creams yer twinky. |
Send message Joined: 31 Oct 12 Posts: 7 Credit: 4,381,920 RAC: 0 |
Last modified: 27 Jan 2013, 19:27:41 UTC In fact I might be struggling to get through them before they timeout. Eh, cache was only 0.25 / day. It's my laptop that travels with me, not connected to internet all the time so I need a buffer. At only 14 mins estimate, 20 units were downloaded for every 1 that should have been. At least I know that it's happened to everyone, not something at my end. |
Send message Joined: 9 Jun 12 Posts: 584 Credit: 52,667,664 RAC: 0 |
|
Send message Joined: 31 Oct 12 Posts: 7 Credit: 4,381,920 RAC: 0 |
|
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
|
Send message Joined: 25 Dec 12 Posts: 6 Credit: 1,028,160 RAC: 0 |
Dagorath said: That's why you need to keep a small cache. What has happened is they screwed up the numbers that allow BOINC to estimate how long the tasks take. I'm getting the same thing... estimate of 12 minutes but they end up needing 3 hours. However I keep a very small cache so I won't have any trouble meeting deadlines. These guys screwed up this week, next week it will be some other project, it's a never ending cycle. Either you plan ahead for the screw ups and have it easy by setting a small cache or you run into deadline trouble and fight with it all the time. Take your pick, whatever creams yer twinky. OK, so what are you using for min and max work buffer size? Between having an almost non-existant buffer, and having <dont_use_dcf/> in the scheduler reply, I can think of several potential problems. I'm using the currently recommended BOINC version on Vista 32 bit - BOINC client 7.0.28. From what I can tell, <dont_use_dcf/> was originally intended to prevent going into EDF mode. FYI, I had never noticed the <fetch_minimal_work>1</fetch_minimal_work> flag until I had written the parts below based on using a very short work queue. it's been a while since I had to get into cc_config.xml. 1. Since it's being told to ignore DCF in the project scheduler reply, it's not just ignoring that project for EDF mode decisions, it's never incrementing the DCF at all for that project. On my machines, I'm seeing estimates from 17 - 20 minutes and actual run times in the 5 hour range. With time estimates like that, even with very small buffer sizes, the quads are getting something like 40+ WUs. Also, what does this do to other projects which don't return <dont_use_dcf/> in their scheduler reply. 2. Some projects like RNA have work units with estimated runtimes of almost 60 hours. Will it fetch work for these if I shrink the cache size on the quads so that it will only ask for 20 minutes * 4 cores? 3. What about busy projects like Seti where it can take many tries to actually get through to the scheduler and the BOINC client will apparently move on to another project. 4. What about projects like POEM where people have multi-day buffers and snatch up all the GPU WUs within a very short period of time and the rest of the day there aren't any GPU WU available unless someone aborts one. 5. Some projects are so busy that it makes the admins mad if you keep contacting them over and over because your cache isn't big enough or you have the <report_results_immediately>1</report_results_immediately> flag set or have the <max_tasks_reported>N</max_tasks_reported> flag set to a small value. I'm not sure if it's because some projects are returning <dont_use_dcf/> or that I have a small work queue but I'm seeing other projects (LHC for example) where they cut it really close on going into priority mode on some WUs of about 10 hours even though that is the only project with WUs due in the next several days and there are no asteroids WUs on the machine. On a side note, this could actually help projects like EON where the WUs really do take only 20 minutes to run. With a short work queue and EON limiting the number of WUs allowed to 2 * core count, I'm already having problems getting any work done on EON unless I set everything else to not fetch work from any other projects on one of my machines. I'm not positive but, from what I can tell, once a quad gets 8 WUs and is refused further WUs, the BOINC client penalizes the projects priority and moves on to another project. FYI, of the 22 projects the machine I'm writing this on, 5 of them have <dont_use_dcf/> set in their sched replies: Asteroids, POEM, NFS, Milkyway and WCG. It's late and I'm exhausted so I hope this makes sense. I'm sure I left some other things out but this should get some discussion started. David Ball |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Last modified: 30 Jan 2013, 21:40:37 UTC OK, so what are you using for min and max work buffer size? Between having an almost non-existant buffer, and having <dont_use_dcf/> in the scheduler reply, I can think of several potential problems. I'm using the currently recommended BOINC version on Vista 32 bit - BOINC client 7.0.28. From what I can tell, <dont_use_dcf/> was originally intended to prevent going into EDF mode. I use 0.1 and 0, "connect about every" is 0.1 and "additional" is 0. Been using that for years and wouldn't dream of going higher. FYI, I had never noticed the <fetch_minimal_work>1</fetch_minimal_work> flag until I had written the parts below based on using a very short work queue. it's been a while since I had to get into cc_config.xml. That's a very useful setting. With that you get one crunch it then get another one. You cache nothing so there's little risk of deadline trouble. If one has 24/7 connectivity and crunch several projects one really doesn't need a cache. 1. Since it's being told to ignore DCF in the project scheduler reply, it's not just ignoring that project for EDF mode decisions, it's never incrementing the DCF at all for that project. On my machines, I'm seeing estimates from 17 - 20 minutes and actual run times in the 5 hour range. With time estimates like that, even with very small buffer sizes, the quads are getting something like 40+ WUs. Seriously? I didn't know project could put that in the scheduler reply. I thought it was only for us to put in cc_config.xml and I could never figure out why anybody would use it. Wow! If that's true then I would simply refuse to crunch those projects. Or maybe use <fetch_minimal_work>. IMHO, BOINC absolutely needs DCF to make informed scheduling decisions and any project that would circumvent that can go pound sand. Also, what does this do to other projects which don't return <dont_use_dcf/> in their scheduler reply. I can't imagine it would affect those projects but ya never know. Like I said this "ignore dcf" in the scheduler reply thing is news to me so I'm really not up to speed on it. BTW thanks for mentioning that info. 2. Some projects like RNA have work units with estimated runtimes of almost 60 hours. Will it fetch work for these if I shrink the cache size on the quads so that it will only ask for 20 minutes * 4 cores? Lol! I've learned to never say "yah, that will work" with BOINC because it seems like there is always some catch that I'm unaware of that causes it to fail. Put it this way.... ClimatePrediction tasks have an even longer duration estimate than RNA tasks and I get them no problem (when they have work, they don't always anymore) with 0.1 and 0 cache settings so I see no reason why you would not get 60 hour RNA tasks. Why not try it and see? What's the worst that could happen? BTW, you might have to wait a few days or even a week to get one so don't try it for 20 minutes then declare it a failure if you don't get one in that 20 minutes. 3. What about busy projects like Seti where it can take many tries to actually get through to the scheduler and the BOINC client will apparently move on to another project. Well, if you're gong to crunch SETI then you have to consider that with all of its problems it's going to throw a monkey wrench into the works. I mean scheduling is all about predicting the future and if a project server can't be relied upon to be online then it's going to cause problems that BOINC may not ever be able to deal with effectively. I am generally an optomist but I've finally learned that some tasks are impossible and will forever be impossible. Wise men can tell which are impossible and they quickly stop banging their heads against the wall trying to accomplish the impossible. Scheduling SETI is one of those impossible tasks, IMHO. And since SETI's chances of ever finding an alien are so close to zero we may as well just call it zero and since there are so many other projects that have a high chance of achieving their goals and do not cause scheduling problems, why bother putting SETI on your list? All it does is screw up the works for nothing. Hey don't get me wrong, I believe in aliens, for sure, but I'm not dumb enough to believe we're going to find them with SETI's methodology. In the end you have to get informed about how the tech works (and I believe you are informed) then make a choice about how you're going to use it, how much failure you're willing to put up with, etc. See the way it's supposed to work is that even if SETI goes down for 2/3 days your host should get work from other projects and those will accrue debt that SETI will collect when it comes back online. You know that. The problem is when they do come back they don't have the bandwidth or whatever it takes to dole out the tasks fast enough for SETI to collect its debt before the next time it goes down. BOINC can't deal with that. Nothing can deal with that in any reliable way. If you try to deal with it by hoarding in as many SETI tasks as you can then the price you will pay is that other entropy and chaos in the system , entropy and chaos that will NEVER go away, such as the mistaken <fpops_est> from Asteroids, will cause deadline problems from time to time. Now I do believe that we/you could deal better with the chaos if we had a watchdog program that runs separately from BOINC and sort of peeks over BOINC's shoulder and watches for "crap" and then forces BOINC to take corrective action. For example, BOINC will start a task that is so close to deadline it hasn't a hope in hell of finishing before deadline. David Anderson refuses to abort such tasks because they might have a chance of returning a result before the task gets resent or before the resend returns a result. To me that is so frickin' stupid! Why not just abort the task? Why not just play it safe, simple and smart? Nope, Dave's got this habit of introducing ever more chaos and uncertainty into a system that is already rife with chaos and uncertainty and that's exactly what that deadline extension crap is all about. A watchdog program would just abort the task. Hey! They create new tasks every frickin minute of every frickin hour of every frickin day all year long including Christmas and Good Friday! Why bother holding onto a task that the best estimates you have indicates cannot be completed on time? 4. What about projects like POEM where people have multi-day buffers and snatch up all the GPU WUs within a very short period of time and the rest of the day there aren't any GPU WU available unless someone aborts one. Listen to Mick Jagger... you can't always get what you want. You can knock yourself out trying if that creams yer twinky or you can relax and remember that nobody pays us money for doing this so it's not worth driving oneself nuts over it. One of the things you can always have is the the wisdom to know the difference between what you can have and what you cannot have. 5. Some projects are so busy that it makes the admins mad if you keep contacting them over and over because your cache isn't big enough I think you are just imagining that and are driven to do so because you want to be a nice guy. If they don't want you contacting them too often they can setup options that will prevent your host from doing so. If they don't then IMHO it means they don't mind being contacted often. or you have the <report_results_immediately>1</report_results_immediately> flag set or have the <max_tasks_reported>N</max_tasks_reported> flag set to a small value. I hear you and again I think those 2 flags are ignored if the server specifies contact limits. I'm not sure if it's because some projects are returning <dont_use_dcf/> or that I have a small work queue but I'm seeing other projects (LHC for example) where they cut it really close on going into priority mode on some WUs of about 10 hours even though that is the only project with WUs due in the next several days and there are no asteroids WUs on the machine. That is a problem with LHC, their deadlines are far too short. The reason they do that is to solve another problem they have. And the reason they don't solve that problem the proper way is some combination of laziness, apathy, ignorance or lack of time. It's THEIR problem and there is a solution for their problem which will not cause problems for us. If you allow them to solve their problem by creating a problem for you then you have nobody to blame but yourself. Then turn up Mick's Can't Get No Satisfaction and see how much pity you can get for punishing yourself unnecessarily. I guess what action you take depends on how you see your relationship with "the projects" in general. If you feel that you are privileged to be able to give them your electricity and a good chunk of your paycheck every month plus hours of work keeping your host running smoothly then there is maybe no limit to the abuse you will take from them. If you feel that they, not you, are the privileged ones then you act differently. Depends how much brown stuff you can stand to have on your nose before you tell yourself you can't take the stink anymore and you pull out and find a properly run project that does things right and doesn't cause you unnecessary and avoidable grief. On a side note, this could actually help projects like EON where the WUs really do take only 20 minutes to run. With a short work queue and EON limiting the number of WUs allowed to 2 * core count, I'm already having problems getting any work done on EON unless I set everything else to not fetch work from any other projects on one of my machines. I'm not positive but, from what I can tell, once a quad gets 8 WUs and is refused further WUs, the BOINC client penalizes the projects priority and moves on to another project. Yes, I think that would benefit EON too. I'm not sure about the penalty you mention. If there is such a penalty then I would think it would be temporary. Always remember that the primary goals of the scheduler are to avoid missed dealines, honor the resource shares and always keep the host busy. A project might have it's priority penalized but I should think if it's fallen too far behind in receiving its share then BOINC will take corrective action. I say "I should think" because I have never verified it by looking at the code or doing other tests of behavior. FYI, of the 22 projects the machine I'm writing this on, 5 of them have <dont_use_dcf/> set in their sched replies: Asteroids, POEM, NFS, Milkyway and WCG. I'm just not sure about this <dont_use_dcf/> in the scheduler reply. I can hardly believe it's an option but I admit there may be a reason for it that I don't understand. The best advice I can give is if it causes a problem for you then try a smaller cache. If you must have a bigger cache so you can get all the SETI you want then consider ditching those project because IMHO they're only going to cause you grief. It's late and I'm exhausted so I hope this makes sense. I'm sure I left some other things out but this should get some discussion started. It makes a lot of sense, good stuff all of it and I hope my response makes as much sense. Again, thanks for the info about <dont_use_dcf/>, I'll have to look into that. Edit added: I should add that you and I have a different perspective on thses matters because you run 22 projects at once. My persoective is different becaus I've never run more than 8 at once, usually just 4. I can't imagine 22 because I simply don't have the time or interest to check that many projects on a regular basis to see if my results are verifying or whether I'm returning compute errors or whatever. I can't just add a project then ignore it and not check it every other day. I mentioned an independent watchdog program to watch for "crap" and force adjustments. Mayb another thing such a program could do is check your results for you either by scanning stdoutdae.txt for errors and warnings or by going to the project sites and examining your results and then giving you a summary of what's going on. It's doable. |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
4. What about projects like POEM where people have multi-day buffers and snatch up all the GPU WUs within a very short period of time and the rest of the day there aren't any GPU WU available unless someone aborts one. This was once a problem at LHC years ago. I would never have worried about it except that it was providing then some rationale for keeping their old, ludicrous and extremely wasteful initial replication of 3 and quorum of 2 policy. So I suggested to them that they configure the option to limit how many tasks a host can have. They refused. So I made a script that polled their server every 30 seconds to see if it had work and if it did the script would boost my cache to the max, grab a big pail full of tasks then reduce the cache. When I was sure the script worked properly I let the rest of the crunchers know what I was doing. Some were already running their own similar script and were not surprised while others were thoroughly miffed, lol! Eventually they (other volunteers) begged me for a copy of the script so they too could get tasks. Well, I milked that for all it was worth, warned the admins of what was going to happen, they refused, pretended it wasn't happening, lies, laziness, and more lies. So I unleashed the script by giving it to others, slowly, a few at a time, depending on how hard they groveled and begged. Eventually the load of all the scripts hammering the server caused problems. Then the admins begged and groveled but I told them there was no bargaining, they had to put the task download limit in place or the scripts would continue to hammer the server. So they did. And then the work started getting spread out more evenly amongst all the volunteers and production increased. So that's what you can do to Poem too though I hesitate to recommend it as a moral kind of action because the lack of a task limit isn't really causing a greater problem like it was at LHC. If their lack of a limit isn't really hurting anyone then IMHO you/we need to learn to live with it. On the other hand I can almost guarantee that other people are running such a script so why not you too? And if you too then why not everybody? They'll decide whether they want their server hammered or not. You can't always git what you want But if you try Sometimes you just might find You can twist their arm until they give in ...to paraphrase Mick a little |
Send message Joined: 25 Dec 12 Posts: 6 Credit: 1,028,160 RAC: 0 |
My buffers only run about .3 or .4 days total. I have different configs for duals and quads. I like to keep it where the BOINC manager tasks tab will fit on one page if possible. That's one reason the Seti WUs that have finished and can't report in clutter up the screen when I'm trying to get a complete picture of what's going on in that machine. FYI, Seti sends me an email if I don't participate for a while and they were so instrumental in getting BOINC going that I run about 1% Seti as sort of a BOINC tax. I long ago gave up on Seti finding anything. The projects that really interest me are the health related projects. Being in my late 50's, I have an interest in health projects making progress :-) If you're wondering about the <dont_use_dcf/> in the schedular reply, go to the main BOINC data directory on your machine and look at the first few lines in a file called sched_reply_asteroidsathome.net_boinc.xml . You'll find that it starts with something like <scheduler_reply> When I saw the <dont_use_dcf/>, I searched the BOINC website for that phrase to find out what it did. I even found where David Anderson checked in a related changeset. See http://boinc.berkeley.edu/trac/changeset/7df3c15fc2ebccb472a0216b0a604f590880a460/boinc which begins with Changeset 7df3c15 in boinc BTW, I got the C2Q Q6600 crunchers cheap and use them for heaters in the winter. They get shut down in the warm months. When I start them up after a several month absence, the BOINC client aborts all the WUs that are past deadline but have never started. I don't remember what it does with WUs that are months past the deadline but have already done some work. I might have had to abort those manually. The machine that has 22 projects is sort of my master control where I see what's going on with all projects that are still active. It uses the work profile and some projects on it have the resource share set to 0.001 which means they never get work. The health related projects can go as high as a resource share of 35. I also run a small resource share like 3 on some newly started projects to help them get going and work the bugs out before they go into production. I have points in many projects that no longer exist. I guess I should put my project list in my sig on asteroids. :-) One of these days, I'm going to download the BOINC client source and build it. I started as a programmer by writing programs and firmware in the 8080/8085/Z-80 days on CP/M and MP/M. I've written a lot of microprocessor assembler, C, and C++. When I started writing C++, we had to use "compilers" that output C code instead of compiling to machine level. Most lines of C++ produced about 6 to 8 lines of C code which mostly consisted of the characters '(' and ')'. Hope this answers some of your remaining questions. David Ball |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
.3 to .4 isn't very high, you shouldn't run into much trouble with buffers that small. But if you do then the only answers are to find which project(s) are sending too much work and correct the problem there at the project level or shrink your buffers even further. SETI has indeed been instrumental but we could have the same development occurring and at the same pace without wasting all that CPU power on a hopeless cause. I ran SETI classic when there was nothing else to do with spare CPU cycles but after BOINC and projects like Einstein, Rosetta and LHC came out I dropped SETI like a hot potato, never again, I don't owe them or any project anything. I'm grateful for their work but it doesn't make sense to throw away CPU cycles just to show my gratitude, "Thanks, cya" is sufficient IMHO. Tasks that are started and expired will run to completion no matter how overdue they are. Thanks for the additional info on <dont_use_dcf/>. It sounds like it applies only to 7.0.28 and newer clients but not older clients. I wonder why. Must be due to some recent changes in the scheduler. Sounds like you've been programming a loooong time, heh, CP/M that was good OS. I'm 59 and my first one was a "Black Apple" a friend and I built from a Heath Kit. I learned Pascal and Basic09 which ran on the CoCo (Radio Shack Color Computer) in 512 KB RAM, dabbled in a little 6809 assembler, learned C++ in the 90's on Borland Builder but quit that when I could no longer stomach Windows. (I ran OS/2 for years before Windows so for me Win was an abomination from the beginning). Eventually learned 8051 assembler. Compiling your own BOINC is a worthwhile thing to learn. A lot could be done to improve the client and the manager. |
Send message Joined: 19 Jun 12 Posts: 221 Credit: 623,640 RAC: 0 |
Last modified: 2 Feb 2013, 4:14:12 UTC I'm just not sure about this <dont_use_dcf/> in the scheduler reply. I can hardly believe it's an option but I admit there may be a reason for it that I don't understand. I'm also not sure why this was implemented but it was invented maybe a year ago. Some threads about 'crazy' DCF and 'desire for' dont_use_dcf http://setiathome.berkeley.edu/forum_thread.php?id=67673 http://setiathome.berkeley.edu/forum_thread.php?id=67273 http://setiathome.berkeley.edu/forum_thread.php?id=67095 ('red-ray' is Ray Hinchliffe, the author of SIV - System Information Viewer: http://setiathome.berkeley.edu/view_profile.php?userid=9653891 http://rh-software.com/ ) It sounds like it applies only to 7.0.28 and newer clients but not older clients. I wonder why. Because they will not understand this tag (and have no code in them to ignore/not-calculate DCF) (and maybe they will complain in the messages about bad tag in sched_reply_*.xml) - ALF - "Find out what you don't do well ..... then don't do it!" :) |
Message boards :
Problems and bug reports :
Asteroids downloading 6+ tasks at once ... disrupts balance with other BOINC projects