GPU and Asteroids@home
Message boards :
Number crunching :
GPU and Asteroids@home
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jun 12 Posts: 129 Credit: 62,725,780 RAC: 0 |
Last modified: 10 Nov 2013, 2:24:04 UTC My development environment is cheap msi notebook with w7 and gtx660m. Yes mobile gpu. It is horible slow, but enought for profiling and debuging for now. I'm using vs2010 + cuda 5.0 + last nvidia nSight. As you'd probably know from other projects older cuda versions seem better on older cards so you might end up having to make cuda 3.2 (pre-Fermi) 4.2 (Fermi) and maybe 5.5 (Kepler) versions. Not sure about cuda 6.0 as that's still under NDA at the moment until it's officially released. You'd have to use the compute capability to determine which app to give out. Also a lot of the other projects have tried using single precision if they can due to double precision being crippled on everything except Teslas and Titan. Not sure if that's possible given you're starting with a CPU app. BOINC blog |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
My development environment is cheap msi notebook with w7 and gtx660m. Yes mobile gpu. It is horible slow, The chart I linked in my previous post gives SP performance for 660M as 641.3 GFlops. 641 / 24 = 27 GFlops DP which is indeed very slow. Compare to 103 GFlops DP for 660Ti. I plan to test app first on my titan on linux for speed. I think 5xx should be ok and gk110 7xx with dp enabled. Titan is the only gk110 7xx that does not have DP crippled. 780 and 780Ti are gk110 but their DP is crippled to 1/24 of SP (4.2% of SP). BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 21 Dec 12 Posts: 176 Credit: 136,462,135 RAC: 46 |
Last modified: 10 Nov 2013, 8:31:00 UTC |
Send message Joined: 19 Jun 12 Posts: 221 Credit: 623,640 RAC: 0 |
Last modified: 12 Nov 2013, 10:39:11 UTC Titan is the only gk110 7xx that does not have DP crippled. 780 and 780Ti are gk110 but their DP is crippled to 1/24 of SP (4.2% of SP). And of course nVidia do this not for technical but for economical reasons: "GeForce GTX 780 Ti, on the other hand, gets neutered in the same way Nvidia handicapped its GTX 780. The card’s driver deliberately operates GK110’s FP64 units at 1/8 of the GPU’s clock rate. When you multiply that by the 3:1 ratio of single- to double-precision CUDA cores, you get a 1/24 rate. The math on that adds up to 5 TFLOPS of single- and 210 GFLOPS of double-precision compute performance." http://www.tomshardware.com/reviews/geforce-gtx-780-ti-review-benchmarks,3663.html If you are prepared for hard-mods (or just to see what people do): GeForce GTX780 to Tesla K20 (have some links to other older mods) http://www.guztech.nl/wordpress/index.php/2013/11/researching-nvidia-gpus-geforce-gtx780-and-gtx-titan-to-tesla-k20-and-tesla-k20x/ - ALF - "Find out what you don't do well ..... then don't do it!" :) |
Send message Joined: 19 Jun 12 Posts: 221 Credit: 623,640 RAC: 0 |
Last modified: 12 Nov 2013, 12:10:13 UTC I think cuda version do not affect speed on older cards. It's more related to code than to cuda version. You can make a conversation about this with jason_gee (Jason Groothuis, the programmer of the optimized CUDA apps for SETI) http://setiathome.berkeley.edu/show_user.php?userid=8534984 http://jgopt.org/ If you want to get the binary versions - the 2 links to 'Lunatics Installers v0.41' are posted here: http://setiathome.berkeley.edu/forum_thread.php?id=71867&postid=1375943#1375943 The 'Installers' are NSIS so can be unpacked by 7-Zip They contain 4 versions of CUDA apps aimed at different GPU generations/drivers: Lunatics_x41zc_win32_cuda23.exe Lunatics_x41zc_win32_cuda32.exe Lunatics_x41zc_win32_cuda42.exe Lunatics_x41zc_win32_cuda50.exe (more detailed info in the ReadMe files) ------- For DP vs SP - is it possible to simulate DP using several SP operations/instructions (this way the app will run on wider range of GPUs) http://www.mersenneforum.org/showpost.php?p=359046&postcount=14 - ALF - "Find out what you don't do well ..... then don't do it!" :) |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
GeForce GTX780 to Tesla K20 (have some links to other older mods) :) BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 8 Jul 13 Posts: 6 Credit: 557,400 RAC: 0 |
|
Send message Joined: 19 Jun 12 Posts: 21 Credit: 107,293,560 RAC: 0 |
Last modified: 18 Nov 2013, 20:50:53 UTC Thanks for pointing on 780ti dp speed. I thought it's like titan and it is not. Ok. So let's be honest... If we're porting asteroids to gpu, the only way to go is opencl! That will of course take into account that all nvidia gpus except the titan suck at DP... So no need to waste energy in porting the app to cuda when we already know that asteroids will be dominated by ATI/AMD gpus because those are less crippled regarding DP performance. To face the fact, Asteroids will be a "second" Milkyway@home regarding GPU dominance. AMD/ATI GPUs will dominate and nvidia will be left biting the dust. Join BOINC United now! |
Send message Joined: 26 Jan 13 Posts: 31 Credit: 1,546,412 RAC: 224 |
Ok. So let's be honest... If we're porting asteroids to gpu, the only way to go is opencl! That will of course take into account that all nvidia gpus except the titan suck at DP...You are porting asteroids to OpenCL? Great, first the CUDA app by HA-Soft, now your OpenCL app... |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Take your time on the OpenCL app, Crunch3r, we won't need it until next week ;) In the meantime, here is a comparison chart to show just how bad NVIDIA sucks at DP compared to AMD. I did not include 8xxx models from AMD because they have apparently crippled DP performance in those models. ============================================================================ | brand | model | price | peak DP | TDP | GFlops/$ | GFlops/watt | | | |($ CDN)| (GFlops) | (watts) | | | |==========================================================================| | | 560Ti | 189 | 104 - 162 | 170-210 | .55-.86 | .61 - .77 | | |-----------------------------------------------------------------| | | 660Ti | 265 | 103 | 150 | .39 | .69 | | |-----------------------------------------------------------------| | NVIDIA | 780Ti | 760 | 230 | 250 | .30 | .92 | | |-----------------------------------------------------------------| | | Titan | 1,030 | 1300-1500 | 250 | 1.26-1.46 | 4.12-6.0 | | |-----------------------------------------------------------------| | | Tesla | 3,499 | 1173 | 225 | .34 | 5.2 | |==========================================================================| | | HD 7730 | ? | 44.8 | 47 | ? | .95 | | |-----------------------------------------------------------------| | | HD 7750 | 125 | 57.6 | 75 | .46 | .77 | | |-----------------------------------------------------------------| | | HD 7790 | 150 | 128 | 85 | .85 | 1.5 | | AMD |-----------------------------------------------------------------| | | HD 7850 | 200 | 110 | 130 | .55 | .85 | | |-----------------------------------------------------------------| | | HD 7950 | 300 | 717 | 200 | 2.39 | 3.6 | | |-----------------------------------------------------------------| | | HD 7970 | 400 | 947 | 250 | 2.37 | 3.8 | | |-----------------------------------------------------------------| | | HD 7990 | 800 | 1894 | 375 | 2.37 | 5.1 | |==========================================================================| Performance data for both brands is excerpted from Wikipedia who claim they obtained the performance data directly from manufacturer literature. See http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units#Southern_Islands_.28HD_7xxx.29_Series for their AMD data which includes many more models than I have included as well as more data for each model (eg. die size, fab size). Their data for NVIDIA is at the link I gave in my previous post in this thread. Retail prices are from newegg.ca. You will find that prices vary considerably between various manufacturers. The prices in the chart are roughly half way between the highest price and the lowest price at newegg.ca for any given GPU. It's pretty obvious AMD kicks NVIDIA butt severely on DP performance. I will be happy to run a few CUDA tasks for A@H for initial testing and debugging but over the long run my NVIDIA GPU will stay at GPUgrid where it is a good tool for the job. The right tool for the job A@H wants to do is clearly AMD. If A@H comes up with an openCL app I will definitely invest in a good AMD card, probably a 7970 or 7990, and devote it to A@H exclusively. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 21 Dec 12 Posts: 176 Credit: 136,462,135 RAC: 46 |
Last modified: 19 Nov 2013, 17:50:49 UTC |
Send message Joined: 1 Apr 13 Posts: 37 Credit: 153,496,537 RAC: 0 |
Last modified: 23 Nov 2013, 21:10:04 UTC Thanks for pointing on 780ti dp speed. I thought it's like titan and it is not. Yes. The best SP/DP (1/4) ratio is provided by the relatively cheap Tahiti GPUs (AMD) only (professional cards excluded). The Hawaii Chip of the new R200 series is here crippled again (1/8)! My 7870 Boost Edition (Tahiti LE) is best utilized by the milkyway OpenCL apps (100%, especially under Linux!). The strength of the double precision Tahiti GPUs shows up in the first pages (!) of the best computer here: http://milkyway.cs.rpi.edu/milkyway/top_hosts.php http://einstein.phys.uwm.edu/top_hosts.php Accordingly, it is good to hear that the A&H GPU apps are making progress. |
Send message Joined: 24 Aug 13 Posts: 111 Credit: 31,709,843 RAC: 3,000 |
Last modified: 16 Mar 2014, 17:25:22 UTC And for older 2nd hand cards the 5800s & 6900s perform very well too for DP tasks. The 5800s are relatively cheap on ebay too :). The HD 5850's DP is 417 GFLOPs & the 5870 544 GFLOPs The 6950 563 GFLOPs & the 6970 675 GFLOPs Team AnandTech - SETI@H, Muon1 DPAD, Folding@H, MilkyWay@H, Asteroids@H, LHC@H, POGS, Rosetta@H, Einstein@H,DHPE & CPDN Main rig - Ryzen 3600, 32GB DDR4 3200, RX 580 8GB, Win10 2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win7 |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Interesting. The older cards might be the most cost effective at A@H if you consider only purchase price. If you include the cost of electricity to operate them you might get a different picture. If you operate one long enough you'll reach a point beyond which you end up paying more per task than if you had just saved your money until you can afford a newer, more efficient model. I don't know how long it takes to reach that point but it's a pretty simple system of simultaneous equations type of problem, basically high school math. Anyone care to have a go at it? The simple scammy-hash way or Gauss-Jordan elimination? BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 24 Aug 13 Posts: 111 Credit: 31,709,843 RAC: 3,000 |
Last modified: 17 Mar 2014, 18:54:16 UTC Yea I did some maths like that for running MW, I compared my then 4870 to the 5850, 5870 & 6950. Vs the 4870 I estimated the 5850 to be ~75% faster (based on DP GFLOPs, was about right) and cheaper to run! Saving me an estimated £36.50/yr (based on 6 mths GPU crunching, 6 mths idle). Ebay prices were £60-70 (14/12/12). The 5870 I estimated would of been ~127% faster & would of cost about the same to run as the 4870. Ebay prices were £75-100, mostly about £90 ish AFAIR. And the 6950 about 135% faster than the 4870 & slightly cheaper to run largely due to it's lower idle power, loaded power was a little bit more. Again based on 6mths on 6 off. Saving about £16.50/yr. Ebay prices were £100-120 I did also compare the 5870 to the 6950 & estimated it would of cost me ~£13/yr more to run the 5870. So it would of made it pointless to buy that 1 after ~2yrs crunching. Electrical prices based on 14p/unit (Southern Electric). So the 6950 would of been the most efficient choice, but I just couldn't justify spending that much on a grx card at the time so I bought the 5850. With the longer term aim of putting that card in my 2nd rig & then getting a 6950, or maybe a 6970. Oh & the 5850 cost me £65, so not a bad price :) ..... I keep forgetting to sell the 4870! lol. Team AnandTech - SETI@H, Muon1 DPAD, Folding@H, MilkyWay@H, Asteroids@H, LHC@H, POGS, Rosetta@H, Einstein@H,DHPE & CPDN Main rig - Ryzen 3600, 32GB DDR4 3200, RX 580 8GB, Win10 2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win7 |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Last modified: 18 Mar 2014, 1:01:14 UTC Those numbers look like they might be right but since you didn't show your work you get only half marks ;-) It's easy to see that you are at the point (or soon will be) where if you add the purchase price of the cheaper card to the cost of the electricity you've purchased to operate it, you've spent the equivalent of the purchase price of the more expensive card. In other words you spent (or soon will have spent) the money but don't own the faster and more efficient card, Furthermore, if you happened to have used a credit card to buy the GPU and paid interest then you're losing money faster than a drunken sailor on shore leave in Dublin. It goes something like this.... Define TCO (total cost of owning) to be purchase price + cost of operating. Define * to mean multiplication. You buy an old video card for $40 and a new one for $265. Let's call the old one A and the new one B. A requires 100 watts to operate, B requires 65. You plug them both in and start crunching with both cards. At that point in time TCO_B (TCO of B) is much higher than TCO_A however, since A costs more to operate per hour than B, it follows that at some point in the future TCO_A will catch up to TCO_B. In other words, at some point in the future we will have the condition where TCO_A = TCO_B. Obviously, TCO_A = p_A + (oc_A * t), where p_A is A's purchase price, (oc_A * t) is the cost of operating over t amount of time, and oc_A is the operating cost of A per unit of time. Similarly, TCO_B = p_B + (oc_B * t). So we can write, p_A + (oc_A * t) = p_B + (oc_B * t) then solve for t to get the operating time at which TCO_A = TCO_B. If you continue to operate A after that amount of time then TCO_A will become larger than TCO_B in spite of it's initial lower purchase price. Of course there is at least 1 deficiency in the above model and that is that it doesn't tell us which one, A or B, is the cheaper bang for the buck with respect to total cost per task over time. Obviously if one wanted to crunch just 1 task then A is the less expensive option. Same if we wanted to crunch just 2 task or maybe even 4 tasks but what about 100 tasks or 500 tasks? When does the total cost (purchase price + electricity) of A per task become equal to the cost of B per task? Can anyone develop the model further to answer that question? BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 24 Aug 13 Posts: 111 Credit: 31,709,843 RAC: 3,000 |
Last modified: 18 Mar 2014, 19:00:30 UTC Not me I just managed to keep up with you! ;) It's easy to see that you are at the point (or soon will be) where if you add the purchase price of the cheaper card to the cost of the electricity you've purchased to operate it, you've spent the equivalent of the purchase price of the more expensive card Err no, I guess you've mis-read my post as out of the 4 cards I was comparing (4870, 5850, 5870, & 6950) I bought the 5850 which was the cheapest card to run, as well as being the cheapest card to buy out of the 3 newer cards I was looking at. Oh & lol @ 1/2 marks, is that still enough for a silver star? ;) I didn't want to fill out my post too much with numbers, but here's the ones for the 4870 vs 5850. For power estimates I used AnandTechs 'bench' & their various reviews of said cards. Loaded, the 5850 uses 14w less than the 4870 in Metro, 28w less in Crysis, & 23w less in OCCT giving an average loaded power usage of ~21w less. The 5850s idle power is 40w less,(I think I got that from the bench too but I didn't record that). So 6 mths with the GPU idling (ignoring the occasional game play :P) means 40w less power draw. At 14p/kwh, 14(p) x 0.04 = 0.56p/hr x 24(hrs)= 13.44p/day x 365/2 = £24.53 saving per 6 mths. (I originally rounded daily savings to 13p/day which gave me £23.73) With the GPU crunching for 6 mths using 21w less, 7.056p/day* = £12.88/6 mths saving. *I rounded to 7p originally. Total year saving £37.41, approx. of course ;). These are the actually power figures I got :- PC as per sig but with 4870 (stock speeds) Running F@H (all 4 cores) & MW on GPU 265-272w F&H CPU only 218-220w MW on GPU, CPU 'idle' 209-211w Idle 150w Same rig with 5850 (stock speeds) :- Running F@H (all 4 cores) & MW on GPU 255-259w F&H CPU only 175w MW on GPU, CPU 'idle' 192-194w Idle 105-106w (F@H 3 cores, MW on GPU 250w) Power measured at the wall. So loaded power is a bit more than I estimated for but idle is a bit less! lol, not worked out actual cost difference but it's much faster (246s per long 213.76 WU vs 444s for the 4870, times for GPU nr 100% load) & does indeed use less power :). Team AnandTech - SETI@H, Muon1 DPAD, Folding@H, MilkyWay@H, Asteroids@H, LHC@H, POGS, Rosetta@H, Einstein@H,DHPE & CPDN Main rig - Ryzen 3600, 32GB DDR4 3200, RX 580 8GB, Win10 2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win7 |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Ahh, there's the work, your gold star is in the mail ;-) Yes, I see now what you meant. I misinterpreted, probably read too fast or something. Actually there is another deficiency in the model I presented. In the model the newer card, B, uses less power than the older, cheaper card. That's rarely the case in the real world. What is true is that newer cards based on smaller lithography use less power to produce the same amount of work. Hence the need for a better model. I'm working on it, anybody else close? BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Send message Joined: 24 Aug 13 Posts: 111 Credit: 31,709,843 RAC: 3,000 |
Lol. Re working out cost per credit, wouldn't my MW benchmark thread help with that? Team AnandTech - SETI@H, Muon1 DPAD, Folding@H, MilkyWay@H, Asteroids@H, LHC@H, POGS, Rosetta@H, Einstein@H,DHPE & CPDN Main rig - Ryzen 3600, 32GB DDR4 3200, RX 580 8GB, Win10 2nd rig - i7 4930k @4.1 GHz, 16GB DDR3 1866, HD 7870 XT 3GB(DS), Win7 |
Send message Joined: 16 Aug 12 Posts: 293 Credit: 1,116,280 RAC: 0 |
Last modified: 20 Mar 2014, 0:43:21 UTC I dunno, never read that thread and if it's at the Milkyway forums I likely never will (I have no use for their gong show, skank admins, mods, and if they have a sceensaver it sucks too). Maybe it helps for the mentioned cards. Maybe it applies to Milky way only, I dunno. I would like a model/formula that applies to every GPU and every project. Why? Well, why not? BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
Message boards :
Number crunching :
GPU and Asteroids@home