AMD Bulldozer FMA4 app
Message boards :
Number crunching :
AMD Bulldozer FMA4 app
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,739,719 RAC: 597 |
|
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,739,719 RAC: 597 |
First fma4 wu validated. A10-7700 8,764.78, i7-3770 11,695.72 https://asteroidsathome.net/boinc//workunit.php?wuid=16225158 Looks like the app has no checkpointing; after nVidia lockup I had to restart the pc, the wu startet from 0% but with the time already used. 5 more waiting for validation. |
Send message Joined: 23 Oct 12 Posts: 18 Credit: 152,074 RAC: 963 |
First fma4 wu validated. That is strange. I am fairly sure that mine checkpointed... |
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,739,719 RAC: 597 |
Some more validated: A10-7700 Wingman wu 8,728 5,830 i5@3.4GHz Win7 sse3 8,285 8,817 i7 2600 @3.4GHz sse2 7.629 16,054 E5-2650 @2GHz sse2 earlier wu's were running together with onboard-gpu wu's. my earlier avx wu's: 9,231 11,216 10,582 10,292 4,062 8,310 my earlier sse2 wu's 9,154 10,370 9,117 |
Send message Joined: 23 Oct 12 Posts: 18 Credit: 152,074 RAC: 963 |
|
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,739,719 RAC: 597 |
Not bad. Your tasks are running a little faster than the i7's. Not really. If I compare the results against my i7 (hostid=88982) I see the avx there finishing faster (5,671 .. 6,332), but fma4 seems to be faster than sse or avx on fm2+ APU's. And this is the smaller one of the both available A10's. All in all I would say it's an advance. Thaks to Crunch3r! |
Send message Joined: 23 Oct 12 Posts: 18 Credit: 152,074 RAC: 963 |
Not bad. Your tasks are running a little faster than the i7's. I see. By the way, the app is checkpointing with me. |
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,739,719 RAC: 597 |
Yes, you are right, the problem I had were caused by a faulty GTX430 which caused my system to shutdown without saving anything. Did switch between avx and fma4 wu's under same circumstances now (one einstein gpu wu also running); the fma4 wu's seem to be faster. Alexander |
Send message Joined: 19 Jun 12 Posts: 21 Credit: 107,293,560 RAC: 0 |
OK, to anyone who want's to try the BD fma4 app here is the link. http://www.boincunited.org/period_search_10210_windows_x86_64_bd_fma4_gcc.zip It's using anonymous platform and the only thing to do is to copy it to the project directory. I won't go into specifics on how to install the app, since only experienced boinc users should have a go at it. Join BOINC United now! |
Send message Joined: 19 Jun 12 Posts: 221 Credit: 623,640 RAC: 0 |
For me the link gives no_hotlink.gif http://www.boincunited.org/period_search_10210_windows_x86_64_bd_fma4_gcc.zip - ALF - "Find out what you don't do well ..... then don't do it!" :) |
Send message Joined: 23 Oct 12 Posts: 18 Credit: 152,074 RAC: 963 |
|
Send message Joined: 19 Jun 12 Posts: 221 Credit: 623,640 RAC: 0 |
I wanted to get this app to include it in a Benchmark package: http://asteroidsathome.net/boinc/forum_thread.php?id=306 Use it to determine the relative speed of different applications on the same WUs (I don't have a CPU with FMA4 nor 64 bit Windows to test it but it's included in the Benchmark package) I noticed people here try to look at CPU time of Completed tasks to 'measure' the speed of the app but this is hard as different WUs can have much different CPU time on the same Hardware using the same app: http://asteroidsathome.net/boinc/results.php?hostid=110&offset=0&show_names=0&state=4&appid= - ALF - "Find out what you don't do well ..... then don't do it!" :) |
Send message Joined: 28 Apr 13 Posts: 87 Credit: 26,739,719 RAC: 597 |
What one can do is compare his fma4 results against the wingmen; this gives a better impression of the performance. My computers are not hidden; feel free to check the results. |
Send message Joined: 3 Jan 13 Posts: 30 Credit: 1,705,200 RAC: 0 |
After receiving the link to the FMA4 app from Crunch3r beginning of last week by pm - thank you! - and after the end of the 2014 Pentathlon :) I have run a few dozen workunits on my FX-8350 and they all validated ok. Compared to wingmen there is some indication of speedup, but as BilBg already pointed out it's hard to compare due to the differences between WUs and due to the fact that you don't know the exact settings of the wingmen computer (such as clock speed, number of threads used per cpu, throttling, hyperthreading on/off, other programs running etc.). Hence I'll also try to get some results with BilBg's bench package posted yesterday. One question concerning AVX and FMA4 on the Bulldozer: Do these instruction sets benefit from using both 128bit FPUs of one module exclusively? In that case there should be a difference between running one thread per core (i.e. 8 threads on a FX-8xxx) and one thread per module (i.e. 4 threads on a FX-8xxx), right? |
Send message Joined: 3 Jan 13 Posts: 30 Credit: 1,705,200 RAC: 0 |
Last modified: 22 May 2014, 9:13:06 UTC |
Send message Joined: 1 Jan 14 Posts: 302 Credit: 32,739,410 RAC: 3,727 |
Last modified: 22 May 2014, 10:09:49 UTC After receiving the link to the FMA4 app from Crunch3r beginning of last week by pm - thank you! - and after the end of the 2014 Pentathlon :) I have run a few dozen workunits on my FX-8350 and they all validated ok. Compared to wingmen there is some indication of speedup, but as BilBg already pointed out it's hard to compare due to the differences between WUs and due to the fact that you don't know the exact settings of the wingmen computer (such as clock speed, number of threads used per cpu, throttling, hyperthreading on/off, other programs running etc.). Hence I'll also try to get some results with BilBg's bench package posted yesterday. The idea of benchmark units has been around for a very long time, the idea was to get everyone to run one the first time they sign up, but since it would not give any credits, or minimal ones at best, it never caught on. If it did give credits people could cheat by just returning it over and over and over again like they used to do in the bad old days when cheating was rampant! A;; that being said I think BilBg's idea is a good one as it is only for those interested in running it, not mandatory for everyone. It can provide valuable data from those that wish to run it. |
Send message Joined: 3 Jan 13 Posts: 30 Credit: 1,705,200 RAC: 0 |
A first run of benchmarks is finished. - I used two of the not shortened WUs from BilBG's bench package (input_22147_73.wu and input_22152_83.wu) and calculated the average from both elapsed time speedups. - the cpu is an AMD FX-8350 (Piledriver) running at 4.0 GHz (no turbo, no throttling) - no other cpu-intense tasks were running - the reference app (baseline) was period_search_10210_windows_intelx86__sse2.exe Results: 32bit plain: -99.8% 32bit SSE2: +2.8% (same as reference, only for control) 32bit SSE3: +8.4% 32bit AVX: -105,0% 64bit SSE2: +16.6% 64bit SSE3: +16.0% 64bit AVX: -19.3% 64bit FMA4: +22.9% This confirms again that the AVX app is not suited at all for the AMD FX and that the SSE3 app has no or little advantage over SSE2 for that processor. But it shows that Crunch3r's FMA4 app has a significant speedup over the fastest stock app (64bit SSE2). Quite surprising to me is the result for the 32bit AVX app. It's as slow as the plain app and much slower than the 64bit variant. Can anybody confirm this? |
Send message Joined: 3 Jan 13 Posts: 30 Credit: 1,705,200 RAC: 0 |
During the last week I made another benchmark, this time under a more realistic setting: - I used ten WUs from the current batch (well, last weeks batch to be more specific: 150893_1, 150893_12, 150893_2, 150893_28, 150893_29, 150893_3 150893_30, 150893_31, 150894_4 and 150894_5) and calculated the average from the elapsed time speedups again. The minimum and maximum speedups are also included below. - the FX-8350 was again running at 4.0 GHz (no turbo, no throttling) - within BOINC another three tasks with Crunch3r's FMA4 app were running concurrently (using an app_config.xml and the 'mode noBS' switch of the benchmark package) - this time the reference app was period_search_10210_windows_x86_64__sse2.exe, the fastest stock app from the first run, so the baseline was 'higher' than in the first benchmark run By using ten test WUs and running three BOINC WUs concurrently I guess I got some more realistic figures here. One thing to be noted is that some workunits tend to run a bit faster with the SSE3 app while others are faster with the SSE2 app. I noticed the same during another benchmark with one of my intels (Ivy Bridge i7). However, in both cases the differences are minimal, so it doesn't matter much if you run 64bit SSE2 or SSE3. YMMV. Results: 32bit plain: -130.33% avg. (max. -124.79%; min. -135.50%) 64bit SSE3: +0.13% avg. (max. +1.82%; min. -1.56%) 64bit AVX: -32.97% avg. (max. -31.17%; min. -34.81%) 64bit FMA4: +10.91% avg. (max. +12.35%; min. +9.68%) Again a significant speedup of approx. 10% with the FMA4 app and no big difference between 64bit SSE2 and 64bit SSE3. AVX is out of the game again. |
Send message Joined: 23 Oct 12 Posts: 18 Credit: 152,074 RAC: 963 |
During the last week I made another benchmark, this time under a more realistic setting: Nice, hope the project's admins test and make the app official. |
Send message Joined: 23 Oct 12 Posts: 18 Credit: 152,074 RAC: 963 |
|
Message boards :
Number crunching :
AMD Bulldozer FMA4 app