Could you compile an AVX2-128bit application for AMDs Ryzen?


Message boards : Problems and bug reports : Could you compile an AVX2-128bit application for AMDs Ryzen?

Message board moderation

To post messages, you must log in.
AuthorMessage
Opteron

Send message
Joined: 13 May 13
Posts: 3
Credit: 27,120
RAC: 0
Message 5173 - Posted: 6 Mar 2017, 17:20:29 UTC
Hi all,

First I have to thank everybody for offering so many applications for different CPU instruction sets. That is a very good service!

I now just wonder, if you could improve that service even more ;)

AMD's new CPU architecture is out and the big difference to Intel's is the handling of AVX instructions. AMDs FPUs are just 128 bit, not like Intel's 256bit.

So the decoder has to crack down each 256bit instruction into 2 internal 128bit ones, that costs additional time and produces an overhead. Especially when running 2 threads on one core (AMD has SMT now, too), the front-end is heavily loaded.

AMD's previous architecture - Bulldozer - had the same problem, but the Bulldozer application is not usable, cause it also uses FMA4 instructions, which is not supported by Zen any more :(

I guess you use GCC as compiler, there is one switch to limit the AVX width to 128 bit:

"-mprefer-avx128
This option instructs GCC to use 128-bit AVX instructions instead of 256-bit AVX instructions in the auto-vectorizer."

If you would also like to use Zen's other supported extensions, then please add these, too:

-mavx2
-mcx16
-mmovbe
-mf16c
-mpopcnt
-mbmi
-mbmi2
-mclzero
-mclflushopt
-mprefer-avx128

(I hope I found all).

There is also a generic "Zenver1" option, but with some tests, these compilations perform worse. Thus I would use the instruction-extensions, only.

It might cost a tiny bit IPC, but as the core uses SMT, it wouldnt be wasted.

Thanks a lot

Alex
ID: 5173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Avatar

Send message
Joined: 19 Jun 12
Posts: 21
Credit: 107,293,560
RAC: 0
Message 5209 - Posted: 9 Mar 2017, 16:11:00 UTC - in response to Message 5173.  

Last modified: 9 Mar 2017, 16:11:24 UTC
Hi,

this ones for linux 64 bit.

http://www.boincunited.org/opt_apps/period_search_10210_x86_64-pc-linux-gnu__avx128.tar.bz2

if that one works, i'll compile one for windows too.

Join BOINC United now!
ID: 5209 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Opteron

Send message
Joined: 13 May 13
Posts: 3
Credit: 27,120
RAC: 0
Message 5227 - Posted: 10 Mar 2017, 19:15:50 UTC

Last modified: 10 Mar 2017, 19:20:17 UTC
Thanks, but the link does not work, I get a "no hotlink" warning :(

http://www.boincunited.org/images/no_hotlink.gif

Edit: Works with MS IE, no clue why.
ID: 5227 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Avatar

Send message
Joined: 19 Jun 12
Posts: 21
Credit: 107,293,560
RAC: 0
Message 5236 - Posted: 12 Mar 2017, 16:48:41 UTC - in response to Message 5227.  
Thanks, but the link does not work, I get a "no hotlink" warning :(

http://www.boincunited.org/images/no_hotlink.gif

Edit: Works with MS IE, no clue why.


https://de.wikipedia.org/wiki/Hotlinking

Join BOINC United now!
ID: 5236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 23 Apr 16
Posts: 5
Credit: 14,169,651
RAC: 61,132
Message 5238 - Posted: 13 Mar 2017, 22:32:23 UTC
Thanks for the compile, unfortunately it seems not to work yet :-(

http://asteroidsathome.net/boinc/result.php?resultid=148531759

btw, the link works fine with wget or Chromium, no need for MS to be involved ;-)
ID: 5238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Avatar

Send message
Joined: 19 Jun 12
Posts: 21
Credit: 107,293,560
RAC: 0
Message 5246 - Posted: 14 Mar 2017, 19:44:24 UTC - in response to Message 5238.  
sorry bout that.

here's another one to try.

http://www.boincunited.org/opt_apps/period_search_10210_x86_64-pc-linux-gnu__avx128_v2.tar.bz2

Join BOINC United now!
ID: 5246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 23 Apr 16
Posts: 5
Credit: 14,169,651
RAC: 61,132
Message 5247 - Posted: 14 Mar 2017, 21:17:54 UTC
Thanks! That one is working, WUs are running 2 minutes already, results in 2-3 hours...

Where is the difference, which flags did you use / skip in this one?
ID: 5247 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 23 Apr 16
Posts: 5
Credit: 14,169,651
RAC: 61,132
Message 5248 - Posted: 14 Mar 2017, 23:21:55 UTC
http://asteroidsathome.net/boinc/results.php?hostid=312297

Might have to wait for some more WUs to complete, but so far it doesn't seem like an improvement over SSE3.
ID: 5248 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HA-SOFT, s.r.o.
Project developer
Project tester

Send message
Joined: 21 Dec 12
Posts: 176
Credit: 136,462,135
RAC: 0
Message 5254 - Posted: 15 Mar 2017, 21:50:42 UTC - in response to Message 5248.  
http://asteroidsathome.net/boinc/results.php?hostid=312297

Might have to wait for some more WUs to complete, but so far it doesn't seem like an improvement over SSE3.


And what do you expect in terms of speed of an app?
ID: 5254 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 23 Apr 16
Posts: 5
Credit: 14,169,651
RAC: 61,132
Message 5255 - Posted: 15 Mar 2017, 22:38:39 UTC
AMD powered miracles, nothing less :-D
ID: 5255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HA-SOFT, s.r.o.
Project developer
Project tester

Send message
Joined: 21 Dec 12
Posts: 176
Credit: 136,462,135
RAC: 0
Message 5258 - Posted: 16 Mar 2017, 8:00:01 UTC - in response to Message 5255.  
AMD powered miracles, nothing less :-D


So it's ok :-)

Compiler never brings any speedup except 32->64 bit version without SSE,AVX,FMA.
ID: 5258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Opteron

Send message
Joined: 13 May 13
Posts: 3
Credit: 27,120
RAC: 0
Message 5263 - Posted: 16 Mar 2017, 18:51:01 UTC - in response to Message 5258.  
I've expected maybe around 10%-20%. Compared to SSE, AVX uses a denser decode scheme (VEX-prefix), that saves some bytes in the first decode stage. Especially when using SMT the front end is under load, so shorter instructions could help in that case.

However, it seems it doesnt change much. Maybe the µOp-buffer decreases the decoder's load significantly.

And then there's AMD's specialty: 128 bit AVX. The decoder has to split up 256 bit AVX instructions, that decreases performance. But even in that context, the µOp-buffer might help.

Anyways, it is certainly not bad idea to spoon feed the cores with an appropriate compiled code. At least it will not decrease performance ;)
ID: 5263 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 23 Apr 16
Posts: 5
Credit: 14,169,651
RAC: 61,132
Message 5264 - Posted: 17 Mar 2017, 0:31:41 UTC

Last modified: 17 Mar 2017, 0:44:41 UTC
Btw, not exactly AVX related, but since we have everyone on board already.
The news on http://www.numberworld.org/y-cruncher/ says Ryzen can run FMA4, even though the FMA4 feature flag is not set. I tried to run www.boincunited.org/opt_apps/period_search_10210_x86_64-pc-linux-gnu__fma4.tar.bz2, but it failed within few seconds, SIGILL: illegal instruction.

Any other way to confirm this?

I've seen you using GCC earlier somewhere, can you give 6.3 a try, maybe with -march=znver1 and -O3?
Have you tried to build with Clang?
As per the Phoronix results, it can make a substantial difference in some apps.
http://www.phoronix.com/scan.php?page=article&item=gcc-clang-ryzen
ID: 5264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Problems and bug reports : Could you compile an AVX2-128bit application for AMDs Ryzen?