Could you compile an AVX2-128bit application for AMDs Ryzen?
Message boards :
Problems and bug reports :
Could you compile an AVX2-128bit application for AMDs Ryzen?
Message board moderation
Author | Message |
---|---|
Send message Joined: 13 May 13 Posts: 3 Credit: 27,120 RAC: 0 |
Hi all, First I have to thank everybody for offering so many applications for different CPU instruction sets. That is a very good service! I now just wonder, if you could improve that service even more ;) AMD's new CPU architecture is out and the big difference to Intel's is the handling of AVX instructions. AMDs FPUs are just 128 bit, not like Intel's 256bit. So the decoder has to crack down each 256bit instruction into 2 internal 128bit ones, that costs additional time and produces an overhead. Especially when running 2 threads on one core (AMD has SMT now, too), the front-end is heavily loaded. AMD's previous architecture - Bulldozer - had the same problem, but the Bulldozer application is not usable, cause it also uses FMA4 instructions, which is not supported by Zen any more :( I guess you use GCC as compiler, there is one switch to limit the AVX width to 128 bit: "-mprefer-avx128 This option instructs GCC to use 128-bit AVX instructions instead of 256-bit AVX instructions in the auto-vectorizer." If you would also like to use Zen's other supported extensions, then please add these, too: -mavx2 -mcx16 -mmovbe -mf16c -mpopcnt -mbmi -mbmi2 -mclzero -mclflushopt -mprefer-avx128 (I hope I found all). There is also a generic "Zenver1" option, but with some tests, these compilations perform worse. Thus I would use the instruction-extensions, only. It might cost a tiny bit IPC, but as the core uses SMT, it wouldnt be wasted. Thanks a lot Alex |
Send message Joined: 19 Jun 12 Posts: 21 Credit: 107,293,560 RAC: 0 |
Last modified: 9 Mar 2017, 16:11:24 UTC Hi, this ones for linux 64 bit. http://www.boincunited.org/opt_apps/period_search_10210_x86_64-pc-linux-gnu__avx128.tar.bz2 if that one works, i'll compile one for windows too. Join BOINC United now! |
Send message Joined: 13 May 13 Posts: 3 Credit: 27,120 RAC: 0 |
Last modified: 10 Mar 2017, 19:20:17 UTC Thanks, but the link does not work, I get a "no hotlink" warning :( http://www.boincunited.org/images/no_hotlink.gif Edit: Works with MS IE, no clue why. |
Send message Joined: 19 Jun 12 Posts: 21 Credit: 107,293,560 RAC: 0 |
Thanks, but the link does not work, I get a "no hotlink" warning :( https://de.wikipedia.org/wiki/Hotlinking Join BOINC United now! |
Send message Joined: 23 Apr 16 Posts: 5 Credit: 9,563,861 RAC: 0 |
|
Send message Joined: 19 Jun 12 Posts: 21 Credit: 107,293,560 RAC: 0 |
sorry bout that. here's another one to try. http://www.boincunited.org/opt_apps/period_search_10210_x86_64-pc-linux-gnu__avx128_v2.tar.bz2 Join BOINC United now! |
Send message Joined: 23 Apr 16 Posts: 5 Credit: 9,563,861 RAC: 0 |
|
Send message Joined: 23 Apr 16 Posts: 5 Credit: 9,563,861 RAC: 0 |
|
Send message Joined: 21 Dec 12 Posts: 176 Credit: 136,433,672 RAC: 10,512 |
|
Send message Joined: 23 Apr 16 Posts: 5 Credit: 9,563,861 RAC: 0 |
|
Send message Joined: 21 Dec 12 Posts: 176 Credit: 136,433,672 RAC: 10,512 |
|
Send message Joined: 13 May 13 Posts: 3 Credit: 27,120 RAC: 0 |
I've expected maybe around 10%-20%. Compared to SSE, AVX uses a denser decode scheme (VEX-prefix), that saves some bytes in the first decode stage. Especially when using SMT the front end is under load, so shorter instructions could help in that case. However, it seems it doesnt change much. Maybe the µOp-buffer decreases the decoder's load significantly. And then there's AMD's specialty: 128 bit AVX. The decoder has to split up 256 bit AVX instructions, that decreases performance. But even in that context, the µOp-buffer might help. Anyways, it is certainly not bad idea to spoon feed the cores with an appropriate compiled code. At least it will not decrease performance ;) |
Send message Joined: 23 Apr 16 Posts: 5 Credit: 9,563,861 RAC: 0 |
Last modified: 17 Mar 2017, 0:44:41 UTC Btw, not exactly AVX related, but since we have everyone on board already. The news on http://www.numberworld.org/y-cruncher/ says Ryzen can run FMA4, even though the FMA4 feature flag is not set. I tried to run www.boincunited.org/opt_apps/period_search_10210_x86_64-pc-linux-gnu__fma4.tar.bz2, but it failed within few seconds, SIGILL: illegal instruction. Any other way to confirm this? I've seen you using GCC earlier somewhere, can you give 6.3 a try, maybe with -march=znver1 and -O3? Have you tried to build with Clang? As per the Phoronix results, it can make a substantial difference in some apps. http://www.phoronix.com/scan.php?page=article&item=gcc-clang-ryzen |
Message boards :
Problems and bug reports :
Could you compile an AVX2-128bit application for AMDs Ryzen?