Posts by Opteron

1) Message boards : Problems and bug reports : Could you compile an AVX2-128bit application for AMDs Ryzen? (Message 5263) Posted 16 Mar 2017 by Opteron Post: I've expected maybe around 10%-20%. Compared to SSE, AVX uses a denser decode scheme (VEX-prefix), that saves some bytes in the first decode stage. Especially when using SMT the front end is under load, so shorter instructions could help in that case. However, it seems it doesnt change much. Maybe the µOp-buffer decreases the decoder's load significantly. And then there's AMD's specialty: 128 bit AVX. The decoder has to split up 256 bit AVX instructions, that decreases performance. But even in that context, the µOp-buffer might help. Anyways, it is certainly not bad idea to spoon feed the cores with an appropriate compiled code. At least it will not decrease performance ;)
2) Message boards : Problems and bug reports : Could you compile an AVX2-128bit application for AMDs Ryzen? (Message 5227) Posted 10 Mar 2017 by Opteron Post: Thanks, but the link does not work, I get a "no hotlink" warning :( http://www.boincunited.org/images/no_hotlink.gif Edit: Works with MS IE, no clue why.
3) Message boards : Problems and bug reports : Could you compile an AVX2-128bit application for AMDs Ryzen? (Message 5173) Posted 6 Mar 2017 by Opteron Post: Hi all, First I have to thank everybody for offering so many applications for different CPU instruction sets. That is a very good service! I now just wonder, if you could improve that service even more ;) AMD's new CPU architecture is out and the big difference to Intel's is the handling of AVX instructions. AMDs FPUs are just 128 bit, not like Intel's 256bit. So the decoder has to crack down each 256bit instruction into 2 internal 128bit ones, that costs additional time and produces an overhead. Especially when running 2 threads on one core (AMD has SMT now, too), the front-end is heavily loaded. AMD's previous architecture - Bulldozer - had the same problem, but the Bulldozer application is not usable, cause it also uses FMA4 instructions, which is not supported by Zen any more :( I guess you use GCC as compiler, there is one switch to limit the AVX width to 128 bit: "-mprefer-avx128 This option instructs GCC to use 128-bit AVX instructions instead of 256-bit AVX instructions in the auto-vectorizer." If you would also like to use Zen's other supported extensions, then please add these, too: -mavx2 -mcx16 -mmovbe -mf16c -mpopcnt -mbmi -mbmi2 -mclzero -mclflushopt -mprefer-avx128 (I hope I found all). There is also a generic "Zenver1" option, but with some tests, these compilations perform worse. Thus I would use the instruction-extensions, only. It might cost a tiny bit IPC, but as the core uses SMT, it wouldnt be wasted. Thanks a lot Alex