Ok I will add the dll, with next check in.
I am not trying to be annoying or axnoxios, I am simply informing you of some things you migh not know. so this is my last commnet
Instructions like xchg ecx, eax are also bad. no as bad as when they block the bus, but they also stall the pile line.
they are difficult to use in algorithmic optimizations use byt because of the fact that they act on more than one CPU resurce.
these fact of optimization techique is what has made CPU makes to abandone those intructions, in favor of the ones that can be implemented with a small set if micro op that can be issue out of order,
and without register dependecies. this is because the CPU internally have more registers than you see.
xchg ecx, eax is a CISC instruction, and one of the nasties at that. That last thing you want to use are instructions like
xchg, inc, dec, movs, xabc, addc, subc, ... and so on
Let me give you an example
say you have some code liek this
- Code: Select all
a = b + c;
c = b + f;
the assembly pout put may be
- Code: Select all
mov eax, b
add eax, c
mov a, eax
mov eax, b
add eax, f
mov c, eax
it may look that that code is poorly issue by a complier, but the compiler makers know that the CPU will not use register eax, instead it will use a alias of register eax,
so the entire 6 instruction may be executed in on tick all at once as long a there are not internal dependencies and the register can be renamed(eax, in thes case).
if you rearrange the instruction, or you use instructions that change the value of some register, then the CPU decoder can not issue out of order instruction and serialize then all, taking 6 ticks instead of one.
That's the kind of knowledge big compiler makers have that we do not. an this is why you may think that compiler generated code is inefficient, but it is not.