- Code: Select all
a = a.AndNot((f > upperFrictionForce) | (f < lowerFrictionForce));
f = f.GetMax(lowerFrictionForce).GetMin(upperFrictionForce);
the sse code does no have a single slect instruction so select has to be done manually.
the I read in a web side that the problem is the instruction andnot whi does no commute and cuase a register allocator to clap registers. if the selct was implenetd usingh xor the whne used with more instruction the the regisl can resuse resister, so I adde funtion
- Code: Select all
DG_INLINE dgVector Select(const dgVector& data, const dgVector& mask) const
{
// (((b ^ a) & mask)^a)
//return _mm_or_ps (_mm_and_ps (mask.m_type, data.m_type), _mm_andnot_ps(mask.m_type, m_type));
return _mm_xor_ps(m_type, _mm_and_ps (mask.m_type, _mm_xor_ps(m_type, data.m_type)));
}
and what do you know, its true, the code is uses with fewe intrunton and with only 6 registers.
while the original uses 8 registers and still spill to memory.
here is the sequences of instructions.
Old slect instruction. 8 registers spill from memory
- Code: Select all
a = a.Select(zero, f > upperFrictionForce).Select(zero, f < lowerFrictionForce);
f = f.GetMax(lowerFrictionForce).GetMin(upperFrictionForce);
00B7F82A cmpltps xmm1,xmm7
00B7F82E movaps xmm2,xmm7
00B7F831 cmpltps xmm2,xmm5
00B7F835 maxps xmm7,xmm5
00B7F838 movaps xmm0,xmm1
00B7F83B movaps xmm3,xmm2
00B7F83E andps xmm1,xmmword ptr [zero]
00B7F845 andnps xmm0,xmm6
00B7F848 andps xmm2,xmmword ptr [zero]
00B7F84F orps xmm0,xmm1
00B7F852 minps xmm7,xmm4
00B7F855 andnps xmm3,xmm0
new select instruction: 7 registers no spill
- Code: Select all
a = a.Select(zero, f > upperFrictionForce).Select(zero, f < lowerFrictionForce);
f = f.GetMax(lowerFrictionForce).GetMin(upperFrictionForce);
001AF82C cmpltps xmm1,xmm5
001AF830 andps xmm1,xmm0
001AF833 xorps xmm1,xmm2
001AF836 movaps xmm2,xmm5
001AF839 cmpltps xmm2,xmm4
001AF83D movaps xmm0,xmm1
001AF840 maxps xmm5,xmm4
001AF843 xorps xmm0,xmm7
001AF846 andps xmm2,xmm0
001AF849 xorps xmm2,xmm1
001AF84F minps xmm5,xmm3