Others InvSqrt implementations
A month ago – April 2009 – I was talking with a friend about computational methods, efficiency of algorithms, and as common in these areas, implementations of Newton-Raphson method. He commented on the Fast InvSqrt function, used on Quake III’s code, which used to be faster than using the inverse of sqrt (the native library). At the same day, I did a research in google. Several interesting links – ([1] gamedev, [2] LOMONTE CHRIS, [3] betterexplained.com) – which taught me about the function:
float InvSqrt (float x) {
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;
}
Then I compiled the function in some small programs to test the efficiency between invSqrt and the native implementation (float) (1.0/sqrt (x)).
The result was a little disappointing. The native implementation was faster than using the InvSqrt, possibly because architectures and compilers should have improved much in floating point arithmetics since 1999, when Quake III was released. And then I modify the implementation of the function using C union
/**
* UnionInvSqrt.c
* Author: Pedro Garcia
*
* http://www.sawp.com.br
* Licensed by Creative Commons: http://creativecommons.org/licenses/by-nc-nd/2.5/en/
**/
union casting {
int i;
float f;
};
float UnionInvSqrt (float x) {
union casting cast;
float xhalf = 0.5f*x;
cast.f = x;
cast.i = 0x5f3759df - ((cast.i)>>1); // x = *(float*)&i; and i = 0x5f3759df - (i>>1);
return cast.f*(1.5f - xhalf*cast.f*cast.f); // x = x*(1.5f - xhalf*x*x);
}
Certainly my function didn’t improve the efficiency in relation to the native implementation. However, using the same numerical value of invsqrt, we can obtain a small improvement in performance using it.
The way used to compare the performance of each function was very simple: in a pre-defined time for execution, which function would generate more terms? In 5 minutes, how many times the function (1/sqrt, UnionSqrt or InvSqrt) would run the code and return?
By running in a pre-determined time, the average speed of the generation of terms can be found as:
Using InvSqrt (x):: Terms [1543696484], Serie [380334.639206]
Using 1/sqrt (x):: Terms [1623751167], Serie [400518.164524]
Using UnionInvSqrt (x):: Terms [1569004170], Serie [386502.806938]
((5000000-4650000)*100%)/(5000000+4650000) = 3,626943005%
http://www.sawp.com.br/sources/InvSqrtC/
http://www.sawp.com.br/sources/BitsInJava/
/**
* Java InvSqrt
* Author: Pedro Garcia
* sawp@sawp.com.br
* http://www.sawp.com.br
* Licensed by Creative Commons: http://creativecommons.org/licenses/by-nc-nd/2.5/en/
**/
strictfp static float InvSqrt(float x){
float xhalf = 0.5f * x;
int i = Float.floatToRawIntBits(x); // convert integer to keep the representation IEEE 754
i = 0x5f3759df - (i >> 1);
x = Float.intBitsToFloat(i);
x = x * (1.5f - xhalf * x * x);
return x;
}
</p>
I applied the same tests as with C, testing error and the difference in efficiency between the InvSqrt and using the native library (float) 1.f/Math.sqrt (x).
In the graph, we can observe that there is no advantage in a matter of efficiency to use the InvSqrt:
float InvSqrt(float x)
{
float xhalf = 0.5f * x;
int i = BitConverter.ToInt32(BitConverter.GetBytes(x), 0);
i = 0x5f3759df - (i >> 1);
x = BitConverter.ToSingle(BitConverter.GetBytes(i), 0);
x = x * (1.5f - xhalf * x * x);
return x;
}
I don’t tested the efficiency of the function or characteristics using .NET . To learn more see the link: http://stackoverflow.com/questions/268853/is-it-possible-to-write-quakes-fast-invsqrt-function-in-c
That’s it! Unfortunately I do not have an old computer to test the implementation using union or InvSqrt in Java. However, if you want to see the results of my tests, the sources or test on other architectures and older machines, you can download the files at:
real function InvSqrt(x)
implicit none
type casting
real :: x
end type casting
real, intent(in) :: x
! castinger
type(casting), target :: pointerTo
! Encode data as an array of integers
integer, dimension(:), allocatable :: enc
integer :: length
real :: xhalf
xhalf = 0.5 * x
! transfer to heap ("promiscuous" operation)
pointerTo%x = x
! encode a memory section from a type to other
length = size(transfer(pointerTo, enc))
allocate(enc(length))
! encoded to integer, now, operate
enc = transfer(pointerTo, enc)
enc(1) = 1597463007 - rshift(enc(1),1)
! decode
pointerTo = transfer(enc, pointerTo)
! dealloc
deallocate(enc)
InvSqrt = pointerTo%x*(1.5 - xhalf*pointerTo%x*pointerTo%x)
end function InvSqrt
## References
[1] www.gamedev.net/community/forums/topic.asp?topic id=139956
[2] Chris Lomont, www.math.purdue.edu/~clomont, “FAST INVERT SQUARE ROOT”, http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf.
[3] http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/