The Newton engine is now Open source with a zlib license.

by **aqnuep** » Tue Feb 22, 2011 7:30 am

Actually they do already. I have a Samsung Galaxy S which has in fact stronger hardware than the iPhone4 and the new upcoming Android phones will be equiped with dual-core CPUs.

by **Aphex** » Wed Feb 23, 2011 9:51 am

May be a good move to port over to the Airplay SDK http://www.airplaysdk.com/

by **Nodrev** » Thu Mar 03, 2011 9:47 am

This open source release is really an excellent news!

by **martinsm** » Sat Mar 05, 2011 6:39 pm

I want to thank Julio for open-sourcing his awesome Newton physics engine. Thanks to that I've been able to port my little unfinished game to Android, and to PSP! It was really fun porting project.

Performance on high-end Android devices is more or less same as on iOS devices. On PSP situation is more dramatic. Depending on scene NewtonUpdate takes up to 200ms. That's too much. But I guess its expected - PSP has only 333Mhz MIPS CPU.

Oh, and by the way during porting I found small bug in Newton. When building TreeCollision NewtonTreeCollisionEndBuild was ignoring optimize parameter. I needed to passs there false (to not to optimize), because Newton was crashing when optimize=true.

Needed change was here - dgAABBPolygonSoup.cpp file, line 739.
http://code.google.com/p/newton-dynamic ... up.cpp#739
I commented out "optimizedBuild = true;" and my game was not crashing.

Strangely, this was happening only on Android, not PC or iOS. I guess it's because of differences in architecture/floating-point operations.

by **Julio Jerez** » Sat Mar 05, 2011 8:02 pm

Woow, woow, wooow, you gave ton of information there in just few line. :mrgreen:

Let us take it one line at time.

martinsm wrote:I want to thank Julio for open-sourcing his awesome Newton physics engine. Thanks to that I've been able to port my little unfinished game to Android, and to PSP! It was really fun porting project.

That is awesome, tell us more.
I was under the impression that Android can only be programmed in Java, how did you do it so fast.

martinsm wrote:Performance on high-end Android devices is more or less same as on iOS devices. On PSP situation is more dramatic. Depending on scene NewtonUpdate takes up to 200ms. That's too much. But I guess its expected - PSP has only 333Mhz MIPS CPU.

I know that the PSP is has the same cpu of the PS2 therefore there is a math coprocessor that can be used to speed up calculations. This is one of the principal reasons I am moving to Newton SDK 3.00.
The Simd D implementation in SDK 2.00 is very messy and hard to debug, because I used macros to wrap the intrinsic functions.
In SDK 3.00 the simd implemnetaion is a Class, that can eassly be re-implemented for Simd instruction on different CPUs.
My next step will be to make the float emulation, the Altivec implementation, and the intel ABX

This is how the class look like so far

Code: Select all: class simd_128 { public: DG_INLINE simd_128 () {} DG_INLINE simd_128 (simd_type type): m_type(type) {} DG_INLINE simd_128 (dgFloat32 a): m_type(_mm_set_ps1(a)) {} DG_INLINE simd_128 (const simd_128& data): m_type(data.m_type) {} DG_INLINE simd_128 (dgInt32 a): m_type (_mm_set_ps1 (*(dgFloat32*)&a)){} DG_INLINE simd_128 (const dgFloat32* const ptr): m_type(_mm_loadu_ps (ptr)) {} DG_INLINE simd_128 (dgFloat32 x, dgFloat32 y, dgFloat32 z, dgFloat32 w): m_type(_mm_set_ps(w, z, y, x)) {} DG_INLINE simd_128 (dgInt32 ix, dgInt32 iy, dgInt32 iz, dgInt32 iw): m_type(_mm_set_ps(*(dgFloat32*)&iw, *(dgFloat32*)&iz, *(dgFloat32*)&iy, *(dgFloat32*)&ix)) {} DG_INLINE dgInt32 GetInt () const { return _mm_cvtss_si32(m_type); } DG_INLINE void StoreScalar(float* const scalar) const { _mm_store_ss (scalar, m_type); } DG_INLINE void StoreVector(float* const array) const { _mm_storeu_ps (array, m_type); } DG_INLINE simd_128 operator= (const simd_128& data) { m_type = data.m_type; return (*this); } DG_INLINE simd_128 operator+ (const simd_128& data) const { return _mm_add_ps (m_type, data.m_type); } DG_INLINE simd_128 operator- (const simd_128& data) const { return _mm_sub_ps (m_type, data.m_type); } DG_INLINE simd_128 operator* (const simd_128& data) const { return _mm_mul_ps (m_type, data.m_type); } DG_INLINE simd_128 operator/ (const simd_128& data) const { return _mm_div_ps (m_type, data.m_type); } DG_INLINE simd_128 operator<= (const simd_128& data) const { return _mm_cmple_ps (m_type, data.m_type); } DG_INLINE simd_128 operator>= (const simd_128& data) const { return _mm_cmpge_ps (m_type, data.m_type); } DG_INLINE simd_128 operator< (const simd_128& data) const { return _mm_cmplt_ps (m_type, data.m_type); } DG_INLINE simd_128 operator> (const simd_128& data) const { return _mm_cmpgt_ps (m_type, data.m_type); } DG_INLINE simd_128 operator& (const simd_128& data) const { return _mm_and_ps (m_type, data.m_type); } DG_INLINE simd_128 operator| (const simd_128& data) const { return _mm_or_ps (m_type, data.m_type); } DG_INLINE simd_128 AndNot (const simd_128& data) const { return _mm_andnot_ps (data.m_type, m_type); } DG_INLINE simd_128 AddHorizontal () const { simd_128 tmp (_mm_add_ps (m_type, _mm_shuffle_ps(m_type, m_type, PURMUT_MASK(2, 3, 0, 1)))); return _mm_add_ps (tmp.m_type, _mm_shuffle_ps(tmp.m_type, tmp.m_type, PURMUT_MASK(1, 0, 3, 2))); } DG_INLINE simd_128 DotProduct (const simd_128& data) const { simd_128 dot ((*this) * data); return dot.AddHorizontal(); } DG_INLINE simd_128 CrossProduct (const simd_128& data) const { return _mm_sub_ps (_mm_mul_ps (_mm_shuffle_ps (m_type, m_type, PURMUT_MASK(3, 0, 2, 1)), _mm_shuffle_ps (data.m_type, data.m_type, PURMUT_MASK(3, 1, 0, 2))), _mm_mul_ps (_mm_shuffle_ps (m_type, m_type, PURMUT_MASK(3, 1, 0, 2)), _mm_shuffle_ps (data.m_type, data.m_type, PURMUT_MASK(3, 0, 2, 1)))); } DG_INLINE simd_128 Abs () const { __m128i shitSign = _mm_srli_epi32 (_mm_slli_epi32 (*((__m128i*) &m_type), 1), 1); return *(__m128*)&shitSign; } DG_INLINE simd_128 Floor () const { const dgFloat32 magicConst = (dgFloat32 (1.5f) * dgFloat32 (1<<23)); simd_128 mask (magicConst, magicConst, magicConst, magicConst); simd_128 ret (_mm_sub_ps(_mm_add_ps(m_type, mask.m_type), mask.m_type)); simd_128 adjust (_mm_cmplt_ps (m_type, ret.m_type)); ret = _mm_sub_ps (ret.m_type, _mm_and_ps(_mm_set_ps1(1.0), adjust.m_type)); _ASSERTE (ret.m_type.m128_f32[0] == dgFloor(m_type.m128_f32[0])); _ASSERTE (ret.m_type.m128_f32[1] == dgFloor(m_type.m128_f32[1])); _ASSERTE (ret.m_type.m128_f32[2] == dgFloor(m_type.m128_f32[2])); _ASSERTE (ret.m_type.m128_f32[3] == dgFloor(m_type.m128_f32[3])); return ret; } DG_INLINE dgInt32 GetSignMask() const { return _mm_movemask_ps(m_type); } DG_INLINE simd_128 InvRqrt () const { simd_128 half (dgFloat32 (0.5f)); simd_128 three (dgFloat32 (3.0f)); simd_128 tmp0 (_mm_rsqrt_ps(m_type)); return half * tmp0 * (three - (*this) * tmp0 * tmp0); } DG_INLINE simd_128 GetMin (const simd_128& data) const { return _mm_min_ps (m_type, data.m_type); } DG_INLINE simd_128 GetMax (const simd_128& data) const { return _mm_max_ps (m_type, data.m_type); } DG_INLINE simd_128 MaximunValue() const { simd_128 tmp (GetMax (_mm_movehl_ps (m_type, m_type))); return tmp.GetMax (_mm_shuffle_ps(tmp.m_type, tmp.m_type, PURMUT_MASK(0, 0, 0, 1))); } DG_INLINE simd_128 MoveHighToLow (const simd_128& data) const { return _mm_movehl_ps (m_type, data.m_type); } DG_INLINE simd_128 MoveLowToHigh (const simd_128& data) const { return _mm_movelh_ps (m_type, data.m_type); } DG_INLINE simd_128 PackLow (const simd_128& data) const { return _mm_unpacklo_ps (m_type, data.m_type); } DG_INLINE simd_128 PackHigh (const simd_128& data) const { return _mm_unpackhi_ps (m_type, data.m_type); } simd_type m_type; };

The good thing is that it generate the exact or better binary that using the Intrinsic, plus we get the added bonus that the code look almost like the dVector Class version.
Remarkably in many cases the new Class generate better and faster code than the Macros, because it is very easy to try different flavors of the Intrinsic that support the function.
In no case the binary generate using the class is inferior to the Macros.

So when 3.00 is to the point that is usable, we can eassly make the class that use the Simd version of the PSP and see how that works.
Do the Androids support Simd instructions?

martinsm wrote:Oh, and by the way during porting I found small bug in Newton. When building TreeCollision NewtonTreeCollisionEndBuild was ignoring optimize parameter. I needed to passs there false (to not to optimize), because Newton was crashing when optimize=true.

Needed change was here - dgAABBPolygonSoup.cpp file, line 739.
http://code.google.com/p/newton-dynamic ... up.cpp#739
I commented out "optimizedBuild = true;" and my game was not crashing.

Strangely, this was happening only on Android, not PC or iOS. I guess it's because of differences in architecture/floating-point operations.

This is very strange it is important that Tress collision call the function Improve Fitness, because that the function than Make the tree very eficion.

Worse come to worse you can save you serialize collision trees, and just load then as an asset.
Some of the utilities function is Newton use very, very heavy float math, in some case I use double precision and in other even use Exact arithmetic float (googels) which also use double Presidion.
So my guess is that if those devices do not support double Presidion float tree collision optimization and convex hull creation may fail.
In any case those can also be serialized and load as assets.

The good thing is that Newton 3.00 will implement an exact Googel class based on 32 bit floats to support the unlimited size world broad phase and when I have it ready maybe I can replace the tree optimization and other algorithm so that they all use 32 bit float when compiling for float.

by **Aphex** » Sun Mar 06, 2011 5:21 am

Julio Jerez wrote:I was under the impression that Android can only be programmed in Java.

That Airplay sdk allows you to write & debug c++ in visual studio then export to almost all mobile devices.

by **JernejL** » Sun Mar 06, 2011 11:46 am

you can use native compiled code on android from what i remember.

by **martinsm** » Sun Mar 06, 2011 12:13 pm

Julio Jerez wrote:Let us take it one line at time.

There's not much to tell really. I had simple game on PC/MacOSX/Linux that used OpenGL 1.5 for rendering and Newton 1.53 for physics. Later I ported game for iPad using OpenGL ES 1.x using closed source Newton 2.x.

Few weeks ago I saw that Newton is open-sourced. I thought cool - now I can port my game to some other platforms as it is, and I don't need to replace physics engine with some other open-source engine (I was thinking to replace it with bullet).

So first I ported iPad version to Android. It was very similar experience as with iPad. I created window in Java (on iPad I used Cocoa), and all the physics updating, game logic and rendering was happening in C++ (for both iPad and Android). It went very smooth.

Last week I started to port to PSP. I used homebrew PSP SDK (called Minimalist PSPSDK) to compile with GCC my game. That gave me more challenge. PSPGL API that implements OpenGL using PSP GU was very picky about some things. But I managed to get it rendering. With Newton there was some few tiny problems - I had some issues with pthread port of PSPSDK so I needed to modify Newton source a bit to disable pthread calls (PSP anyway has 1 CPU, so no advantage of multithreading). Unfortunetaly performance as I wrote above was disappointing, but I think this is not a Newton fault. It's more like my game architecture and how my game is using Newton fault. It was not intended initially to port this game to mobile/handheld platforms so it uses some quite an unoptimal things for physics and game structure. And I'm lazy to rewrite everything

I just had a fun seeing Newton run on PSP and Android

Oh, and for audio system I used OpenAL. On PC/MacOX/Linux/iPad that wasn't big deal. For Android I wrote special backend to OpenAL-Soft: http://repo.or.cz/w/openal-soft/android.git and for PSP I found that somebody also have written OpenAL-Soft backend: http://forums.ps2dev.org/viewtopic.php?t=11769

Julio Jerez wrote:I was under the impression that Android can only be programmed in Java, how did you do it so fast.

There are two SDK available for Android - first SDK is for compiling Java code. I used that to set up OpenGL ES window for rendering, and receiving input touches from touchscreen. Second SDK is called NDK = Native Development Kit. It consists of GCC toolchain that allows to create shared library so file (what is dll in Windows). And functions in this so file can be called from Java world. Inside this so file I used Newton for physics simulation, and OpenGL ES functions for rendering. I basically created simple makefile (similar what you have for Linux) with all the Newton source files, and it compiled fine with Android NDK.

You can even use this NDK to compile static library (libNewton.a) file to distibute release as binary - similar to iPhone, if you want.

This is how the class look like so far

I suggest dropping reference to const simd_128& data function arguments. GCC is very picky about that. If it sees reference/pointer to SIMD type, then it will omit some pretty significant optimizations. Pass this class by value and GCC will automatically inline everything without redundant copies.

Do the Androids support Simd instructions?

Yes. Newer Androids have NEON - same thing for iPhone/iPad. You can code them using inline assembly, or similar intrinsics as SSE. Both Android and iPhone/iPad's use same architecture - ARM, so bacially they are running exactly same architectures. Android NDK includes some sample projects of how to use NEON in native C/C++ code.

If somebody has questions for me regarding Android/PSP feel free to ask me more.

by **Julio Jerez** » Sun Mar 06, 2011 1:06 pm

martinsm wrote:I suggest dropping reference to const simd_128& data function arguments. GCC is very picky about that. If it sees reference/pointer to SIMD type, then it will omit some pretty significant optimizations. Pass this class by value and GCC will automatically inline everything without redundant copies.

Oh very ineteresting, dir you mean replace the prototypes for this:
DG_INLINE simd_128 DotProduct (const simd_128& data) const;
to this:
DG_INLINE simd_128 DotProduct (const simd_128 data) const;

martinsm wrote:
Do the Androids support Simd instructions?
Yes. Newer Androids have NEON - same thing for iPhone/iPad. You can code them using inline assembly, or similar intrinsic as SSE.

Ha very good info, last year when I made the IPot/Iphone project I was asking over the Mac forum if the iphone sdk support simd using intrinsic, and they reply that although it supported simd it had to be coded using inline assembly.
It is good to know that we can also use intrinsic for ARM cpu, but even if there aren’t any intrinsic, CGG inline assembly closely resemble intrinsic in visual studio in the sence that the compiler do they won register allocation.
So when finish the conversion of all sse intrinsic style to use the simd_128 class I will start all the conversion to CGG inline assembly, altivec intrinsic, xbox 360 intrisisics and PS3 intrinsics.
So we will have all grounds covered.

by **martinsm** » Sun Mar 06, 2011 1:56 pm

Julio Jerez wrote:Oh very ineteresting, dir you mean replace the prototypes for this:
DG_INLINE simd_128 DotProduct (const simd_128& data) const;
to this:
DG_INLINE simd_128 DotProduct (const simd_128 data) const;

Yes, exactly!

by **Julio Jerez** » Sun Mar 06, 2011 2:02 pm

I already did it, and you are correct using the new prototype lead to better bionaty copde generation in visual studio as well ,
the code as almost identical except that in many places they are redundant move instructions that are omitted when passing argument by value

Code: Select all: using const simd& data mov eax, DWORD PTR _data$[ebp] movaps xmm0, XMMWORD PTR [eax] movaps xmm1, XMMWORD PTR [ecx] mov eax, DWORD PTR ___$ReturnUdt$[ebp] subps xmm1, xmm0 movaps XMMWORD PTR [eax], xmm1 using const simd data movaps xmm0, XMMWORD PTR [ecx] movaps xmm1, XMMWORD PTR _data$[ebp] mov eax, DWORD PTR ___$ReturnUdt$[ebp] subps xmm0, xmm1 movaps XMMWORD PTR [eax], xmm0

plus in many places instructions like movaps are replaced with movss, and in general each function has equal of fewer instruction.
Yes I believe it is better passing argument by value in the class thanks for that tip.
I think this is what Microsoft was talking about with VS 2010 doin better register allocation, so it sodul be even better with VS 2010

by **Julio Jerez** » Sun Mar 06, 2011 4:22 pm

Oh I got too happy too soon, In visual studio it is not realible.
Some how in release works fine but in debug crashes generating misalined data.

I will ifdef out one for visual studio to use const simd_type &
and the other using argument passed by value for GCC.

by **martinsm** » Sun Mar 06, 2011 4:43 pm

Have you instructed Visual Studio to align your simd_128 class?
Like this:

Code: Select all: __declspec((align(16)) class simd_128 { ... };

Also all the classes that contains this simd_128 class must be aligned. That means:
1) classes must be aligned with __declspec((align(16))
2) memory allocations must return 16-byte aligned pointers if these classes are allocated from heap.
There is no other way around this.

This not-aligning simd_128 class will get you crash sooner or later, both in Debug & Release configurations with both using reference & in argument list or not using. It's just that you've got lucky with code generation that didn't tried access argument without aligned load.

I somehow think that this universal simd_128 class will not work with such broad class of different hardware simd types. Sooner or later there will be some "universal" operations that either will be impossible to implement or will be very inefficient on one or multiple platforms. I believe this hardware specialization must be done on higher level, not on so low-level. Like - imagine how impossible your simd_128 class would be to use for splitting work on PS3 SPU's (btw that's my next target for running Newton in near future - on PS3).

by **Julio Jerez** » Sun Mar 06, 2011 5:07 pm

Yes I have the aligment,
I did not put aligment on the simd_128 class because it only contain one member, by I beleve you are right it needs aligment too.
anyway I will try that again later.

On the universal class for simd d I do beleive it will work,
I you look at the code in 2.00 I tried to do that before for Power pc and Intell and I had both working.
The problem si taht I did ti with Macros and macros are very difficult to debug and also lead to unreable code.
a classes with operator overload leads to code that looks almost equal to the code using floats.

I do not think we will have major problem with console liek PS3 and the xbox 360. They are both base of risk power pc cpu with altivec.
once you have and alivec class it sodul work without change on both console.
In fact since altivec is more flexible than SSE, if we can have teh class for Intell cpu movn to altivec is very eassy.

The biggest problem wll be the SPU part, but that is a thread problem more than simd problem, and for that I also has some solutions as well.
but it is too early to start think about that now.

by **Julio Jerez** » Mon Mar 07, 2011 12:23 pm

Ha Ok I see why Visual studion do not like aligned aguments passed by values
http://msdn.microsoft.com/en-gb/library/ms864731.aspx

It is a short comming of Visual studion, I get these warnings when I changed to pass by value
2>c:\newton-dynamics_300\corelibrary_300\source\core\dgSimd_Instrutions.h(350) : error C2719: 'src0': formal parameter with __declspec(align('16')) won't be aligned

not ine every funtion, and I do not know why, But the fact is that passing argument by values is what makes Visual studio work in Release and Fail in debug.
In releasemode the function are issued inline, so the arguments are aligned anyway,
In debug theyr passed on the stack and there according to that doc in the link, it will fail because VS dopes no pass lifgned valued on the stack.
You would think that will be so eassy to fix, but the do not support it.

Anyway It is good to know and I know and worse coem to woprse I can just make a Debug and a Release class.
for GCC I definitelly use teh pass by value mode.

The Newton engine is now Open source with a zlib license.

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Re: The Newton engine is now Open source with a zlib license

Who is online