Platform architecture and micro threads

by **Bird** » Sun Dec 11, 2011 6:17 pm

If I set NewtonSetPlatformArchitecture model to anything other than 0 and use more than 1 micro threads, my progam hits an __ASSERTE (p1.m_w == dgFloat32 (0.0f)) on Line 183 of dgIntersections.h. It doesn't hit the assert if there's only 1 body in the scene.

I tried on the MultiRayCasting SDK demo scene and it happens there too so it's not just me.

steps to reproduce:
---------------------
-run MultiRayCasting demo
-set Options->Use simd
-set Number of physics micro threads to 4;
---------------------

Could you please explain what is going on there?

BTW, the access to the Thread pool is a great addition. I'm sure I'll find many uses for it.

-Bird

by **Julio Jerez** » Sun Dec 11, 2011 7:19 pm

upps my mistake it was a rounding mode on a simd operation in function void dgBroadPhaseNode::SetAABBSimd (const dgVector& minBox, const dgVector& maxBox)

it calculate teh min am max of an aabb here

Code: Select all: { simd_128 scale (DG_BROADPHASE_AABB_SCALE); simd_128 invScale (DG_BROADPHASE_AABB_INV_SCALE); simd_128 p0 ((simd_128&)minBox * scale); simd_128 p1 ((simd_128&)maxBox * scale); p0 = p0.Floor() * invScale; p1 = ((p1 + dgCollisionConvex::m_one).Floor() * invScale)

P1.w is zero, however I add one and the round down but the roudn down on 1 is one, I nee dot and with a mask for triple like this

Code: Select all: p1 = ((p1 + dgCollisionConvex::m_one).Floor() * invScale) & dgCollisionConvex::m_triplexMask;

wow it was good to found it because those are vert hard to fodn bug in moltotreading system.
I looks like some of the threads do not hae the proper round mode. It is exptarnge becaus eit is set whne eh mocro thread are created. by maybe Qt is messing with the CPU state.
anyway I do no liek to reliy on rounding in teh CPU flags so that is fixed now,

try again please.

Bird wrote:BTW, the access to the Thread pool is a great addition. I'm sure I'll find many uses for it.

aren the micro threads greats. they really make a big difference.
this is the biggest difference bewteen core 200 and core 300.
In core 200 I tryed to make every thong data paral;let, by requres a tremedondous amount of house keeping. and I codu never maintain a balance of teh load on each thread.
now with asycronous threads, a tread can have a big load and other can have a small load, and it does not matter because when the thread with a small
load complete it task if can alway teak anoet task from teh pendim job list.

this even yield better prefermance even when thre are more micro threads than cores in the systme, and i can even take advantge of hyprethread syetems which in core 200 run slower.
Also it does not slow down in systems that do not have the hardware, so it is safe to set 8 or 12 or 16 threads and even single core CPU will do equal or better that using just a single core.

There are two restrictions the thread pool is that you can no use it form inside a Newton update. but you cna use for anything else any where and let you take advange of the mout core in you system.
teh other is that the default Newton memoy manage is a chunk base dmanager and it is very hiogh preforemance. By design shuk allocation form teh same pool are no thread safe.
so calling netwon tool like the Mesh pool is no thread safe at thing time.
I could solve that two ways. first is by making a thread safe allocator, ( i do not like this option because it slow down the engine a lot since I use allocations for everythin and a lot)
or I can go over the mesheffect tool and explicytlly insert locks on each allocation. that sounds like and atractive option by it will take some time to do.
so for now you can not call the MeshEffect function tool from a Thread pool kernel.

by **Bird** » Sun Dec 11, 2011 8:05 pm

wow it was good to found it because those are vert hard to fodn bug in moltotreading system.
I looks like some of the threads do not hae the proper round mode. It is exptarnge becaus eit is set whne eh mocro thread are created. by maybe Qt is messing with the CPU state.
anyway I do no liek to reliy on rounding in teh CPU flags so that is fixed now

No problems now.. I tried in both my project and the NewtonSDK demo. I did hit another assert: _ASSERTE (parameter <= 1.0f) in line 97 of MultiRayCasting.cpp. I tried again several times but I didn't happen again. I also had the "Use parallel solver" option on when it asserted.

so for now you can not call teh MeshEffect function tool from a Thread pool kerner.

Hehe, One of the things I wanted to try was doing a bunch of boolean operations in parallel.

-Brid

by **Julio Jerez** » Sun Dec 11, 2011 8:54 pm

the cast may fail if the shape is in some wird position. this is inevitable with floats. it should be minimal.
on the doing bollean you will be afte I made the Mesh effect thread safe, but that is not high priority now.

I want to cover some more imprtant bug and features before I get to that.

who i steh teh performance ok in your system?

by **Bird** » Sun Dec 11, 2011 10:08 pm

I get max performance of about 90 fps using 8 micro threads http://www.hurleyworks.com/downloads/raycast_demo.jpg

My system is old:
--------------------
Intel Core 2 Quad 2.5ghz 8 GB ram
GeForce 9800 GT 512 MB

-Bird

by **Julio Jerez** » Mon Dec 12, 2011 6:10 am

I think that's no bad. there you have more than 500 generic bodies (box, capsule, conevex, cones, sphere and collesion tree) pluse 1000 ray cast on each frame,
more than any typical game would do for physics, and the physics is under taking 3.0 milisecond .

the fps is shoopy because Qt, is a great GUI, but it is the worse for a Game loop.
it imposes some severe restition usin Mutexes, and also take full control of the keyboards and the refresh rate.
also Qt has a random crash on initialization that start to bother me a lot, and compiling Qt source is harder than a MIT or Calteck rocket science experiment.

so I will shwith to my own dGLW based on xwindows for unix/linux systems and win32 for windows. that wa I will have direct assest for what I need.
I am sick and tire of deally with GUIs. They are also a nightmare to debug.
I am going do do that this week because I am going to start making the demos liek Player controller and vehicles and they required user imput at reash rate and more than one key and QT does not alowed that .

by **Bird** » Mon Dec 12, 2011 11:09 am

I think that's no bad. there you have more than 500 generic bodies (box, capsule, conevex, cones, sphere and collesion tree) pluse 1000 ray cast on each frame,
more than any typical game would do for physics, and the physics is under taking 3.0 milisecond .

I don't know much about how games work but I think the physics speed is pretty amazing when you think about how much work it's doing.

the fps is shoopy because Qt, is a great GUI, but it is the worse for a Game loop.

Yeah, that's a annoying limitation of Qt. Have you ever looked at JUCE? It's very well done and the author is quite responsive to user requests too.
http://www.rawmaterialsoftware.com/juce.php

-Bird

by **Julio Jerez** » Mon Dec 12, 2011 1:56 pm

since I exposed the microTread job dispacher, I believe that I have not shoice tham to make all containers thread safe too.
otherwise it will open a Can of worms, if some one use any of teh contaners.

I will also, experimen with the effect of makin teh pool memory manage thread safe as well.
if it is not a big impact on the run time I will make too tread safe. However I do not place too much hope there because last time I tryed the engine slowed down a comple te crowl.
But that was back in core 200, and I wonder if I was making a mistake some where else.
anyway I will make everything that can be use din parallel, thread safe.

for example say you want to make a compound collision, if I iterate ove the list It will get race conditions. everywere.
and even if the container was tread safe it will still crash because the intenal containers are not thread safe either.
it will be nice if in the compound after the mesh is created we can create the compound in paraller from a kernel.

but that is just the begining, do you remember the voronoi for complex concave shapes. after the first conersion it needs to run a serieus of bollean to clip the ecessed terrathedrums,
we can launch all those bollen in paraller too, basically the goal this week is to make everything that can be launch from a kernel, to run in parallel
that after I fix the concave bug of course.

plus one of the reason to expose the microThrea Pool is a secrect feature that will make it entrace in the next week or too.
we have muticores and newton will make sure that evry one having a muticire CPU can use those cores. it they use the engine.

Platform architecture and micro threads

Platform architecture and micro threads

Re: Platform architecture and micro threads

Re: Platform architecture and micro threads

Re: Platform architecture and micro threads

Re: Platform architecture and micro threads

Re: Platform architecture and micro threads

Re: Platform architecture and micro threads

Re: Platform architecture and micro threads

Who is online