[newton 300] Multi Thread Island solver bug

by **PJani** » Sat Feb 25, 2012 6:23 pm

Hy i want to report problem with multi threaded island solver.
i enable this solver via NewtonSetMultiThreadSolverOnSingleIslan and when i get "large" island of boxes my project hangs and works VERY I MEAN VERY VERY SLOW actually main loop doesn't have any chance to execute smoothly.

I don't have any example to show the problem. but i think is reproducable in any case.

by **Julio Jerez** » Sat Feb 25, 2012 6:41 pm

is this latest core 300 in SVN ?

I run a test with 3200 spheres dropping from the sky, and I do get much better performance using multithreaded.
I also run a test of a single pyramid of 40 x 40, with not autho sleep and I also get better perform cans.
Parallel solver on Single Island in has an overhead for precondition the island to reduce thread to minimize contingences.
It should be very rare that multithreaded on single island is slower than normal one island per core solver, in fact if it was no for the extra cost of the preconditioned and if everyone had core 7 I would made it the default.
Making a multithreaded solve that is stable and accurate on a single island is very hard, I place lot of effort of this one and it does yield the same quality than the non multithreaded.
The one thing that makes that solve slower is using a legacy quad core CPU,
I legacy quad core is tow CPU on the same die that are like tow ship at sea than to no see each other.
If you run an island in one core that thread is assigned to one CPU, and it will run until the thread time slice expires, or until the island yield it time, or until the thread finish it task.
What the mean is that each time that thread is going to asset memory, the memory is in the local cahe of the CORE, as long as the thread is running that task.
What this means is that a core will do better executing an island if the island is no too big and the data is in memory.
If you run the same island in a quad core that does not share the level one and two cache, then each time one thread asses data that was processed by another core then the data had to be copy to system memory had and reload back to the cache of that core.
In My Icore 7 I can run that solve with 16 thread running on one island and I get better result,
I notice that the best result is when the number of thread is equal to the number of cores,
If I run the same test at work in my CEON quad core, the parallel solve is slower, if I set 4m 8, or more thread, however if I set tow thread is equal of better.

Can you tell me more about the kind of scene you are running?

by **PJani** » Sat Feb 25, 2012 8:27 pm

My os is Win XP SP3 i have intel dual core 1.8Ghz
Yep thats the latest 300 from svn(checkouted 2h ago)...I made some changes to sources because i had problem with GetThreadId

this is my edit of dgWorkerThread.cpp at line 116

Code: Select all: unsigned thread_id; m_childThreadHandle = _beginthreadex( NULL, 0, ThreadSystemCallback, this, 0, &thread_id); char name[256]; sprintf (name, "physics_%d", id); SetThreadName (thread_id, name);

and this is my second edit of dgThread.cpp

Code: Select all: unsigned thread_id; m_childThreadHandle = _beginthreadex( NULL, 0, ThreadSystemCallback, this, 0, &thread_id); SetThreadName (thread_id, "newton main thread");

I am not sure if this "hacks" are right so maybe i did something wrong...

I am using my own stripped version of OgreNewt ported for newton 300

I shoot lots of boxes(NewtonCreateBox) in interval of 100ms all boxes are size of 0.1m and have 1kg mass. I use 2 threads. and my ground is 20x1x20 box with mass of 0kg...this was normally working under 2.35.

EDIT: AND project ALMOST hangs...i get maybe 1 frame per 30seconds

by **Julio Jerez** » Sat Feb 25, 2012 8:53 pm

how many object until if srta to hang to a crowl.

also does it run better win not using mutithread on single island.

Ha one more thing, does it slows down in the SDK demos too. ther are soem stcking demo with many boxes try that and see if is eqully slow please

by **Julio Jerez** » Sat Feb 25, 2012 8:58 pm

In file: ..\core\dgWorkerThread.h

there are to define that I use to debu the threadin code

//#define DG_SOFTWARE_THREAD_EMULATION
//#define DG_USE_NORMAL_PRIORITY_THREAD

try uncoment DG_USE_NORMAL_PRIORITY_THREAD
and see what happend.

then try uncomnet DG_SOFTWARE_THREAD_EMULATION

thsi will emulatio the threding code, and you cn astep in the debugger, It is a long time I do no use it so it may crash
but see if that makes it better.
then we will kno wif it is the thread startegy or the code.

by **Julio Jerez** » Sat Feb 25, 2012 9:04 pm

I just download the SVN build and I run the piramid demo, in demog I get literatelu 4x perfoemnce in teh piramid stack

I wander if there is some differemence wit the threading in widnow 7 and window xp
umcomment the threadr priority define and let us see if that makes it better.

by **PJani** » Sat Feb 25, 2012 9:31 pm

If i dont use island multithreaded app doesnt crawl.
Each box life time is 10 seconds so if each is shot every 0.1ms then i have max of 100 boxes but its not necessary that they are in pile.

hmm i haven't made tests on SDK demos i havent even build them.

anyway i did uncomment DG_USE_NORMAL_PRIORITY_THREAD and now i can "normally" browse other apps but still i get hung so i made these three screen shots of threads:

I am not sure which thread is which but i can see that main thread is completely stopped "dead". I hope stack traces on bottom right will help

Damn and i forgot to say i have compound of boxes(here boxes are actual convex hulls from 4 points) in grid.

EDIT...eather way even if i dont spawn compound is the same

by **Julio Jerez** » Sat Feb 25, 2012 10:05 pm

But how did you run single island mutithreded before if it was disabled. I think I disable it because I was no satisfied wit the quality of the solution.

I liek to see why you tes is so slow, could you integrate the dScene library,
Tha librae allowes you to exposrt and Netwon world
basically something like this will export a newtn world

Code: Select all: class MakeViualMesh: public dScene::dSceneExportCallback { public: MakeViualMesh (NewtonWorld* const world) :m_world (world) { } NewtonMesh* CreateVisualMesh (NewtonBody* const body, char* const name, int maxNameSize) const { // here the use should take the user data from the body create newtonMesh form it and return that back NewtonCollision* collision = NewtonBodyGetCollision(body); NewtonMesh* const mesh = NewtonMeshCreateFromCollision(collision); sprintf (name, "visual Mesh"); return mesh; } NewtonWorld* m_world; }; void ExportScene (NewtonWorld* const world, const char* const fileName) { MakeViualMesh context (world); dScene testScene (world); testScene.NewtonWorldToScene (world, &context); testScene.Serialize (fileName); }

I did no get your last commnet, are you sayin that it is slow with simple box primitives too.

by **Julio Jerez** » Sat Feb 25, 2012 10:08 pm

I am not sure which thread is which but i can see that main thread is completely stopped "dead". I hope stack traces on bottom right will help

this suggest that somehow xp treats thread priority different that windows 7.
did you uncomnet the //#define DG_USE_NORMAL_PRIORITY_THREAD ?

by **PJani** » Sat Feb 25, 2012 10:22 pm

When i was using 2.35 i had mt island solver enabled and it was working normally.

I will try to export my scene in the morning(i dont know how i will export this because projectile boxes are spawned dynamicaly)...currently here its 3 am and i am a little fuzzy

.

hmm i was just testing with some compounds with convex hull(in the shape of box) collisions in it and i got the same result.

by **PJani** » Sat Feb 25, 2012 10:23 pm

Julio Jerez wrote:
I am not sure which thread is which but i can see that main thread is completely stopped "dead". I hope stack traces on bottom right will help

this suggest that somehow xp treats thread priority different that windows 7.
did you uncomnet the //#define DG_USE_NORMAL_PRIORITY_THREAD ?

Yep i did uncomment and before this it was impossible to use any other program than task manager

by **Julio Jerez** » Sat Feb 25, 2012 10:41 pm

but what happen when you uncomented it?

by **Julio Jerez** » Sat Feb 25, 2012 10:49 pm

PJani wrote:When i was using 2.35 i had mt island solver enabled and it was working normally.

I do not remeber but I beleive it is Is is ignored, because the quality is not as good as the quality of teh single island solver.

I do no thiong the problem is w the solver I belieb it is wi the teh wait teh thread are handle.
try using teh parallel solver but onl;y one micro thread, that will clear out if the bug is because of the solver being slow or thread contingencies making it slow.

by **PJani** » Sun Feb 26, 2012 6:47 am

Julio Jerez wrote:but what happen when you uncomented it?

It was the same(those 3 screen shots in previous post show what happend main thread is dead)! The only difference was i could Alt-Tab.

Julio Jerez wrote:
PJani wrote:When i was using 2.35 i had mt island solver enabled and it was working normally.

I do not remeber but I beleive it is Is is ignored, because the quality is not as good as the quality of teh single island solver.

I do no thiong the problem is w the solver I belieb it is wi the teh wait teh thread are handle.
try using teh parallel solver but onl;y one micro thread, that will clear out if the bug is because of the solver being slow or thread contingencies making it slow.

Hmm i didn't know it was ignored 0_0. Now i understand.
I tried with only one thread and same thing happend(my task info shows only one thread running at full). And to me it smells like dead lock.

Sometimes i get access violation at dgBroadphaseCollision.cpp at line 1379
this is the line:

Code: Select all: if (contact->m_broadphaseLru != lru) {

this is not the only one access violation i get there are others in dgMemory and in dgBroadphaseCollision but i didn't write them down.

I will try to make exporter in secondary thread so when main thread goes i will try to dump physical world. i hope it will work...

EDIT: anyway when do i need to do critical section lock? I dont know really when.

by **Julio Jerez** » Sun Feb 26, 2012 11:01 am

wow, that sound like more bugs than usual.
about the main thread completely shut down, yes that is correct because I was using sleep or yield when the Newton update was called.
now it is using WaitForMutipleobject, which in effect completely stops the calling thread from running on idle while the update complete.
This is the normal mode when the engine block the calling application on each update.
There is a also a new mode that that the engine can runs.
this is concurrent with the application.
This is similar to D3D, and OpenGL where the update function return immemorially,
and it run concurrent with the application until another update is call, in which case it wait until the previous call finish.
In this mode say you have a two cores or more system, the best mode will be to have the system running all you logic concurrent with the main thread which will by doing all the graphics.
for this what you do is that you call
NetwonUpdate (timestep, 1);
the update return immediately and will run in parallel with the main loop.
now can set jobs to be execute in separate cores from wit the Newton update and organize then to be run at the appropriate time.
for this they is a PresUpdateListner and PostUpdate Listener.
anything you place there will run in a Separate core with you graphics, and also in order.
for example you can have you AI, your input System, you Camera control, the sound etc all those game subsystems are model that run at simulation time,
therefore the most appropriate place to put the is to sync the wit the main Game logic, and no having to worry about locking or unlocking.
for that you can then on a PreUpdateListener or PostUpdateListener and the sub system will run in parallel, at the correct time, and It can even use micro thread to maximize all the cores in the system.
then all you need to do is to have a smooth interpolation function that interpolates that fraction between the current and next matrix.
The idea is to organize the system in two main threads, a game login thread running all the logic and running at its own rate, and the graphics logic which use the main thread at its own fps,
managing GPU graphics, and processing any procedural graphics data.
The coordination happens with a smooth matrix interpolation on each entity.

I can see how people will be reluctant to the mode, so I will add a this option to the engine that will run just like before is in the main thread.
But the mode that is et now is the most advance and the one that can accommodate more to use the multiple core in the system in way that is natural.
The new mode if you have more than one code, you gate multiple tracks each one having a full frame time, and no hiccup when one system is updated.
the other method all system run in the same main thread and the only way to make then efficient is to make each sub system very fast.
I will check that stager mode then and I will make it the default. basically this mode is the equivalent to make the Emulation mode permanent.

[newton 300] Multi Thread Island solver bug

[newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Re: [newton 300] Multi Thread Island solver bug

Who is online