Shaky Crabs

by **Bird** » Mon Nov 07, 2022 3:50 pm

Hi Julio

I'm trying to pile some crabs on top of some rocks. The rocks are a static mesh and the crabs are 100 convex hulls instances. As you can see in the video, the crabs never seem to settle. Although you don't see it here, sometimes they work their way right through the static mesh.

Is there any setting I can use to reduce this?

https://youtu.be/GaVdcUts7ss

by **Julio Jerez** » Mon Nov 07, 2022 4:37 pm

Yes that looks very bad.
My guess is that there are missing faces on the static mesh.

Let me get that other problem fix, and them we work on this.

by **Julio Jerez** » Mon Nov 07, 2022 5:03 pm

There is one quick trick you can try.
Set the number of substep higher than what you have.
Try 4 see how thT works.

by **Julio Jerez** » Tue Nov 08, 2022 8:53 am

ok , I added the missing piece of the collision with small area faces.
did not debug much, but I test a cube colliding with the sword model you gave me, that's a nasty little one, and the cube collide fairly well.
I did see a little jitter, but that's because 120 fps is not enough for such small geometry, but the mesh does settled in all cases.

so if you sync, and you try setting you substep to say 4.

world.SetSubSteps(4);

that will be a 240 fps equivalent, which for your kind of app, I think it is reasonably.
one good part the substeps, is that after the first pass the next steps are much faster.
I am guessing is because caches, but regardless, I take it as a welcome side effect.

by **Bird** » Tue Nov 08, 2022 12:20 pm

Definitely seems improved now with the changes you made

https://youtu.be/ljeNOG-ClE8

by **Julio Jerez** » Tue Nov 08, 2022 1:54 pm

Ah that's much better.

I had the part pending for a while. But never got a mesh to try it.

Did you try with subste set to 3 or 4
You can experiment with that?

Also you can increase the solver iterations, it default to 4.
You can make higher and I think that will take care of the last remaining pops.

Notice not all pops are bugs. Some are just the mesh setling.
A telling clue is that they do settle after some iterations.

In newton 4 objects go to sleep after one tick.
So if the pops stops, the chances are that it did reached an equilibrium.

by **Bird** » Tue Nov 08, 2022 4:02 pm

yes, I set solver iterations to 8 and that seems to help even more.

In newton 4 objects go to sleep after one tick.

Yeah, I find that very useful in my pathtracer renderer so I can stop rebuilding the BVH when all bodies have gone to sleep.

Thanks for the quick help!

by **Bird** » Fri Nov 18, 2022 9:43 pm

I'm still getting shaking and penetration problems unfortunately in the latest version on github.
https://youtu.be/X3YtCSlquv4

Code: Select all: int m_solverPasses = 24; int m_solverSubSteps = 18; int m_workerThreads = 12; ndWorld::ndSolverModes m_solverMode = ndWorld::ndSimdAvx2Solver;

by **Julio Jerez** » Sat Nov 19, 2022 8:41 am

Yes,
I think I made a changed that broke the way collision pairs are calculated.

Let me get tgat fixed today? Then let us test again tgat scene starting with default values.

Do not do anything else because it is not going to fix it,

Tge change I made is related to variable size thread pool job.
I tested tge engine with one of tge new little and big cores systems, and something weird happened.
Because tge jobs are equally balanced, if there are many thread, the tge thread pool block the fast core,

The way to solve is to make some task first cone first served.
But in doing that, I broke some task tgar actually depend on itgervtask having a fix nu.ber of item.

Anyway, I will give over and revisited it.

by **Julio Jerez** » Sun Nov 20, 2022 1:36 am

Ok.
I completed the adaptive thread balance.
But the result is not what I expected.
Basically this way the thread pool works now is that it divide that jobs evenly among threads.
But this has the side effect that the slower thread determine the overall performance of the entire thread pool.

The result is that the is about 5 to 10 maybe more if a thread was preempted, wasted of cpu time as the tread with lighter task finish thier tasks.

I try to improved that buy using a first added first serve job dispatcher, but the method require a read modify write operation.

The result is that all tread are even, but because all thread are blocked and only one can pick a job at a time, the overall time get slower. And it gets even worse as more core are fighting to get a job.

I thought of doing this because, we now have intell cpus with high a efficient perfomace core. So in this system the high performance care will just go to waste on spins.
But even with current cores, both intell and amd run some core at different speed. And efficient performance core also apply to mobiles, and Mac cpus.

Any way after adding the feature, it seem to work well, but on the stack demo it cones out consistently about 8 % slower.

So I will not add that funtionality at least not at this time.

On the brighter side, this feature broke the collision system, so
But not soprting it the collision system will continue as it was before.
So tomorrow I will undo that funtinality and put is back like is was a weak ago.

I will save it with a label, in case I decide to take a secund look, but as it stand now there will not be adaptive thread pool.
It is not committed. I will clean the changes and committed tomorrow.

by **Bird** » Sun Nov 20, 2022 9:25 am

I use Nanothread in my project and it works well for me, although my needs are a lot less complex than yours.

It's well designed, cross platform and really small and might be worth a look.
https://github.com/mitsuba-renderer/nanothread

by **Julio Jerez** » Sun Nov 20, 2022 11:45 am

The threads in neuton are not sofisticated.
It has been a long time since I gave up in efficient. to make threads efficient.

The problem is that the synchronization objects are the same not matter who implements them.

And at the end of the day a sync object will call and assembly instruction name syscall.

Which is an operation system call to the kernel.
This is the only way to syncronize threads in x86 hardware.

That call could be any where from few thousand trick to several millions if a thread swiches.

There is not way around that.

So after years of dealing with thread synchronization my solution is just not use any synchronization object.

Instead just use algorithms solutions. That in general consist in taking you data, sorted by some criteria in a flat array, divide the array by the thread number and have each tread running those sub set of jobs.

This actually work very well, but if has the problem that the sub arrays are equal size, but if the job are variable complexity, the the work load per tread is uneven.

It takes a lot more subtleties than than but that the thrust of the method.

Here is the big problem, assuming the work lod is even, now cpu run a different speed, so the method has the effect that thevtgread pool is controll by the slowest thread.

So I try to fix that by using a atomic counter instead of fix size sub arrays, the work load look beautifully even, but overall they ro slower.

I will post some profile trace so you can see what I mean.

A way to explaining it is by thinking of a multilane highway.
And there is a told booth.

If the high is not too busy, the not too many vehicle pile at the booth.
But if the highway in very busy, the you alway see a pile of vehicles some time several hundreds of Metter long.

And the wider and the busiest the highway the more severe the problem.
That very much what happen with a physic engine. The task as many short and variable size,
So any sunc object become the bottleneck.

I put lot of work on getting newton 4 to be lock and atomic free, and it is the first time it can achieve almost linear time acing with thread, core counts, and simd instruction.

But since last week, I have that edia of variable suzecwirk load, and forgot about the disastrous blocking effect of atomic, mutates and all the sync objects.

I will commit the change just to have as reference but disable under an if def.
The I will just remove since it is not worth pusuing it.

Here is a little story.
Back in 2004 and 5
I registered to be tester of couda with the 8800 gpu.
The bigest complain I and almost every one had was that there was not atomic operations, so cuda kernels where limited to very simple data parallel operation. The people at nvidia though that there was not need fir atomic, so ther considered us idiots and tge animosity started to buil up.
I concluded that it was not worth continuing. And I abandone it.

Not 15 years later, a am actually doing what they were saying, using sorting and parallel reduction to avoid atomic at all cost.

by **Julio Jerez** » Sun Nov 20, 2022 6:37 pm

Ok I committed the engine with the option disable.
I am definitely removing it in the next day.
It was a disastrous feature.
I do not know why I thought this was going to work, when everyone time in the pass it had failed.

To make even worse. I just tested on my new rig. An Intel core i9 13 gen with ddr5 memory.
And the adaptive job pool gets slower as the core count increases.
Atomic have a very detrimental effect as many threads compete for the same lock.

Anyway, if you sync and try again. Let me know what you get.
Try using the low iteration and sub step, them adjust as appropriate.

BTW, I was blowing away on how fast the engine run the staking test setting 16 threads and avx2.
It takes under 7 ms worse case, that's about 5 time faster than my icore 7 9 gen.

by **Bird** » Sun Nov 20, 2022 8:19 pm

Definitely much better. This is at 8 solver passes and 4 substeps. There's a little penetration but nothing like before and the pile settles must more quickly

https://youtu.be/D7_r6Wk5N-0

I extracted the Newton convex hull mesh for the crab and see that it has a lot of small triangles. I use a Retopology project called InstantMeshes that can convert any mesh to uniform sized triangles. This video show the mesh out of Newton and then the one that's been remeshed by InstantMeshes. Is it possible for me to replace the Newton convex hull geometry with the remeshed geometry? Do you think it would improve the collision artifacts.

https://youtu.be/RfHeTWn_S6c

by **Julio Jerez** » Sun Nov 20, 2022 10:39 pm

that does not look good at all.

skinning triangle could be part of the problem.
can you render the newton convex hull mesh.

there is sample code to do that.
void ndDemoEntityManager::DrawDebugShapes()

maybe I can add a debug function to the collision shape to write and PLY file.

can you display a PLY file in you app?
https://en.wikipedia.org/wiki/PLY_(file_format)

I am adding a debug function the the shape instance so that is can be write to a file so that it can be visualized but a visualization tool.
let us see how that shape looks like first.

Shaky Crabs

Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Re: Shaky Crabs

Who is online