Development of self balancing biped with inverse Dynamics

A place to discuss everything related to Newton Dynamics.

Moderators: Sascha Willems, walaber

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Mon Aug 04, 2025 9:46 am

Okay Joe, I made the walls visible, you were right, there's no overlapping.
actually, there is an unrelated one.
in my version when I set the first two wall, the next one shred an edge with the one already in the map, so they are placed higher, that just in my modified version, I just made the side of the last two 1.99 the width. that's how good the convex cast is. :shock: :shock:

Ok, with a velocity of -0.1, I waited for about 5 minutes.
It didn’t stop, but it takes too long, it didn’t even get 10% closer to the wall.
I increased the velocity to -1.0 to speed things up.
The planks move straight without any issues until they hit the walls and stop.
I don’t see any problems.

If I pick them up and let them fall again, they still sink in and don't move. So the velocity seems ignored, and the contacts become very soft.

I haven’t seen this behavior on my end. can you tell more how you reproduce that?

Here’s a quick overview of how it’s supposed to work:

Contacts are solved by the iterative Gauss Sidle solver; therefore, they react softly to high abrupt impulse, but they should still be stiff enough to handle regular collisions.

When a cluster of connected bodies comes into contact with rigid bodies, those rigid bodies and any others connected to them, up to five levels deep. Are collected from the scene and a duplicate is attached to the articulation's skeleton. Static Bodies serve as end of the search.
This allows the skeleton solver to compute forces using the direct joint solver.
there will be discrepancy on each duplicate bound, but the deeper the number of layer scanned, the smaller the discrepancy, in theory, going all the way deep, the discrepancy is zero, but that's too slow.
By empirical experimentation, at five levels deep the discrepancies are small enough that the Gauss Sidle pass smooth them all out.
This is in fact the method that was used in Newton 1.xx, but it scan all levels making one giant island.
My main focus was articulated bodies, an in my experience's scenes were small at the time.
You know the history.
I resurrected the method again, but this time is per skeleton islands, and I there is a rename method that duplicate the bodies to resolve dependency,

This only applies to hard bilateral joints, which are the default type.

Some joints are intentionally soft. A good example is the picking joint in the scene.
It’s added to the skeleton as a temporary joint.
That’s what allows you to pick up a rigid body and sink it into the ground.

I used to think this was a bug, but eventually I realized the picking joint is a very special case.
It gets input from the screen space and projects it back into world space.

The thing is, a small movement in screen pixels can translate into very large and unpredictable changes in world space units.
After trying to “fix” it for a long time, I realized it was a fool’s errand.
Honestly, I’m surprised it works as well as it does.

That said, the kinematic joint is extremely useful for other tasks, especially when you're working directly in world space, where errors are more lineal and do not generate those very high impulses.

anyway, see if that is better now.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby JoeJ » Mon Aug 04, 2025 11:59 am

Oh are you saying the other for bodies are fences, not overlapping extra floor?
What happen when the other bodies hit the fences?
Shouldn't they stops moving.

Yeah the 4 bodies are fences around the edges of the big floor box.
They are meant to stop bodies from falling over the edge, so i do not get out of world problems.
All bodies overlap by the 'width' of a fence, so i thought this causes the problem which we solve by avoiding overlaps.

That's fine, but ofc. it would be better if the overlap would cause no problems.
In this sense the strange issue with my code snippet is: Although the fence bodies are all very far away, why does their existence causes the bodies to stop although they are in contact only with the ground box? Why does it initially work, but after 10 sec. both bodies stop at the same time, and their contacts are soft form now on?

The planks move straight without any issues until they hit the walls and stop.
I don’t see any problems.


Yeah that's how i expect it. But in my demo the bodies stop long before they would even get close to any wall.
I'm assuming the overlap with the wall bodies can confuse the kinetic contact velocity calculation, which then becomes zero by accident.
But even then, why do the contacts also become soft, causing the bodies to sink into the floor?

I haven’t seen this behavior on my end. can you tell more how you reproduce that?

Use my original code, turn to solid or wireframe view.
Observe the falling and moving boxes. At the moment they stop, they also sink into the floor a bit.
Then i used the mouse joint to lift them up a little and release them in air. As they get in contact with the floor, they contacts remain soft (having much larger penetration than before while still moving and behaving normally).
I did not try to push them into the floor using the mouse joint. Gravity is enough to show the softness.

I hope this helps to clarify. All problems go away if i remove the 4 overlapping walls.
I don't know why only my demo shows those issues. I think your floor is even larger, which maybe makes a difference for some reasons.
I also do some simulation settings like setting solver iterations etc., maybe some of it matters.
But if you change your App to start with wireframe view, you can debug my demo easily i guess.
User avatar
JoeJ
 
Posts: 1489
Joined: Tue Dec 21, 2010 6:18 pm

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Mon Aug 04, 2025 12:18 pm

oh, I see.

I pasted your demo literally, no change to any of your script to make the bodies.
and yes, you are correct in all your observations.
It behaves very bad.
It seems the contact filter is correct, but it seems there are other bugs still

I am checking it out.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Mon Aug 04, 2025 12:49 pm

Ah, I see the issue now, this might be my fault due to some past advice I’ve given.

I've often told people that if they want a kinematic body to move around in the scene, they should assign it a mass.
However, what I meant was that this only applies when the rigid body approximation is still valid typically for small or moderately sized bodies.

The rigid body model doesn't work well for very large static objects because real world materials aren’t truly rigid, they have elasticity.
When you assign a large mass to a massive static body (like a ground plane), the resulting inertia becomes enormous. In such a case, any small body trying to interact with it is essentially pushing against an immovable wall.

On the flip side, if you have a small kinematic body with mass interacting with a large static body (which has zero mass and inertia in the contact solver), the solver assigns all the reaction force and torque to the small kinematic body. This is more stable and physically consistent.

There’s no hard rule here, just common sense. You usually don’t want to assign mass to something like the ground. If you remove the mass and treat the ground as a true kinematic body with zero mass, everything should work as expected.

Suggested Change
Just modify the ground creation like this:

Code: Select all
ndBodyKinematic* bodyFloor =
    BuildKinematicBox(world, groundXF, 0.0f, ndVector(size * 2.0f, 1.0f, size * 2.0f, 0.0f));


And in your BuildKinematicBox function, update it like so:

Code: Select all
static ndBodyKinematic* BuildKinematicBox(ndWorld& world, const ndMatrix& xform, ndFloat32 mass, const ndVector dim)
{
    ndShapeInstance box(new ndShapeBox(dim[0], dim[1], dim[2]));
    ndBodyKinematic* const body = new ndBodyKinematic();

    body->SetNotifyCallback(new BodyNotify);
    body->SetMatrix(xform);
    body->SetCollisionShape(box);
    body->SetMassMatrix(mass, box); // Will work even if mass is zero

    ndSharedPtr<ndBody> bodyPtr(body);
    world.AddBody(bodyPtr);
    return body;
}


What Was Going Wrong is that the contacts have penetration, but the massive ground object had nonzero mass matrix therefore is being slightly nudged downward by gravity at each step.
Since the solver doesn’t integrate kinematic bodies with mass, and you weren't updating its position manually, the penetration just kept increasing over time.
Eventually, after few steps, this led to unstable contact responses and incorrect behavior.

In summary
A kinematic body with mass must be integrated explicitly by your application.
The solver won’t move them automatically.
They are meant for game play stuff like elevators, doors, player capsules and stuff like that.

Now you might say, then how can I simulate a large movable object like a ship, a plane, a space station or something like that?
The answer is that, for such bodies we break the law of conservation of momentum,
instead, we use a pseudo conservation of velocity law.
That is, the body has zero momentum because it invMass is zero but its velocity is not,
therefore, contacts only get the velocity effect but no momentum exchange.

Say you have a ship at sea, the app moves the ship, every frame and set the linear an angular velocity.
Them the engine takes that info an update the bodies to match the velocity of each point in the ship.

Please make these changes.
sync and test again.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby JoeJ » Mon Aug 04, 2025 1:10 pm

Oh yes, makes sense. Thanks!
Tried it and works as expected. : )
User avatar
JoeJ
 
Posts: 1489
Joined: Tue Dec 21, 2010 6:18 pm

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Mon Aug 04, 2025 2:33 pm

I am grateful, without this bug report,
I wouldn't find that big bug in the contact filter. :mrgreen:
now back to training.

edit:
I cleaned up your demo and committed it as part of the SDK demo suite.

Looking through it, I can really tell that you understand the core philosophy of the engine.
You’ve set up all the hooks properly and didn’t rely on the demo scaffolding, very impressive work.

In the demos, I go out of my way to make things clear and easy to follow, but they’re meant to be illustrative pseudo-code, not something to be copied and pasted into production.
Unfortunately, most people do just that, trying to apply them directly in different contexts with mixed results.
You clearly get it.
Honestly, you probably know how to use the engine better than I do. :D :D
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Wed Aug 06, 2025 11:57 am

I’ve just completed the unified architecture for running machine learning training on both GPU and CPU. As part of this update, I’ve also implemented my first reinforcement learning (RL) algorithm: Soft Actor-Critic (SAC).

After studying a wide range of RL algorithms, it became clear to me that only two are really worth using in practice:

Proximal Policy Optimization (PPO): great for discrete action spaces, but less reliable when dealing with continuous ones.

Soft Actor-Critic (SAC): far more stable and mathematically sound for continuous action spaces.

Many of the other popular RL algorithms (like DQN, DDPG, TD3, A2C, VPG, etc.) are, frankly, a collection of clever hacks, often not much more effective than tricks used by game developers.
What surprises me is how often academic papers showcase cherry picked performance graphs from few runs of the same test case, which could easily be outliers rather than representative results.

Going forward, I’ve decided to streamline my focus entirely around SAC and discrete PPO. All other agents will be removed from my training stack.

SAC Performance: A Huge Leap Forward
The results with SAC have been incredibly promising.
In the past, using PPO, DDPG or TD3, I typically needed around 250,000 training steps for TD3 or DDPG and tens of millions for PPO, just to get halfway decent performance, and even then, the learning curve was erratic, full of spikes and regressions.

With SAC, I’m seeing much higher scores in just 25,000 steps, even without any hyperparameter tuning.
And if I train longer, SAC consistently reaches a higher plateau without collapsing.
Something that commonly happens with other algorithms.

The secret behind SAC's performance lies in its entropy regularization. Here’s why that matters:

When an RL agent finds a good solution for a given state, it tends to overfit. The network becomes extremely confident about that specific action, but in doing that it also becomes extremely sensitive to small changes in the input that produces that action and any slight deviations leads to wildly incorrect outputs.

Entropy regularization counters that by slightly weakening the overconfident Q-values, encouraging the agent to explore similar, but not wildly different actions.
Instead of locking into a hyper-specific solution, SAC trades a "perfect action for one state" for a "very good action for a range of similar states." This leads to better generalization and robustness.

It might seem counterintuitive at first, but it works brilliantly.
Honestly, sharing this insight just makes me feel good.

A Big Milestone: The Spider Walks
For the first time ever, I managed to get the spider demo to train successfully overnight.

Until now, I could never get the spider to learn proper movement. Its environment generates a lot of contradictory states, making it a tough challenge for RL algorithms. Simpler networks always failed, and more complex ones were too slow on CPU. this prompted me to look at GPU solutions. :D :D

With SAC using GPU and big network, I finally saw it: the spider started shifting its body to maintain balance, a behavior I had never achieved before.
That alone is a huge testament to SAC’s power and stability.

Now that the training pipeline is in place and I’ve settled on the right algorithms, I can focus on improving the environment itself and pushing for more complex behaviors.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Thu Aug 07, 2025 11:39 am

ok Joe, here it is.

https://youtu.be/mmM020ZIzeI

After many fail experiments, I was ready to believe that it would be impossible to learn that controller but the (SAC) actor critic has does twice, and it less than half million steps. While all other agents fail even at 1 billion steps.

I still have one huge problem to solve,
for some reason when I run in GPU, after few ours, the engine slow down to a scroll, and it issue a OUT_OU_RESOURCES.
this was why I couldn't save the first model.
I made changes so that it save the agent after the score reach certain high,
last night that happened at 85% and after that the output was flooded with that error message.

I have to investigate why is that, the library allocate zero memory during training.

lots of people has different opinion as to what causes that,
some blame windows, some blame drivers, and some have very strange reason.

The problem is that this happens after 6 or 7 hours of training, so once again I am trapped in precisely what I was tried to avoid by going to GPU.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Fri Aug 08, 2025 10:11 am

well to start as a test, I added a synchronization point after each training update.
It turns out that OpenCL does not flush the command queue from memory unless it’s explicitly done with:

error = m_queue->finish();

This seems to contradict what many people claim. Most self-appointed experts say that the more commands you submit, the faster the CPU–GPU interaction, and some even suggest you never need to call finish() at all.

In my case, adding that call at the end of each batch of commands allowed training to progress past 300k steps.
That led me to another well-known neural network crash problem: exploding gradients.
After applying a simple fix for that, changing form RELU activation to Leaky Relue, and using
He initialization (which in my opinion is just a hack, that only work because it uses a much smaller standard deviation to initialize the weight, but I use anyway), now the training completed in about 8.5 hours:

    Saving to file: c:\development\newton-dynamics\build\applications/media/ndQuadruped_2-sac.dnn
    Training complete
    Training time: 30605.3 seconds
This is by far the best result I’ve had so far, and here’s the learning curve:
Untitled.png
Untitled.png (22.04 KiB) Viewed 132850 times


It’s still not as smooth as I’d like, but it’s the only method that hasn’t completely collapsed after hitting a local minimum early on, only to fail entirely afterward.

To be fair, I’ve only trained this new deep-and-wide architecture with DDPG and TD3, not with PPO. It’s possible PPO could also train this model successfully and produce a good or even better controller, but I’ll revisit that experiment another time.


This will be my last attempt at training using the Zero Moment Point (ZMP) approach.
While ZMP as an objective function is not only expensive to compute, it’s also too erratic for stable neural network training.
Now that I have a more powerful trainer, I can try simpler surrogate objective functions that relate body momentum and leg momentum individually, without computing total linear and angular momentum for the whole model.
Basically, the neural net will make a complex function using information form individual parts it will generate the necessary response.
That seems to be the method most people are using.

For now, I’m sticking with inverse dynamics, at least until training prove me wrong again.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Thu Aug 14, 2025 11:51 am

wow testing, debugging, and developing this system, I hit few big and hard to find bugs.
For some reason, training would slow down to a ridiculous crawl one frame every few seconds
right after passing the half-million training steps mark.

When that happened, the output became complete garbage. To make matters worse, Windows has a built-in watchdog in its graphics system:
if a GPU kernel takes more than a few seconds, the OS assumes the application is hung and kills it.

Debugging in C wasn’t an option. It takes about 9 hours on GPU to reach that failure point in release mode, and in debug mode it would take weeks.
After digging into the code, I discovered the cause: an out-of-bounds memory read.

OpenCL’s standard library is extremely limited, no rand(), no built-in distributions, and very few of the utilities you take for granted in C++, CUDA, or DirectX.
You have to implement almost everything yourself.

To cut corners, I didn’t implement GPU-side random number generation.
Instead, I generated random numbers on the CPU, stored them in an array the size of the replay buffer, and loaded that array to the GPU. The array would get shuffled and reloaded repeatedly.

That worked fine for training runs shorter than the replay buffer size. But as soon as training steps exceeded that size, a modulo function reset the index… and the GPU started reading past the end of the array. This caused two problems:
which seem rather off seem is crash after it reached half the replay buffer.
but I forgot that the update adds two steps in each simulation.

the kernels start to produce NaNs from uninitialized data as result of out-of-bounds reads.

After fixing that, I was finally able to complete a training run — but the results still weren’t as good as I expected. So there’s probably another bug lurking somewhere.

Moral of the story: building a GPU library isn’t as simple as it looks. If you have to write everything from scratch, it’s a lot of work — and if you don’t, your GPU version may not actually outperform the CPU, and might even be slower.

Next on my to-do list:
-Implement uniform distributed random number generation on the GPU
https://en.wikipedia.org/wiki/Mersenne_Twister
-Implement a proper random shuffle — which sounds trivial, but is surprisingly hard to do efficiently on GPU
https://arxiv.org/pdf/1508.03167
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Re: Development of self balancing biped with inverse Dynamic

Postby Julio Jerez » Tue Aug 26, 2025 11:06 am

I just committed the best controller I’ve managed so far using SAC.

That said, I’m still not happy with it. Strangely, the code produces almost identical results on CPU, CPU OpenCL emulation, and Nvidia GPUs, but on AMD, the results are terrible.
The trainer barely makes any progress, and in many runs it doesn’t even get off the ground.
In fact, I’ve even seen cases of negative progress.
This could be because I may still have some bug in the opencl implementation, but that does not explain why in 100% of the cases nvidia GPUs generates result similar to the cpu implementations while amd fails.
My suspicion is that somehow amd arithmetic logic unit are less accurate that NVidia. but that a really big claim that I can't demonstrate.
But it seems this is a fact according to various people experiencing similar results with AMD hardware.
It seems the errors are suitable enough that
Rendering or any small processes does not make a significant difference but it seems amd cut so many corners in float operation that the error accumulation after few tens of thousands iterations are larger than any correction the step

I’ve gone out of my way to make everything as deterministic as possible, but comparing results is still very difficult. It’s honestly quite frustrating.

At this point, the only clear benefit of using the GPU is faster execution, not better outcomes.

As for SAC itself, it seems like a decent algorithm, but it’s not the slam dunk that some papers make it out to be.
I’m seeing a lot of variance in performance, which is the opposite of what the literature suggests.

Since SAC isn’t delivering as strong results as I’d hoped, I’ve reimplemented Proximal Policy Optimization (PPO). That should give me a better basis for comparing algorithms.
The last time I tried PPO, it wasn’t very impressive, but I’ve fixed a lot of bugs since then, so I’m curious to see how it does now.
Julio Jerez
Moderator
Moderator
 
Posts: 12425
Joined: Sun Sep 14, 2003 2:18 pm
Location: Los Angeles

Previous

Return to General Discussion

Who is online

Users browsing this forum: No registered users and 1 guest