Matrix error improvement

by **Lax** » Fri Nov 28, 2025 10:31 am

Hi,

i try to port the pathfollow joint and have come to weird crashes, because of devision to zero in ndMatrix if a body is moved at much speed. I fixed it by integrating a guard, to prevent devision to zero. Maybe you could apply the code:

Code: Select all: ndVector ndMatrix::SolveByGaussianElimination(const ndVector& v) const { ndMatrix tmp(*this); ndVector ret(v); for (ndInt32 i = 0; i < 4; ++i) { ndFloat32 pivot = ndAbs(tmp[i][i]); if (pivot < ndFloat32(0.01f)) { ndInt32 permute = i; for (ndInt32 j = i + 1; j < 4; ++j) { ndFloat32 pivot1 = ndAbs(tmp[j][i]); if (pivot1 > pivot) { permute = j; pivot = pivot1; } } if (permute != i) { ndAssert(pivot > ndFloat32(1.0e-6f)); ndSwap(ret[i], ret[permute]); ndSwap(tmp[i], tmp[permute]); } } // *** NEW: guard against singular / near-singular pivot *** ndFloat32 diag = tmp[i][i]; if (!ndCheckFloat(diag) || ndAbs(diag) < ndFloat32(1.0e-6f)) { // Matrix is effectively singular for this system. // Fallback: zero-out remainder of solution (no angular accel contribution). for (ndInt32 k = i; k < 4; ++k) { ret[k] = ndFloat32(0.0f); } return ret; } // *** END NEW *** for (ndInt32 j = i + 1; j < 4; ++j) { const ndVector scale(tmp[j][i] / tmp[i][i]); tmp[j] -= tmp[i] * scale; ret[j] -= ret[i] * scale.GetScalar(); tmp[j][i] = ndFloat32(0.0f); } } for (ndInt32 i = 3; i >= 0; --i) { const ndVector pivot(tmp[i] * ret); // We know tmp[i][i] is valid and non-zero here due to the guard above ret[i] = (ret[i] - pivot.AddHorizontal().GetScalar() + tmp[i][i] * ret[i]) / tmp[i][i]; } return ret; }

Best Regards
Lax

by **Julio Jerez** » Fri Nov 28, 2025 11:34 am

Oh wow, that’s actually a pretty good trick. This is basically a rank reduction of the matrix. :shock:

Basically, if an n x n matrix is singular, it means one or more rows are linear combinations of others.
In matrix language this is call a rank deficient matrix.
When that happens, those dependent rows imply that any set of complementary variables is a valid solution.

In that case, setting the solution for one of the dependent rows to zero and reducing the matrix rank from N to N-1 is a valid approach.

My current solution simply assuming the matrix is not rank deficient. So it just find for the next row.
But if the matrix is rank deficient, the system will crash.
Your approach is actually much more robust, and I’ll add it to the SDK.

My only concern is that this could hide bad implementations. For example, if someone writes their own joint and submits incorrect rows, this logic would mask the mistake. In general, any rigid body has six independent degrees of freedom. The engine removes DOFs, but it can't detect when a DOF is a linear combination of others in the implementation.

Still, you're right, a fallback solution is better than a crash.
I’ll add it with a warning so users know there may be a potential issue

thanks. :mrgreen:

I edited your post to make the code readable.
also, can you show me the code you are using, I feel that you may have a mistake,
It is perfectly possible that a matrix become singular, due to losing a degree of freedom in a transformation, but that's not justification for handling bad design.

by **Julio Jerez** » Fri Nov 28, 2025 11:52 am

Oh I see, your objects are moving really fast, and the code can't handle the non-linearity.

the path follow joint is one of the very nonlinear joints that generate extreme Gyro and Coriolis forces and torque. for those cases, a shorter time step is usually the only solution.

there is the sub step solver parameter, this effectively reduced the time step while update is still the application time step.
how many sub steps are you using? can you just try setting to a higher number?
I use two as default, which is an effective 128 update per second at 69 hertz.

The modification you made is the right idea but it not quite correct;
it just returns 0 for all the rows remaining.
Say for example the second row is the one with problem, this modification will return with a nonzero first row, and zero for the second and third row.
what we want is zero for only the rows with the problem,
but correct solution for the other rows.

but your post is a very good insight.
I will modify to make it a proper rank reduction fallback solver.

edit:
alright, I added the modification,
Thanks for that insight. :mrgreen:

by **Lax** » Fri Nov 28, 2025 4:15 pm

Hi julio,

I'm glad to help

. I now come deeper and deeper in newton dynamics as i port lots of stuff from OgreNewt3 to OgreNewt4.

Indeed i use only substeps(1). So what would be a good number? Because i have also issues with ballandsocket becoming unstable in a manner, that i get 0 fps and the whole engine is freezing.

I could do some stability improvements by using:

joint->SetSolverModel(m_jointIterativeSoft);

Best Regards
Lax

by **Julio Jerez** » Fri Nov 28, 2025 4:48 pm

Try sub step two, that's the default I use for the demos.
The accuracy increase with the squared of the step side.
So a substep two make the time step error at least four time smaller.
Newton solver is RK4 so the accuracy is even better, but to be conservative,
let us say is the square of the step size. That is, the integration error is O(dt^2)
Believe me, it will save you lot of aggravations.

Performance cost is not that big since the sub steps shared lot of cache intermediate data

by **Julio Jerez** » Fri Nov 28, 2025 5:09 pm

I assume you are running with a fix time step. aren't you?

Lax wrote:joint->SetSolverModel(m_jointIterativeSoft);
Lax

What that option does, it that it excludes the joint from the islands that are send to the direct solver.
You probably do not want that from most joints.
Just try two sub step and use joint default settings.

It may sound counter intuitive, but with more sub step the performance efficient goes up.
This is because, the joints are primed by an interactive solver before they are send to the direct solver.
The cost of the iterative solver goes down because in most cases the error is four time smaller, so it doesn't do all of the iterations and even when it does. the error is smaller, so the direct solver finds the solution is fewer iteration as well.

by **Lax** » Sat Nov 29, 2025 8:16 am

Oha, thats a good information. I increased to 2 substeps and its now better. I will try with 4.

And yes, I use fixed timestep.

by **Julio Jerez** » Sat Nov 29, 2025 8:34 am

Just leave at two and see how that works,
You know that you can always try a higher number.
A higher number would be for something like an accurate simulation, a high speed game,
Stiff like a racing car, or a sport sim game, you might want to go with 3.

There another parameter that you might adjust

Which is the number of iteration.
This is the number of time the iterative solve for over each joint.
The default value is 4, but in the demos I use 6

This value is adaptive, is setting to 6 doesn't go necessarily 6 time, buy it does when needed.
This parameter control the accuracy of individual contact in the simulation.
This parameter control stuff like jitter, or bodies going to sleep.

Basically the number of iterations control the force calculation from the time derivative.
And the number of substep control how far away the vaculated forces will be integrated in time.

The SDK demos use
2 sub steps
6 iteration per joints.

by **Lax** » Mon Dec 01, 2025 9:04 am

Ok, i use now SubSteps(2), because using 4: I got massive frames drop, if some joints were involved and stopping newtondynamics, the whole app began to hang forever. I also use solvermodel 6 like you in the demos. I will test that.

by **Julio Jerez** » Mon Dec 01, 2025 11:26 am

yes 4 sup steps, might be an over kill,
I have an intel 13 get (8 big cores and 17 small cores.)
What kind of CPU are you running?

if I set 4 threads, I can even run the box stacking demos in debug mode at a reasonable speed.
how big are your scenes? and are you running single thread?

Later Newton, I modified the threading pool system from using a fixed-size workload per thread to an adaptive model.
Using fixed workloads was like a good idea at the time, but as technology evolved, it is clear that now it is not. Modern systems have many cores, and not all cores run equally or receive the same amount of time from the operating system scheduler.

On a PC, the main thread almost always ends up doing most of the work, and the OS doesn’t assign equal time slices to every thread. All of this points toward a “first-come, first-served” multithreading system. In this model, whenever a core finishes a job, it immediately grabs the next available job instead of waiting for a new workload to be assigned.

The result is an adaptive system: if there’s work to do, the main thread breaks it into jobs and sends them out to the worker threads. But if the workers finish before the main thread has generated more jobs, then no extra threads are activated unnecessarily.

Previously, no matter how small the workload was, the system would wake up all threads. That made performance worse for many threads, not better, because thread activation is far more expensive than executing a small task.

Try using around four threads if you’re on a multicore system,
you’ll often find that this gives a very good balance. provided that your system has more than four or more cores.

by **Lax** » Wed Dec 03, 2025 8:25 am

Ok i will test with more threads. Thank for the information!

Matrix error improvement

Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Re: Matrix error improvement

Who is online