Collision free instance painting

by **JoeJ** » Sun Jan 02, 2022 4:21 pm

Bird wrote:Yeah, they couldn't make it more confusing.
VC2019 is actually Visual Studio 16 so VC16 is what you want

Oops - first time i fell into this trap

works now, thanks!

Julio Jerez wrote:the my method does generate manifolds, which mean it can be used for real time procedural mesh.

I see. So i could provide density volume data, and Newton does collisions on that? That will make my procedural stuff easier. E.g. i could use fluid sim density of earth quake to get my destructed city...

Some timings for Ryzen 2700, set to 16 threads otherwise defaults:
basic gpu rigid body:
sse soa: 23 ms
avx2: 21ms
Then i have deleted everything, used CMake again, checked OpenCL as well, but can't build:

Severity Description File
Error Cannot open include file: 'dCoreStdafx.h': No such file or directory C:\dev\newton-dynamics-master\newton-4.00\sdk\dExtensions\dOpencl\ndDynamicsUpdateOpencl.cpp

I would like to try that just for curiosity. What could it be?

Having lost belief in monster dGPUs, i currently guess iGPU could become the new mainstream to keep the platform afloat. Seems upcoming AMD APUs will have 3.7 tf, which would be enough for me to deliver next gen gfx. Thus i don't plan for thousands of rigid bodies on GPU. Some robotic ragdolls should give us more anyway. :mrgreen:

by **Bird** » Sun Jan 02, 2022 4:39 pm

Julio Jerez wrote:unfortunately, by my break is up, so I am back to weekend sit downs again.
but I will try that next weekend.

Looking forward to it.

by **JoeJ** » Sun Jan 02, 2022 4:41 pm

Julio Jerez wrote:this code seems very old and try to be very clever by making a vert list index list, the requires
a way to cache vertex duplication and index hash. the original code was using hash map, I am using the engine redback tree. however, this make is quite slow, as is stan now those are the slower part.

I can't remember what i've done, and looking up my code would need some time to understand it fully again.
However, Marching Cubes seems more complex, mainly because it generates variable number of triangles per cell.
For a Minecraft alike quadmesh that's not the case, and probably because of that i had no need for any mapping.
But it seems i use data structures holding all potential quads for the whole volume, so a lot of memory. After that i have some compaction step over the whole volume again. My stuff is definitively brute force and not clever. But i remember i had a port to GPU i mind, and it would work well with that.
Then there is a lot of complexity to guarantee manifold, and to merge per thread blocks to a final whole mesh.
Notice that Marching Cubes does not guarantee manifolds, which requires to duplicate / join vertices and edges. Various algorithms added this, but there is no single convention telling us which solution is preferred. We either want to join or separate shared edges. At this point i was not even sure what i want. Probably a main reason why so many variants of iso surface extraction algorithms exist. Can be quite hairy and confusing, but i think the fastest option to get done is to reinvent the wheel.

For realtime, i'd definitively want to go without manifold guarantees. No issue for rendering (just normal bay be zero), but not sure about collision detection.

by **Julio Jerez** » Sun Jan 02, 2022 4:45 pm

The OpenCL solve does not work, I disabled in favor of making teh basics solve more friendl to conver to gpu.

wow, 23 ms, I would expect better than that. but I guess that cpu is like mine where the lack of cache makes the multhread code actually run slower.

anyway, I was looking the original code for marching cube, and it is indeed my version that is slow.
basically, I have an optimization that make generate unique vertices.
I do not remember where I got it from, but I do remember I spend lots for time on it.

In the end the clever trick of making each vertex unique was good at the time, but now with so many cores, is better to make the flat triangle array, the converted to a vertex list index list.
the way I see this could make at list three of four time faster and can be GPU friendly.
I think I will try that before moving on to the Physics simulation.

by **JoeJ** » Sun Jan 02, 2022 5:09 pm

Julio Jerez wrote:wow, 23 ms, I would expect better than that. but I guess that cpu is like mine where the lack of cache makes the multhread code actually run slower.

Well, it's a 65 Watt CPU, and not the newest.
I noticed 1-2 cores can already saturate memory bandwidth. Not sure if that's normal. Maybe i should have used Dual Channel Ram instead Single Channel.
Otherwise i'm happy with it. MT often gives me speedups of 10.

Just tried to switch to Clang compiler, which did not made it faster. Usually it does. So you're probably right and it's about cache.

by **Julio Jerez** » Mon Jan 03, 2022 10:56 pm

Bird wrote:Wow, the fluid surface reconstruction is looking pretty fast. On my machine physics time is about 11 ms.

I tried creating the same fluid volume mesh you made with 32,768 particles using OpenVDB and it's taking about 450 ms.

ok Bird, since you ran the test, maybe you can try again.
I am now raising the bar.

I made the base code that generate tringle list instead on index list.
this removes all of the O(n log n) form the inner loops. and it is one average about twice as fast.
is not really faster because like I said before it trades memory for code speed, so now is bandwidth bounded.

I am now generation a 40 x 40 x 40 solid block that 64k particles, in single tread in CPU.
maybe you can compare it with the OpenVDB and see where we are now.
I was targeting 32 k particle for real time, but I think I can now target 64k or a least is feels comfortable with a value on the range, which make it possible to be practical

the secund point is that the new algorithm is 100% parallelizable, so for high end system, we can problably go higher than 64k.

there are few optimizations that I can apply but these will only yield marginal speed up.
but I will apply then so that will get it over with plus it is very close to 16 ms, and I would like to be under that.

any way if you want to see block the render thread you can open file
...\newton-dynamics\newton-4.00\sdk\dCore\ndThreadBackgroundWorker.cpp

and comment out //#define D_EXECUTE_IMMIDIATE

this will execute the task immediately and is good for debugging and see how much is can delay the main thread. remember by hiding the latency in a background thread, it is desirable that the task complete in one simulation frame.

anyway that the state now, one more round of optimization and move to the physics.

One of the good of having a the fluid is that is opens the door for a huge range of effect.
just using the navier and stoke equation we can get for free stiff like water and gasses.
and when using the Reynolds number there are ton of effects that can be approximated.
https://www.youtube.com/watch?v=wtIhVwPruwY

also if you feel brave, you can try setting ndInt32 particleCountPerAxis = 64;
262k particles in cpu. hikes !!!! :shock:

,

by **Julio Jerez** » Tue Jan 04, 2022 1:07 am

I just realized that, the usage of this can extend to a lot.

For example if we combine the iso surface with an octree,
Now we do not have to pass solid geometries, we can use the octree to jus scan the skin of the surface and pass that, reducing the number of point by o e dimension.

Fir exame, say we have a huge sphere, say 200 unit in radio.
If the was solid. That will be about 10 million points,
Volume proportional to the radios ^ 3

but using and octree is possible to carve huge interior solid block so in theory the number of point is now proportional to the radius ^ 2 which will be in the vesinity of 40000,

Now imagine this shape not a spheres but any arbitrary volume, like a hight field, if wi tile it, at say 128 x 128 now we have a procedural terrain that we can modify at will.
We can make gave any kind d of overhang, cave and can be modify in real time.

The possibities are scary.

by **JoeJ** » Tue Jan 04, 2022 8:30 am

Julio Jerez wrote:The possibities are scary.

You surely know the Atomontage engine? It's built around that vision, and allows to poke out holes of the models at runtime. Some random video where they use this to deform terrain: https://www.youtube.com/watch?v=Gshc8GMTa1Y

I also use similar ideas for my volumetric editor, which can use regular meshes, voxelize them, then do CSG or geometry blending. Or i create geometry from SDF primitives per fluid particle. Stuff like that. Pretty interesting for procedural content generation, which i think should be explored to reduce content creation costs.
I have no volumetric representation of the whole world, but i generate this only temporally to generate the final surface mesh. The surface pipeline is isosurface extraction -> curvature aligned quad remesh. Such remeshing isn't realtime, but curvature alignment gives good quality. Still, it's resampled geometry and can't preserve all details of input meshes ofc.

Contrary, voxels or isosurface is aligned to the global grid, so it's quality is not good for visuals. Except if we would use insane high resolutions, leading to a waste of memory. Thus i'm no real voxels believer, and things like Atomontage surley are not the future despite their claims.
However, for realtime dynamic geometry it's probably our only option, so that's interesting.

But than i need to ask again: Why a surface mesh at all? You could do collisions against the volume data itself much faster? I mean, ofc. you need a mesh for easy visualization, but you don't need it for the actual engine?

Supporting volume data would be quite interesting. Having some experience with this, here are some important points i've learned / found out:

Density needs less memory than signed distance (SDF), because we can do easy compression based on sparsity.
DAG achieves much higher compression ratios over SVO on big data. It's easy to convert from SVO to DAG, but not realtime. Interesting for storage at least. Random paper: https://www.cse.chalmers.se/~uffe/HighResolutionSparseVoxelDAGs.pdf

But SDF allows faster processing than density. E.g. efficient sphere tracing for collision detection. (This is part of how Unreal Engine 5 realtime GI works - they trace SDF models.)
An approximate conversion from density to SDF is possible in realtime, by making an exponential sum of density mip maps. It's not perfect, but good enough for my needs so far.

Voxelization of mesh assets is hard. Because they are usually not manifold and are often just partial, defining surface only for visual purposes. So they lack a robust volumetric definition.
Early solution was raytracing many rays from a given point. If the rays hit mostly back faces, they point is probably in solid space, otherwise in empty space. This is super slow.
A better solution is to integrate the surface to the point. This paper does so by calculating a winding number: https://www.dgp.toronto.edu/projects/fast-winding-numbers/fast-winding-numbers-for-soups-and-clouds-siggraph-2018-barill-et-al.pdf
The authors implementation showed they store some coefficients and do some taylor series, but this is a wast of memory and calculations. As i knew from my work on GI, surface area and normal direction ('surfel') is enough to integrate surface surface form factor with better accuracy. If we are close to the surface, we can switch to exact triangles for precise accuracy where needed.
Here is code i did to research this (can switch between both methods):

Code: Select all: static bool visInsideOutsideTest = 0; ImGui::Checkbox("visInsideOutsideTest", &visInsideOutsideTest); if (visInsideOutsideTest) { static int init = 1; static HEMesh mesh; if (init) { //MeshConverter::TestTrianglesToHE (mesh, mesh); //((HEMesh_Serialization&)mesh).AutoLoadModel ("C:\\dev\\pengII\\mod\\bunny closed.lxo"); // hangs on tessellation (delauney) ((HEMesh_Serialization&)mesh).AutoLoadModel ("C:\\dev\\pengII\\mod\\Armadillo_input.obj"); //((HEMesh_Serialization&)mesh).AutoLoadModel ("C:\\dev\\pengII\\mod\\Gear.lxo"); //((HEMesh_Serialization&)mesh).AutoLoadModel ("C:\\dev\\pengII\\mod\\MaxPlanck_input.obj"); //((HEMesh_Serialization&)mesh).AutoLoadModel ("C:\\dev\\pengII\\mod\\Beetle_input.obj"); // hangs on tessellation (delauney) //((HEMesh_Serialization&)mesh).AutoLoadModel ("C:\\dev\\pengII\\mod\\Buddha_input.obj"); // hangs on tessellation (delauney) } static vec query(0.46f, 0, 0); ImGui::DragFloat3("query", (float*)&query, 0.001f); static bool surfels = 0; ImGui::Checkbox("surfels", &surfels); float integral = 0; if (surfels) for (int i=0; i<mesh.GetPolyCount(); i++) // surfels { float area; vec center = mesh.CalcPolyCenterAndArea(area, i); vec norm = mesh.CalcPolyNormal(i); vec diff = query - center; float sql = diff.SqL(); if (sql > FP_EPSILON2) { float solidAngle = diff.Dot(norm * area) / (sqrt(sql) * sql); integral += solidAngle; } } if (!surfels) for (int i=0; i<mesh.GetPolyCount(); i++) // triangles { vec triVD[3]; bool onSurface = false; int eI = mesh.GetPolyEdge(i); for (int j=0; j<3; j++) { vec diff = mesh.GetEdgeVertexPos(eI) - query; float dist = diff.Length(); if (dist < FP_EPSILON2) { onSurface = true; break; } triVD[j] = diff / dist; eI = mesh.GetEdgeNext(eI); } if (!onSurface) { float numer = triVD[0].Dot( vec(triVD[1]-triVD[0]).Cross(triVD[2]-triVD[0])); float denom = 1 + triVD[0].Dot(triVD[1]) + triVD[0].Dot(triVD[2]) + triVD[1].Dot(triVD[2]); float solidAngle = 2 * atan2(numer, denom); integral += solidAngle; } } float winding = 1 / float(4*PI) * integral; bool inside = winding > 0.5f; RenderCircle(0.01f, query, vec(0), !inside, inside, 0); ImGui::Text("winding %f", winding); ((HEMesh_Vis&)mesh).RenderMesh(0,0,0); init = 0; }

That's simple and nice, but still slow.
To make it fast, we can use something like an octree and calculate average surfel per cell. If the query point is distant from the actial node, we can use that average instead traversing the subtree.
Then we are at the point where above paper is. Still slow

They proposed fast multi pole method, but did not try.
I implemented it (or something similar) by processing blocks of queried regions in subdividing top down fashion (which requires to give traversal stack to child nodes as parameter). Now we can use a distant surface patch integral for a whole block of query region, and the algorithm is finally paractically fast. Iirc, i can voxelize a meshes at 2k^3 resolution in one minute.

But it's still not perfect. Imagine a model of a column on a floor. The modeler will create a big flat quad for the floor, and a cylinder without caps for the column. He will not cut out a hole from the floor polygon where the column is, as this would increase triangles and work for no win.
In a point inside the column but close to the floor, the algorithm 'sees' more frontface surface than backfaces, so it will generate a bubble of empty space at this region, which we do not want.
I solve this by a heuristic filtering out bubbles with small surface area, bun manual interaction to decide over such bubbles remains necessary in very rare cases, i guess.
A nice worst case model is CryTeks Sponza.

Now you now. And you can decide how deep you want to go into the rabbit hole of volumetric data :mrgreen:

It's not really hard problems, but quite time consuming. As usual

And sorry if i wrote all this once before already - i somehow remember as if i did.

by **Julio Jerez** » Tue Jan 04, 2022 1:27 pm

yes the bottom line is that, as long as you can generate surfaces in linear time,
then using CPU, is you have a fast enough algorithm you can do relativity cool stuff since the process become sub O(n^2) is you deal with surface (number of cores a subset of the number of items)

this explains the proliferation of some many clever voxel terrain engines out there, basically an octree let you scan the surface of a solid volume and term into a mesh.

if the number of processing units, is comparable to the number maximum number of item in one dimension (line GPUs, which has n cores) then you process is reduce by one dimension O(k n)
where k is a high constant.

this is why making fluid is a much harder process, because the data is volumetric.
At best in GPU you end with O(k n^2), which is pretty good with today hardware.

I am content with the fast O(n^3) for solid volumes and is why I am keeping the target from 32 to 64 k particles. and O(n ^2) for surface algorithm using clever octree for cool graphic and collision meshes.

I now added the phase two, of the optimization and I can generate manifolds index list for up 32k in about 6.5 ms.

: Untitled.png (59.85 KiB) Viewed 3176 times

you can see that the average time is 6.5 ms, and I still have the last optimization, which I think will make under 6 ms. and then this will be completed.

then I will add a feature to making open manifold meshes which I think is need it for adaptive reconstruction, and this will allow for make stuff like open terrains.

anyway we are now closer.

by **Julio Jerez** » Tue Jan 04, 2022 2:27 pm

as a test if the power of the O(n^2) for surfaces vs O(n^3) for volume rconstruction

I made to produce a hollow box one cell thick, and the whole thing is abput 2.5 ms, that icluse all the engine overhead, the surface generation is under 1 ms.

hikes, :mrgreen:

by **Bird** » Tue Jan 04, 2022 2:39 pm

I am now generation a 40 x 40 x 40 solid block that 64k particles, in single tread in CPU.

That runs at about 3ms on my machine.

The OpenVDB time I first reported was wrong. Turns out I was testing while my app was just starting up and many threads were being used for loading resources.

Now the OpenVDB times look like this: Still much slower than what you're getting
32768 particles = 90 milliseconds
64000 particles = 147 milliseconds
262144 particles = 301 milliseconds

by **Julio Jerez** » Tue Jan 04, 2022 3:04 pm

Bird wrote:The OpenVDB time I first reported was wrong. Turns out I was testing while my app was just starting up and many threads were being used for loading resources.

Now the OpenVDB times look like this: Still much slower than what you're getting
32768 particles = 90 milliseconds
64000 particles = 147 milliseconds
262144 particles = 301 milliseconds

I thought you might have a miss reading when you said 450 ms, I thought you were running a debug build.
but still, those are very good figures.

I am testing some curve shapes, and I see the first problem, which is that

Mu ison surface reports a value 0.5 for any point inside a cell this is fast but generate too course mesh since the angle can only be 45, 30, and 60 base of the configuration.

but this can be firs easily by making the class a template, and pass a iso interpolator that can calculate a poison.

my guess is that OpenVDB is so slow because they are probably using a global poison interpolator, which is translate to a sparse linear system that need to be solves to calculate the optimal density
I will add that later, and that will produce high quality mesh,
My guess is that OpenVDB mesh are very high quality.

another easy method is to just do a trilinear interpolation of the point insides grid. that will generate a smother mesh.

and yet there is another graphics solution with is using the surface smoothing of the new graphic apis, so the will so the mesh smoothing.
but that will no work for physics. but is any case way are on our way.

anyway, if you sync you will see what I am talking about.

if you sync, you will see what I am talking abput now.

by **Julio Jerez** » Tue Jan 04, 2022 3:24 pm

actually, I made to render a solid color and all in all is not too bad. considering that the gird size is 0.25 unit (half meter) this mean that a trivial trilinear interpolator for the iso value would do almost a perfect job, no need for dreadful Poisson, calculation hessians, gradients and all that junk.

trilinear interpolation works for interpolating texture in graphics so it should work for us.

I figure that some as simple are say for level of densities. 0.0, 0.25, 0.5, 0.75 and 1.0
should be sufficient to yield a very convincing perfect smooth mesh.
I will try that after I do the last optimization.

please sync again so that you can see the solid mesh. and tell me what you think,
this will be our base line control for improve quality.

ha when you sync and run the test, you can navigate to inside the meshes,

you will see that the box is hollow, is has and inside surface, but the sphere is solid.
So you will not see any interior surface.
these are just stress test for the mesh generation quality.

by **Bird** » Tue Jan 04, 2022 3:55 pm

Those look good. The box looks similar to what OpenVDB is spitting out. I haven't been able to make an OpenVDB version of the sphere yet.

Physics time on my machine is 5.8 ms

by **Julio Jerez** » Tue Jan 04, 2022 4:38 pm

Bird wrote:Physics time on my machine is 5.8 ms

oh that the brute force generation, all happen in the main thread.
as I said before this is useful for determine how much we can do, before we reach a 1/60 of a frame.
because at that point the physics will block the main thread.
so as long as it is under 16 ms we are good, so far is show that we can aim for 64k particles.

I now enabled the part that hide the reconstruction, and I get under 2.5 ms.
give is a try, you should get under 2 ms.

Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Re: Collision free instance painting

Who is online