Leadwerks wrote:What is so special about your algorithm, other than the fact you put everything into a 3D grid to reduce the number of particle combinations you have iterate through? Is there any other major optimization that can be done?
As said, the only difference to state of the art methods from papers is that i trade a need for atomics or spin locks against a coarse division of work so it can be parallelized lock free *). But that's nothing special and does not explain why i seemingly beat mGPU paper results.
My 3D grid is certainly not special - instead i only adopted this idea from said papers. But it's not about particle combinations at all, as there are no direct particle interactions in MPM (unlike SPH).
Traditionally, MPM is simply like so:
'G2P': Particles scatter their velocity and mass to a 3^2 neighborhood of grid cells.
'P2G': Particles gather updated velocity from the grid in the same way.
(Latest paper proposes to fuse both loops into one, halfing particle BW and also halfing the need for expensive SVD matrix decompositions. Very promising, but i have no tried this yet.)
So you see the grid acts to smooth out velocities which we want for fluid.
To make this realistic, they use 3x3 matrices per particle to model things like deformation of space around them, and those matrices 'skew' the particle velocity as injected to adjacent grid cells, allowing to model elastic collisions, or any kind of shrinking / expanding of the surroundings, e.g. to preserve volume. This way a particle can also track state over time, e.g. 'I'm compressed to 140 % of density, and i want to expand to get back to 100 %.
Beside that, a performance relation of the grid (or sparse grid, etc.) has motivations like this:
Make groups for parallel processing and coherent memory access.
Limit memory hazards (particles may write to the same cells concurrently)
Allow larger domains.
*) I just came here to point out that this is no good idea for games either.
I have disabled MT for debugging and noticed on a single core the 10k particle fluid experiment goes from 6ms to only 10ms. So, for a low particle count my method performs poorly, even if i would use a simple regular grid. The workloads become too small, and threads spend too much time on waiting on the last one.
This raises the question if particle binning and reordering in memory to grid is worth it at all. Without that, particles memory access to the momentum grid becomes random over time, but i can imagine that's the lesser evil.
If we adopt the fused 'P2G2P' idea from the paper, we will access each particle only once during simulation. No need to reorder stuff just for that, i guess.