Using Newton for raytracing

by **JoeJ** » Sat May 10, 2014 7:54 am

I need two functions for lighting purposes:

Line / AABB Intersection with boolean result.
Line / Polygon Intersection with boolean result.

I do not want intersection point or distance to be calculated, and i use a line from 2 points, not an infinite ray.
I'd be fine if 4 point polygon is supported.

Most papers focus on infinite rays, so i'll ask before implementing my own stuff.
There seems to be what i want in dgIntersections, but it's a bit hard to read due to optimizations...

by **JoeJ** » Sat May 10, 2014 3:27 pm

That's what i came up with for the box (not tested for special cases etc. yet)

Code: Select all: bool IntersectRayAABox (const Ray& ray, const AABox& box) { qVec3 t0 = qVec3(box.minmax[0] - ray.origin).MulPerElem (ray.invDirection); qVec3 t1 = qVec3(box.minmax[1] - ray.origin).MulPerElem (ray.invDirection); qVec3 tMin = t0.MinPerElem (t1); qVec3 tMax = t0.MaxPerElem (t1); qScalar ffd = tMin.MaxElem(); // front face distance (may be behind origin) qScalar bfd = tMax.MinElem(); // back face distance return (ffd <= bfd) & (bfd >= 0.0f) & (ffd <= ray.length); }

Looks much simpler and also faster than anything i've read in papers, but i did not do any comparisions.
There's no early termination, but due to simd theres no point to miss it, and i'll need it for GPU finally.

Let me know if anyone knows a faster way... this will still put my lighting performence down pretty much

by **Bird** » Sat May 10, 2014 5:54 pm

http://www.gamedev.net/topic/338987-aabb---line-segment-intersection-test/ Here's another way. I have no idea if it's faster than what you came up with.

-Bird

Code: Select all: bool Intersection::Intersect(const Vector3& p1, const Vector3& p2, const Vector3& min, const Vector3& max) { Vector3 d = (p2 - p1) * 0.5f; Vector3 e = (max - min) * 0.5f; Vector3 c = p1 + d - (min + max) * 0.5f; Vector3 ad = d.Absolute(); // Returns same vector with all components positive if (fabsf(c[0]) > e[0] + ad[0]) return false; if (fabsf(c[1]) > e[1] + ad[1]) return false; if (fabsf(c[2]) > e[2] + ad[2]) return false; if (fabsf(d[1] * c[2] - d[2] * c[1]) > e[1] * ad[2] + e[2] * ad[1] + EPSILON) return false; if (fabsf(d[2] * c[0] - d[0] * c[2]) > e[2] * ad[0] + e[0] * ad[2] + EPSILON) return false; if (fabsf(d[0] * c[1] - d[1] * c[0]) > e[0] * ad[1] + e[1] * ad[0] + EPSILON) return false; return true; }

by **JoeJ** » Sun May 11, 2014 11:05 am

Hmm, i don't get why everything is multiplied by 0.5 there, but it's surely slower. More math and branches. (Most vector functions i use are just one instruction)
I think the only way to speed it up might be to calculate backface distance first and early terminate - so doing only half the work in lucky cases.

There are some intersteing ideas - "Pluecker coordinates" or testing slopes - Authors claim 100% speed up against traditional methods like mine, but their code is huge, full of branches and differs depending on classifications, so it's a bad choice i think.

Thanks anyways

by **Julio Jerez** » Mon May 12, 2014 9:42 am

I think you version is similar to the version in Netwon by use different simd API.

Code: Select all: DG_INLINE dgFloat32 BoxIntersect (const dgVector& minBox, const dgVector& maxBox) const { dgVector test (((m_p0 <= minBox) | (m_p0 >= maxBox)) & m_isParallel); if (test.GetSignMask() & 0x07) { return dgFloat32 (1.2f); } dgVector tt0 ((minBox - m_p0).CompProduct4(m_dpInv)); dgVector tt1 ((maxBox - m_p0).CompProduct4(m_dpInv)); dgVector t0 (m_minT.GetMax(tt0.GetMin(tt1))); dgVector t1 (m_maxT.GetMin(tt0.GetMax(tt1))); t0 = t0.GetMax(t0.ShiftTripleRight()); t1 = t1.GetMin(t1.ShiftTripleRight()); t0 = t0.GetMax(t0.ShiftTripleRight()); t1 = t1.GetMin(t1.ShiftTripleRight()); dgVector mask (t0 < t1); dgVector maxDist (dgFloat32 (1.2f)); t0 = (t0 & mask) | maxDist.AndNot(mask); dgAssert ((mask.GetSignMask() & 1) == (t0.m_x < dgFloat32 (1.0f))); return t0.m_x; }

The one in Newton does uses an early out part at the beginning that makes it about 30% slower than your version
Although that part can be predicated in the code, I decided not to because it will no make the code fasters.
The reasons are these.
1- the early out at the beginning of the function and the branch will cost less that the code body the followed
2- when you study patterns of line an box intersections, it is rare that the line Is parallel to the box,
this mean that CPUs with branch prediction will cache the predict the branch as not taken and will take a hit actionably,
as oppose to always take the take the long pass with more instructions. this of course should be test it

I do not see how you version it deal parallel lines to one face or with origin inside the box.

on the Plucker bull *, I will no go but what people say in video game papers,
every person who reinvent the wheel thinks his version is more round than the wheel,
this seem to be a specially traits of the self pointed expert in Gamedevelop.net

The truth is that the plucker non sense is twice slower, the reason is that the code does no a line intersection if does a line test, if you need the intersection param, you still have to do the long version If the ray test

The version that Bird posted look very fast, I bet that in simd can be about twice as fast as our because all the conditionals can be simdsice to have no early out, plus almost all of the calculation at the beginning can be recomputed.
so maybe is I a good candidate to try out.

Code: Select all: Vector3 d = (p2 - p1) * 0.5f; // this is the center of the line can be precomputed as line.center, I do not know what it mean foeth code Vector3 e = (max - min) * 0.5f; // this is box size and can be precomputed as box.size Vector3 c = p1 + d - (min + max) * 0.5f; // (min + max) * 0.5f is the box orgin can be precomputed as box.center Vector3 ad = d.Absolute(); // this can also be precompyte it will reduce to Vector3 c (p1 + line.center - box.size);

after looking at it I think the one bird post can be about 2 to three time faster, I wonder if it can be modified to calculate the intersection.

by **Bird** » Mon May 12, 2014 11:38 am

after looking at it I think the one bird post can be about 2 to three time faster, I wonder if it can be modified to calculate the intersection.

The author provides a version the finds the intersection point too in post #4 in the link above.

-Bird

Code: Select all: //And a version that returns points of intersection (just reject t's outside [0,1] for a segment): template <class T> bool IntersectLineAABB( const Vector3<T>& O, const Vector3<T>& D, const Vector3<T>& min, const Vector3<T>& max, T t[], T epsilon) { Vector3<T> C = (min + max) * (T)0.5; Vector3<T> e = (max - min) * (T)0.5; int parallel = 0; bool found = false; Vector3<T> d = C - O; for (int i = 0; i < 3; ++i) { if (Math<T>::Fabs(D[i]) < epsilon) parallel |= 1 << i; else { T es = (D[i] > (T)0.0) ? e[i] : -e[i]; T invDi = (T)1.0 / D[i]; if (!found) { t[0] = (d[i] - es) * invDi; t[1] = (d[i] + es) * invDi; found = true; } else { T s = (d[i] - es) * invDi; if (s > t[0]) t[0] = s; s = (d[i] + es) * invDi; if (s < t[1]) t[1] = s; if (t[0] > t[1]) return false; } } } if (parallel) for (int i = 0; i < 3; ++i) if (parallel & (1 << i)) if (Math<T>::Fabs(d[i] - t[0] * D[i]) > e[i] || Math<T>::Fabs(d[i] - t[1] * D[i]) > e[i]) return false; return true; }

by **Julio Jerez** » Mon May 12, 2014 11:55 am

yes that is what I was saying in my post that and instersection test, using the pluck nonse is not really any better than the generic test, because
it ususally need the parameter and no just if you intersect or not.

the function I posted is not faster than the test but is like five to ten time fsater that the intestion using the loop, and can be use as both intesetion test
and a intersection function, for intersection test the return valie is negative. else the thre retun value is the parametic intertion.

by **JoeJ** » Mon May 12, 2014 12:59 pm

I tried a simd implementation of the gamedev code.
I still do not well understand how it works (would need to add some debug rendering), but i'm sure it is much slower:

1. It uses 2 cross products. In my math lib (ontop sony simd lib came with bullet) this is still a slow operation.
I don't know if it can be made efficient with more moderen instrunction set.

2. Even if shuffling is less expensive in GPU shader, i count more than twice the raw math and more registers necessary against my version.
But let me know if you still think i could miss something. I did not do any real tests.

Both versions work for cases like ray inside box or ray aligned with axis.
Both versions return false if ray is parallel to a box face (EDIT: shares the same plane), so i need to add an epsilon to the boxes. Thx for pointing out

EDIT: Mine and Julios version seem to be basically the same.

Code: Select all: bool IntersectRayAABox2 (const Ray& ray, const AABox& box) { // code from 'scgames' // precompute: qVec3 d = (ray.direction) * ray.length * 0.5f; // ray center qVec3 ad = d.AbsPerElem(); qVec3 e = (box.minmax[1] - box.minmax[0]) * 0.5f; // box half size qVec3 bcent = (box.minmax[0] + box.minmax[1]) * 0.5f; // box center // runtime: qVec3 c = ray.origin + d - bcent; qVec3 ac = c.AbsPerElem(); qVec3 ePad = e + ad; if (qVec3(ac - ePad).MaxElem() > 0) return false; qVec3 dCc = d.Cross(c).AbsPerElem(); qVec3 eCad = e.Cross(ad); if (qVec3(dCc - eCad).MaxElem() > FP_EPSILON) return false; return true; }

EDIT3: G3D Innovation Engine has implementation of the Eisemann slope based algo, (which is claimed to be a bit faster than the Pluecker thing) with and without intersection. I copied this initially but than thought that the simple thing i've had in mind initially should be better.

EDITEDIT: I need to edit this later again. The code i posted is messed up (i forgot to change function name and called my own old version during tests)
Also i missunderstood the parallel term...

Ok, i found the bug i've introduced:
qVec3 eCad = e.Cross(ad) this should not be a regular cross product - the multiplied numbers should be added, not subtracted (instruction count stays the same).

For the parallel term i mean:
My method works if the ray is parallel to world axis - infinite numbers still yield correct results. Julio, maybe you could remove the parallel test as well.
It fails however if the ray has zero length AND is inside the box (reporting false instead true).

Using Newton for raytracing

Using Newton for raytracing

Re: Using Newton for raytracing

Re: Using Newton for raytracing

Re: Using Newton for raytracing

Re: Using Newton for raytracing

Re: Using Newton for raytracing

Re: Using Newton for raytracing

Re: Using Newton for raytracing

Who is online