Jump to content
Mario Marengo

Fast Gi Anyone?

Recommended Posts

Hi all,

I recently came across a cool paper by Michael Bunnel from nVidia (one of the chapters from the GPU Gems 2 book), that deals with speeding up GI calculations using surface elements. Complementary to that paper, and along similar lines, but broader, is this one by Arikan et al. which will appear in the SIGGRAPH 2005 proceedings. A lot of yummy stuff in there! :P

Here are some early results from an HDK implementation I'm working on for the ambient occlusion portion of the nVidia paper (sorry, won't be able to share code on this one, but I thought you'd be interested in the viability of the method nonetheless):

Test Geometry: Car Wheel, 45,384 points, 44,457 polys, rendered as subdiv surface.

post-148-1118442629_thumb.jpg

Reference:

Here's the reference image, with 200 samples per shade point.

Calculation time: 2 hrs. 48 mins. 12.26 secs.

post-148-1118442675_thumb.jpg

(the little black artifacts are probably due to the wrong tracing bias, but I didn't feel like waiting another 3 hours for a re-render).

Solid Angle:

And this is a test using two passes of the "solid angle" element-to-point idea described in the paper.

Calculation time: 6.20609 secs. (!!!) .

post-148-1118442649_thumb.jpg

A little softer than the reference, but... not bad! :)

The implementation is in its infancy right now, and I know there's room for more optimization, so it should go faster.

Point Cloud Method

More comparison fun...

Here is the "occlusion using point clouds" method with the point cloud matching the geometry points, and 200 samples for each computed point (note that about half of all points were probably actually visited since, at render time, we only touch the ones visible to camera plus a small neighborhood).

Calculation time: 38 mins. 19.482 secs

post-148-1118442664_thumb.jpg

The artifacts are due to a too-small search radius, but if I were to increase it, then everything would go too soft, and since it's a global setting I left it at a small enough size to catch the high-frequency stuff. In any case, I was more interested in the timing than the look of it in this case.

Methinks the days of sloooow GI are numbered.... yay! :)

Cheers!

Share this post


Link to post
Share on other sites

Hey Mario,

sorry, won't be able to share code on this one

How could you leave us wanting more?!?! :(

Anyhoo, I hope you'll continue to update us on your progress. :)

Great stuff!

Cheers!

steven

Share this post


Link to post
Share on other sites
How could you leave us wanting more?!?! :(

18684[/snapback]

Hey Steven,

Sorry. Really don't mean to be a tease... it's just too damned exciting not to mention it at all... and... well... I'm willing to help anyone trying to implement it... how's that? :unsure:

Trying out the element-to-element radiance transfer now... jaw-on-floor... :ph34r:

Cheers!

Share this post


Link to post
Share on other sites
I'm willing to help anyone trying to implement it... how's that? :unsure:

Okay, let's rock?! :D

Mario, I have some questions to you:

1. The algorithm described in GPU Gems 2 book in any case have at least two passes? I.e. in the first pass we should create a map with the areas of elements and normals in points, and in the second pass we can make shading? It seems to me, that it allows to avoid recalculation of the areas in each shading sample, though such way probably not the most transparent for the user.

2. Where it is better to store the precalculated areas of elements, in bitmap or in .geo/.bgeo as point attributes?

3. If I have correctly understood, the emitter always is a point, while a receiver always is a shading sample?

4. It seems to me, that in the equation 14-1 in GPU Gems2 book should be the sum, i.e. not

1 - (r * cos(Teta_E * max(1, 4 * cos(Teta_R)))) / sqrt(A / pi + r * r)

but

1 - sum((r * cos(Teta_E * max(1, 4 * cos(Teta_R)))) / sqrt(A / pi + r * r))

I'm right?

Share this post


Link to post
Share on other sites

Hey Hoknamahn,

... man!... nothing like a large birthday party to get you fixing all the things around the house that you've been putting off for "later" -- and of course we're having record heat here in Toronto (not that I'm complaining, I love it, but it's not the best weather for chasing after 14 kids) -- ooofff... I'm beat...anyway...Where was I?... Oh yeah,

1. The algorithm described in GPU Gems 2 book in any case have at least two passes? I.e. in the first pass we should create a map with the areas of elements and normals in points, and in the second pass we can make shading? It seems to me, that it allows to avoid recalculation of the areas in each shading sample, though such way probably not the most transparent for the user.

Yes, I suppose you could consider the step of building the data structures as a separate "pass" if you like, but then you'll need a minimum of two "calculation" passes beyond that (for the reasons explained in the paper), for a total of three passes then. This is just semantics though -- it doesn't provide any new insight into what needs to be done.

2. Where it is better to store the precalculated areas of elements, in bitmap or in .geo/.bgeo as point attributes?

For the SOP version, I'm storing it as point attributes, not unlike what I did for the SSS algorithm. But it can also be calculated "on the fly" in a similar way to how the ScatterSOP does it. But this has to do with implementation and it really doesn't matter -- what I mean is: "use whatever makes most sense in the environment that you're adapting it to" -- for HLSL/CG this might be a texture map, in SOPs a point attribute, etc.

3. If I have correctly understood, the emitter always is a point, while a receiver always is a shading sample?

The other way around: The emitter is a disk and the receiver is a point (or the center of a unit sphere if you like).

4. It seems to me, that in the equation 14-1 in GPU Gems2 book should be the sum, i.e. not

1 - (r * cos(Teta_E * max(1, 4 * cos(Teta_R)))) / sqrt(A / pi + r * r)

but

1 - sum((r * cos(Teta_E * max(1, 4 * cos(Teta_R)))) / sqrt(A / pi + r * r))

I'm right?

Yes, it's the sum of contributions from all elements that occlude the receiver. Each one of those contributions is, according to the paper, calculated using that formula.... according to the paper... but you may be a little shocked at the results when you apply that formula as given...

When trying to implement something like this (or any other algorithm), there are always two separate and distinct steps: The theory (or "the math" in this case), and the implementation itself. Try not to mix them up. And always make sure you fully understand the theory first (and you don't need to write a single line of code for that), then the implementation comes naturally after that. In this case, things like how to calculate or store each element's surface area, are all things that belong to the implementation. Forget that for now. Also forget the fact that the basic theory can be sped up, in practice, through the use of that MIPMAP-like tree structure mentioned in the paper. Just assume that all necessary data is magically available to you and think about the theory...

For example, does that formula make sense to you?

Well; it doesn't to me. What the text *says* makes a lot of sense (in fact, it's one of those "DUH! How come I didn't think of that?!?" kind'a things). I'm no mathematician, but in my limited opinion, that formula doesn't do what the text says it's supposed to be doing. But there might be some Cg-related angle that I'm missing which makes it correct... I dunno... can't say for sure, but both Woolfwood and I tossed it around for a couple of days and we just couldn't reconcile it with the text.

The text says that the formula is meant to represent the solid angle covered by the emitter element (an oriented disk) from the point of view of the receiver (an oriented point). In other words (and this makes sense intuitively): How much of the total surface area of a unit-sphere surrounding the receiver does the emitter's area (after accounting for their relative position/orientation and projecting to this unit sphere) use up?

Hope that helps somewhat,

Cheers!

Share this post


Link to post
Share on other sites

Thank you for the long-long answer, Mario! :)

Now I shall torment you a little, maybe it will be usefull to someone too. :lol:

For the SOP version, I'm storing it as point attributes, not unlike what I did for the SSS algorithm. But it can also be calculated "on the fly" in a similar way to how the ScatterSOP does it. But this has to do with implementation and it really doesn't matter -- what I mean is: "use whatever makes most sense in the environment that you're adapting it to" -- for HLSL/CG this might be a texture map, in SOPs a point attribute, etc.

As I understand, some piece of geometry moves on our Bla-bla SOP and it writes the calculated areas/normals attributes. Then some questions arise:

1. We can take into account influence only those points for which we use this megafast shader? I.e. if we have precalculated the areas for several objects, on one of which this will be applied megafast shader, on the others any others shaders the information on the areas of points of other objects will be inaccessible to us, right? Or we can take *ANY* information from *ANY* piece of geometry using HDK?

2. Can you say, how depends the quality of occlusion from points count? For example, if to compare quality of a picture for object with 5000 points and for the same object subdivided to 100000 points? Or here it is necessary to speak not about absolute quantity of points, but about a parity of the area of a surface of object to quantity of points?

3. I think something like Scatter SOP (but in 2D) can make sense in this case?

The other way around: The emitter is a disk and the receiver is a point (or the center of a unit sphere if you like).

I'm not sure, that has correctly understood you... Whether my idea is correct: when the occlusion shader does it's own work then current shading sample is receiver but emitter is a *POINT* (with area/normal attributes) from a piece of geometry? I.e. number of receivers == number of shading samples, number of emitters == number of points of geometry. Right?

And always make sure you fully understand the theory first (and you don't need to write a single line of code for that)

Yeaaa... My first step is make the theory clear for myself.

For example, does that formula make sense to you?

I can't say that the formula is clear to me.

r * cos(Teta_E) looks like сathetus if r is hypotenuse. But what mean this cathetus? I don't know.

A/pi - radius of emitter. So sqrt(A / pi + r * r) gives us other hypotenuse.

But The formula is still not clear for me... And why *4* cos(Teta_R)?

Well; it doesn't to me.

So you decided to write your own formula for calculate "amount of shadows"?

In any case the general idea is clear. It is possible to think of the rest.

The text says that the formula is meant to represent the solid angle covered by the emitter element (an oriented disk) from the point of view of the receiver (an oriented point). In other words (and this makes sense intuitively): How much of the total surface area of a unit-sphere surrounding the receiver does the emitter's area (after accounting for their relative position/orientation and projecting to this unit sphere) use up?

Yes, this idea is crystal clear :)

post-312-1118585542_thumb.jpg

Share this post


Link to post
Share on other sites

Hi Mario,

This all looks very interesting. I scanned through this paper when Frank posted the link to the siggraph papers, I didn't think much more about it til your post. At the moment though I'm still struggling with some of the new terminology. But it looks like a very simple algorithm and for the quality and speed well worth some time to understand...

So,

Is the form factor calculation mentioned in the siggraph paper the same one that they present the calculation for in the nvidia paper?

In the sig paper they just seem to say we used this calculation for the near term but don't give any implimentation details. But to me it seems like the most useful part of the whole method.

And also, I think I have the same question as hoknamahn, are you doing this is HDK so that you can access geometry from the whole scene in a way that you can't just within a "normal" shader or is it just because an HDK implimentation will be quicker than any other method?

Hoknamahn, I see how your anotations relate to the formula in the nvidia paper but how do they or the formula relate to calculating the solid angle?

Share this post


Link to post
Share on other sites
Hoknamahn, I see how your anotations relate to the formula in the nvidia paper but how do they or the formula relate to calculating the solid angle?

If I am not mistaken, solid angle of element it is an angle between its normal and a vector r. So for emitter it is Teta_E angle.

Similar interesting technics http://www.tml.hut.fi/~janne/aofields/aofields.pdf

Share this post


Link to post
Share on other sites
If I am not mistaken, solid angle of element it is an angle between its normal and a vector r. So for emitter it is Teta_E angle.

18710[/snapback]

Yeah that's what I thought, so why do they need all those other terms in there, what do they do?

Share this post


Link to post
Share on other sites
Yeah that's what I thought, so why do they need all those other terms in there, what do they do?

18713[/snapback]

They need this terms for calculation of amount of occlusion that emitter did. If you know the area of the emitter and its orientation (i.e. solid angle) concerning a receiver, it is possible to calculate the shadow area that emitter casts on the receiver.

Share this post


Link to post
Share on other sites

Sorry, I understand what they are doing but I can't find any explaination of how they derived their formulation of that process.

So my understanding so far is

1. The solid angle of the emitter represents how much a reciever is obscured by an emitter.

2. The equation for the solid angle is simply the area of the emitter projected onto a unit sphere.

post-509-1118622882_thumb.jpg

3. They perform this calculation with the equation

(r * cos(Teta_E * max(1, 4 * cos(Teta_R)))) / sqrt(A / pi + r * r)

But what I don't understand is how is this equation derived? I've done some searches on google but I can't find anything that would explain how to project a circle onto a sphere that relates to that equation.

It looked like you knew how it worked from the anotations you added to that diagram but I can't follow how they help derive the area of the circle projected on a sphere. Could you elaborate more?

Share this post


Link to post
Share on other sites

> Hoknamahn wrote:

> Or we can take *ANY* information from *ANY* piece of geometry using HDK?

Well... if "ANY" can be defined by the user with the standard Houdini tools then yes, it can. But, unless I missed those classes, the HDK doesn't yet read minds :P.

As you know, a SOP can only access what is given to it directly through inputs, or via object-reference parameters. On the other hand, nothing stops you from merging all objects into one, and feeding the entire mess to a single SOP. Similarly, one could scatter an unlimited number of points to create a point-cloud (disassosiated from topology) to represent the world's surfaces. And finally, one could strive to taylor the algorithm to work on generic point soups (lifting the need for topology/connectivity information) in the hopes of arriving at a "lowest common denominator method" that could (in theory) work for all possible input types (allowing the user to mix point clouds and structured geometry freely). Such an approach, if possible, would be "callable" from either a rendering context (Mantra/VEX) or a geometry context (SOPs), or from anywhere really, as long as the caller can pass it a "bunch'o points" to compute. In the end, that's the minimum requirement... a "bunch'o points" with some attributes, and yes, you might need a way to tag subsections of said cloud as belonging to this or that object, and there are bound to be many more details, but that, roughly, is your range of options.

> Hoknaman wrote:

> Can you say, how depends the quality of occlusion from points count?

Generally speaking, the more points, the more accurate the result. But more importantly, better results are achieved when the point density varies in close proportion to surface curvature.

> Hoknaman wrote:

> I think something like Scatter SOP (but in 2D) can make sense in this case?

It doesn't make a any more or less sense than using the original geometry points. Actually, the original vertices are more likely to keep point density in step with surface detail than a generic cloud. None of it is made "better" by casting it in 2D though....

> Hoknaman wrote:

> I.e. number of receivers == number of shading samples, number of emitters == number of points of geometry.

> Right?

Right.

> Hoknaman wrote:

> So you decided to write your own formula for calculate "amount of shadows"?

Yes. Based, as the text mentions, on the solid angle coverage of the emitter.

> Simon Barrick wrote:

> Is the form factor calculation mentioned in the siggraph paper the same one that they present

> the calculation for in the nvidia paper?

They sigg paper makes reference to a "polygon-to-point" form factor for the occlusion calculation. The nVidia paper uses a "disk-to-disk" form factor but only for the radiance transfer portion, not for occlusion. What the nVidia paper uses for occlusion is not a form factor at all -- it's a simple "solid angle coverage" deal (at least in my opinion, but I could be wrong of course).

> Simon Barrick wrote:

> ... are you doing this is HDK so that you can access geometry from the whole scene in a way

> that you can't just within a "normal" shader or is it just because an HDK implimentation will be

> quicker than any other method?

The need for the HDK arises when you get to the point of implementing the hierarchical tree structure they mention. My first "proof of method" tests were all done as a stright-forward point-cloud shader though. At that stage I was just testing the formula itself, but its performance is comparatively quite slow since you're doing N^2 comparisons per pass. The MipMap-like tree structure is what completes the "wow" factor ;)

> Hoknaman wrote:

> If I am not mistaken, solid angle of element it is an angle between its normal and a vector r. So

> for emitter it is Teta_E angle.

No. That's just the angular measure in the plane of the two vectors (2D). A solid angle is the 3D equivalent of the more familiar 2D planar angle, and it is expressed in terms of area (of a sphere) instead of arc length (of a circle). See this entry in the wiki for a brief description of solid angle.

> Hoknaman wrote:

post-312-1118585542_thumb.jpg

The line you marked as "sqrt(A/PI + r^2)" would have to be "sqrt(A^2 / PI^2 + r^2)" if you were going for a "Pythagorean type of thing", but it would only apply to a right-angled triangle, which that one isn't (necessarily). The "A/PI" term in the denominator of the formula in the paper (except without the square root) may have originally come from a "point-to-area" form factor (see, eg, this reference), but then we're not strictly talking about a "solid angle formula" any more...

> Simon Barrick wrote:

post-509-1118622882_thumb.jpg

This is the correct visualization. Stick to this image :)

Cheers!

Share this post


Link to post
Share on other sites
>  The "A/PI" term in the denominator of the formula in the paper (except without the square root) may have originally come from a "point-to-area" form factor (see, eg, this reference), but then we're not strictly talking about a "solid angle formula" any more...

18722[/snapback]

Cheers Mario, that was the link I couldn't find. :D

I'm thinking of doing a point cloud implimentation just to play with.

I think even a slow method would have some usefulness even if it were just for doing nice renders of models for reels. It also seems even if it were slow it gives very nice stable results without the need for too much fiddling and could still be faster than the other options.

I'm not sure how practical the whole approach is though for large scenes. We have scenes with up to 4,000 objects in, all probably falling within the "near field" that's one hell of a point cloud soup. How big a point cloud is practical? Does anyone know the limits on it?

Also if you just made a big old point cloud that fill the whole scene wouldn't it contain lots of emitters that shouldn't be there? In fact wouldn't everything be occluded by rogue emitters?

I wonder if a "geometryloop" that could be used in a vex context would be the answer? That way in combination with a point cloud for each object it could all be done on the fly.

Anyway in the meantime I think I'll try a point cloud only approach.

If it works I'll be happy to post it up as it's just for fun.

Share this post


Link to post
Share on other sites

Nice pic, Simon! I will use it :)

Thanks for the good explanation, Mario! And for reflectance functions too.

Solid angle is not very intuitive term :angry: But, guys, your help is very strong :D

The MipMap-like tree structure is what completes the "wow" factor

I know what is it MipMap levels of the texture maps. But what the idea of the MipMap-like tree structures?

On the other hand, nothing stops you from merging all objects into one, and feeding the entire mess to a single SOP.

So the most logical is to transfer Point Clouds into shader? Or to receive attributes from geometry to which it is applied the shader? I think first...

Share this post


Link to post
Share on other sites
I'm thinking of doing a point cloud implimentation just to play with.

I think even a slow method would have some usefulness even if it were just for doing nice renders of models for reels. It also seems even if it were slow it gives very nice stable results without the need for too much fiddling and could still be faster than the other options.

Yeah, I think that's the best idea to start with. If nothing else, it gives you an indication of whether it's worth spending the extra effort to speed it up -- which you can always do later.

I'm not sure how practical the whole approach is though for large scenes. We have scenes with up to 4,000 objects in, all probably falling within the "near field" that's one hell of a point cloud soup. How big a point cloud is practical? Does anyone know the limits on it?

Also if you just made a big old point cloud that fill the whole scene wouldn't it contain lots of emitters that shouldn't be there? In fact wouldn't everything be occluded by rogue emitters?

Intuitively, it would certainly seem that way, but it's easy to forget that "near field" is proportional to the area of the emitter and not some global quantity. In other words, two points that appear to be "close together" may be considered (by the algorithm) to be "far enough away" if the emitter's area is "very small" relative to the distance (squared) between them. So in practice, it seems that one never really needs to go beyond two passes to remove spurious multiple shadowing.

I know what is it MipMap levels of the texture maps. But what the idea of the MipMap-like tree structures?

It's reminiscent of MipMaps in the sense that each higher level is a "filtered" version of the preceding lower level, that's all. Look at the tree picture in the paper and ou'll see what I mean.

So the most logical is to transfer Point Clouds into shader? Or to receive attributes from geometry to which it is applied the shader? I think first...

No matter what you do, you're going to have to somehow get the area information into the shader somehow, so "attributes" are part of the picture no matter what. And since from the point of view of a shader there's no native way to access geometry points other than through point clouds, you really have no choice but to use them (assuming you're thinking of solving this inside a shader -- which is what I did initially and what Simon is thinking of doing as well).

Cheers!

Share this post


Link to post
Share on other sites

In the NVidia's paper it is told, that the area of an element in the current point is equal to the sum of the areas of triangles or quads (or any other polygons) which share the current point. If to use point clouds for a finding of the points entering into an element there is one question: how to define the order in which points form triangles, quads, etc.? The areas depend on it. Not easy task.

But pictures in GPU Gems 2 force to think what enough to take radius from a current point up to the nearest and to calculate the area of an element.

How to be? The text and pictures force to think differently.

Can I find the nearest point, calculate distance and then calculate the element area as pi * r ^ 2?

Or I must sum all of triangles and quads areas?

Share this post


Link to post
Share on other sites

I was thinking of using the scatter sop to create the point cloud this then allows you to add the area attribute automatically. Same thing Mario did for his SSS shader.

Alternative would be to create the area attribute first with the measure sop and then transfer it to a point at the center of each poly. Then use those points as the point cloud.

All of this is off the top of my head, I haven't had time to try it yet.

Share this post


Link to post
Share on other sites

Ok as a first stab at this here is a hip file with a vex implimentation of the occlusion stuff (very rough still). I ended up using the formula in the nvidia paper which seemed to give the best results, but for speed I've combined it with the idea of a user defined near term distance. So rather than calculate every point with every other i just do the close ones. I couldn't find a way to nest point cloud calls (the help says you can't with a single pcopen command but i didn't think that would prohibit doing it with two, anyway it doesn't seem to work) so I also ended up doing the first pass with a sop and the second pass in the shader.

Also in the file you will find a second quicker method that does both passes in sops using a second more refined point cloud, the trouble with this is that the solution is much much softer.

Anyway here it is for your perusal...

Fast_GI.zip

Share this post


Link to post
Share on other sites

Way to go Simon!! :D

The puzzling thing about this is that you seem to be getting better results from the NVidia formula than I was. This might have to do with the fact that you're using the ScatterSOP's ptarea (which is closer to 2*r than the area, and which ends up being used as 2*r/PI in the end) whereas I'm using an actual area. Or perhaps there was something else throwing me off... hmmm... I'll have to have another look at that...

The only thing that jumped out from the code was the line:

eA = (d2 < 4*eA?0:eA);

This is what they use to determine whether they should move down the tree hierarchy, but you are already limiting the search via the user-defined search radius, so in your case, this ends up arbitrarily ignoring elements that are further than eA*4 units away, which is not related to the user-defined radius.

Another little thing is that when you attenuate the contributions from already-shadowed elements (for the second pass) you're using occ *= occ1;, which I believe should be occ *= 1.-occ1 since you want to weigh it by the amount that is unoccluded (i.e: the occlusion amount that *wasn't* already accounted for).

Here are a few quick tests:

post-148-1119283394_thumb.jpg

Left: Reference (200 samps / shade point).

Middle: the point cloud solution as you posted it,

Right: your point cloud method but with the changes I mentioned above.

I'll have another look at the printed formula when I get a chance, and let you know what I find...

Cheers!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×