Fast Gi Anyone?

Mario Marengo · May 22, 2008

Never mind. Just checked NVidia's paper - average normals should be okay.

Hey Hok,

Just thought I'd mention that there is now an update to the original method, which is published in the latest GPUGems3 book: "High Quality Ambient Occlusion" [pg257]. Worth a look if you're revisiting this.

I haven't had a chance to read it yet, but it claims to be a much more robust version of the original.

Cheers!

hoknamahn · May 23, 2008

Hey Hok,
Just thought I'd mention that there is now an update to the original method, which is published in the latest GPUGems3 book: "High Quality Ambient Occlusion" [pg257]. Worth a look if you're revisiting this.

I haven't had a chance to read it yet, but it claims to be a much more robust version of the original.

Cheers!

Thank you sir! Yesterday I was checking the features of new build of Blender and found some information about it's point occlusion:

Implementation

The implementation is based on the concept of point-based occlusion. It takes together ideas from various papers on the subject:

Dynamic Ambient Occlusion and Indirect Lighting [pdf]

GPU Gems 2, Chapter 14, Michael Bunnell.

High Quality Ambient Occlusion

GPU Gems 3, Chapter 12, Jared Hoberock and Yuntao Jia.

Points Clouds and Brick Maps for Movie Productions

Point-Based Graphics, Chapter 8.4, Per H. Christensen.

If someone is interested:

http://download.nvidia.com/developer/GPU_G..._Gems2_ch14.pdf

http://developer.nvidia.com/object/gpu-gems-3.html

http://129.35.76.177/wps/find/bookdescript...117/description

Edited May 23, 2008 by hoknamahn

Jason · July 1, 2010

I'd like to revive this old thread and ask if anyone managed to get a VEX version of this running nicely?

Zach · September 28, 2010

i am also wondering the same thing

ITdreamer · October 13, 2010

Hi

I am trying to learn this method, and now I am creating the VEX version of it. Then I'll begin to implement it using HDK.

In the article I saw this equation "rsqrt(emitterArea/rSquared + 1) ... Bla Bla Bla"

So the first question is, what is "rsqrt"? I know what is sqrt, but what rsqrt is?

The second question is, what is bent normal, and what is the difference between regular normal and bent normal? (I can't figure it out)

Thanks

Edited October 13, 2010 by ITdreamer

anim · October 13, 2010

it would be faster if you ask google

reciprocal of a square root rsqrt() = 1/sqrt() but faster

i think it would be this article: http://en.wikipedia.org/wiki/Fast_inverse_square_root

info on bent normals : http://en.wikipedia.org/wiki/Ambient_occlusion

ITdreamer · October 14, 2010

This is my last results. Almost ideally. But I don't like a shadow on the floor. In the raytraced variant the shadow looks more sharp. The shadow from a head of a T-Rex is visible even.

I use this formula
value = max(eTheta, 0.0f) * max(rTheta, 0.0f) * (1.0f - (1.0f / SYSsqrt(1.0f + *eArea / d2)));
As I use emitters on a condition
if(rTheta &gt; 0.0f &amp;&amp; eTheta &gt; 0.0f) // The emitter is before the receiver and the normal of the emitter looks in the same direction as a normal of the receiver
then I don't need max(), so
value = eTheta * rTheta * (1.0f - (1.0f / SYSsqrt(1.0f + *eArea / d2)));
There are any ideas how to make a picture of more correct?

There is one more not clear thing for me. In it's own shader NVidia not simply multiplies value of the second pass by value of the first pass but uses the artful formula
if (PASS == 1)  // only need bent normal for last pass
   result = saturate(1 - total);   // return accessibility only
else
   result = saturate(1 - total) * 0.6 + texRECT(lastResultMap, receiverIndex.xy).x * 0.4;
What sense is incorporated in this formula, Mario?

Hello, again.

Why in your formula you use max(eTheta, 0.0f) and not max(cos(eTheta), 0.0f)? The same is with rTheta.

And what is SYSsqrt in your formula?

Thanks

Mario Marengo · October 14, 2010

Why in your formula you use max(eTheta, 0.0f) and not max(cos(eTheta), 0.0f)? The same is with rTheta.

I'm guessing that, despite their names, the variables eTheta and rTheta actually represent cosines in the code (did you check?) -- i.e: they could be the result of a dot product, for example. (well... let's put it this way: they would *have* to be, given the way they're used in the formula, no?).

And what is SYSsqrt in your formula?

I imagine that's the HDK's sqrt() function (overloaded and cross-platform, IIRC) -- defined in the header $HT/include/SYS/SYS_Math.h

With the exception of Simon's VEX-based implementation, I think the rest of the approaches were HDK-based (so a lot of functions would come from there).

HTH.

MENOZ · November 10, 2010

hello, is there a chance to have the "old" occlusion plugin compiled for H11 64?

Do you think that Sesi will implement this method some day?

thank you!

ITdreamer · January 16, 2011

Hi

This is my first pass with about 400 points in the model

This is the same first pass, but the model was subdivided once (about 1600 points)

This is the same first pass, but the model was subdivided 2 times (about 4600 points)

As you can see the shadows become darker and darker with more points. In some points I get color values below zero and in some points above one, is it ok for the first pass? Or in both passes I must not come outside 0 and 1.

I use this formula for the first pass:

cos_Qr = dot(Nr,normalize(RE));       // cos between Nr and RE
cos_Qe = dot(Ne,-normalize(RE));      // cos between Ne and ER

if (cos_Qe &gt; 0 &amp;&amp; cos_Qr &gt; 0)
{
    access += cos_Qe * cos_Qr * (1.0 - 1.0 / sqrt(1.0 + Ae / (RE2 * 3.1415926535)));
}

ITdreamer · January 18, 2011

Looks like this topic died ((

pclaes · January 18, 2011

I doubt it, this topic is interesting and flares up from time to time. Have a look how much time there is inbetween previous posts, sometimes there's a few weeks/months between replies.

Generally people working at this level are just busy, so be patient, someone will come around and have a look and give some feedback. Could you provide a bit more info on the above renders, in regards to time and memory?

Mario Marengo · January 18, 2011

Looks like this topic died ((

Well; it's been a looong time since I've looked at (and used) this stuff -- almost 6 years now, going by my first post -- so all I have now is a foggy memory of the overall algorithm. The gory implementation details have long ago evaporated from my mind I'm afraid.

Not long after writing this, Mantra speeds improved enough to no longer justify (for us, that is) the code maintenance (upgrading with every new HDK and compiling for all the architectures in our shop) and usage overhead when setting up a shot. Which is not to say that the technique is not still valid or useful. It is.

So, with that in mind, Ivan, it *seems* like your problem may have to do with normalization (or lack thereof). The "more samples generate more amplitude" symptom is typically an indication that all samples are being given a constant weight. In the case of this algorithm, each element's contribution should be proportional to the *area* of the element -- and this, on a hunch, is what I'm thinking you may have forgotten to do.

That is: if I have 10 evenly distributed elements over an area of size 10 (meaning each element has area=1) with each contributing a value of 1 per unit area and I do a box filter of all of them (an average) ignoring each element's area, I end up with (num_elem*amp_per_elem)/total_area = (10*1)/10=1, as expected, except there's a problem... which will show up as soon as you change the number of elements.

If I now double the number of elements in the same total area and keep everything else the same, I end up with (20*1)/10 = 2, which is wrong (you still expect it to be 1, not 2), because it fails to take into account that now each element's area is half of what it was before (since they still all live in a total area of 10, 20 evenly distributed elements would each have an area of 0.5). So that should instead be (num_elem*amp_per_elem*area_per_elem)/total_area = (20*1*0.5)/10=1, which is what you expect. And note that "amp_per_elem" and "area_per_elem" in this case are both constant just for illustration -- it wouldn't be the case in reality (where you would instead have a loop accumulating each area-weighted sample). But my point is that now no matter how many samples you throw at it, your reconstruction (box filter in this example) will always produce the expected value (in this toy case: 1), because it now accounts for the element areas.

But... that's just a hunch. It could just as easily be something completely different.

HTH.

ITdreamer · January 20, 2011

Thanks, Mario

It helped.

As I understood, you wrote your shader using Houdini C++ API? Your shader works on per vertex level or on per shading point (micropoligons)?

Edited January 20, 2011 by ITdreamer

Mario Marengo · January 20, 2011

As I understood, you wrote your shader using Houdini C++ API? Your shader works on per vertex level or on per shading point (micropoligons)?

Not a shader, but a SOP. And yes, in C++ using the HDK.

It works on points, baking the AO into a point attribute.

Out of curiosity: are you trying to implement this as an exercise, or for some situation where letting Mantra trace the AO is not an option for some reason?

ITdreamer · January 20, 2011

Not a shader, but a SOP. And yes, in C++ using the HDK.

It works on points, baking the AO into a point attribute.

Out of curiosity: are you trying to implement this as an exercise, or for some situation where letting Mantra trace the AO is not an option for some reason?

Well, it began as an exercize, but I think it can be a good way in some production cases. Mantra's AO is slow, as any AO in other applications, and there a lots of noise. To get good results I have to raise pixel samples, and min ray samples in Mantra ROP. So I was looking for any other much faster solutions.

So as I understand the only way to up the quality is to add more points. For example, bu subdividing geometry.

Is there any way to perform this on the micropoligon level? I mean, to create some shader which will do the same thing, but with micropoligons.

Mario Marengo · January 20, 2011

Well, it began as an exercize, but I think it can be a good way in some production cases. Mantra's AO is slow, as any AO in other applications, and there a lots of noise. To get good results I have to raise pixel samples, and min ray samples in Mantra ROP. So I was looking for any other much faster solutions.

So as I understand the only way to up the quality is to add more points. For example, bu subdividing geometry.

Is there any way to perform this on the micropoligon level? I mean, to create some shader which will do the same thing, but with micropoligons.

You could always bake onto a point cloud (from a scatter SOP) and then do a filtered lookup (of the pre-baked point cloud) at render time. You're not limited to working strictly with the points in the geometry you're piping to the IFD. So, for example, you'd only need to subdivide for the purpose of scattering the points, but you can then render the un-subdivided surface thereafter.

But no, this is not something that can be implemented at the shader level because at that point you don't have access to the entire scene geometry (though some limited functionality with file-bound geometry exists) -- and even if you did, you wouldn't want to be calculating this thing for every shade point (it would be many orders of magnitude slower than tracing). The whole idea here is that you compute it once on a much coarser scale than what normally results from a rendere's dicing step (or the even larger sampling density while raytracing), and then interpolate -- the assumption being that AO is sufficiently low-frequency an effect to survive that treatment... though that's not always a safe assumption, as the later updates to the original paper confirm).

Also keep in mind that this method, even when working properly, has its own set of limitations: Displacements and motion blurred or transparent shadows come to mind. And... well, AO is just *one* aspect of reflectance -- a noisy AO may be noisy on its own, but not after you add 15 other aspects of light transport. So, yes, tracing may be slower, but you're likely not just tracing AO for a real material in a real scene... just be careful how you judge those speeds.

ITdreamer · January 20, 2011

Thanks, Mario, for great explanation.

Sign In

Fast Gi Anyone?

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Posted Images

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation