Jump to content

Fast Gi Anyone?


Recommended Posts

  • Replies 137
  • Created
  • Last Reply

Top Posters In This Topic

Cool, thanks for that, I threw it all together pretty quickly. I was a bit dissappointed by how flat it looked. I have a dimmer parameter on my shader set to 0.8 it would compare better with your test if it was set to 1.

eA = (d2 < 4*eA?0:eA);

Yeah I was kinda wondering about that..... ta for clearing it up.

Did you spot that they used rsqrt and not just sqrt, I figured by looking at their calculation of a normalised vector that it meant 1/sqrt

I wonder if this made the difference?

you're using occ *= occ1;, which I believe should be occ *= 1.-occ1

Thanks for spotting that little blooper too!

I'll see if I can do anything with adding a light transfer term.

And at minimum use the occlusion in conjunction with perhaps an environment map for the far term, or with a normal bounce light as they do in the siggraph paper.

I didn't get round to testing it against a fully ray tested approach as you have in those images, how did the render times compare?

Link to comment
Share on other sites

Did you spot that they used rsqrt and not just sqrt, I figured by looking at their calculation of a normalised vector that it meant 1/sqrt

I wonder if this made the difference?

18936[/snapback]

No, that wasn't my problem. I suspect it had more to do with some of the inputs: area, normals, etc.

It also doesn't help that the paper itself has *two* different versions of the formula, and that there's even a third variation in a preprint version of the paper (which is much more complex than the other two). None of them gave convincing results when I tested them (which may have been due to some flaw in my implementation at the time -- that's unclear now). The end result was that, in frustration, I threw the whole lot out the window and started constructing my own, based on what I understood to be the concept behind it -- and luckily it produced the expected results (yay!). Nevertheless, I've decided to go back and study the printed versions again to see if I can fathom what they're doing (which I can do more relaxed now since I already have one that works :P) -- so far, the symbolic version is easy to understand geometrically, but the one in the code snippet... mmmmnot-so-much. And the one in the preprint is a weekend project! :D

Here's a little anthology of the published equations:

In all cases I'll use d for the distance between elements

From the GPU Gems book ( chapter 14 ) we get:

1. In symbolic form:

1 - [ d * cos(theta_e) * max(1, 4*cos(theta_r)) / sqrt(Ae/PI + d^2) ]

2. In the code snippet:

clamp(cos(theta_e),0,1) * clamp(4*cos(theta_r),0,1) * [1 - 1/sqrt(1 + Ae/(d^2*PI))]

And from the preprint we get:

3. In symbolic form:

1 - [d * cos(theta_e) * max(1,4*cos(theta_r) / sqrt(Ae/PI + d^2)] * [1 - d/sqrt(Ae/2 + d^2)] * cos(theta_e) * min(1,4*cos(theta_r))

None of these are simple rearrangements or simplifications of each other.

A little perplexing... :blink:

Link to comment
Share on other sites

OH... MY... GOD!!! :o

You know when you think you wrote something a certain way and you thereafter always see it that way regardless of how many thousands of times you read it?

Well, that happened to me. I was writing one of the terms wrong... how embarrassing...

I take it back: The formula in the code snippet works as advertised (I haven't tried the others). Sorry about that.

OK. Now I've got to figure out exactly what it's doing. If I manage to figure it out, I'll post it here.

Link to comment
Share on other sites

Yup I must confess I gave up trying to figure out how they had arrived at the formula, it was close to the one in the link you posted. I didn't try all that hard to rearrange it, I just plugged their code snippet in and the results were encouraging enough to be happy that it was roughly working so at that point I stopped.

The thing I'm most unhappy about is the area around the feet, despite their two pass approach is the shadow on the ground plane is still rather dark (which may be because of the way I'm ignoring most of the points but I think it's more than that) Also where the feet meet the ground they are rather too light. From the papers I'm thinking perhaps this is due to the nature of the solution, but are you getting the same effects?

You know when you think you wrote something a certain way and you thereafter always see it that way regardless of how many thousands of times you read it?

Only every single time I do anything :P

Link to comment
Share on other sites

From the papers I'm thinking perhaps this is due to the nature of the solution, but are you getting the same effects?

No, I don't get that problem, but I think it's probably related to the fact that you're always filtering based on proximity, and in a tight area like the foot-floor contact region, it might be picking up the wrong points.

In my case there are two important differences: the calculation is run for every geometry point (same as the paper) and subdivision rendering smooths out the interpolation. I also add an ID attribute (essentially a ConnectivitySOP's ID) that I use in order to keep the hierarchy from including points from unconnected meshes, which could lead to a certain amount of "smudging" if you only base the groupings on proximity.

Have you tried using the geo vertices themselves as the point cloud?

Here's a quick OTL that generates point clouds from either the incoming vertices or a ScatterSOP distribution. It also tries to compute accurate areas for the disks.

PointCloud.zip

Link to comment
Share on other sites

No, I don't get that problem, but I think it's probably related to the fact that you're always filtering based on proximity, and in a tight area like the foot-floor contact region, it might be picking up the wrong points.

PointCloud.zip

18950[/snapback]

Thing is I'm not doing any filtering, for the second pass the calculation is done for every shader point to every emitter disk, those within range at least (no filtering, just a straight forward summation). But I think I should probably be calculating the first pass as a disk to disk calculation?

OR in fact this way of doing things is introducing errors and slowing things down and in fact it would be better to just do the whole thing directly on the geometry by pre dividing it all to an appropriate level.

I found out though if I make the floor into a box rather than just a grid then I get much better results. I think it is because disks at the floor level relative to the foot, which is perpendicular to it, are seen as too thin and so either get ignored or don't accurately cover the amount of the hemisphere that they should due to all the approximations going on. Whereas relative to the floor the bunny doesn't present the same problem since relatively few disks fall into this category.

I did try building a cloud directly from the geometry but I figured since the ground plane only has a small number of polys in it I'd need to chop it up rather a lot to get it to have enough points in it. Once I'd done that the whole cloud would be too dense, so I went with a sparser distribution of points using the scatter sop to speed things up.

Anyway with the corrections to the code plus the scene itself here are my latest renders. Still need to test if this is actually any quicker than current methods :P It's clearly not as sharp but if it's quicker then it could still have it's uses. I'll try doing some test directly on the geometry next though, now I know the code is roughly right.

post-509-1119395138_thumb.jpg

Any chance you could render the same scene with your method so I can see if it should look any better?

Link to comment
Share on other sites

Hey Simon,

Here's what I get:

post-148-1119419683_thumb.jpg

LEFT: This is what I get if I just feed it the scene untouched.

MIDDLE: Reference.

RIGHT: After modifying the scene (see below).

This technique defines an element as a disk that has both a front and a back, and it adopts the convention that radiance is emitted/reflected from the front side and transmitted/occluded from the back side (where "front" is the side facing in the normal's direction). So in the case of ambient occlusion, only the back side of an emitter is taken into account.

This means that when the floor is just a plane with its normals pointing up, it does not occlude the bunny (except maybe some crevices here and there). It does however receive shadows (from any part of the bunny's surface which is facing toward the upper hemisphere). This is why the image on the left has contact shadows but a washed-out bunny. And it is also the reason why you get better results when the floor is a box (since then there are a bunch of elements "mooning" the bunny :P).

One way to deal with this problem is to ensure all surfaces are closed (the floor becomes a box), but that's not very practical. What I do instead, is tag a surface (via an attribute) as being "double-sided". Any surface with this tag will then emit from both sides, and still receive from one, though which side becomes the receiving side depends on the context: when evaluating inside a shader, the receiving side is the front-facing version of the normal, and in SOPs there's no choice but to keep it as the element's original normal.

The difference between the left and right images above, is that the right one has the floor tagged as double-sided.

> Thing is I'm not doing any filtering, for the second pass the calculation

> is done for every shader point to every emitter disk, those within range

> at least (no filtering, just a straight forward summation).

Right. I just noticed you're using pciterate() instead of pcunshaded() for the second pass. I just assumed the second pass was being done in the same manner as the first but for the PC points near the shade point and then transfered through pcfilter. IOW, I thought the first pass was being done in SOPs just to get around the no-nested-pcopen()-calls problem.

Any reason why you don't want to do the second pass a-la SSS and calc/store then filter a small neighborhood? It would definitely speed things up quite a bit. (or have you tried that and found it too soft?)

Cheers!

Link to comment
Share on other sites

Any reason why you don't want to do the second pass a-la SSS and calc/store then filter a small neighborhood? It would definitely speed things up quite a bit. (or have you tried that and found it too soft?)

Cheers!

18956[/snapback]

Yup too soft, that is essentially what you get in the far right image of the last strip I posted, the second pass is done in sops too and the result filtered straight into the render, it's really quick but very smudgy. I tried making the point cloud from the geometry but it's just way too slow for vex to handle, too many points. I'll try doing a render without ignoring any points and see if that is any better, i know it will be slooooow, but I'm concerned that my results don't look anywhere near as nice as yours or the reference. So either it's the number of samples, the fact that I ignore so many, or the nvidia formula is still slightly screwy. Is the render you posted using the nvidia form or your own?

I'll add the tagging system you mention that sounds good, at the moment I assume all surfaces in the hemisphere can occlude (which is what I thought they were saying in the sig paper, couldn't see anything about it in the nvidia one), so maybe just making that change will help some.

I notice even in your render there is some light spill around the foot in contact with the floor.

Link to comment
Share on other sites

So either it's the number of samples, the fact that I ignore so many, or the nvidia formula is still slightly screwy. Is the render you posted using the nvidia form or your own?

That was the nvidia formula -- I was playing around with it last night. Here they are side-by-side:

post-148-1119457746_thumb.jpg

LEFT: nvidia, MIDDLE: mine, RIGHT: reference

The nvidia one is a little bit "smudgier"... when I get a chance to dissect it, I may be able to come up with an explanation for that extra softness. Then again, mine is a little bit too "contrasty"... maybe another pass... Oh, by the way, all these tests are with just two passes.

I'll add the tagging system you mention that sounds good, at the moment I assume all surfaces in the hemisphere can occlude (which is what I thought they were saying in the sig paper, couldn't see anything about it in the nvidia one)

All elements in the hemisphere are potential occluders, but only the ones with their backs to the receiver actually contribute -- look at the clamped cosine terms.

I notice even in your render there is some light spill around the foot in contact with the floor.

Yes. That region had some artifacts because the bunny was intersecting the floor. I lifted/rotated it a teeny-tiny bit for the images above.

Link to comment
Share on other sites

Hey Mario,

That was the nvidia formula -- I was playing around with it last night. Here they are side-by-side:

post-148-1119457746_thumb.jpg

LEFT: nvidia, MIDDLE: mine, RIGHT: reference

May I know what are the render times for each of the renders?

Yes. That region had some artifacts because the bunny was intersecting the floor. I lifted/rotated it a teeny-tiny bit for the images above.

You can use DOPs to have the bunny sit squarely on your ground geometry. You know you want to. :D

Cheers!

steven

Link to comment
Share on other sites

Hey Steven,

May I know what are the render times for each of the renders?

I didn't time the reference image, but it was probably around 1 hour, maybe less (Simon?). The other two are about the same (except I don't have the nvidia version optimized by pre-calculating some stuff, but it would end up being the same as my version in the end). The timings for my version are 1.2 seconds to build the tree and 4.23 seconds to actually calculate the occlusion. I'm still working on the tree bit -- it's designed to work on unstructured points so that's why it takes so long, but I have some ideas for speeding it up...

You can use DOPs to have the bunny sit squarely on your ground geometry. You know you want to. :D

Hehehe. Maybe I should pass it on to JC or Woolfwood, they are the "DOP boys" -- I still haven't had a chance to play with them much :(

Cheers!

Link to comment
Share on other sites

Hehehe. Maybe I should pass it on to JC or Woolfwood, they are the "DOP boys" -- I still haven't had a chance to play with them much :(

Cheers!

18971[/snapback]

Bah I was just JC's DOP sidekick...but the mighty JC was defeated by Battlefield 2....so that leaves me alone to battle the forces of DOPs. Wish me well.

Link to comment
Share on other sites

OK. I managed to find some quiet time to study the nvidia formula and attempt to arrive at a plausible explanation for the steps they took to get there. It finally makes sense...phew! (I'm talking about the version in the code snippet, the other ones still puzzle me). So here it goes:

First, the easy stuff: the two cosine terms.

The term cos θe is the amount of the emitter's area that is visible from the receiving point. When the emitter is perpendicular to the projection axis v = Pe - Pr then the full area is showing since θe = 0 and therefore cos θe = 1 . When the emitter is rotated, say, 45 degrees away from the axis v, then less of it is showing from the POV of the receiver (it is "forshortened" under that projection), and only cos 45

post-148-1119563078.jpg

post-148-1119563086.jpg

post-148-1119563092.jpg

post-148-1119563099.jpg

post-148-1119563106_thumb.jpg

post-148-1119563113_thumb.jpg

post-148-1119565521_thumb.jpg

Link to comment
Share on other sites

Hello everybody,

This looks like an extremely promising technique, especially due to the speed-up, but one thing has me rather concerned: if I'm not mistaken - and please do correct me if I'm wrong - point clouds don't work in conjunction with displacement maps, so you're limited to the resolution of the geometry mesh itself as far as the occlusion info. is concerned. Correct?

On the other hand, the pixel shader approach as described in the nVidia paper relies on texture maps for point, normal and area info. I'd be very interested in implementing this as a shading/compositing solution, but coming up with a way of rendering a shaded area map has me stumped. Any ideas?

Ta,

Roy

-----

Roy Stelzer

Stelzer Productions

Switzerland

Link to comment
Share on other sites

Hi Roy,

one thing has me rather concerned: if I'm not mistaken - and please do correct me if I'm wrong - point clouds don't work in conjunction with displacement maps, so you're limited to the resolution of the geometry mesh itself as far as the occlusion info. is concerned. Correct?

Correct.

Well.... you can still displace point clouds in SOPs as you would in a shader (minus derivative information) either procedurally or using a map, but yes, your sampling frequency is given by the number of points per unit area in the cloud. So if your pointcloud points were just the ones in your geometry, you'd need a very dense tessellation to capture small-scale detail.

On the other hand, there's nothing stopping you from walking the tree for each shade point, though this would naturally be slower than doing it for just the vertices. In fact, this is the next thing I want to try, but I haven't written the shadeop versions yet. I imagine it should still be considerably faster than 200+ samples per shadepoint, though it's one of those things that really has to be tested to know for sure.

This is without a doubt the biggest limitation of the method. On the other hand, it has some pretty sexy things going for it, for example:

- Those few seconds (or minutes if you have millions of points) of calculation is what it takes to compute *all* the vertices (for a single frame if things are moving), not just the ones facing camera -- i.e: it is view-independent. This means I can tumble my scene, change the camera, adjust lights, etc. incurring no further cost for the ambient occlusion portion (for that frame).

- And then of course there is that evil word "motion blur"... this makes it so you don't feel like running out of the room screaming :P

On the other hand, the pixel shader approach as described in the nVidia paper relies on texture maps for point, normal and area info. I'd be very interested in implementing this as a shading/compositing solution, but coming up with a way of rendering a shaded area map has me stumped. Any ideas?

To be honest, I've never written shaders in OGL or HLSL, or Cg, or any of those things, so I can't say with any certainty whether that function they print is meant to be called from a pixel shader, vertex shader, fragment shader... actually, I don't even know what the heck a fragment shader is supposed to be?!

When I read it, I simply though of these maps as data structures -- and assumed the need for them to be represented as "texture maps" which get filled by "shaders" was just a peculiarity of working with "hardware rendering".

With respect to rendering a "shaded area map"... I'm not sure you need such a thing, do you? But in case you do, here's a shader that approximates the area under a pixel (and exports it in case you want it in a deep raster):

surface Area (export float areaP = 0;) {
   Cf = Of = Af = areaP = area(P);
}

Of course, you'd need to render as floating point.

Cheers!

Link to comment
Share on other sites

post-509-1119714701_thumb.jpg

Ok I've played around with this some more and have some pretty decent results now, all done with point clouds, mine is on the right, reference on the left. It took 85 secs to calculate and 12 seconds to render on a PIII 900 Mhz proc.

I basically had to change the way I was calculating the areas for the point cloud (this could be improved some more but its now working at least) and I tweaked a couple of other things so that points didn't occlude themselves. I also added a near and far cloud at different resolutions to improve performance.

Fast_GI.zip

One thing I did notice was that if I removed the 4xrtheta and made it just rtheta as in Marios proof then the solution turns out to washed out.... <_<

So that term must be doing something clever, is it something to do with the way they remove double shadowing?

If I drop the near radius down to 0.25 then the calculation time is halved but the results look nearly identical.

post-509-1119715699_thumb.jpg

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...