Jump to content

NAN values and Arnold - a quick fix if this plagues your renders


Recommended Posts

A problem I've now had twice in production is NAN valued points in hi-res meshes generated from vdbs. Out of maybe 10 million point meshes per frame across 100 frames I'll get one or two meshes with 200ish bad primitives (prims with garbage points, so far NANs, but I bet infinite values wouldn't play nice either). These meshes look fine and render fine in Houdini and the bgeo's are the same size as the other frames. If you finish in Houdini, you'll never know there is a problem. The bad prims are tiny and just not drawn as they have no position.

 

However, caching out .ass files results in silently truncated frames. There are no errors when writing out these .ass files from the in house tool. The corrupt frame gets rendered in Maya and no mesh shows up or it throws and error and I get an email about issues with my mesh which, again, look fine in the viewport. Files are about half the size they should be so truncation is clearly happening.

 

If this sounds like something you're experiencing, consult the spreadsheet and sort your position values by any channel and if you have not-a-number values it'll look like this:

 

post-926-0-88361200-1439841436_thumb.png
 

If you're familiar with vex, you may know there is a NAN testing function called isnan() which will work on a float so we can test against each position channel and if any fail the test we have discovered our bad geo. A word of advice though- don't just delete the bad point, delete the bad primitive! For whatever reason, Arnold doesn't always find the geo well formed when culled by points in Houdini. You'll often get a vertex mismatch error during rendering and another email. So the safe solution is to walk over primitives in a wrangle and test the points of the current primitive. If any fail the NAN test, delete the entire prim:

// PrimitiveWrangle code
int cull_prim = 0;
int pt;
vector pos;

for (int i=0; i<primvertexcount(0, @primnum); i++) {

    // convert the prim vertex -> linear vertex -> point number
    pt = vertexpoint(0, vertexindex(0, @primnum, i));

    pos = point(0, "P", pt);
    if (isnan(pos.x) || isnan(pos.y) || isnan(pos.z)) {
        cull_prim = 1;
        break;
    }
}

if (cull_prim)
    removeprim(0, @primnum, 1);

If anyone has a better solution, I'd love to hear it. I'm a bit disappointed by how brittle the Arnold/.ass combination is compared to Houdini's handling of floating point values. Vertex mismatches and NAN errors have wasted too much of my time. On the other-hand, Arnold renders look great! 

 

Cheers

Shawn

Edited by yourdaftpunk
  • Like 1
Link to comment
Share on other sites

Anim, I do intend too. I see it as three bugs to report:

 

1) Arnold's handling of NANs.

2) Arnold's handling of certain topologies without NANs which seem valid in Houdini (the point delete issue I mentioned).

3) Houdini's/VDB's issues with certain particle to vdb operations and/or sdf smoothing operations.

 

For the third bug, if I can find the time I need to:

 

1) Remove custom otls and slim down the network to the problem area.

2) Transfer the 1.2GB particle cache frame going into the mesh nodes so they can diagnose the issue.

3) Write up some additional observations which I think will help.

 

It's my last week on the job so I would put all this well bellow finishing :) I'm also curious what would happen if I wrote that mesh out as an alembic or obj. Would it be loadable in other apps? Would Houdini gracefully bring it back in?

 

 

I hope this post helps some future TD banging her head against a monitor. NANs are part of the floating point specification along with INF values and software needs to properly take this into consideration. Much like non-manifold geometry, this stuff will crop up from time to time, or it will come into houdini through the external pipeline. I remember educating compositiors about the issue years ago when they first started moving to Nuke 6 / exr-half and they couldn't understand why some renders had black pixels which couldn't be easily fixed and grossly contaminated neighboring areas when blurred (protip- Houdini has a builtin cop node called illegalPixel with a cute icon for handling this). The solution then was a simple expression much like the vex code above.

Link to comment
Share on other sites

  • 1 month later...

NANs are never good, Mantra is not completely bulletproof either, for example SSS breaks with NANs in normals (usually looks like tons of super bright speckles all over) and I bet you can find a lot of other cases

so it's smarter prevent or get rid of them even for Mantra

Link to comment
Share on other sites

  • 4 years later...

Very annoying indeed.

The Clean SOP has this Built In if you want a quick fix.

For scenes with complex (even changing) geometry and Rendering with HtoA it seems to be good practice to add the clean SOP with "Remove NANs" and "Manifold Topology Only".

Internally uses a similar vexpression:

isnan(@P.x) || isnan(@P.y) || isnan(@P.z)

 

  • Like 2
Link to comment
Share on other sites

For better future finding of this thread, this is the type of Error Message you mostly get from Arnold when this happens:

 

* CRASHED in AiIsFinite at 00:00:11, pixel (1952, 664)
* signal caught: error C0000005 -- access violation
*
* backtrace:
*  0 0x00007ff9f47b61ce [ai        ]
*  1 0x00007ff9f47b546f [ai        ]
*  2 0x00007ffa32f2f67a [KERNELBASE] UnhandledExceptionFilter
*  3 0x00007ffa35f44af2 [ntdll     ] memset                  03:16:25  9997MB WARNING |   [kick] render aborted due to earlier errors

*  4 0x00007ffa35f2c6d6 [ntdll     ] _C_specific_handler
*  5 0x00007ffa35f411ff [ntdll     ] _chkstk
*  6 0x00007ffa35f0a289 [ntdll     ] RtlRaiseException
*  7 0x00007ffa35f3fe6e [ntdll     ] KiUserExceptionDispatcher
>> 8 0x00007ff9f4da35cb [ai        ] AiIsFinite
*  9 0x00007ff9f4da2e48 [ai        ] AiIsFinite
* 10 0x00007ff9f4da18a2 [ai        ] AiIsFinite
* 11 0x00007ff9f4481bc4 [ai        ] AiUniverseGetSceneBounds
* 12 0x00007ff9f48071bc [ai        ] AiTextureParamsSetDefaults
* 13 0x00007ff9f4770277 [ai        ] AiUniverseGetAOVIterator
* 14 0x00007ff9f476f28d [ai        ] AiUniverseGetAOVIterator
* 15 0x00007ff9f4f83f5d [ai        ] AiIsFinite
* 16 0x00007ff9f4809af0 [ai        ] AiLightsTrace
* 17 0x00007ff9f48113a9 [ai        ] AiTrace
* 18 0x00007ff9f4f982c1 [ai        ] AiIsFinite
* 19 0x00007ff9f4766923 [ai        ] AiUniverseGetAOVIterator
* 20 0x00007ff9f4f6662a [ai        ] AiIsFinite
* 21 0x00007ff9f476f358 [ai        ] AiUniverseGetAOVIterator
* 22 0x00007ff9f4f83f5d [ai        ] AiIsFinite
* 23 0x00007ff9f4809af0 [ai        ] AiLightsTrace
* 24 0x00007ff9f48113a9 [ai        ] AiTrace
* 25 0x00007ff9f4f982c1 [ai        ] AiIsFinite

Link to comment
Share on other sites

In some cases though, for some random frames, both the clean SOP and @yourdaftpunk 's Vex Wrangle did not help.

Here I had to additionally Subdivide the Geo at Rendertime with the Arnold Subdivision OBJ Level Parms. (Even though I actually would not have wanted to subdivide them)

But then it rendered fine. Well yeah, Arnold seems to veery picky about this stuff

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...