Jump to content
Farmfield

Main GPU v. Dedicated GPU for OpenCL?

Recommended Posts

Currently I get about 3x the speed running something like Pyro or the grain microsolver on OpenCL, but seeing my CPU's still spike and my (seemingly) GPU doesn't, I'm getting kinda confused... So before ripping a card out of another workstation and benchmarking running OpenCL on a dedicated GPU, I thought I might as well ask here if anyone here knows how Houdini handles OpenCL when using the display GPU compared to a dedicated.

 

My spec's are a i7-4790K@4.5 Ghz, 32Gb RAM, 1x GTX970 4Gb...

Edited by Farmfield

Share this post


Link to post
Share on other sites

The GPU will handle only some of the computing workload as not everything the Pyro solver is doing can be done with OpenCL. Adding another GPU to handle the OpenCL workload won't reduce the CPU usage anymore than your existing GPU has.

Share this post


Link to post
Share on other sites

Adding another GPU to handle the OpenCL workload won't reduce the CPU usage anymore than your existing GPU has.

 

Thanks for the quick reply, Luke. So. What I've read is that Houdini cannot take full advantage of the GPU for OpenCL when running the display on the same card, so the question is if you get (noticeably) better OpenCL performance with a dedicated card or not.

Share this post


Link to post
Share on other sites

Thanks for the quick reply, Luke. So. What I've read is that Houdini cannot take full advantage of the GPU for OpenCL when running the display on the same card, so the question is if you get (noticeably) better OpenCL performance with a dedicated card or not.

 

Using a dedicated GPU for OpenCL will improve performance but probably not in a cost effective and worthwhile way compared to using a GPU that's being used for the display. I have a Quadro K5000 I could pillage to test with if things get slow at work. If you beat me to it please share the results.

Share this post


Link to post
Share on other sites

Currently I get about 3x the speed running something like Pyro or the grain microsolver on OpenCL, but seeing my CPU's still spike and my (seemingly) GPU doesn't, I'm getting kinda confused... So before ripping a card out of another workstation and benchmarking running OpenCL on a dedicated GPU, I thought I might as well ask here if anyone here knows how Houdini handles OpenCL when using the display GPU compared to a dedicated.

 

 

I put a little description about the OpenCL additions in H14 in this post, but briefly:

 

Only the constraint solving part of the grain solver is done on the GPU; fortunately it's generally the most expensive part.  The other expensive part is finding all the neighbor points for each particle, which is done on the CPU for the moment. Collision detection takes some processing as well and is also purely CPU.  So you'll generally see a big speedup from the GPU since constraint solving is really expensive.  Also, increasing the Constraint Iterations (often necessary for production-quality results) isn't as expensive as you might think since all the data has already been transferred to the GPU by the time of the first iteration.

 

Thanks for the quick reply, Luke. So. What I've read is that Houdini cannot take full advantage of the GPU for OpenCL when running the display on the same card, so the question is if you get (noticeably) better OpenCL performance with a dedicated card or not.

 

 

For best results with OpenCL and the Pyro solver you generally need to disable caching of the Pyro object (turn off Enable Caching on the Pyro object's Creation tab).  Most of the Pyro pipeline is OpenCL-optimized these days; the big remaining holdout is turbulence, which will force a transfer of the velocity field back to the CPU for VEX-based turbulence calculations.  But on a good card you should still see a big speedup with caching off, at least while your simulation fits in GPU memory.  The memory limitation, by the way, is the big argument for an additional GPU, not speed.  The display can easily take a gigabyte or more of GPU memory, leaving less for simulation.

 

P.S.  On a different note, I happened to find this while trolling Vimeo, I think it's yours?:

 

The main problem you're running into is you don't have a mass attribute on your grain particles, so it's treating them as they have mass of 1.  If you enable Compute Mass on the Grain Source and set the density to 100 to match the rigid body, you'll see each particle has mass around 0.05.  So your grain particles are about 20 times too heavy and are causing instabilities.  (I'm making a bug database entry to get this into the documentation - another user ran into it recently)

 

Also:

- The RBD Solver is more stable than Bullet for inter-solver coupled interactions like this.

- You had Rotational Stiffness really low (0.3), if anything you want that really high (e.g. 4), which will dampen spurious rotations from particle collisions

- Consider substepping at the DOPNet level as well, since that makes the the RBD / grain coupled interactions solve at higher frequency.  So for example decrease the grain POP Solver substeps to 2, but increase the DOPNet substeps to 5 (note this will take more DOPNet cache memory).

- You might need to increase grain Constraint Iterations even higher than the 100 I set here to avoid stretching on the first ball / net interaction.

- If you really just want sphere / grain interactions, you might even use really big grains instead and make it an entirely grain simulation.  See the Variable Radius grains helpcard example.

 

I attached a more stable version of your test.

stable.net.hiplc

Share this post


Link to post
Share on other sites

P.S.  On a different note, I happened to find this while trolling Vimeo, I think it's yours?:

 

The main problem you're running into is you don't have a mass attribute on your grain particles, so it's treating them as they have mass of 1.  If you enable Compute Mass on the Grain Source and set the density to 100 to match the rigid body, you'll see each particle has mass around 0.05.  So your grain particles are about 20 times too heavy and are causing instabilities.  (I'm making a bug database entry to get this into the documentation - another user ran into it recently)

 

Also:

- The RBD Solver is more stable than Bullet for inter-solver coupled interactions like this.

- You had Rotational Stiffness really low (0.3), if anything you want that really high (e.g. 4), which will dampen spurious rotations from particle collisions

- Consider substepping at the DOPNet level as well, since that makes the the RBD / grain coupled interactions solve at higher frequency.  So for example decrease the grain POP Solver substeps to 2, but increase the DOPNet substeps to 5 (note this will take more DOPNet cache memory).

- You might need to increase grain Constraint Iterations even higher than the 100 I set here to avoid stretching on the first ball / net interaction.

- If you really just want sphere / grain interactions, you might even use really big grains instead and make it an entirely grain simulation.  See the Variable Radius grains helpcard example.

 

I attached a more stable version of your test.

 

First off, tnx for the info on the GPU computational stuff, the VRAM question is interesting, especially since the GTX970 has this weird 3.5+0.5GB setup, I guess I'll notice when that becomes an issue. And btw, what is the normal Houdini out of VRAM behaviour? Error popup or BSOD, hehe? Had a couple of those while rendering with H14, though never sim'ing.

 

And the sim. Well, first off, it's just a 5 minute shelf tool setup, and haven't really had time to go through the grain solver for more than a couple of "for fun" tests, though the non-existent mass seems, as you say, like something that should have been ticked in the grain source node added in the shelf tool script. But note that I only posted it for entertainment, it's just as creepy, organic behaviour I ever got out of a sim, ever. :D

 

As for rotational stiffness, checking my orig file, I only had increased the weight for the internal collisions and lessened the stiffness, you set them back to default and increased scale kinetic from .1 to .5...

 

Finally the substeps, yeah, I didn't want to increase the DOP ones over 2, that's why I tried with the drag to calm it down, but that was before the thing came alive and started to crawl around, I think that came with my increased weight.

 

Well, I'm going to get way more into this down the line, I only fell into this test after watching Alvaro's FEM breakdown today - real RnD I'd prefer to do on paid time, learning Houdini during my transition now has been way more focused on the more foundational, SOPs, attributes, VEX, the math (!) etc, really getting to know how it ticks... :)

 

And tnx for this great info and great advice.

Share this post


Link to post
Share on other sites

First off, tnx for the info on the GPU computational stuff, the VRAM question is interesting, especially since the GTX970 has this weird 3.5+0.5GB setup, I guess I'll notice when that becomes an issue.

 

I just can't get enough of these fake subtitle videos. Especially when they're about Nvidia.

 

http://youtu.be/spZJrsssPA0

  • Like 1

Share this post


Link to post
Share on other sites

LOL - that's absolutely hilarious. xD (and refreshing from the 1.4 billion Hitler-variants out there)

 

And to be honest, if I were a gamer I'd be pissed, but as the most advanced gaming I do is Freecell, this really has no impact for me as a comper and FX guy. Well, unless it has, but I won't stress out over that until it bites.

 

BTW - I wonder what the heck the video is really about, they are seriously dying... :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×