Is it possible to implement full GPU based solvers inside Houdini?

magneto · October 31, 2015

Hi,

I mean without using the HDK, and even if certain features won't be available in the solver. I tried using OpenCL support for PBD and while it was much faster, it wasn't as fast as I hoped. I am using 20k points so not exactly high res.

What's holding Houdini back in this area?

Is there any advantage to buy GFX cards with large amount of memory like Titan X for Houdini if we can't fully utilize them?

On a side note I read that the next gen Pascal cards will have up to 32GB RAM. Any thoughts on this? It seems like it would be a huge advantage to take advantage of these cards and iterate much faster. Or are they impractical for Houdini?

Thanks

October 31, 2015

Pascal seems like a very good beast but the compute is mixed precision, compute at 16 bit for better than 32 bit accuracy. What I'm reading is the order of operations is important to make this work. Not sure if this is correct or if code needs to be reimplemented to do this.

Pure GPU computer afiak needs every bit of the process to work on the GPU, so all of Houdini that interacts with the GPU solver needs to be on the GPU. That works for isolated bits but in Houdini, everything talks to each other.

The ultimate reason why it can't be all on the GPU is that the dev team needs to sleep and they don't have the time to implement all of the relevant parts of Houdini on the GPU GPU development environment too has changed in the last few years, to be more friendly IIRC and I think the Vray team built a program on the CPU that emulates the GPU so they can debug their code there without the GPU crashing.

Farmfield · October 31, 2015

I think this is in part about OpenCL being a mess while CUDA is unpopular to use. And I think it was Jeff Lait who wrote here a year back or two that there really wasn't much possible to GPU accelerate in the current solvers and that even the current OpenCL acceleration really only helps on a small part of the calculations in FLIP and grains...

anim · October 31, 2015

Jeff explains a bit here

https://www.sidefx.com/index.php?option=com_content&task=view&id=3149&Itemid=412

you can build your OpenCl solvers to modify fields and geometry attributes and you should see quite a big speedup, however if you throw into a mix non OpenCL modifiers or want to see the geo in the viewport, you will get some slowdown as the data has to be constantly copied back and forth from/to GPU memory. Jeff shows the example of gas repeat solver repeating some OpenCL code, which takes full advantage of OpenCL as the data stays there for all iterations

so overall it's not the small and simple sims where you will see the advantage, but sims where the data stays on GPU for a big chunk of work before needs to be copied back

magneto · November 1, 2015

How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right?

anim · November 1, 2015

How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right?

I don't know technical details, but I would assume that they are directly streamed to OpenGL as it's already on GPU memory, in Houdini that would mean that Houdini will not get the data back after every frame, which could be fine for preview, but then you would need to run it again properly if you want a cache

There was a thread about OpenCL in pyro with test scene where all caching was off just to get maximum speed, don't know where it is exactly and I personally haven't tried it as my card is super weak. So I should rather let someone more experienced shed some light into this

EDIT: found that thread, maybe you can find some useful info there https://www.sidefx.com/index.php?option=com_forum&Itemid=172&page=viewtopic&p=116929

Edited November 1, 2015 by anim

November 1, 2015

How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right?

Which GPU demos? All the papers are uploaded, i.e: Real-time GI is possible http://on-demand.gputechconf.com/gtc/2014/presentations/S4552-rt-voxel-based-global-illumination-gpus.pdf

There's nothing too magical about GPUs, there are optimisations done, less effort to correct errors as long as it's all visualising appealing.

Also nothing is stored on disk/CPU ram, and, you can recreate this too on Houdini. In H15 I can get ~26fps on the SmokeCL default test, GTX 980 Windows, once you turn on Ram caching it drops to ~20fps, turn on disk caching and it's 12fps.

Edited November 1, 2015 by tar

jkunz07 · November 1, 2015

With houdini displaying the sim the viewport involves copying the data to the cpu and then back to the gpu. This is done to support simulating on a card that isn't your display card. Most demos just display the data without copying it anywhere or saving it so they are faster.

magneto · November 1, 2015

With houdini displaying the sim the viewport involves copying the data to the cpu and then back to the gpu. This is done to support simulating on a card that isn't your display card. Most demos just display the data without copying it anywhere or saving it so they are faster.

So can Houdini be configured to do the same because it seems like Houdini is doing more?

November 1, 2015

So can Houdini be configured to do the same because it seems like Houdini is doing more?

I'd like to see you make this into project of yours Llke my organic modelling asset - I've no idea if it'll work but is a good side project that pushes ones knowledge of the software. Are you game to take up the challenge?

Farmfield · November 1, 2015

Another issue with GPU acceleration is VRAM, I mean, in my experience (running a GTX970, 4Gb), you really don't need to run that big a FLIP sim's before you get out-of-memory issues, same with grains...

But all this talk might be redundant anyways, a few weeks back I tweeted Gridmarkets asking them about supporting the H15 distributed simulations and I got a message back saying it was an awesome idea and that they would look into it directly - that would be a game changer for freelancers like me, having the possibility to do distributed simulations in the cloud. I used cloud rendering for years and nowadays I couldn't live without it.

magneto · November 2, 2015

I'd like to see you make this into project of yours Llke my organic modelling asset - I've no idea if it'll work but is a good side project that pushes ones knowledge of the software. Are you game to take up the challenge?

I would love to do that as I am not afraid to get my hands dirty but I would need support from SESI to provide me with things that are missing in Houdini to accomplish this. I don't know what they are but even Jeff Lait needed this to be able to implement PBD in Houdini with tons of additions to VEX, etc. So a fully GPU based PBD would very likely require a lot more foundations to be added into Houdini by SESI. Someone correct me if I am far off

Another issue with GPU acceleration is VRAM, I mean, in my experience (running a GTX970, 4Gb), you really don't need to run that big a FLIP sim's before you get out-of-memory issues, same with grains...

But all this talk might be redundant anyways, a few weeks back I tweeted Gridmarkets asking them about supporting the H15 distributed simulations and I got a message back saying it was an awesome idea and that they would look into it directly - that would be a game changer for freelancers like me, having the possibility to do distributed simulations in the cloud. I used cloud rendering for years and nowadays I couldn't live without it.

That's true but the next gen nvidia cards of 2016 promise 32GB RAM, which I assume will be a Titan. So the memory won't be an issue anymore. If we could "max out" this kind of card in Houdini, I would definitely get one. But as is, unless you are using a GPU renderer, it doesn't have much advantage of getting this for Houdini IMO.

http://www.pcper.com/news/Graphics-Cards/Rumor-NVIDIA-Pascal-17-Billion-Transistors-32GB-HBM2

November 2, 2015

You have to dig a bit first - find out as much as possible about how OpenCL is implemented in Houdini, then come back with questions.

Sign In

Is it possible to implement full GPU based solvers inside Houdini?

Recommended Posts

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Join the conversation