Jump to content

Is it possible to implement full GPU based solvers inside Houdini?


magneto

Recommended Posts

Hi,

 

I mean without using the HDK, and even if certain features won't be available in the solver. I tried using OpenCL support for PBD and while it was much faster, it wasn't as fast as I hoped. I am using 20k points so not exactly high res.

 

What's holding Houdini back in this area?

 

Is there any advantage to buy GFX cards with large amount of memory like Titan X for Houdini if we can't fully utilize them?

 

On a side note I read that the next gen Pascal cards will have up to 32GB RAM. Any thoughts on this? It seems like it would be a huge advantage to take advantage of these cards and iterate much faster. Or are they impractical for Houdini?

 

 

Thanks :)

Link to comment
Share on other sites

Pascal seems like a very good beast but the compute is mixed precision, compute at 16 bit for better than 32 bit accuracy. What I'm reading is the order of operations is important to make this work. Not sure if this is correct or if code needs to be reimplemented to do this.

 

Pure GPU computer afiak needs every bit of the process to work on the GPU, so all of Houdini that interacts with the GPU solver needs to be on the GPU. That works for isolated bits but in Houdini, everything talks to each other.

 

The ultimate reason why it can't be all on the GPU is that the dev team needs to sleep and they don't have the time to implement all of the relevant parts of Houdini on the GPU :)  GPU development environment too has changed in the last few years, to be more friendly IIRC and I think the Vray team built a program on the CPU that emulates the GPU so they can debug their code there without the GPU crashing.  

Link to comment
Share on other sites

I think this is in part about OpenCL being a mess while CUDA is unpopular to use. And I think it was Jeff Lait who wrote here a year back or two that there really wasn't much possible to GPU accelerate in the current solvers and that even the current OpenCL acceleration really only helps on a small part of the calculations in FLIP and grains...

  • Like 1
Link to comment
Share on other sites

Jeff explains a bit here

https://www.sidefx.com/index.php?option=com_content&task=view&id=3149&Itemid=412

 

you can build your OpenCl solvers to modify fields and geometry attributes and you should see quite a big speedup, however if you throw into a mix non OpenCL modifiers or want to see the geo in the viewport, you will get some slowdown as the data has to be constantly copied back and forth from/to GPU memory. Jeff shows the example of gas repeat solver repeating some OpenCL code, which takes full advantage of OpenCL as the data stays there for all iterations

so overall it's not the small and simple sims where you will see the advantage, but sims where the data stays on GPU for a big chunk of work before needs to be copied back

  • Like 2
Link to comment
Share on other sites

How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right?

I don't know technical details, but I would assume that they are directly streamed to OpenGL as it's already on GPU memory, in Houdini that would mean that Houdini will not get the data back after every frame, which could be fine for preview, but then you would need to run it again properly if you want a cache 

There was a thread about OpenCL in pyro with test scene where all caching was off just to get maximum speed, don't know where it is exactly and I personally haven't tried it as my card is super weak. So I should rather let someone more experienced shed some light into this

 

EDIT: found that thread, maybe you can find some useful info there https://www.sidefx.com/index.php?option=com_forum&Itemid=172&page=viewtopic&p=116929

Edited by anim
  • Like 1
Link to comment
Share on other sites

How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right?

 

 

Which GPU demos?  All the papers are uploaded, i.e: Real-time GI is possible http://on-demand.gputechconf.com/gtc/2014/presentations/S4552-rt-voxel-based-global-illumination-gpus.pdf

 

There's nothing too magical about GPUs, there are optimisations done, less effort to correct errors as long as it's all visualising appealing. 

 

Also nothing is stored on disk/CPU ram, and, you can recreate this too on Houdini. In H15 I can get ~26fps on the SmokeCL default test, GTX 980 Windows, once you turn on Ram caching it drops to ~20fps, turn on disk caching and it's 12fps.  

Edited by tar
Link to comment
Share on other sites

With houdini displaying the sim the viewport involves copying the data to the cpu and then back to the gpu.  This is done to support simulating on a card that isn't your display card.  Most demos just display the data without copying it anywhere or saving it so they are faster.

  • Like 1
Link to comment
Share on other sites

 

With houdini displaying the sim the viewport involves copying the data to the cpu and then back to the gpu.  This is done to support simulating on a card that isn't your display card.  Most demos just display the data without copying it anywhere or saving it so they are faster.

 

So can Houdini be configured to do the same because it seems like Houdini is doing more?

Link to comment
Share on other sites

 

So can Houdini be configured to do the same because it seems like Houdini is doing more?

 

I'd like to see you make this into project of yours  :)  Llke my organic modelling asset - I've no idea if it'll work but is a good side project that pushes ones knowledge of the software. Are you game to take up the challenge?

Link to comment
Share on other sites

Another issue with GPU acceleration is VRAM, I mean, in my experience (running a GTX970, 4Gb), you really don't need to run that big a FLIP sim's before you get out-of-memory issues, same with grains...

 

But all this talk might be redundant anyways, a few weeks back I tweeted Gridmarkets asking them about supporting the H15 distributed simulations and I got a message back saying it was an awesome idea and that they would look into it directly - that would be a game changer for freelancers like me, having the possibility to do distributed simulations in the cloud. I used cloud rendering for years and nowadays I couldn't live without it.

  • Like 1
Link to comment
Share on other sites

I'd like to see you make this into project of yours  :)  Llke my organic modelling asset - I've no idea if it'll work but is a good side project that pushes ones knowledge of the software. Are you game to take up the challenge?

 

I would love to do that as I am not afraid to get my hands dirty but I would need support from SESI to provide me with things that are missing in Houdini to accomplish this. I don't know what they are but even Jeff Lait needed this to be able to implement PBD in Houdini with tons of additions to VEX, etc. So a fully GPU based PBD would very likely require a lot more foundations to be added into Houdini by SESI. Someone correct me if I am far off :)

 

Another issue with GPU acceleration is VRAM, I mean, in my experience (running a GTX970, 4Gb), you really don't need to run that big a FLIP sim's before you get out-of-memory issues, same with grains...

 

But all this talk might be redundant anyways, a few weeks back I tweeted Gridmarkets asking them about supporting the H15 distributed simulations and I got a message back saying it was an awesome idea and that they would look into it directly - that would be a game changer for freelancers like me, having the possibility to do distributed simulations in the cloud. I used cloud rendering for years and nowadays I couldn't live without it.

 

That's true but the next gen nvidia cards of 2016 promise 32GB RAM, which I assume will be a Titan. So the memory won't be an issue anymore. If we could "max out" this kind of card in Houdini, I would definitely get one. But as is, unless you are using a GPU renderer, it doesn't have much advantage of getting this for Houdini IMO.

 

http://www.pcper.com/news/Graphics-Cards/Rumor-NVIDIA-Pascal-17-Billion-Transistors-32GB-HBM2

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...