magneto Posted October 31, 2015 Share Posted October 31, 2015 Hi, I mean without using the HDK, and even if certain features won't be available in the solver. I tried using OpenCL support for PBD and while it was much faster, it wasn't as fast as I hoped. I am using 20k points so not exactly high res. What's holding Houdini back in this area? Is there any advantage to buy GFX cards with large amount of memory like Titan X for Houdini if we can't fully utilize them? On a side note I read that the next gen Pascal cards will have up to 32GB RAM. Any thoughts on this? It seems like it would be a huge advantage to take advantage of these cards and iterate much faster. Or are they impractical for Houdini? Thanks Quote Link to comment Share on other sites More sharing options...
Guest tar Posted October 31, 2015 Share Posted October 31, 2015 Pascal seems like a very good beast but the compute is mixed precision, compute at 16 bit for better than 32 bit accuracy. What I'm reading is the order of operations is important to make this work. Not sure if this is correct or if code needs to be reimplemented to do this. Pure GPU computer afiak needs every bit of the process to work on the GPU, so all of Houdini that interacts with the GPU solver needs to be on the GPU. That works for isolated bits but in Houdini, everything talks to each other. The ultimate reason why it can't be all on the GPU is that the dev team needs to sleep and they don't have the time to implement all of the relevant parts of Houdini on the GPU GPU development environment too has changed in the last few years, to be more friendly IIRC and I think the Vray team built a program on the CPU that emulates the GPU so they can debug their code there without the GPU crashing. Quote Link to comment Share on other sites More sharing options...
Farmfield Posted October 31, 2015 Share Posted October 31, 2015 I think this is in part about OpenCL being a mess while CUDA is unpopular to use. And I think it was Jeff Lait who wrote here a year back or two that there really wasn't much possible to GPU accelerate in the current solvers and that even the current OpenCL acceleration really only helps on a small part of the calculations in FLIP and grains... 1 Quote Link to comment Share on other sites More sharing options...
anim Posted October 31, 2015 Share Posted October 31, 2015 Jeff explains a bit here https://www.sidefx.com/index.php?option=com_content&task=view&id=3149&Itemid=412 you can build your OpenCl solvers to modify fields and geometry attributes and you should see quite a big speedup, however if you throw into a mix non OpenCL modifiers or want to see the geo in the viewport, you will get some slowdown as the data has to be constantly copied back and forth from/to GPU memory. Jeff shows the example of gas repeat solver repeating some OpenCL code, which takes full advantage of OpenCL as the data stays there for all iterations so overall it's not the small and simple sims where you will see the advantage, but sims where the data stays on GPU for a big chunk of work before needs to be copied back 2 Quote Link to comment Share on other sites More sharing options...
magneto Posted November 1, 2015 Author Share Posted November 1, 2015 How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right? Quote Link to comment Share on other sites More sharing options...
anim Posted November 1, 2015 Share Posted November 1, 2015 (edited) How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right? I don't know technical details, but I would assume that they are directly streamed to OpenGL as it's already on GPU memory, in Houdini that would mean that Houdini will not get the data back after every frame, which could be fine for preview, but then you would need to run it again properly if you want a cache There was a thread about OpenCL in pyro with test scene where all caching was off just to get maximum speed, don't know where it is exactly and I personally haven't tried it as my card is super weak. So I should rather let someone more experienced shed some light into this EDIT: found that thread, maybe you can find some useful info there https://www.sidefx.com/index.php?option=com_forum&Itemid=172&page=viewtopic&p=116929 Edited November 1, 2015 by anim 1 Quote Link to comment Share on other sites More sharing options...
Guest tar Posted November 1, 2015 Share Posted November 1, 2015 (edited) How do they do those real time GPU sim demos? Do they cache it to disk at every frame? Can you do that using the GPU? If not, you will see a major slowdown because you can't just store the last frame of your sim, right? Which GPU demos? All the papers are uploaded, i.e: Real-time GI is possible http://on-demand.gputechconf.com/gtc/2014/presentations/S4552-rt-voxel-based-global-illumination-gpus.pdf There's nothing too magical about GPUs, there are optimisations done, less effort to correct errors as long as it's all visualising appealing. Also nothing is stored on disk/CPU ram, and, you can recreate this too on Houdini. In H15 I can get ~26fps on the SmokeCL default test, GTX 980 Windows, once you turn on Ram caching it drops to ~20fps, turn on disk caching and it's 12fps. Edited November 1, 2015 by tar Quote Link to comment Share on other sites More sharing options...
jkunz07 Posted November 1, 2015 Share Posted November 1, 2015 With houdini displaying the sim the viewport involves copying the data to the cpu and then back to the gpu. This is done to support simulating on a card that isn't your display card. Most demos just display the data without copying it anywhere or saving it so they are faster. 1 Quote Link to comment Share on other sites More sharing options...
magneto Posted November 1, 2015 Author Share Posted November 1, 2015 With houdini displaying the sim the viewport involves copying the data to the cpu and then back to the gpu. This is done to support simulating on a card that isn't your display card. Most demos just display the data without copying it anywhere or saving it so they are faster. So can Houdini be configured to do the same because it seems like Houdini is doing more? Quote Link to comment Share on other sites More sharing options...
Guest tar Posted November 1, 2015 Share Posted November 1, 2015 So can Houdini be configured to do the same because it seems like Houdini is doing more? I'd like to see you make this into project of yours Llke my organic modelling asset - I've no idea if it'll work but is a good side project that pushes ones knowledge of the software. Are you game to take up the challenge? Quote Link to comment Share on other sites More sharing options...
Farmfield Posted November 1, 2015 Share Posted November 1, 2015 Another issue with GPU acceleration is VRAM, I mean, in my experience (running a GTX970, 4Gb), you really don't need to run that big a FLIP sim's before you get out-of-memory issues, same with grains... But all this talk might be redundant anyways, a few weeks back I tweeted Gridmarkets asking them about supporting the H15 distributed simulations and I got a message back saying it was an awesome idea and that they would look into it directly - that would be a game changer for freelancers like me, having the possibility to do distributed simulations in the cloud. I used cloud rendering for years and nowadays I couldn't live without it. 1 Quote Link to comment Share on other sites More sharing options...
magneto Posted November 2, 2015 Author Share Posted November 2, 2015 I'd like to see you make this into project of yours Llke my organic modelling asset - I've no idea if it'll work but is a good side project that pushes ones knowledge of the software. Are you game to take up the challenge? I would love to do that as I am not afraid to get my hands dirty but I would need support from SESI to provide me with things that are missing in Houdini to accomplish this. I don't know what they are but even Jeff Lait needed this to be able to implement PBD in Houdini with tons of additions to VEX, etc. So a fully GPU based PBD would very likely require a lot more foundations to be added into Houdini by SESI. Someone correct me if I am far off Another issue with GPU acceleration is VRAM, I mean, in my experience (running a GTX970, 4Gb), you really don't need to run that big a FLIP sim's before you get out-of-memory issues, same with grains... But all this talk might be redundant anyways, a few weeks back I tweeted Gridmarkets asking them about supporting the H15 distributed simulations and I got a message back saying it was an awesome idea and that they would look into it directly - that would be a game changer for freelancers like me, having the possibility to do distributed simulations in the cloud. I used cloud rendering for years and nowadays I couldn't live without it. That's true but the next gen nvidia cards of 2016 promise 32GB RAM, which I assume will be a Titan. So the memory won't be an issue anymore. If we could "max out" this kind of card in Houdini, I would definitely get one. But as is, unless you are using a GPU renderer, it doesn't have much advantage of getting this for Houdini IMO. http://www.pcper.com/news/Graphics-Cards/Rumor-NVIDIA-Pascal-17-Billion-Transistors-32GB-HBM2 Quote Link to comment Share on other sites More sharing options...
Guest tar Posted November 2, 2015 Share Posted November 2, 2015 You have to dig a bit first - find out as much as possible about how OpenCL is implemented in Houdini, then come back with questions. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.