Jump to content

SLI for openCL


cojoMan

Recommended Posts

Nvidia SLI or AMD Crossfire can improve 3D rendering performance in applications that are written to take advantage of it (mostly games). As far as OpenCL goes Houdini will use only one compute device per instance but multiple cards can be used for multiple simulations. Houdini uses an environment variable to pick which OpenCL compute device to use for a particular instance. The short answer is no, it's not "worth it" for most users in my opinion.

Link to comment
Share on other sites

 or do you want one card with as many cores, GHz ? or what is the consideration for "more power" here ?

 

In general core count and the MHz are the key - as an example Apple's trash can Mac Pro is meant to run the gpu cards at ~80% power to save on heat & add stability, so core count isn't the only factor.  Generation of card is also important for OpenCL for Nvidia. The latest ones have very good performance in the reviews.

Link to comment
Share on other sites

Don't forget also that Intel's OpenCL driver can also be used to use utilize Houdini's OpenGL code path on the CPU.

 

 

but how does a CPU core scale to a GPU one ?

because I am guessing it's not one-to-one....and I see GPUz out there with smth crazy like 512 cores...and that would just laugh at my 6 core machine :))

Link to comment
Share on other sites

MUCH appreciated !

 

I got a new machine some months ago, and seeing that the videocard had the least impact on houdini performance and output, I overdid it on the ram, ssd, cpu, and left the GPU to deal with some other time...

Link to comment
Share on other sites

I'm working on a machine with 16 cores at 3.4GHz (Xeon E5-2687W v2) and a Quadro K5000. Using OpenCL speeds up the Pyro simulations a little bit but with only 4GB of memory on the GPU it can't handle that high of resolution simulations compared to just using the CPU and the system memory which is 64GB in my case. If it would be helpful I can do a comparison and time them when the machine is idle (probably not today...).

Link to comment
Share on other sites

I ran a test with a simple Pyro setup between render jobs. It had roughly 3 million voxels in the simulation. Results will be different on different configurations but it looks like a pair of Xeon E5-2687W v2 processors and a Quadro K5000 are pretty similar in performance in terms of Pyro solver performance. With OpenCL disabled it took 4 minutes and 36 seconds. With OpenCL enabled it took 4 minutes and 2 seconds (roughly a 14% improvement).

  • Like 1
Link to comment
Share on other sites

using openCL is not only a matter of checking that checkbox on the smoke solver. there are DOP nodes you shouldn't use as they use the CPU instead of the GPU and transfering data back and forth is slowing it down.
One should also save in background not to have the sim waiting to write the files on disk.
most of this is written in the help files of houdini anyway.

But, I've also noticed that sideFX has changed some settings in the smoke solver while using openCL... example : project multigrid with different substeps, which makes the results biased...

Link to comment
Share on other sites

...it looks like a pair of Xeon E5-2687W v2 processors and a Quadro K5000 are pretty similar in performance...

 

I tried a simple FLIP tank with roughly 700,000 particles filled half way side to side (rather than top to bottom) so it sloshes around. I used the same machine as the last post I made.

 

that is in H14?

try it with FLIP as well, there it seems to have a bigger impact on performance.

 

With OpenCL disabled it took 4 minutes 44 seconds and with OpenCL enabled it took 4 minutes 56 seconds. So it was about 4% slower with OpenCL on this particular machine, of course YMMV with other configurations and scenes.

 

I think the biggest performance differences will be on machines with a single processor socket and a high end gaming card. The Quadro K5000 isn't that fast of a card relatively speaking compared to something like a Radeon R9 290X or GeForce GTX 980 and the processors in this particular machine are on the higher end making the difference with OpenCL smaller (or non-existent).

 

I like that OpenCL is there as an option but in production I don't see it being all that useful given the memory limitations of the GPU compared to the CPU. The one place it really makes a difference is with lower end hardware where the GPU can be much faster than the CPU without spending a ton of cash.

Link to comment
Share on other sites

I tried a simple FLIP tank with roughly 700,000 particles filled half way side to side (rather than top to bottom) so it sloshes around. I used the same machine as the last post I made.

 

 

Unfortunately that's not really a big enough sim to see a speedup from FLIP's OpenCL support.  You're correct you need a quite fast GPU, but also a good-sized sim to offset the overhead of transferring data to and from the GPU.

 

The two major OpenCL additions for H14 are in the Position Based Dynamics (grains) solver and FLIP. Both are a bit different from the Pyro implementation in that they are used just for the expensive, iterative parts of a larger CPU-based solve, i.e.  we transfer data over to the GPU and iterate a bunch of times to get an answer that is used in an otherwise CPU-based sim.  

 

Pyro, on the other hand, has its entire solve pipeline on the GPU, meaning we store the entire sim representation on the GPU.  While this is can give really good speedups, it also means you have to be careful about how often you're transferring data back and forth to the GPU when working with it (no DOPnet caching of the smoke; infrequent viewport display), and there are obvious memory limitations.  The PBD and FLIP OpenCL additions have no such workflow limitations since they only accelerate a small, but expensive, part of the solve pipeline.  They also have lower GPU memory requirements since they only store enough data to solve one expensive part of the sim.

 

I think the biggest performance differences will be on machines with a single processor socket and a high end gaming card. The Quadro K5000 isn't that fast of a card relatively speaking compared to something like a Radeon R9 290X or GeForce GTX 980 and the processors in this particular machine are on the higher end making the difference with OpenCL smaller (or non-existent).

 

I like that OpenCL is there as an option but in production I don't see it being all that useful given the memory limitations of the GPU compared to the CPU. The one place it really makes a difference is with lower end hardware where the GPU can be much faster than the CPU without spending a ton of cash.

 

 

For PBD the speedup can often be 5-6X on a good card, and I suspect will be used a good deal even in production.

 

For FLIP the viscosity solver actually benefits a good bit more than the pressure solve, although it does as well. I included some internal timings for a viscous sim benchmark from H13 to H14 at the end of this message.

 

Otherwise agreed about the benefits of "consumer" GPUs on lower end machines (and that the GTX 980 is a really good OpenCL card for the money, right now.)

 

--------------------------------------------------------------------------------- 
I put together a few preliminary numbers on the GasViscosity work. 
These are early results so might change, but just to give some data 
points, I increased the resolution on a variable viscosity test I had 
lying around to about 6.8M particles. In the OpenCL case the linear 
system for both pressure and viscosity is solved on the GPU (K6000 in 
this case, so really fast). 
 
Not exactly eye candy, but a flipbook is here: 
 
Overall sim times for 240 frames (all times in minutes): 
H13: 232 
h14 CPU: 146 - speedup vs H13 = 1.6x 
h14 OpenCL: 74 - speedup vs H13 = 3.1x, speedup vs H14 CPU = 2x 
 
We're limited in our overall speedup due to the amount of other solving 
that goes into FLIP. To drill down a bit, the solve times for just the 
GasViscosity DOP: 
H13: 165 
H14 CPU: 82 - speedup vs H13 = 2x 
H14 OpenCL: 22 - speedup vs H13 = 7.5x, speedup vs H14 CPU = 3.7x 
 
So about a 2X speedup over H13 in the regular CPU case. This includes 
things like computing surface weights and building the matrix and such. 
For just the viscosity linear system solve itself: 
H13: 132 
H14 CPU: 72 - speedup vs H13 = 1.8x 
H14 OpenCL: 12.3 - speedup vs H13 = 10.7x, speedup vs H14 CPU = 5.9x 
 
Pressure projection improves, but not as much as it has even more going 
on besides solving the linear system. Overall 
GasProjectNonDivergentVariational times: 
H13: 30 
H14 CPU: 28 
14 OpenCL: 14 
 
Just the pressure linear system solve: 
H13: 16 
H14 CPU: 16 
H14 OpenCL: 3.25 
Edited by johner
  • Like 3
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...