SLI for openCL

cojoMan · January 19, 2015

seeing that there is increased performance with running the openCL option, I assume having a SLI or Crossfire would greatly improve sim times ?

or will only one card be used at a time ?...

any ideas ? advices for sli ?..

Skybar · January 19, 2015

I think Houdini can only use one device for OpenCL at the moment. However I can't remember where I read it or find it again.

cojoMan · January 19, 2015

this is exactly what I wanted to find out...if it's worth SLI ing 2 or more cards, or do you want one card with as many cores, GHz ? or what is the consideration for "more power" here ?

lukeiamyourfather · January 19, 2015

Nvidia SLI or AMD Crossfire can improve 3D rendering performance in applications that are written to take advantage of it (mostly games). As far as OpenCL goes Houdini will use only one compute device per instance but multiple cards can be used for multiple simulations. Houdini uses an environment variable to pick which OpenCL compute device to use for a particular instance. The short answer is no, it's not "worth it" for most users in my opinion.

January 19, 2015

or do you want one card with as many cores, GHz ? or what is the consideration for "more power" here ?

In general core count and the MHz are the key - as an example Apple's trash can Mac Pro is meant to run the gpu cards at ~80% power to save on heat & add stability, so core count isn't the only factor. Generation of card is also important for OpenCL for Nvidia. The latest ones have very good performance in the reviews.

edward · January 19, 2015

Don't forget also that Intel's OpenCL driver can also be used to use utilize Houdini's OpenGL code path on the CPU.

cojoMan · January 20, 2015

Don't forget also that Intel's OpenCL driver can also be used to use utilize Houdini's OpenGL code path on the CPU.

but how does a CPU core scale to a GPU one ?

because I am guessing it's not one-to-one....and I see GPUz out there with smth crazy like 512 cores...and that would just laugh at my 6 core machine )

January 20, 2015

GPU cores are super weak compared to CPU cores. The main advantage of CPU vs GPU OpenCL is available Ram.

cojoMan · January 20, 2015

care to explain a bit ?

I have 64Gb of RAM

would I be better of with 6 cores CPU (up to 12 with multithreading) or with 512 GPU cores working for me on a flip sim ?...

January 20, 2015

Just run a test.

cojoMan · January 20, 2015

I would...I don't have such a card...that's why I am asking So I could get one if the difference is significative..

I'll do some more reasearch. tx

January 20, 2015

I'm currently setting up the machine to test a gtx 980 vs dual 6 core @ 3.33. I'll the post the results when I can.

cojoMan · January 20, 2015

MUCH appreciated !

I got a new machine some months ago, and seeing that the videocard had the least impact on houdini performance and output, I overdid it on the ram, ssd, cpu, and left the GPU to deal with some other time...

lukeiamyourfather · January 20, 2015

I'm working on a machine with 16 cores at 3.4GHz (Xeon E5-2687W v2) and a Quadro K5000. Using OpenCL speeds up the Pyro simulations a little bit but with only 4GB of memory on the GPU it can't handle that high of resolution simulations compared to just using the CPU and the system memory which is 64GB in my case. If it would be helpful I can do a comparison and time them when the machine is idle (probably not today...).

lukeiamyourfather · January 20, 2015

I ran a test with a simple Pyro setup between render jobs. It had roughly 3 million voxels in the simulation. Results will be different on different configurations but it looks like a pair of Xeon E5-2687W v2 processors and a Quadro K5000 are pretty similar in performance in terms of Pyro solver performance. With OpenCL disabled it took 4 minutes and 36 seconds. With OpenCL enabled it took 4 minutes and 2 seconds (roughly a 14% improvement).

freaq · January 23, 2015

that is in H14?
try it with FLIP as well, there it seems to have a bigger impact on performance.

Shinjipierre · January 23, 2015

using openCL is not only a matter of checking that checkbox on the smoke solver. there are DOP nodes you shouldn't use as they use the CPU instead of the GPU and transfering data back and forth is slowing it down.
One should also save in background not to have the sim waiting to write the files on disk.
most of this is written in the help files of houdini anyway.

But, I've also noticed that sideFX has changed some settings in the smoke solver while using openCL... example : project multigrid with different substeps, which makes the results biased...

lukeiamyourfather · January 23, 2015

...it looks like a pair of Xeon E5-2687W v2 processors and a Quadro K5000 are pretty similar in performance...

I tried a simple FLIP tank with roughly 700,000 particles filled half way side to side (rather than top to bottom) so it sloshes around. I used the same machine as the last post I made.

that is in H14?

try it with FLIP as well, there it seems to have a bigger impact on performance.

With OpenCL disabled it took 4 minutes 44 seconds and with OpenCL enabled it took 4 minutes 56 seconds. So it was about 4% slower with OpenCL on this particular machine, of course YMMV with other configurations and scenes.

I think the biggest performance differences will be on machines with a single processor socket and a high end gaming card. The Quadro K5000 isn't that fast of a card relatively speaking compared to something like a Radeon R9 290X or GeForce GTX 980 and the processors in this particular machine are on the higher end making the difference with OpenCL smaller (or non-existent).

I like that OpenCL is there as an option but in production I don't see it being all that useful given the memory limitations of the GPU compared to the CPU. The one place it really makes a difference is with lower end hardware where the GPU can be much faster than the CPU without spending a ton of cash.

johner · January 24, 2015

I tried a simple FLIP tank with roughly 700,000 particles filled half way side to side (rather than top to bottom) so it sloshes around. I used the same machine as the last post I made.

Unfortunately that's not really a big enough sim to see a speedup from FLIP's OpenCL support. You're correct you need a quite fast GPU, but also a good-sized sim to offset the overhead of transferring data to and from the GPU.

The two major OpenCL additions for H14 are in the Position Based Dynamics (grains) solver and FLIP. Both are a bit different from the Pyro implementation in that they are used just for the expensive, iterative parts of a larger CPU-based solve, i.e. we transfer data over to the GPU and iterate a bunch of times to get an answer that is used in an otherwise CPU-based sim.

Pyro, on the other hand, has its entire solve pipeline on the GPU, meaning we store the entire sim representation on the GPU. While this is can give really good speedups, it also means you have to be careful about how often you're transferring data back and forth to the GPU when working with it (no DOPnet caching of the smoke; infrequent viewport display), and there are obvious memory limitations. The PBD and FLIP OpenCL additions have no such workflow limitations since they only accelerate a small, but expensive, part of the solve pipeline. They also have lower GPU memory requirements since they only store enough data to solve one expensive part of the sim.

I think the biggest performance differences will be on machines with a single processor socket and a high end gaming card. The Quadro K5000 isn't that fast of a card relatively speaking compared to something like a Radeon R9 290X or GeForce GTX 980 and the processors in this particular machine are on the higher end making the difference with OpenCL smaller (or non-existent).

I like that OpenCL is there as an option but in production I don't see it being all that useful given the memory limitations of the GPU compared to the CPU. The one place it really makes a difference is with lower end hardware where the GPU can be much faster than the CPU without spending a ton of cash.

For PBD the speedup can often be 5-6X on a good card, and I suspect will be used a good deal even in production.

For FLIP the viscosity solver actually benefits a good bit more than the pressure solve, although it does as well. I included some internal timings for a viscous sim benchmark from H13 to H14 at the end of this message.

Otherwise agreed about the benefits of "consumer" GPUs on lower end machines (and that the GTX 980 is a really good OpenCL card for the money, right now.)

---------------------------------------------------------------------------------

I put together a few preliminary numbers on the GasViscosity work.

These are early results so might change, but just to give some data

points, I increased the resolution on a variable viscosity test I had

lying around to about 6.8M particles. In the OpenCL case the linear

system for both pressure and viscosity is solved on the GPU (K6000 in

this case, so really fast).

Not exactly eye candy, but a flipbook is here:

https://s3.amazonaws.com/vfx/var_viscosity_benchmark.mp4

Overall sim times for 240 frames (all times in minutes):

H13: 232

h14 CPU: 146 - speedup vs H13 = 1.6x

h14 OpenCL: 74 - speedup vs H13 = 3.1x, speedup vs H14 CPU = 2x

We're limited in our overall speedup due to the amount of other solving

that goes into FLIP. To drill down a bit, the solve times for just the

GasViscosity DOP:

H13: 165

H14 CPU: 82 - speedup vs H13 = 2x

H14 OpenCL: 22 - speedup vs H13 = 7.5x, speedup vs H14 CPU = 3.7x

So about a 2X speedup over H13 in the regular CPU case. This includes

things like computing surface weights and building the matrix and such.

For just the viscosity linear system solve itself:

H13: 132

H14 CPU: 72 - speedup vs H13 = 1.8x

H14 OpenCL: 12.3 - speedup vs H13 = 10.7x, speedup vs H14 CPU = 5.9x

Pressure projection improves, but not as much as it has even more going

on besides solving the linear system. Overall

GasProjectNonDivergentVariational times:

H13: 30

H14 CPU: 28

14 OpenCL: 14

Just the pressure linear system solve:

H13: 16

H14 CPU: 16

H14 OpenCL: 3.25

Edited January 24, 2015 by johner

lukeiamyourfather · January 26, 2015

What processors were used on the workstation for the tests so we can get a good idea of what the CPU times mean. Thanks for sharing the results and taking the time!

Sign In

SLI for openCL

Recommended Posts

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

johner

lukeiamyourfather

johner

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Guest tar

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation