Jump to content
Sign in to follow this  
Atom

[SOLVED]Cooking Use All Cores?

Recommended Posts

Hi All,

 

I have been reviewing some more complicated example files and I notice that some of them have a long cooking time. When I examine my Activity Monitor I notice that not all my CPUs are being used.

 

 

Does Houdini 13 really only cook with one core?

 

How does one get work done on complicated scenes?

 

I was under the impression that Houdini was used in "big studio" hollywood productions but if the big studios can only cook on one core how do they do it? Is there some way to network cooking or tell Houdini to use all my cores?

 

I am starting to spend more time waiting on my computer rather than using it.

Edited by Atom

Share this post


Link to post
Share on other sites
Guest tar

Hey Atomic - you're the winner of the seemingly monthly 'use all my cores' question :)

 

Simply put you need to pre-render the bits you're happy with. Also check out the distribution options for the sim tools.

Share this post


Link to post
Share on other sites

Funny ^_^ ,

 

I guess I have noob egg on my face.

 

So new, that I have not heard of pre-render. Is that a node, kind of like a subnet?

Share this post


Link to post
Share on other sites
Guest tar

Pre-rendering is the common term for writing out files to disk. i.e. caching

 

Some examples are; you can write .sim files, these contain all the information to continue a simulation if it's interrupted - the files are very big though. You can also write out .bgeo files with a fileSop node or perhaps cacheSop.

 

If you use the shelf tools like Particle Fluid/ Emit Particles, then a pre-built node is placed where you can write and read files very easily called 'surface cache' within the 'particle_fluid' sop. 

 

The general idea is to process the files to disk and then read them back in - just like rendering :)

Share this post


Link to post
Share on other sites

In a general sense, the whole threading issue is far more complex than just a case of it does or it doesn't.

Houdini is multi-threaded on a node-by-node basis.  A fair few older nodes are still single-threaded, although that number tends to decrease with each new release as old code gets rewritten.

Most new additions are written with a strong multi-threading focus, but in certain rare cases the particular task a node is doing just doesn't lend itself to parallelization.

 

You shouldn't typically be seeing an entire scene cooking single-threaded, unless someone's gone out of their way to build it badly.

Things like the VOPSOP and AttribVOP nodes are very often the best way to implement things in SOPs, as they will always auto-thread over all points on a surface.

 

Basically, there's no definitive answer to this issue... you will get a feel for what parts of Houdini work in what ways with experience.  I'd say that usually, if you're finding you're hitting a single-threaded bottleneck, there is almost certainly a better approach.  At its core, Houdini's closer to a programming language than a 3D package - there are always multiple ways to achieve just about anything, and no defined "right way".

 

 

 

If however your problem stems from something more like Marty is suggesting :-

The idea of pre-rendering data is if you're looking to render a simulation - a sim of any kind is time-dependant, while a render is not, so if you attempt to directly render the output of a sim, Houdini would re-cook the entire sim end-to-end for each frame.  If you have any kind of sim, you need to cook the geometry it creates to disk, and then render the cached geometry instead.  Basically, treat it strictly as a two-stage process - A sim creates geometry, you write it to disk, then read it back in a render it.

  • Like 1

Share this post


Link to post
Share on other sites

And remember there can also be a perception bias on how well threaded a piece of code is. If you have 12 cores and are running code with which 90% of the work is threaded, the singlethreaded portion takes longer to execute than the multihreaded portion. In that case it easy to mistakenly think; "damn, less than half of this is multithreaded, this single threaded portion looks longer in the processor usage curve"...

Share this post


Link to post
Share on other sites

Its a lot of tiresome work to get everything properly miltithreaded, but the cynic in me is thinking: hbatch licenses...

Share this post


Link to post
Share on other sites

There are parts of Houdini that are single threaded, and will always be. Some things just don't thread well enough to justify the effort. However, we're continually expanding what is threaded in Houdini.

 

However, part of the problem is that CPU usage monitors haven't really kept up with all the new CPU features, such as dynamic overclocking (turbo) and SMT (aka hyperthreading, also the only time I've ever heard of "2" referred to as "hyper"). If you have a 6-core SMT processor, many CPU monitors will show 12 graphs, or display a single graph where 1 thread @100% shows as 8% usage. That would be fine if SMT actually gave you a +100% boost, but more often it's less than 20%. Turbo kicking in and boosting your CPU clocks by upwards of 20% is also not reflected in the usage graph either.

 

So, when Intel's TBB lib decides to only allocate 4 threads to your task on a 6-core SMT processor and run at +10% of the base clock, your CPU monitor shows you 33% usage. That looks pretty dismal. But if we assume that running a SMT core gives you 120% of the performance of that core running single-threaded (I think I'm being generous here), instead of having 12 cores (6x2), you have more like 7.2 (6x1.2). Since these 4 threads are very likely running on separate cores, and at +10% of their base clock, your actual usage should look more like 4*1.1/7.2, or 61%. Similarly, if the CPU is running single threaded at a lower idle clockspeed, it should show less than one running all out.

 

Longer tasks will likely be assigned more cores by TBB if they're partitioned right. But for a series of short threaded tasks, the CPU usage often looks very low. I can only trust that Intel, who designed both the CPU and the task scheduling in their TBB lib, knows what they're doing :) I suspect that it's better to run fewer threads faster, than run into higher thread and resource contention at high thread levels at a lower clockspeed on an SMT CPU.

 

Edit: Slight clarification.

  • Like 3

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×