Jump to content

Multithreaded HOT


Recommended Posts

  • 1 month later...

Hi Christian,

Sorry I haven't had time to think seriously about this patch, I've been off doing non-houdini things of late. I had a quick look at the code and I'm not sure that it's worth making the build more complicated than it is already. Also, I have some doubts that removing the locking around the FFT startup would work reliably on multicpu machines. Last time I checked it definitely didn't work reliably under multi-threaded mantra rendering, crashing every few hundred to a thousand frames. It would be nice if Sidefx had some HDK functionality that achieved the multi-threaded style geometry update that you have put into the SOP with OpenMP, I'm guessing that the HDK routines that are being made thread safe these days, is there any docs on this ? I guess you see pretty much linear speedup in the SOP playback speed ?

-Drew

Link to comment
Share on other sites

Hey Drew,

the fft part is a critical part, that's right. The HOT patch in this version is only safe for the SOP.

The reason is, because I did the multithreaded changes for a maya and mentalray adaption of the HOT. I have to mention, that the mentalray shader multithreading works different as the VEX version does. For a thread safe VEX version some more changes have to be done.

For example, the mentalray shader has a non threaded initialization part, where the ocean can be prepared. The shading process (and so the ocean evaluation) can be multithreaded on the same ocean object. This works fine and the the full multithreading of the renderer can be used.

AFAIK, the VEX commands have a initialize function, but it is called for each thread and without the function parameter available. So, the ocean can not be prepared once, but it is called by each thread. Each thread creates it's own ocean (and parallel). That's why the lock around the FFTW is necessary, I agree.

I will stabilize my patch to enable the usage of the vex function.

A good multithreaded approach: The whole fftw should NOT be done by each thread calling the vex function, but once at the beginning. Then, the evaluation itself can be threaded on the same ocean object. This would be a good multithreaded way and lesser ocean initializations are needed.

I will also check if a multithreading optimization can be done there.

Anyway, I threaded the fft startup for each components. I had never problems and crashes.

@drew: That's right, the playback speedup on SOP is much better, especially for big grids.

I agree, the best way would be to use HDK threading mechanism. I also will take a look on this.

@claude: These are the correct links, thanks.

I hope I remember the facts right...I am currently in holiday an I am not able to check the code and details.

I will take a look at these things when I am back.

Cheers,

christian

Link to comment
Share on other sites

  • 2 months later...

Hey,

unfortunately, I didn't find the time to work on it intensively.

But I changed the threading mechanism of the ocean SOP deformer to use the houdini internal threading stuff provided by the HDK.

My first impression is that is is not as fast as the OpenMP version. Though, maybe some more speedup could be reached with some optimizations.

The best amount of speedup I get on large grids, where many points have to be processed and evaluated. On my 8-core I get a speedup factor up to 2.

On very small grids, no big improvements can be seen.

I made a 64bit compile against Houdini 11.0.504, if somebody want to test it.

The dll only conatins the SOP deformer, called "hocean". I added a checkbox "threading", where you can switch on/off the threading to compare the results.

cheers,

christian

HOcean_x64_H11.0.504.zip

Edited by schnelli
Link to comment
Share on other sites

  • 1 month later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...