Jump to content
StaffordN

Distributed Simulation with Deadline

Recommended Posts

Hey all,

I have a large scale FLIP sim which I wish to distribute into slices and submit to a render farm using deadline. Following documentation from thinkbox and sidefx I have set-up a sliced sim ready for render however when submitting to deadline only 4 files are submitted (1 for each slice) and fail usually fail to complete with no file being outputted.

rop.PNG.e873098a4e6b26ecf5cfdbb73d9bc4fe.PNGdeadline.PNG.bf38849aee65c097bb204eba473cd986.PNG

The Documentation from thinkbox reads as follows: 

"The Houdini submitter allows you to submit a job that will run a distributed simulation.

In order to submit a simulation job you will have to first set up your fluid simulation. Once you have your simulation set up, click the Distribute tool found on the Wire, Cloth, or Particle fluids tab, select the item you wish to distribute and then press enter. This will create 3 new nodes, the specific node that controls the simulation will be sent to Deadline as the new distributedsim node in the /out tree. This node defines how many slices your simulation will have.

Once you have the distributed sim node set up, submit it to Deadline using the regular Deadline in-app submitter. Additional information for setting up distributed submissions and the properties defined in the nodes can be found in the Houdini Documentation."

Implying that there is no additional setup needed.

Any help would be greatly appreciated.

 

Thanks,

Stafford

Share this post


Link to post
Share on other sites

UPDATE: I now have the slices rendering through deadline. Turns out, the deadline monitor shows the slices as frames, but actually renders the whole sequence per slice.

The files weren't showing as they were saving to the farms temp storage as the default file location is $HIP/geo and the Houdini file was being submitted to the farm. To change this I just changed the output location to be local, $JOB/<scenename>/geo.

I am now having the issue that the frames are rendering separate from each slice, rather than slice.1.001, slice.2.001, slice.1.002, slice.2.002, they are rendering non-dependent on one another, slice.1.001, slice.1.002, slice.2.001, slice.2.001, slice.1.003, slice.2.003 etc. On top of this the sim is very chaotic showing that the different slices dont seem to be communicating to one another.

chaos.PNG.5620dfdadb73d2c8fd364176489d23ca.PNG

 I have checked on FLIP Solver and "Distributed Pressure Solve" is enabled.

pressure.PNG.1846cb6750d15cd7360e6196e119681e.PNG

 

Thanks,

Stafford

Share this post


Link to post
Share on other sites

Do you happen to know if it is possible/practical to set up deadline such that it will initiate an instance of Houdini for each card installed on a server and run the distributed simulation on each card?

We have a render server that has multiple GPUs in the same physical machine, and it would be nice to utilize it in H. It doesn't necessarily need to be a dynamic solution...

Edited by shawn_kearney

Share this post


Link to post
Share on other sites

UPDATE: I re-made the scene and all seems to be well now, slices are working well together and started increasing number of slices and sim resolution with good results. Going to talk with farm technicians today or tomorrow to try and better understand all this will update with anything interesting.

I will also post a reply summarising my experience here and what I learnt to help someone that sees this post in the future. :)

Share this post


Link to post
Share on other sites

Ciao! I am trying to set up a distributed sliced sim on Deadline.

The job is submitted to the farm correctly but after generating the first frame the sim does not proceed.

No errors are raised. I am suspecting the slaves are not communicating. (checking under progress)

 

Checklist

HIPFILE:

Flip tank, slice along line and distribute shelf tools.

 

DISTRIBUTE CONTROLS:

-Tracker Address: IP of the machine I am dispatching the job from

-Tracker Port: 8000 (default)

 

HQUEUE SIM ROP

-hQueue Server: set to our dedicated IP address (where Hqueue server is installed)

-Target HFS: $HFS

 

SLAVES

-firewalls among fx render blades are disabled

-substeps on the flipsolver are kept constant (min/max at 2)

-machines on the farm have similar specs

-checked tips given by Jeff Lait on this thread

https://www.sidefx.com/forum/topic/37431/#post-189862

image.png.ed45fc0e010f37d9d24f33dcf611010d.png

 

Any help on how to debug it would be much appreciated. Thanks!

 

I ran this command in a terminal

python /.apl/apps/Houdini/hfs17.5.460/houdini/python2.7libs/simtracker.py 8000 8001

And checked the slave response at the address:

<tracker address>:8001

image.thumb.png.6eb644b6f42652b836f11badfcd731f8.png

Seems like the machines are not connecting.

Share this post


Link to post
Share on other sites

Try to run a sim with "Disable Plugin Sandboxing" checked under 'Tools'->'Configure Repository Options'->'Slave Settings'

Share this post


Link to post
Share on other sites

Hi Dennis,

 

What should I expect to get by changing that parameter?

Tools > Configure Repository Options > Worker Settings > Run Plugin Sandbox in Job's Environment (default on)

"If enable the plugin sandbox will run in the same environment as the rendering job. The job's environment variables will be available to Plugin scripts."

 

Update: the same setup works fine on Hqueue.

(hip attached)

 

 

 

 

smr_sc029_0065_fx_simSlicing.v016.hip

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×