Jump to content

Distributed Simulation with Deadline


StaffordN

Recommended Posts

Hey all,

I have a large scale FLIP sim which I wish to distribute into slices and submit to a render farm using deadline. Following documentation from thinkbox and sidefx I have set-up a sliced sim ready for render however when submitting to deadline only 4 files are submitted (1 for each slice) and fail usually fail to complete with no file being outputted.

rop.PNG.e873098a4e6b26ecf5cfdbb73d9bc4fe.PNGdeadline.PNG.bf38849aee65c097bb204eba473cd986.PNG

The Documentation from thinkbox reads as follows: 

"The Houdini submitter allows you to submit a job that will run a distributed simulation.

In order to submit a simulation job you will have to first set up your fluid simulation. Once you have your simulation set up, click the Distribute tool found on the Wire, Cloth, or Particle fluids tab, select the item you wish to distribute and then press enter. This will create 3 new nodes, the specific node that controls the simulation will be sent to Deadline as the new distributedsim node in the /out tree. This node defines how many slices your simulation will have.

Once you have the distributed sim node set up, submit it to Deadline using the regular Deadline in-app submitter. Additional information for setting up distributed submissions and the properties defined in the nodes can be found in the Houdini Documentation."

Implying that there is no additional setup needed.

Any help would be greatly appreciated.

 

Thanks,

Stafford

Link to comment
Share on other sites

UPDATE: I now have the slices rendering through deadline. Turns out, the deadline monitor shows the slices as frames, but actually renders the whole sequence per slice.

The files weren't showing as they were saving to the farms temp storage as the default file location is $HIP/geo and the Houdini file was being submitted to the farm. To change this I just changed the output location to be local, $JOB/<scenename>/geo.

I am now having the issue that the frames are rendering separate from each slice, rather than slice.1.001, slice.2.001, slice.1.002, slice.2.002, they are rendering non-dependent on one another, slice.1.001, slice.1.002, slice.2.001, slice.2.001, slice.1.003, slice.2.003 etc. On top of this the sim is very chaotic showing that the different slices dont seem to be communicating to one another.

chaos.PNG.5620dfdadb73d2c8fd364176489d23ca.PNG

 I have checked on FLIP Solver and "Distributed Pressure Solve" is enabled.

pressure.PNG.1846cb6750d15cd7360e6196e119681e.PNG

 

Thanks,

Stafford

Link to comment
Share on other sites

Do you happen to know if it is possible/practical to set up deadline such that it will initiate an instance of Houdini for each card installed on a server and run the distributed simulation on each card?

We have a render server that has multiple GPUs in the same physical machine, and it would be nice to utilize it in H. It doesn't necessarily need to be a dynamic solution...

Edited by shawn_kearney
Link to comment
Share on other sites

UPDATE: I re-made the scene and all seems to be well now, slices are working well together and started increasing number of slices and sim resolution with good results. Going to talk with farm technicians today or tomorrow to try and better understand all this will update with anything interesting.

I will also post a reply summarising my experience here and what I learnt to help someone that sees this post in the future. :)

  • Thanks 1
Link to comment
Share on other sites

  • 1 year later...

Ciao! I am trying to set up a distributed sliced sim on Deadline.

The job is submitted to the farm correctly but after generating the first frame the sim does not proceed.

No errors are raised. I am suspecting the slaves are not communicating. (checking under progress)

 

Checklist

HIPFILE:

Flip tank, slice along line and distribute shelf tools.

 

DISTRIBUTE CONTROLS:

-Tracker Address: IP of the machine I am dispatching the job from

-Tracker Port: 8000 (default)

 

HQUEUE SIM ROP

-hQueue Server: set to our dedicated IP address (where Hqueue server is installed)

-Target HFS: $HFS

 

SLAVES

-firewalls among fx render blades are disabled

-substeps on the flipsolver are kept constant (min/max at 2)

-machines on the farm have similar specs

-checked tips given by Jeff Lait on this thread

https://www.sidefx.com/forum/topic/37431/#post-189862

image.png.ed45fc0e010f37d9d24f33dcf611010d.png

 

Any help on how to debug it would be much appreciated. Thanks!

 

I ran this command in a terminal

python /.apl/apps/Houdini/hfs17.5.460/houdini/python2.7libs/simtracker.py 8000 8001

And checked the slave response at the address:

<tracker address>:8001

image.thumb.png.6eb644b6f42652b836f11badfcd731f8.png

Seems like the machines are not connecting.

Link to comment
Share on other sites

Hi Dennis,

 

What should I expect to get by changing that parameter?

Tools > Configure Repository Options > Worker Settings > Run Plugin Sandbox in Job's Environment (default on)

"If enable the plugin sandbox will run in the same environment as the rendering job. The job's environment variables will be available to Plugin scripts."

 

Update: the same setup works fine on Hqueue.

(hip attached)

 

 

 

 

smr_sc029_0065_fx_simSlicing.v016.hip

Link to comment
Share on other sites

  • 1 month later...

Solved:

Path to simtracker.py was not set correctly in the Deadline Repo.

 

Deadline Monitor (super user mode)

Tools > Configure Plugins > Houdini

Double check that the path to the Render Executables (Hython) and the Hqueue Simulation Job Option (Sim tracker file) are pointing to the correct folder / file.

 

 

Link to comment
Share on other sites

1 hour ago, WLVL said:

解决了:

在截止期限回购中未正确设置simtracker.py的路径。

 

期限监视器(超级用户模式)

工具>配置插件> Houdini

仔细检查渲染可执行文件(Hython)和Hqueue Simulation Job Option(Sim跟踪文件)的路径是否指向正确的文件夹/文件。

 

 

I checked that the path was right, right, but still wrong

IMG_20210327_005957.jpg

IMG_20210327_010829.jpg

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...