Jump to content

Clustered simulation with H13


borisb2

Recommended Posts

I'm setting up a farm for clustered simulation with simtracker.py and RenderPal as my rendermanager.

 

Job distribution so far works, I can see on the simtrackers-webpage the peers arriving (did a simple test with a pyro sim, split in 2 slices/2 machines). But the problem is, nothing is calculating, the jobs are just frozen (and waiting?), simtracker is showing "pending" for sync.

 

When I do the simple test as shown in that old tutorials from sesi (xxx): opening 2 houdini sessions on the same machine, setting up distributed sims, launching simtracker.py and moving 1 frame forward in both sessions, I can see this "first" simulated frame works, simtracker shows done for all entries, and both houdini sessions return from frozen status - but then usually 1 houdini session crashes with the next frame and again simtracker is showing pending.

 

So I guess my problem is not related to the rendermanager but more basic related to simtracker/syncing?? What could I do wrong? tracker-address is the name of the machine, $SLICE is set correctly (can see #PEER 0 and #PEER 1 in simtracker). what else could be messed up?

 

Thanks for any help,

Boris

Edited by borisb2
Link to comment
Share on other sites

as suspected - it seems that there's somethign in the simtracker going wrong. Whenever moving one frame forward on both machines I get "Socket broken" in the simtracker log:

 

Y:\_scripts\houdini>python Y:\_scripts\houdini\simtracker.py 8000 9000
Exception in thread Thread-4:
Traceback (most recent call last):
  File "C:\Program Files\Side Effects Software\Houdini 13.0.447\python27\lib\threading.py", line 808, in __bootstrap_inner
    self.run()
  File "Y:\_scripts\houdini\simtracker.py", line 308, in run
    sendmessage(peer, message)
  File "Y:\_scripts\houdini\simtracker.py", line 374, in sendmessage
    rs.connect( (peer['address'], peer['port']) )
  File "C:\Program Files\Side Effects Software\Houdini 13.0.447\python27\lib\socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 10061] No connection could be made because the target machine actively refused it

 

initial syncing (first frame) always works .. hmmm, any idea?

Link to comment
Share on other sites

so I tried it again on a laptop (firewall disabled) - same result:

 

- simple pyro scene - setup as distributed with 2 slices.

- opening 2 houdini sessions, setting $SLICE to 0 and 1 accordingly

- starting simtracker succesfully, opening in browser

- as soon as I type in the tracker address (computerName) in distribute-control-node, houdini starts calculating (which differs from tutorial-video .. is that maybe the problem?)

- when I type in the tracker address in second houdini instance, both sync eachother successfully, no more freeze, simtracker-browser UI shows successfully sync done.

- when I move 1 frame forward in both houdini-sessions, 1 houdini is always crashing with fatal error, simtracker-Ui shows pending, simtracker-shell shows "Runtime-error: Socket broken"

 

so whats wrong there? any idea anybody? .. would be great to get this going

Edited by borisb2
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...