Search the Community
Showing results for tags 'hq'.
-
Hi all,I have a problem with HQnode. HQnode working properly after a restart, HQnode talk with server, start but have this error when the job start :hqnode 2016-01-21 13:27:43,717 INFO Starting HQueue client…hqnode 2016-01-21 13:27:43,717 INFO Running as pid 4510hqnode 2016-01-21 13:27:43,717 INFO Hostname is HDrf001hqnode 2016-01-21 13:27:43,717 INFO Listening on port 5001hqnode 2016-01-21 13:27:44,746 INFO Loaded DiskTool b15ddc31b0953b2b6f368f494205349201c49cbahqnode 2016-01-21 13:27:44,746 INFO Loaded GpuTool 08675048800785eb531143c626c606c0fe5d88d4hqnode 2016-01-21 13:27:44,746 INFO Loaded HoudiniTool 1eb2c4733ee11cb420fc724b10b54193cbb93007hqnode 2016-01-21 13:27:44,746 INFO Loaded InterfacesTool b857afb4e210b1017ebf9d6d4b2eccb156de6d0fhqnode 2016-01-21 13:28:30,618 INFO Running job 2hqnode 2016-01-21 13:28:41,398 INFO Finished job 2hqnode 2016-01-21 13:28:41,398 ERROR Could not connect for 5.08 secondsNonehqnode 2016-01-21 13:32:01,687 INFO Running job 2hqnode 2016-01-21 13:32:12,482 INFO Finished job 2hqnode 2016-01-21 13:32:12,482 ERROR Could not connect for 5.08 secondsNoneDo you have an idea ? It's a windows 7 x64 server and the client is also windows 7 x64 and the firewall on the to computer was turn off.Thanks,
-
Hey guys, I am having a heck of a time getting my farm setup for my school finals. I have three machines I need to run sliced fluid sims on. Right now I am just trying to get the main workstation to complete a HQueue job... So I have HQueue client and server installed on this machine on the C drive. Both services run fine under another admin account I created called HQueue. I Have used the shelf tool for creating a sliced sim (pyro sim in this case) as recommended per the HQueue documentation. The shared folder with a houdini install in it is on another disk called F in the workstation, the specific folder is shared as hq to all other machines and is mounted on all of them as H: I have no problems accessing it from any machine. The server's .ini file has been setup with the server ip of the pc it is running on (the workstation), and these lines have been set: hqserver.sharedNetwork.path.windows = \\KYLE-PC\hq hqserver.sharedNetwork.mount.windows = H: Everything else in there is vanilla. The problem seems to be in writing the slice files to the mounted H: drive, as I get this error when I submit the houdini file I have attached: hqlib.callFunctionWithHQParms(hqlib.simulateSlice) File "\\KYLE-PC\hq\houdini_distros\hfs.windows-x86_64\houdini\scripts\hqueue\hqlib.py", line 1864, in callFunctionWithHQParms return function(**kwargs) File "\\KYLE-PC\hq\houdini_distros\hfs.windows-x86_64\houdini\scripts\hqueue\hqlib.py", line 1532, in simulateSlice _renderRop(rop) File "\\KYLE-PC\hq\houdini_distros\hfs.windows-x86_64\houdini\scripts\hqueue\hqlib.py", line 1869, in _renderRop rop.render(*args, **kwargs) File "//KYLE-PC/hq/houdini_distros/hfs.windows-x86_64/houdini/python2.7libs\hou.py", line 32411, in render return _hou.RopNode_render(*args, **kwargs) hou.OperationFailed: The attempted operation failed. Error: Failed to save output to file "H:/projects/geo/untitled.loadslices.1.bgeo.sc". Error: Failed to save output to file "H:/projects/geo/untitled.loadslices.2.bgeo.sc". I am really not sure why this is happening as I think I have all the relevant permissions. Any suggestions peeps? -Kyle Here is the diagonostics ouput too: Diagnostic Information for Job 75: ================================== Job Name: Simulate -> HIP: untitled.hip ROP: save_slices (Slice 0) Submitted By: Kyle Job ID: 75 Parent Job ID(s): 73, 76 Number of Clients Assigned: 1 Job Status: failed Report Generated On: December 12, 2015 01:52:08 AM Job Properties: =============== Description: None Tries Left: 0 Priority: 5 Minimum Number of Hosts: 1 Maximum Number of Hosts: 1 Tags: single Queue Time: December 12, 2015 01:15:04 AM Runnable Time: December 12, 2015 01:46:19 AM Command Start Time: December 12, 2015 01:50:04 AM Command End Time: Start Time: December 12, 2015 01:50:04 AM End Time: December 12, 2015 01:50:18 AM Time to Complete: 13s Time in Queue: 35m 00s Job Environment Variables: ========================== HQCOMMANDS: { "hythonCommandsLinux": "export HOUDINI_PYTHON_VERSION=2.7 && export HFS=\"$HQROOT/houdini_distros/hfs.$HQCLIENTARCH\" && cd $HFS && source ./houdini_setup && hython -u", "pythonCommandsMacosx": "export HFS=\"$HQROOT/houdini_distros/hfs.$HQCLIENTARCH\" && $HFS/Frameworks/Python.framework/Versions/2.7/bin/python", "pythonCommandsLinux": "export HFS=\"$HQROOT/houdini_distros/hfs.$HQCLIENTARCH\" && $HFS/python/bin/python2.7", "pythonCommandsWindows": "(set HFS=!HQROOT!\\houdini_distros\\hfs.!HQCLIENTARCH!) && \"!HFS!\\python27\\python2.7.exe\"", "mantraCommandsLinux": "export HFS=\"$HQROOT/houdini_distros/hfs.$HQCLIENTARCH\" && cd $HFS && source ./houdini_setup && $HFS/python/bin/python2.7 $HFS/houdini/scripts/hqueue/hq_mantra.py", "mantraCommandsMacosx": "export HFS=\"$HQROOT/houdini_distros/hfs.$HQCLIENTARCH\" && cd $HFS && source ./houdini_setup && $HFS/Frameworks/Python.framework/Versions/2.7/bin/python $HFS/houdini/scripts/hqueue/hq_mantra.py", "hythonCommandsMacosx": "export HOUDINI_PYTHON_VERSION=2.7 && export HFS=\"$HQROOT/houdini_distros/hfs.$HQCLIENTARCH\" && cd $HFS && source ./houdini_setup && hython -u", "hythonCommandsWindows": "(set HOUDINI_PYTHON_VERSION=2.7) && (set HFS=!HQROOT!\\houdini_distros\\hfs.!HQCLIENTARCH!) && (set PATH=!HQROOT!\\houdini_distros\\hfs.!HQCLIENTARCH!\\bin;!PATH!) && \"!HFS!\\bin\\hython\" -u", "mantraCommandsWindows": "(set HFS=!HQROOT!\\houdini_distros\\hfs.!HQCLIENTARCH!) && \"!HFS!\\python27\\python2.7.exe\" \"!HFS!\\houdini\\scripts\\hqueue\\hq_mantra.py\"" } HQPARMS: { "controls_node": "/obj/pyro_sim/DISTRIBUTE_pyro_CONTROLS", "dirs_to_create": [ "$HIP/geo" ], "tracker_port": 54534, "hip_file": "$HQROOT/projects/untitled.hip", "output_driver": "/obj/distribute_pyro/save_slices", "enable_perf_mon": 0, "slice_divs": [ 1, 1, 1 ], "tracker_host": "KYLE-PC", "slice_num": 0, "slice_type": "volume" } HQHOSTS: KYLE-PC Job Conditions and Requirements: ================================ hostname any KYLE-PC Executed Client Job Commands: ============================= Windows Command: (set HOUDINI_PYTHON_VERSION=2.7) && (set HFS=!HQROOT!\houdini_distros\hfs.!HQCLIENTARCH!) && (set PATH=!HQROOT!\houdini_distros\hfs.!HQCLIENTARCH!\bin;!PATH!) && "!HFS!\bin\hython" -u "!HFS!\houdini\scripts\hqueue\hq_sim_slice.py" Client Machine Specification (KYLE-PC): ======================================= DNS Name: KYLE-PC Client ID: 1 Operating System: windows Architecture: x86_64 Number of CPUs: 24 CPU Speed: 4000.0 Memory: 25156780 Client Machine Configuration File Contents (KYLE-PC): ===================================================== [main] server = KYLE-PC port = 5000 sharedNetwork.mount = \\KYLE-PC\hq [job_environment] HQueue Server Configuration File Contents: ========================================== # # hqserver - Pylons configuration # # The %(here)s variable will be replaced with the parent directory of this file # [DEFAULT] email_to = you@yourdomain.com smtp_server = localhost error_email_from = paste@localhost [server:main] use = egg:Paste#http host = 0.0.0.0 port = 5000 [app:main] # The shared network. hqserver.sharedNetwork.host = KYLE-PC hqserver.sharedNetwork.path.linux = %(here)s/shared hqserver.sharedNetwork.path.windows = \\KYLE-PC\hq hqserver.sharedNetwork.path.macosx = %(here)s/HQShared hqserver.sharedNetwork.mount.linux = /mnt/hq hqserver.sharedNetwork.mount.windows = H: hqserver.sharedNetwork.mount.macosx = /Volumes/HQShared # Server port number. hqserver.port = 5000 # Where to save job output job_logs_dir = %(here)s/job_logs # Specify the database for SQLAlchemy to use sqlalchemy.default.url = sqlite:///%(here)s/db/hqserver.db # This is required if using mysql sqlalchemy.default.pool_recycle = 3600 # This will force a thread to reuse connections. sqlalchemy.default.strategy = threadlocal ######################################################################### # Uncomment these configuration values if you are using a MySQL database. ######################################################################### # The maximum number of database connections available in the # connection pool. If you see "QueuePool limit of size" messages # in the errors.log, then you should increase the value of pool_size. # This is typically done for farms with a large number of client machines. #sqlalchemy.default.pool_size = 30 #sqlalchemy.default.max_overflow = 20 # Where to publish myself in avahi # hqnode will use this to connect publish_url = http://hostname.domain.com:5000 # How many minutes before a client is considered inactive hqserver.activeTimeout = 3 # How many days before jobs are deleted hqserver.expireJobsDays = 10 # The maximum number of jobs (under the same root parent job) that can fail on # a single client before a condition is dynamically added to that root parent # job (and recursively all its children) that excludes the client from ever # running this job/these jobs again. This value should be a postive integer # greater than zero. To disable this feature, set this value to zero. hqserver.maxFailsAllowed = 5 # The priority that the 'upgrade' job gets. hqserver.upgradePriority = 100 use = egg:hqserver full_stack = True cache_dir = %(here)s/data beaker.session.key = hqserver beaker.session.secret = somesecret app_instance_uuid = {fa64a6d1-ae3f-43c1-8141-9c29fdd9d418} # Logging Setup [loggers] keys = root [handlers] keys = console [formatters] keys = generic [logger_root] # Change to "level = DEBUG" to see debug messages in the log. level = INFO handlers = console # This handler backs up the log when it reaches 10Mb # and keeps at most 5 backup copies. [handler_console] class = handlers.RotatingFileHandler args = ("hqserver.log", "a", 10485760, 5) level = NOTSET formatter = generic [formatter_generic] format = %(asctime)s %(levelname)-5.5s [%(name)s] %(message)s datefmt = %B %d, %Y %H:%M:%S Job Status Log: =============== December 12, 2015 01:15:04 AM: Assigned to KYLE-PC (master) December 12, 2015 01:15:10 AM: setting status to running December 12, 2015 01:15:23 AM: setting status to failed December 12, 2015 01:18:28 AM: Rescheduling... December 12, 2015 01:18:28 AM: setting status to runnable December 12, 2015 01:18:28 AM: Assigned to KYLE-PC (master) December 12, 2015 01:18:35 AM: setting status to running December 12, 2015 01:18:47 AM: setting status to failed December 12, 2015 01:23:18 AM: setting status to runnable December 12, 2015 01:23:19 AM: Assigned to KYLE-PC (master) December 12, 2015 01:23:20 AM: setting status to running December 12, 2015 01:23:33 AM: setting status to failed December 12, 2015 01:29:44 AM: setting status to runnable December 12, 2015 01:29:44 AM: Assigned to KYLE-PC (master) December 12, 2015 01:29:44 AM: setting status to running December 12, 2015 01:29:57 AM: setting status to failed December 12, 2015 01:34:17 AM: setting status to runnable December 12, 2015 01:34:17 AM: Assigned to KYLE-PC (master) December 12, 2015 01:38:17 AM: setting status to abandoned December 12, 2015 01:46:19 AM: setting status to runnable December 12, 2015 01:50:04 AM: Assigned to KYLE-PC (master) December 12, 2015 01:50:04 AM: setting status to running December 12, 2015 01:50:18 AM: setting status to failed UPDATE: I just did a system restart to see if it would help and instead of the regular write error I recieved this: 0x00000000577CDE78 (0x000000000000002B 0x000000AD63AEF840 0x000000AD453FEEB0 0x0000000000000000), ?thread_sleep_v3@internal@tbb@@YAXAEBVinterval_t@tick_count@2@@Z() + 0x8C8 bytes(s) 0x00000000577CDD2B (0x000000AD45381F90 0x000000AD45381F90 0x000000AD453FEEB0 0x0000000000000000), ?thread_sleep_v3@internal@tbb@@YAXAEBVinterval_t@tick_count@2@@Z() + 0x77B bytes(s) 0x00007FFF29E43FEF (0x00007FFF29EE1DB0 0x0000000000000000 0x0000000000000000 0x0000000000000000), _beginthreadex() + 0x107 bytes(s) 0x00007FFF29E44196 (0x00007FFF29E44094 0x000000AD453FEEB0 0x0000000000000000 0x0000000000000000), _endthreadex() + 0x192 bytes(s) 0x00007FFF36582D92 (0x00007FFF36582D70 0x0000000000000000 0x0000000000000000 0x0000000000000000), BaseThreadInitThunk() + 0x22 bytes(s) 0x00007FFF36C29F64 (0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000), RtlUserThreadStart() + 0x34 bytes(s) After resubmission, it went back to the usual error mentioned above. untitled.hip