I'm not entirely sure why but render farm managers are constant source of misunderstanding among artists. Probably because artists often don't directly setup / operate on render farms they use. They are end-users. As a consequence when you ask about some CG technique or application you usually get bunch of reasonable answers on internet. If you ask about render managers you mostly get rumors...
Lets notice for starter that render managers are practically speaking a subset of wider topic called task scheduling, which is by itself an area of active development in high performance computing and scientific community for decades. Task Schedulers / workload managers or cluster middle-wares are pretty advance systems capable to handle hundreds of thousands of tasks per second with full statistics, accounting, priorities, very deep hardware integration (like choosing which cores or even RAM slots are to be used by my process) , running tasks in complete separation from each other with private virtual file system etc, (or contrary in parallel sharing RAM, or even GPU).
Technically speaking most render managers are very basic task schedulers with lots of additional features from which hardly any focuses on core functionality. You can rather safely expect that the best and most expensive render managers get somewhat closer to an average task scheduler in terms of core technology (dispatching performance, reliability, prioritizing policies) and most if not all task schedulers lack general yet non essential usability provided by render managers. Aside these considerations the main problem of pretty much all task schedulers is strong Linux bias. Some do provide Windows / OSX toolset for interacting with clusters, but non of them are truly muli-platform. This totally makes sense considering how deeply they integrate with a hosts and that most of their users are 1000*x CPU clusters running on some sort of Linux.
Above comments are here to provide perspective on render managers, so one could make some metrics for that sort of software. I won't do that, but someone could. I personally would avoid putting DrQueue and Qube! in a same list. It's a bit of confusing as former one is not even developed for a last 10 years, and even in its good years was buggy and very basic. Qube is pretty much first class in its bucket.
My list would start from most popular commercial products like:
following more exotic:
Rush (?) is it alive?
(both can be used with third party applications so we can consider them as general purpose).
then open source:
finally I would put into different category task schedulers:
SGE (a.k.a. Grid Scheduler, Son of Grid Engine)
Open Lava (LSF comp.)
(interestingly enough all three above are free and open source).
First and perhaps most important distinction aside of OS you're going to run manager on and budget you have to send on it, is an artists' structure of your company. The thing is, running a series of jobs one after another submitted by a single artist or a couple of generalists and running multiply jobs from different departments which have to share render farm resources is completely different usability scenario. Funny enough most artists I talk to, don't even realize that in case of multi-department scenario FIFO policy (first in first out), is perhaps the worst one. Why is that? Perhaps because this is the only policy provided by render mangers they have ever used. Without fair-share policy (or similar), you have to either divide farm into parts (wastful) or tolerate preemption / re-scheduling of jobs with less priority (even more wastful).
Also at the risk of being trivial I would value core functionality over extra features, making sure manager operates flawless in stress scenarios when loads of sequences scheduled simultaneously by many users left manager with 100% farm usage for entire night or weekend. Now, while this shouldn't be a problem, manager should handle cases when this constant stream of work in disrupted by changing priorities per job, re-queuing failed jobs, hosts fails and restart, end-of-ram error, and similar. From my experience this apparently basic test isn't easy to pass by most popular applications for example because they loose contact with mothership when scheduled job got crazy.
Some of managers do decent work on 20 machines and fails miserably for 30+. Specially, if they are used by many users at the same time for different type of tasks. This alone is a source of confusion, because a manager which performed great on indie project with 15 PCs rendering maya batches of 10 frames each, may die if you flood it with single frame jobs from 2 apps and users with different requirements and dependencies on a slightly bigger cluster. You will see interesting effect of death from exponential growth...
An interesting example was Deadline prior version 6, which by design of operating without database couldn't handle anything beyond x* (* - some rather low number determined by your network/storage condition). Also its multi-platform support based on Mono (.NET) was almost certain source of endless pain in every-but-Windows OS. Deadline was rewritten from scratch since then, so I don't know its current state, but lesson was taken. Know your environment and don't choose software based of general usability but rather precision of applicability.
I definitely would discourage anyone from making decisions based on a number of features like automatic mp4 creation, partial frame view and number of client application support. Anyone, I mean anyone with minimal scripting knowledge can these days support new application for any render manager or task scheduler in hours.
Some of interesting imo questions about render manger you choose:
- Can it run jobs as an submitting user?
- Can it share single machine between many tasks with different specification?
- Does it support fair-share or similar policy?
- Tickets? Deadline policy? (that is dynamic change of priorities based on conditions)
- Does it allow for arbitrary resource definition per host and per queue (which can be used for license or resource management with great deal of flexibility )?
- Does it support pre and post run scripts?
- Does it support custom health checks?
- How does it handle dependency? Array dependency? Multi-stage dependency? (graph like)
- How does it handle dynamic environment, per job configuration?
- Does it make persistent accounting/statistics?
- Does it pass environment variables? Can it overwrite them? How is it manged?
Less important, but still:
- Python API? RESTful?
- Command line toolset (stats, manager status, user control, not only submits)?
- Web UI?
- fail over mangers?
- User conditions to kill job?
- Separate manager and accountant?
In commercial category my likes would probably go to Qube!, which is unfortunately also very expensive (for a reason?), but as I said I don't have any experience with new Deadline and lots of bad experience with old one. I had brief contact with Smedge and Muster, and I don't expect them to be anything better then average but I can't argue neither way. I don't know Butterfly nor Royal Render. I assume they are targeted towards small studios and freelancers. Being a product if small teams and driven by Windows' users I don't have much faith in them, but it's not solid opinion obviously. Tractor is (again obviously) very good piece of software, in good mood of big render farms and user friendliness... for people who can effort it. I know middle sized companies running Backburner, which basically works in its minimalist fashion. I don't have good experience with Hqueue though.
From an open source projects I'd definitely recommend Afanasy which is very solid and constantly developed. Lots of core usability, and gets better and better. I would simply kill guy who designed its GUI. But that's minor. Loose one life, save hundreds... Arsenal used to be also on my radar, but it's not developed actively and doesn't seem to have any offers beyond Afanasy. Head horse.
I won't hide personal bias here though. If you feel adventurous and have that comfort of working in Linux environment I highly recommend trying Slurm or SGE or Openlava. SGE is used for years in movie industry (even by the facility which has created commercialized render manager from our list...(sic!)), and has great records of managing both CPU and GPU clusters. Specially Slurm is very interesting, because it's apparently the most popular open source task scheduler on super computers these days. It's actively developed and has a major community. Contrary to popular opinion, making cluster based on SGE, openlava, or Slurm isn't hard at all. Average Linux user, can make it happen in two hours. What makes it little tedious is customizing queues (partitions in Slurm). Basically you need to specify a number of conditions to make your jobs running manageably.
Once you get into it (schedulers / cluster software), you can forget fancy GUI (any GUI for that matter), and functionality directly related to VFX/CGI, but you gain quite a lot in exchange. Both GUI and client submission front-end can be made easily by your own stuff, so the trade-off is accountable assuming you are in desperate need of render farm performance, reliability, and you run elaborated pipeline which requires handcrafting anyway. Basically if you think big and work on Linux, there is hard to find better option.