animatrix Posted August 22, 2016 Share Posted August 22, 2016 Volume convolution on the GPU using OpenCL. For 27M voxels using 100 iterations, OpenCL is 650 times faster than C++ and 12525 times faster than VEX. 3 Quote Link to comment Share on other sites More sharing options...
johner Posted August 22, 2016 Share Posted August 22, 2016 Hi Yunus, Just a minor point, for your VolumeWrangle approach you can just do: sum += volumeindex(0, "density", set(@ix-1, @iy, @iz)); which will skip the position calc and linear interpolation and be more similar to the OpenCL code (though still way slower!) 1 Quote Link to comment Share on other sites More sharing options...
animatrix Posted August 22, 2016 Author Share Posted August 22, 2016 Thanks John, good idea. I will try to update the code and video. Hopefully this will improve VEX benchmark a bit Quote Link to comment Share on other sites More sharing options...
symek Posted August 22, 2016 Share Posted August 22, 2016 Also "faster than C++" is a bit misleading. You don't actually bench C++ code versus OpenCL code, but VolumeBlurSOP versus your OCL implementation of convolution (which is impressive btw, but still it's NOT proper C++ vs. OpenCL comparision Quote Link to comment Share on other sites More sharing options...
animatrix Posted August 22, 2016 Author Share Posted August 22, 2016 (edited) I disagree. Volume Blur SOP is written in C++ so it's comparing a C++ implementation of a volume convolution by SESI none the less to a possible OpenCL version. From SESI wrt Volume Convolve SOP: "There pretty much is zero cost for the voxels with a 0 multiplier in them. Because volume convolve 3x3x3 has a known stencil we can stream perfectly, avoiding any random access. VEX will always be slower than it." Edited August 22, 2016 by pusat Quote Link to comment Share on other sites More sharing options...
symek Posted August 22, 2016 Share Posted August 22, 2016 The title suggest you're compering C++ to OpeCL to VEX, but in fact you're compering SESI C++ node (not code) to your code. It's not apples to apples. Quote Link to comment Share on other sites More sharing options...
animatrix Posted August 22, 2016 Author Share Posted August 22, 2016 Not really, it's just semantics. You could also wrap the AttribWrangle node into an HDA, black box it, and compare it just the same. By your terms this too would not be comparing VEX code, but indeed it is. Quote Link to comment Share on other sites More sharing options...
symek Posted August 22, 2016 Share Posted August 22, 2016 Quote Not really(...) Well. really. Sorry, it's not very important after all, but it's like ABC of bench-marking and comparison studies. It's simply not technically possible that the some algorithm expressed in OpenCL is 650 times faster than its implementation in C++ - on the same hardware. If algorithm differs, or hardware differs, or one is multi-threaded and second is not, you are not entitled to say "OpenCL is X times faster then C++". Because it's not, something else does also matter, like hardware, or implementation details. Plain and simple. You would rather say, my OpenCL code is 650 faster than Houdini's own VolumeBlurSOP. Peroid. Which is still great result, but refers rather to VolumeBlur, not C++. EDIT: Are you running OpenCL on CPU or GPU? EDIT2: Oh sorry, I see now, it's a GPU... Quote Link to comment Share on other sites More sharing options...
animatrix Posted August 22, 2016 Author Share Posted August 22, 2016 Sorry but I don't have time to argue back and forth with you on this. If my tests do not meet your standards of benchmarking, then simply disregard them and move on. I will continue using them in production and get shots done on time, and not worry about whether I am comparing code or a node, etc. Quote Link to comment Share on other sites More sharing options...
symek Posted August 22, 2016 Share Posted August 22, 2016 Sorry, if you feel offended, it wasn't my intention. My remark clearly referred to the expression, you used ("it's a bit misleading") , not the essence of your tool or its usefulness. It's might be useful in production and I've never daubed that. Peace! Quote Link to comment Share on other sites More sharing options...
Guest tar Posted August 22, 2016 Share Posted August 22, 2016 Very nice! Would be cool to see the OpenCL version run on the CPU too. Quote Link to comment Share on other sites More sharing options...
animatrix Posted August 22, 2016 Author Share Posted August 22, 2016 1 hour ago, marty said: Very nice! Would be cool to see the OpenCL version run on the CPU too. I wanna give that a try sometime. I hope H16 ships with Intel drivers with the ability to run OpenCL code on the CPU on a per node basis. Quote Link to comment Share on other sites More sharing options...
Guest tar Posted August 22, 2016 Share Posted August 22, 2016 (edited) 12 minutes ago, pusat said: I wanna give that a try sometime. I hope H16 ships with Intel drivers with the ability to run OpenCL code on the CPU on a per node basis. Would be super cool if you could be on the H16 beta! Hopefully @johneris showing your tests to the dev team there saying - 'we should hire Yunus'! Edited August 22, 2016 by tar Quote Link to comment Share on other sites More sharing options...
animatrix Posted August 22, 2016 Author Share Posted August 22, 2016 I would love to help SESI if I can Quote Link to comment Share on other sites More sharing options...
kiko Posted October 12, 2016 Share Posted October 12, 2016 I agree with symek that it is not a valid comparison because algorithm implementations are different. Quote Link to comment Share on other sites More sharing options...
prashantcgi Posted March 30, 2017 Share Posted March 30, 2017 Is there a example file available? Thanks Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.