teresuac Posted January 15, 2020 Share Posted January 15, 2020 (edited) Hello, I'm new to HDK and c++. It's at the moment for some learning purpose. I've succefully made a first node to advect some data inside a volume by a velocity volume. I've made a version with the THREADED_METHOD2 to multithread my calculation on each voxel but inside Houdini my node is slower than the first version. ( I've checked and it used all my processors and only one in first version). I guess it's possible to gain some performance on a volume so it seems I'm not doing it the good way. Here are some samples of the code : single threaded version : for (int idx_z = 0; idx_z < voxel_size[2]; ++idx_z) { for (int idx_y = 0; idx_y < voxel_size[1]; ++idx_y) { for (int idx_x = 0; idx_x < voxel_size[0]; ++idx_x) { UT_Vector3 pos; first_volume->indexToPos(idx_x, idx_y, idx_z, pos); // first_volume = GEO_PrimVolume* for (int i = 0; i < substeps; i++) { float velxv = vvelx->getValue(pos); // vvelx = GEO_PrimVolume* float velyv = vvely->getValue(pos); float velzv = vvelz->getValue(pos); UT_Vector3 advect(velxv, velyv, velzv); pos += advect * amplitude/substeps; } float val = first_volume->getValue(pos); volume_handleW->setValue(idx_x, idx_y , idx_z, val ); // volume_handleW = UT_VoxelArrayWriteHandleF } } } multithreaded version : class VOLADVECT { public: THREADED_METHOD2( // Construct two parameter threaded method VOLADVECT, // Name of class true, // Evaluated to see if we should multithread. advect, // Name of function int, substep, // An integer parameter named substep float, amp) // A float parameter named amp void advectPartial(int substep, float amp, const UT_JobInfo &info); int sizex, sizey, sizez; UT_VoxelArrayWriteHandleF volume_handleW; GEO_PrimVolume* volume_data ; GEO_PrimVolume* volume_velx ; GEO_PrimVolume* volume_vely ; GEO_PrimVolume* volume_velz ; int myLength; }; void VOLADVECT::advectPartial(int substep, float amp, const UT_JobInfo &info) { int i, n; UT_Vector3 voxel_size(volume_handleW->getRes(0), volume_handleW->getRes(1), volume_handleW->getRes(2)); sizex = (int)(voxel_size[0]); sizey = (int)(voxel_size[1]); sizez = (int)(voxel_size[2]); myLength = sizex * sizey* sizez; for (info.divideWork(myLength, i, n); i < n; i++) { int idx_x = i % sizex; int idx_y = (int)(i / sizex) % sizey; int idx_z = (int)(i / (sizex*sizey)); UT_Vector3 pos; volume_data->indexToPos(idx_x, idx_y, idx_z, pos); for (int k = 0; k < substep; k++) { float velxv = volume_velx->getValue(pos); float velyv = volume_vely->getValue(pos); float velzv = volume_velz->getValue(pos); UT_Vector3 advect(velxv, velyv, velzv); pos += advect * amp / substep; } float val = volume_data->getValue(pos); volume_handleW->setValue(idx_x, idx_y, idx_z, val); } } They both work but as I'm learning it, I guess I'm not doing it the right way as I don't have better performance in the multithreaded version. I know vex or opencl sop would be easier and at least the same speed or faster in this example. If someone knows a better way or have some advices. thanks ! Edited January 15, 2020 by teresuac Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.