Guest tar Posted April 5, 2016 Share Posted April 5, 2016 Pascal 16 bit compute looks super nice @ 22TF. Wondering if this might be able to be used in H15+ as a faster OpenCL path? Assuming the tradeoff is some stability & accuracy but error reduction might be able to be mitigated in 32 or 64 buffers. Quote Link to comment Share on other sites More sharing options...
malexander Posted April 5, 2016 Share Posted April 5, 2016 Possibly, if you don't mind added jitter It'd have to be restricted to values with [0,1] or [-1,1] ranges, like colors or normalized direction vectors. Problem is you're back to 32b as soon as you do a matrix multiply. I think fp16 is mostly for image processing and machine learning (which Nvidia is pumping hype for right now). Quote Link to comment Share on other sites More sharing options...
Guest tar Posted April 5, 2016 Share Posted April 5, 2016 Haha - jitter wouldn't be too much fun but it does sound good for Nuke / Cops acceleration! Quote Link to comment Share on other sites More sharing options...
malexander Posted April 6, 2016 Share Posted April 6, 2016 Yeah, only $129K for a fast compositing station Where fp16 really shines is in data movement. It effectively doubles your bandwidth when fetching that data compared to fp32, and often that data transfer is the bottleneck in these massive compute engines. That's probably more advantageous than the double-rate compute rate - and it's been possible to pack/unpack fp16 data since almost the dawn of shaders Reading a bit more about Pascal, it achieves this rate by processing 2 fp16 ops with one fp32 ALU, which probably means that you need vec2 or vec4 fp16 data to truly take advantage of it (SIMD). Scalar data wouldn't likely see a boost, and vec3 data a moderate one. Quote Link to comment Share on other sites More sharing options...
Guest tar Posted April 6, 2016 Share Posted April 6, 2016 (edited) Nice investigations! Reading around, most people seem disappointed with Pascal; only 2x the performance instead of the fabled 10x. Moving forward, hopefully it will be possible to process sims across OpenCL devices soon, especially if we NVLink the cards. Foundry now allow two equal cards to do compute in NukeX 10. Edited April 6, 2016 by tar Quote Link to comment Share on other sites More sharing options...
malexander Posted April 7, 2016 Share Posted April 7, 2016 I believe Nvidia stated in their earlier roadmap slides that it was a 10x improvement in performance per watt, not just performance. And I believe that was against either Kepler or Fermi, not Maxwell. The best thing about Pascal, IMO, is true virtual memory with page faulting. That means it should no longer run into "Out Of Memory!" issues with large datasets - it should be able to page bits of them out to main memory and pull other pages in. As anyone who's tried serious OpenCL sims probably knows, not being able to finish the compute is really bad This should allow huge sims to finish, though slightly slower due to the PCI-Ex paging transfers. Quote Link to comment Share on other sites More sharing options...
Guest tar Posted April 7, 2016 Share Posted April 7, 2016 That's a killer feature! HBM2 size appears more limited currently, 16GB on the tesla, hopefully we get more! Pascal is good in my book. Cheers! Quote Link to comment Share on other sites More sharing options...
malexander Posted April 9, 2016 Share Posted April 9, 2016 16GB used to be restrictive for large sims, but now it's just "fast memory" - and that's a lot Quote Link to comment Share on other sites More sharing options...
malexander Posted May 7, 2016 Share Posted May 7, 2016 Consumer versions out at the end of May! GEForce 1080, 8GB GDDR5X, May 27 ($600) GEForce 1070, 8GB GDDR5, Jun 10 ($380) Significantly boosted clocks speeds compared to the 900 series (double, to 2.1GHz) but with similar power draw. Dropping to a 16nm process (from 28) really helps with power consumption. Performance claims by Nvidia put the 1080 just above TitanX levels. ARSTechnica article 1 Quote Link to comment Share on other sites More sharing options...
Guest tar Posted May 7, 2016 Share Posted May 7, 2016 (edited) Looks nice! It's a good upgrade from the 980/70 - 8GB is also much better for 4K monitor/s, sims and 4k comping. Hoping the Titan and Ti version with lots more ram will be out in the following months too. As GPUs should be hitting the same manufacturing limitations as CPUs over the next few years, future work that could help Houdini for GPU compute would be using less particles for deep water Flip(narrow band), adaptive grid where more detail is required(Space X), and heterogeneous compute(Foundry HPC). That should be a good 5 years of R&D! Along these lines just put an Rfe in to have OpenCL GPU sims that fill up the GPU memory, to automatically overflow to CPU ram instead of failing. Ref #75317. Edited May 9, 2016 by tar Rfe Quote Link to comment Share on other sites More sharing options...
malexander Posted July 25, 2016 Share Posted July 25, 2016 Just a quick update on the non-Tesla Pascal cards: The GEForce 1080, 1070, 1060 and the new pascal-based Titan X (no "GTX" to differentiate from the Maxwell-based GTX TitanX) and the new Quadro P series. FP16 compute on these cards is horribly crippled, even worse than FP64. For every shader module of 128 FP32 units, there are 4 FP64 units (1/32) and a single vec2 FP16 unit (1/64 rate) on the GP102 and GP104 that powers the GEForces, Quadros, and Titan X. Basically they are only there for debugging programs to run on the Pascal-based Teslas. Quote Link to comment Share on other sites More sharing options...
Guest tar Posted July 26, 2016 Share Posted July 26, 2016 ...that's pretty terrible, pity they didn't exploit it more. Thanks for the update! Quote Link to comment Share on other sites More sharing options...
pezetko Posted July 26, 2016 Share Posted July 26, 2016 So new GP102 and GP104 "Pascal" cards are worse then their Kepler predecessor for OpenCL sims? So what about (new) Radeon Pro WX and Radeon Pro Duo cards? http://wccftech.com/amd-radeon-pro-wx-7100-workstation-card/ This could be better $/performance then Nvidia. As Tesla P100 prices wasn't revealed yet even Radeon Pro SSG looks interesting. Quote Link to comment Share on other sites More sharing options...
malexander Posted July 26, 2016 Share Posted July 26, 2016 Maxwell and Kepler didn't have fp16 compute, the new pascal-based Telsa card (GP100) was the first to introduce it. The reason cited for fp16 compute was for "deep learning" applications. It basically allowed for twice the throughput of FP32 operations. The pascal-based Quadros and GEForces have the same 1/32 FP64 rate as their Maxwell predecessors, so at least in that aspect they're the same. As for fp16 compute, you wouldn't be able to run it on Maxwell or Kepler cards - or at least it'd be FP32-emulated. It would have been nice to have since HDR color is a good candidate for fp16 compute, but fp32 will do I guess (just not at double rate that full fp16 support would have had). The Radeon Pros are based off the same Polaris 10 architecture as the Radeon 480, putting them right about at the GEForce 970-980 mark in terms of CL performance. Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.