Jump to content

Recommended Posts

Hi,

Thoughts on new threadrippers for houdini? 32 cores @3.0/4.2 or 16 cores @3.5/4.4 ? which one is better for hard FX in houdini?

Share this post


Link to post
Share on other sites

I would go with the threadripper 32core. You get nearly the double of performance. it won't help in all function but for rendering and big simulation it can help. 

Also the license cost is lower when you use houdini engine. If you are a indie NO BRAINER.

 

Share this post


Link to post
Share on other sites
On 9/8/2018 at 9:47 PM, Mandrake0 said:

I would go with the threadripper 32core. You get nearly the double of performance. it won't help in all function but for rendering and big simulation it can help. 

Also the license cost is lower when you use houdini engine. If you are a indie NO BRAINER.

 

Doesn't simulations benefit from higher siglecore speed? wouldn't it be better to have less cores and higher core speeds? thats what I have been reading in some forums for FX but I'm barerly new to houdini

Share this post


Link to post
Share on other sites

The 24/32 core parts are odd beasts. They're made of four 8 core modules (24 core has 2 cores disable per module). Unlike the server Epyc processor, only 2 of those modules have access to main memory (each have access to half of it via dual memory controllers), and the other two modules must hop through one of the mem-attached modules to get at main memory. So they'll scale well for low-bandwidth, high-compute workloads (rendering), but start to suffer in cases where memory bandwidth is important (massive sims). The 16 core part has 2 modules and each has access to half the memory.

AMD's designed the scheduling such that the modules connected to memory are populated with threads first, then the mem-isolated modules. I'm curious how that works for SMT (fill 16 threads on the mem modules, then 16 threads on the other modules, then populate the 33rd+ thread on the loaded cores; or load up the mem-modules to 32 threads first).

I'd be more tempted to go for the 16 core version (2950X) myself. Thread efficiency starts to drop off at high core counts as well, so you really don't want to be losing even more performance in your top 16 cores. The memory bandwidth issue would also make me think twice about launching multiple processes using a lower thread count too.

Pretty in-depth analysis here: https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review

Share this post


Link to post
Share on other sites
2 hours ago, malexander said:

The 24/32 core parts are odd beasts. They're made of four 8 core modules (24 core has 2 cores disable per module). Unlike the server Epyc processor, only 2 of those modules have access to main memory (each have access to half of it via dual memory controllers), and the other two modules must hop through one of the mem-attached modules to get at main memory. So they'll scale well for low-bandwidth, high-compute workloads (rendering), but start to suffer in cases where memory bandwidth is important (massive sims). The 16 core part has 2 modules and each has access to half the memory.

AMD's designed the scheduling such that the modules connected to memory are populated with threads first, then the mem-isolated modules. I'm curious how that works for SMT (fill 16 threads on the mem modules, then 16 threads on the other modules, then populate the 33rd+ thread on the loaded cores; or load up the mem-modules to 32 threads first).

I'd be more tempted to go for the 16 core version (2950X) myself. Thread efficiency starts to drop off at high core counts as well, so you really don't want to be losing even more performance in your top 16 cores. The memory bandwidth issue would also make me think twice about launching multiple processes using a lower thread count too.

Pretty in-depth analysis here: https://www.anandtech.com/show/13124/the-amd-threadripper-2990wx-and-2950x-review

Thank you for all the in-depth info Mark, didn't know any of this memory management, I will go with an 2950x OC. Thanks :)

Share this post


Link to post
Share on other sites

I'm not saying the 2990WX is bad, just that you should temper your expectations of a 32x speedup in rendering :)

Share this post


Link to post
Share on other sites

it takes me wonder when you make a simulation how much the data throughput will affect the calculation cycle. i have the feeling the most challenging part will be the correct setup for so many cores.

just read today that near all benchmark test could be wrong because there is a bug in nvidia driver that slows down massively when you have 64 threads and up specialy by games.  

https://translate.google.com/translate?sl=de&tl=en&js=y&prev=_t&hl=de&ie=UTF-8&u=https%3A%2F%2Fwww.golem.de%2Fnews%2F32-kern-cpu-threadripper-2990wx-laeuft-mit-radeons-besser-1808-136016.html&edit-text=&act=url

https://translate.google.com/translate?hl=de&sl=de&tl=en&u=https%3A%2F%2Fwww.forum-3dcenter.org%2Fvbulletin%2Fshowpost.php%3Fp%3D11769097%26postcount%3D60

 

2990wx is surly a good cpu but the biggest challenge is the software and the setup. there is also a older article that shows how much the tile size can affect the rendering in blender cycles:

https://www.blenderguru.com/articles/4-easy-ways-to-speed-up-cycles (under: 3. Change the Tile Size)

 

 

 

Share this post


Link to post
Share on other sites

The infinity fabric has lower latency than socket to socket. That said it's still very sensitive to memory timings and speeds. Unlike Intel processors, the Ryzen processors must have the timings dialed in and the processors respond better to higher to memory frequencies(the less memory you have the faster it needs to be). In the case of the 32 core Threadripper it can only be run in NUMA mode which windows is absolutely terrible at handling. Slow, improperly timed memory + bad NUMA + bad scheduling = performance regressions with a 32 core processor in windows. Having slow improperly timed memory will also hurt your performance in Linux, but at least you will have superior NUMA and scheduling.

 

 

 

  • Like 1

Share this post


Link to post
Share on other sites
On ‎11‎/‎10‎/‎2018 at 1:09 AM, AaronAb said:

The infinity fabric has lower latency than socket to socket. That said it's still very sensitive to memory timings and speeds. Unlike Intel processors, the Ryzen processors must have the timings dialed in and the processors respond better to higher to memory frequencies(the less memory you have the faster it needs to be). In the case of the 32 core Threadripper it can only be run in NUMA mode which windows is absolutely terrible at handling. Slow, improperly timed memory + bad NUMA + bad scheduling = performance regressions with a 32 core processor in windows. Having slow improperly timed memory will also hurt your performance in Linux, but at least you will have superior NUMA and scheduling.

 

 

Would that mean that Ram speed between DDR4 2400 and 3200 would be a high difference?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×