When designing our PL-Science workstation, we started with a decent idea of what would be best for this market, but, after some interesting conversations with our Intel partners, decided we might be on the wrong path.
The idea was simple with maximum cores and maximum GPU compute, surely that would be everything you needed for scientific workloads at the desk? After all, that’s what we tend to see in the data centre space. It turns out we were being a little bit simplistic.
Intel’s resident data science expert, David Liu, shared with us some key benchmarks and data points, as well as his own personal experience as a data analyst, and this really helped shape our scientific workstation into a system that excels in the data science workflow.
We realised that there were 3 key areas that we needed to get right: memory, CPU, and expandability. Let’s explore what these mean and how they aren’t always as simple as you’d expect.
Data scientists are working on increasingly large sets of data. Having the ability to deal with this huge amount of data in full is a key requirement and helps increase workflow by being able to test against the entire data set before cleaning and compressing. We’ve been able to squeeze up to 3TB of RAM in this workstation, which is impressive in its own right, and we also ensured we included Intel Optane DCPMM compatibility. This means you can get 3TB of RAM with a DDR4/Optane mix, significantly cutting the cost of such a high amount of RAM whilst minimizing performance losses.
More cores are not always better. Various benchmarking data shows that the sweet spot for most scientific workloads exists around the 18 core Xeon W and Xeon Scalable SKUs. This is due to how AVX-512 frequency scales with core counts. More cores mean more power and heat, resulting in lower AVX frequencies.Combine this with numerous deep learning hardware accelerations and the various accelerated open-source frameworks built on by Intel, such as Intel One API and the Intel MKL, and you have the perfect CPUs for data scientists.
All that memory needs data to fill it, and as this data is usually provided on a HDD for security and locality reasons, having simple hot-swap capability is a must. Once you have your data and some models to test against, you need to be able to install the relevant GPU or FPGA for your use case before you upload to your production server. What if a new project has different requirements and needs a different GPU? Being able to easily swap various cards out drastically increases workflow and deployment speeds.
Ready to accelerate your data science workflow?
Researching and designing this workstation was an exciting journey that has led to a product we are really proud of. Combining all the needs of a data scientist into a single powerful and quiet workstation was not as simple as we first thought - but nothing good is ever simple.
Get in touch today to find out more about how a PL-Science workstation can help you accelerate your data science workflow.