HPC Scale Out I/O2018-07-24T01:29:24-07:00

Holodeck HPC Scale Out I/O — Video with Transcript

Shiv Sikand, Founder & EVP, IC Manage (edited transcript – DAC 6.25.18)
See Demo

Extreme File Performance through P2P Caching

We introduce Holodeck scale-out I/O that delivers extreme file performance.

There is a standard Intel strategy to blur the lines between file I/O and memory I/O. So, we use NVME, but instead of building file systems out of NVME, we make your local filesystem look like a cache.

And we provide a shared storage model for that cache using peer-to-peer. This enables high-performance compute in the cloud for EDA tools, and by using peer-to-peer between multiple readers and writers, we can now do everything in parallel.

Most filesystems today are serialized. There are alternative parallel filesystems that have been used in HPC compute for a while but they’ve been optimized for a different kinds of workload, which is typically high concurrent I/O — which is not the way EDA works.

So what we do is that we use flash as a cache, not as a storage, and it slashes your costs because using small 1 terabyte or 2 terabytes of NVME per node, you can now get extreme file performance.

3 Steps to Hybrid Cloud Bursting

We offer three simple steps to hybrid cloud bursting. Some people have said to me, ‘Wow I don’t believe this’, and I tell you if you don’t believe it come upstairs to booth 2618, we’ve got a nice car there for you to look at and we’ll show you how this occurs, and how quickly it occurs.

  1. On one of your compute nodes, you simply register your filer mount points and your namespaces to a little agent.
  1. That agent then allows you to project your entire corporate data in the cloud. So you type “ls” and every mount that exists, everything that you’ve registered on the left hand side is suddenly available. And it doesn’t take very long because we only transfer the data on-demand so the first “ls” is just the top level. The second “ls” is a little bit deeper. You could do recursive, and it just pulls it all on-demand.
  1. Once the virtual environment has been created — and the reason you create the virtual environment is so the application sees everything that it needs. It sees it’s includes, it sees it’s configuration files, it sees all the things it needs to see. But they’re all virtualized, and as they’re accessed we pull them over the wire.

Holodeck Scale Out I/O Benefits

What we’re able to do is to really provide a scale-out solution for I/O. Because we’re bringing files to the local compute node. We are not going over the network to create a remote file system. We are bringing those files locally through this on-demand caching onto the local NVME.

This gives you a lot of I/O. In fact, more than you know what to do with. A standard Amazon instance for example with two NVMEs can give you 2 GB/sec of I/O and something like a C5D18x large can give you 10 GB. This is per node. So unlimited aggregate throughput — this is true scale-out.

But more importantly it’s very, very cheap because were not storing your data there — it’s transient. We have the job, we run the job, and we get another job in there.

And storage happens in the background. We’re very heavily focused on job throughput, job execution, and speed of completing these jobs. Storage is something that happens in the background asynchronously and we can write it back to wherever you want; we don’t care, we’re totally agnostic to it.


Q: In my specific use-case we have ~ten thousand jobs out of a single volume that’s going to also depend on other 50 NFS volumes. So, I don’t see how local cache can help in this case, so how would you address this situation?

Shiv Sikand:  The way the cache works is it creates a virtual filesystem that has all the files that you need from a conceptual basis. So, if you need to access those files, they are simply loaded on-demand.

Most applications will do a whole bunch of reads, and at those points the reads are finished, you can expunge the cache, and you can bring in newer objects because the way most applications tend to work is that they do reads, they load a memory representation, and they do a bunch of writes.

[With IC Manage Holodeck] All of those are available because we support a full peer-to-peer architecture to bring those files in wherever they may be distributed across your network.