Understanding and explaining really new things is hard, I think that is why I’ve struggled to put this article together. What I saw from PlexiStore at TFD11 is that kind of new. That reminds me, I saw PlexiStor at a Tech Field Day event, so please refer to my TFD disclosure. Now let me try to lay it out as simply as possible.
What Does PlexiStore Do?
It makes lots of persistent memory available to a Linux server. Great for any application that needs really fast storage. PlexiStore produces the Linux Kernel module software presents the persistent memory. This is memory that is available to applications but is not lost when the server crashes or reboots. Plexistore can also mirror this persistent memory between servers or simply back it up to a remote server’s disk or SSD storage. They also have a tiering model where flash storage can pretend to be persistent RAM, delivering a huge capacity of persistent memory. This large amount of persistent storage can be used by in-memory databases or analytics tools that require very fast storage. For disk based applications the application’s data files can be mapped into the persistent memory using a standard POSIX mapping command. For the RAM based applications, the application will need to be modified to accept persistent memory. The physical server also needs to support persistent memory. This is not something that you will simply buy off the shelf and deploy in a week. My guess is that some mega-scale (not quite hyper-scale) cloud business will use this kind of heavily engineered system.
One other interesting architecture that PlexiStore discussed was placing the persistent memory and flash storage in a separate box called a brick. Then using their software to mirror a physical server’s RAM to the box. This way the physical server running the application doesn’t need to support NVDIMMs, it just needs the Plexistore kernel module to repopulate the local RAM before the application is started. My guess is that this model will win. A custom box with lots of NVDIMMs and NVME flash will be easier to support. Limiting your application server hardware choice to just systems that support NVDIMM will be an issue. Especially if NVDIMM sizes don’t increase to match standard DIMM sizes. The challenge will be reducing the latency for the replication to the brick.
Here ends the Plexistore discussion. What follows is a bit of background on the underlying technologies and concepts.
What is Persistent Memory?
Persistent Memory is RAM that doesn’t lose its contents when the physical server reboots, or crashes. Generically the physical devices are called NVDIMMs. At its basic form there are RAM DIMMs which also have some flash and capacitors. When the system is shut down or loses power the RAM contents is copied to the flash. When the system starts up the RAM is refilled from the flash. Data written to this type of RAM is persistent, while data written to normal RAM is not. This allows RAM to be used as a tier of storage rather than as a cache, so application writes can complete much faster. The big limitation with these NVDIMM-N devices is capacity. Each DIMM must have both flash and RAM on the DIMM, plus a capacitor to power the transfer from RAM to flash. All this hardware takes up space and so capacity is limited, most vendors have products in the 8GB range.
The other kind of persistent memory is the flash based NVDIMM-F, like the Diablo Memory-1 DIMM. This is a 400GB DIMM that has only flash, reads and writes go directly to this flash. Since it’s a flash device I expect it to be block addressable, not byte addressable like the NVDIMM-N. The latency on the flash is also higher than on the RAM based NVDIMM-N devices.
There is a new type of persistent memory on the way, it is Intel’s 3D X-Point. This technology is supposed to be a little slower than RAM, but cost a lot less. I suspect that Plexistore is mostly about learning how to use this technology before it has been released. Then when Intel does release Plexistore will be the leaders in a new segment.
Bytes vs Blocks
One of the other characteristics of memory is that it is “byte addressable.” This means that it is easy to overwrite just a single byte of data. In reality it is word addressable, on a 64bit system the smallest unit to overwrite is 64 bytes in size. The next size up is block addressable disks that use a 512 byte sector. For a block addressable system that entire sector must be overwritten. By the way newer disks use a 4096 byte sectors. Then a RAID array will use a larger size, anywhere from 64K to 4MB is common. Using flash doesn’t get away from larger writes. Efficient storage systems will write entire flash blocks at a time, usually 16KB in size. Writing smaller units to flash storage reduces the lifespan of the device, or forces you to use a more expensive flash device with more endurance. These are larger amounts of data and therefore part of the larger write latency is moving more data around.
Having byte (word) addressable persistent storage is very unusual.
- RAM is usually tens of nanoseconds, under 0.1uS.
- NVME flash is usually around 100 microseconds, 100uS
- SSD flash is usually a few hundred microseconds, 500uS
- Hard Disk is usually a few milliseconds, 5,000uS
It is obvious that using persistent RAM for storage is going to speed up applications a lot. At least the applications that are limited by storage performance. With these sub-microsecond latencies, the storage must be inside the server. Any kind of network access will add tens of microseconds, possibly hundreds of microseconds.
© 2016, Alastair. All rights reserved.