Tuesday, October 11, 2016

Notes on SNIA Storage Developer Conference 2016

This year SNIA Storage Developer Conference, chosen bits : 
  • MarFS : scalable near-POSIX file system using object storage. What is really impressive is that MarFS is part of a 5 tiers storage system of the trinity project. Yes FIVE tiers, RAM -> BurstBuffer -> Lustre -> MarFS-> Tape. MarFs seats above Tape for long term archival and aim at providing storage persistence that span year(s) of usage. In comparison Lustre just above aim at keeping the data for weeks only. What bother me is the logic behind this approach as most Supercomputer system have a 5-6 year lifespan. This implies that the project usage will span multiple generation of systems. [Github]
  • Hyperconverged Cache : It seems that Intel start to realize what we discovered years ago in the Hecatonchire project. Once you start to have near Ram performance, dis-aggregating and pooling your ressource becomes the natural next step for efficiency. And this is what they aim to achieve with a distributed storage cache system that would aggregate their 3dxpoint system across a cluster in order to deliver fast and coherent cache layer.  However without RDMA this approach seems a little bit pointless. The only things that seems to save them is that the cloud storage backend ( Ceph ) has a big enough latency gap they can exploit. 
  • Erasure Code : Very good overview of modern erasure code and their trade-offs. As always no code are equal but not all use case are the same. 

Persistent Memory :  As storage shift away from HDD to Pmem , the number of talk around persistent memory exploded this year. The main focus seems to shift from pure NVM consumption to remote access model. 
  • NVMe over fabric : two talk on the recent progress of NVMe over fabric. Nothing really new there, just that it seems that it will be the standard in remote storage access in the near future. [Mellanox] [Linux NVMf]
  • RDMA :  It seems that Intel and other are aiming for direct Persistent memory access using RDMA, bypassing the NVMe stack. The idea is to eliminate the latency from the NVMe stack. However this require some change in the RDMA stack in order to guarantee persistence of data. 
    • IOPMEM  : interesting work where the author propose to bypass CPU interaction between PCIe devices. Basically enabling DMA between NVM and other devices. It then allows RDMA NIC to directly talk to the NVM device on the same PCI switch. However it doesn't really explain what persistence guarantee are associated with the different operations.
    • RDMA verbs extension : basically Mellanox propose to add a RDMA flush verbs that would mimic the CPU flush command . This operation would guarantee consistency and persistence of remote data. 
    • PMoF : address the really difficult aspect of guaranteeing persistence and consistency of accessing persistent memory over fabric. Basically this talk describe all the nitty gritty detail to avoid losing/corrupting data during access over over fabric. This is what the RDMA flush verb will need to address but for the moment require a lot of manual operation. 
Last but not least we can see reference here and there to 3dXpoint from Intel however it seems that the company tuned down its marketing machine. Probably fearing some backlash because of the  continuous claw-back on claimed performance front.