Reflections Of The Void

Wednesday, March 30, 2016

[Links of the day] 30/03/2016 : Supercomputer facilities and ops, gpu compiler, escher etch a sketch

GPUcc : Open-Source GPGPU Compiler by google using LLVM [slides] [code]
Trinity Facilities and Operations : what does it take to plane and operate a supercomputer such as the Trinity at the LANL
Escher Sketch : ever wanted to do tessellation based wallpaper or just a pretty picture , now you can!

Tuesday, March 29, 2016

[Links of the day] 29/03/2016: cache strategy and write avoiding algorithms + Stats intro ebook

Write-Avoiding Algorithms : when you have to deal with the CAP theorem, sometimes the best strategy is to avoid confrontation. In this case, avoid operations that trigger consistency transaction. This paper lokos into algorithm that tries to minimise write operations in order to minimise distributed coherence related operations and the associated benefits.
FairRide : Paper looking into the possibility to deliver cache Isolation Strategy , Pareto Guarantee and Proofness Efficiency is hard. And it turns out it is actually not possible but you can get close enough.
Intro Stat with Randomization and Simulation : free statistics intro ebook

Monday, March 28, 2016

[Links of the day] 28/03/2016: Hierarchy of engagement, Latency measurement, TAO consistency at Facebook

The Hierarchy of Engagement : Another excellent Greylock partners's slides deck on how to leverage the Hierarchy of Engagement to fuel the growth of your company.The proposed hierarchy model has three levels: 1) Growing engaged users, 2) Retaining users, and 3) Self-perpetuating.
Measuring and Understanding Consistency at Facebook : paper summary of Facebook highly consistent DB : TAO. Interesting thing is that they have a hierarchical consistency model with synchronous cache consistency and asynchronous cache, DB/storage invalidation model.
How NOT to Measure Latency : in-depth overview of Latency and Response Time Characterization, including proven methodologies for measuring, reporting, and investigating latencies, and overview of some common pitfalls encountered (far too often) in the field

Saturday, March 26, 2016

PureStorage bring us one step closer to micro storage architecture

Pure storage just released its Flashblade product. It is an fabric connected object storage solution. It is a modular solution composed of a large numbers of blades which are each made of :

8TB to 52TB raw NAND storage capacity : a lot but still take less than half the real estate space on each blade.
NV-RAM+supercapacitor write buffer : when your NAND is still too slow you want to have a persistent buffer of NVRAM to handle the bursts
ARM CPU + FPGA : to deal with the “low level” operations such as erasure code, etc..
8 core Xeon System on chip : for moving the computation to where the data is located, pretty much all the high level operation such as NFS , S3 , object storage etc..
40 Gbit ethernet : that s where the data gets out
PCIe fabric networking : in chassis solution linking compute, storage cards via a proprietary protocol, what’s interesting is that the system is self contained and scaling with other box goes through the 10 Gb/s connectivity and not a proprietary fabric link. Which implies that it doesn’t need exotic solution once you go past the box boundaries. This is great as it makes it easy (and cheap) to scale however I wonder what are the implication in term of performance once you start crossing boundaries.

What is interesting is that, when you look at Purestorage solution, they decided to integrate the high level compute aspect of storage directly with the low level one in a single blade. They ended of with an hybrid solution combining ARM and FPGA for low level aspect such as deduplication, erasure code. And the Xeon for the object storage and file system solution.

One can assume that the decision behind such architecture was driven by the customers requirement that tend to want a high performance Jack of all trade solution. I can picture the product manager arguing for supporting every scale out storage protocol popular at the moment. However, Jack always end up master of none and to over compensate PureStorage had to pump up the compute capabilities.

While this seems like a good choice it is also counter productive in term of Watt per GB coupled with a lot of real estate wasted or duplicated. Don’t get me wrong, what Pure achieved with the flashblade is impressive but I can’t stop thinking that they should have taken it a step further.

This type of high performance, high-cost and high-power architecture technology is a right step toward micro storage architecture which delivers low cost low power high performance and scalability features. Now it is all about trimming down the system while maintaining scalability by dividing the blade system into a much larger number of smaller nodes, literally offering what the ethernet connected equivalent of HGST with flash.

However this might also implies that you won’t be able to offer support for every single storage solution out there (NFS, S3, block, etc..) without having to rely on either client side processing or using a frontend. This should be achievable while maintaining excellent performance, the key to this will hide in the detail of the core storage api employed.

Friday, March 25, 2016

[Links of the day] 25/03/2016: Scheduling with queuing theory, LLVM Assembler Framework, Erasure Coding at Azure

Efficient Queue Management for Cluster Scheduling : MS researcher look into introducing queue management techniques, such as appropriate queue sizing, prioritization of task execution via queue reordering, starvation freedom, and careful placement of tasks to queues for big data task cluster scheduling.
Keystone : a indie gogo project to refactor LLVM to build a multi-architecture, multi-platform, open source assembler framework.
Erasure Coding in Windows Azure Storage : MS azure use Local Reconstruction Codes (LRC) for its storage. LRC greatly reduces the number of erasure coding fragments needed for reconstruction in case of failure/ offline data. The key benefit is a drastic reduction in I/O and bandwidth requirement for repairs while maintaining the storage overhead low.

Thursday, March 24, 2016

[Links of the day] 24/03/2016: Testing distributed systems, SDN OS, HW/SW for Storage Class Memory

Technologies for Testing Distributed Systems : testing distributed system is hard, and unit testing do not really cut it when it come to byzantine fault..
ONOS : Open Network Operating System (ONOS) is a software defined networking (SDN) OS
WrAP : Hardware and Software Support for Atomic Persistence in Storage Class Memory

Wednesday, March 23, 2016

[Links of the day] 23/03/2016: containers patterns, delta CRDTs, probabilistic DB

Container Patterns : WiP but promising documentation of containers patterns. Check v1.0 branch
Efficient State-based CRDTs by Delta-Mutation : instead of maintaining a full information in a CRDT the authors propose to use delta based messages in order to reduce storage and network space overhead.
BlinkDB : allows users to trade-ošff query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. Really cool, we start to see the emergence of probabilistic programming everywhere. We just have to get used to that like real life, computer programs can be more efficient when not everything is certain.