Wednesday, March 30, 2016

Tuesday, March 29, 2016

[Links of the day] 29/03/2016: cache strategy and write avoiding algorithms + Stats intro ebook

  • Write-Avoiding Algorithms : when you have to deal with the CAP theorem, sometimes the best strategy is to avoid confrontation. In this case, avoid operations that trigger consistency transaction. This paper lokos into algorithm that tries to minimise write operations in order to minimise distributed coherence related operations and the associated benefits. 
  • FairRide : Paper looking into the possibility to deliver cache Isolation Strategy , Pareto Guarantee and Proofness Efficiency is hard. And it turns out it is actually not possible but you can get close enough. 
  • Intro Stat with Randomization and Simulation : free statistics intro ebook 

Monday, March 28, 2016

[Links of the day] 28/03/2016: Hierarchy of engagement, Latency measurement, TAO consistency at Facebook

  • The Hierarchy of Engagement : Another excellent Greylock partners's slides deck on how to leverage the Hierarchy of Engagement to fuel the growth of your company.The proposed hierarchy model has three levels: 1) Growing engaged users, 2) Retaining users, and 3) Self-perpetuating.
  • Measuring and Understanding Consistency at Facebook : paper summary of Facebook highly consistent DB : TAO. Interesting thing is that they have a hierarchical consistency model with synchronous cache consistency and asynchronous cache, DB/storage invalidation model.
  • How NOT to Measure Latency : in-depth overview of Latency and Response Time Characterization, including proven methodologies for measuring, reporting, and investigating latencies, and overview of some common pitfalls encountered (far too often) in the field

Saturday, March 26, 2016

PureStorage bring us one step closer to micro storage architecture

Pure storage just released its Flashblade product. It is an fabric connected object storage solution. It is a modular solution composed of a large numbers of blades which are each made of :
  • 8TB to 52TB raw NAND storage capacity : a lot but still take less than half the real estate space on each blade.
  • NV-RAM+supercapacitor write buffer : when your NAND is still too slow you want to have a persistent buffer of NVRAM to handle the bursts
  • ARM CPU + FPGA : to deal with the “low level” operations such as erasure code, etc..
  • 8 core Xeon System on chip : for moving the computation to where the data is located, pretty much all the high level operation such as NFS , S3 , object storage etc.. 
  • 40 Gbit ethernet : that s where the data gets out
  • PCIe fabric networking : in chassis solution linking compute, storage cards via a proprietary protocol, what’s interesting is that the system is self contained and scaling with other box goes through the 10 Gb/s connectivity and not a proprietary fabric link. Which implies that it doesn’t need exotic solution once you go past the box boundaries. This is great as it makes it easy (and cheap) to scale however I wonder what are the implication in term of performance once you start crossing boundaries.

What is interesting is that, when you look at Purestorage solution, they decided to integrate the high level compute aspect of storage directly with the low level one in a single blade. They ended of with an hybrid solution combining ARM and FPGA for low level aspect such as deduplication, erasure code. And the Xeon for the object storage and file system solution. 

One can assume that the decision behind such architecture was driven by the customers requirement that tend to want a high performance Jack of all trade solution. I can picture the product manager arguing for supporting every scale out storage protocol popular at the moment. However, Jack always end up master of none and to over compensate PureStorage had to pump up the compute capabilities.

While this seems like a good choice it is also counter productive in term of Watt per GB coupled with a lot of real estate wasted or duplicated. Don’t get me wrong, what Pure achieved with the flashblade is impressive but I can’t stop thinking that they should have taken it a step further.

This type of high performance, high-cost and high-power architecture technology is a right step toward micro storage architecture which delivers low cost low power high performance and scalability features. Now it is all about trimming down the system while maintaining scalability by dividing the blade system into a much larger number of smaller nodes, literally offering what the ethernet connected equivalent of HGST with flash.

However this might also implies that you won’t be able to offer support for every single storage solution out there (NFS, S3, block, etc..) without having to rely on either client side processing or using a frontend. This should be achievable while maintaining excellent performance, the key to this will hide in the detail of the core storage api employed.

Friday, March 25, 2016

[Links of the day] 25/03/2016: Scheduling with queuing theory, LLVM Assembler Framework, Erasure Coding at Azure

  • Efficient Queue Management for Cluster Scheduling : MS researcher look into introducing queue management techniques, such as appropriate queue sizing, prioritization of task execution via queue reordering, starvation freedom, and careful placement of tasks to queues for big data task cluster scheduling. 
  • Keystone : a indie gogo project to refactor LLVM to build a multi-architecture, multi-platform, open source assembler framework.
  • Erasure Coding in Windows Azure Storage : MS azure use Local Reconstruction Codes (LRC) for its storage. LRC greatly reduces the number of erasure coding fragments needed for reconstruction in case of failure/ offline data. The key benefit is a drastic reduction in I/O and bandwidth requirement for repairs while maintaining the storage overhead low.

Thursday, March 24, 2016

[Links of the day] 24/03/2016: Testing distributed systems, SDN OS, HW/SW for Storage Class Memory

  • Technologies for Testing Distributed Systems : testing distributed system is hard, and unit testing do not really cut it when it come to byzantine fault.. 
  • ONOS : Open Network Operating System (ONOS) is a software defined networking (SDN) OS
  • WrAP : Hardware and Software Support for Atomic Persistence in Storage Class Memory

Wednesday, March 23, 2016

[Links of the day] 23/03/2016: containers patterns, delta CRDTs, probabilistic DB

  • Container Patterns : WiP but promising documentation of containers patterns. Check v1.0 branch 
  • Efficient State-based CRDTs by Delta-Mutation :  instead of maintaining a full information in a CRDT the authors propose to use delta based messages in order to reduce storage and network space overhead.
  • BlinkDB : allows users to trade-ošff query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. Really cool, we start to see the emergence of probabilistic programming everywhere. We just have to get used to that like real life, computer programs can be more efficient when not everything is certain.

Tuesday, March 22, 2016

[Links of the day] 22/03/2016: VPN chart, Hyperloglog in real life, NLP

  • VPN comparison chart : if you need VPN with certain characteristics, this chart if for you. 
  • HyperLogLog in practice : google looked at hyper log log (one of my favorite probabilistic datatructure) and optimized it slightly, they came up with a improved version. Great paper with good insight on how to use HLL in real world and make it work even better.
  • Deep or Shallow, NLP is Breaking Out : natural langage processing is getting boost from deep and shallow learning. We already saw the result with Skype real time translation, Siri and Cortana. Its just a matter of time before we start to see more and more connected devices with NLP capabilities. 

Monday, March 21, 2016

[Links of the day] 21/03/2016: Twitter distributed file system, mechanical computer, broadband access

Friday, March 18, 2016

[Links of the day] 18/03/2016: RethinkDB FS and NSDI16 - Load balancer, reconfigurable fabric

  • Google Load Balancer : Google use a N+1 model for its load balancer vs the classical Active/passive model. Really nice low level network load balancing solution. 
  • XFabric : Reconfigurable In-Rack Network for Rack-Scale Computers
  • Regrid : a method of storing large files inside a RethinkDB database. Each file is stored as a series of binary chunks inside RethinkDB
  • NSDI16 : this year harvest of Usenix networking papers.

Wednesday, March 16, 2016

[Links of the day] 16/03/2016: userspace TCP stack, pebcak, neural net

  • mTCP : Highly Scalable User-level TCP Stack for Multicore Systems, really cool project released back in 2014. The really interesting bit is that they now support DPDK natively which allow for massive performance improvement while bypassing the kernel stack altogether. [github]
  • The problem is between the chair and the keyboard : how do you patch users ? A Taxonomy of Attacks and a Survey of Defence Mechanisms for Semantic Social engineering Attacks
  • Neural Networks Demystified : series of video explaining neural networks

Tuesday, March 15, 2016

[Links of the day] 15/03/2016: AWS Cross region fault tolerance, word of wisdom from AWS's CTO, #NetflixEverywhere Global Architecture

  • Build Fault Tolerant Cross-Region AWS VPC : how Rackspace deploy fault tolerant solution on top of AWS multi region using VPC 
  • 10 Lessons from 10 Years of AWS : words of wisdom from Werner Vogel , CTO of AWS 
  • #NetflixEverywhere Global Architecture : Qcon presentation from Netflix director of Operations Josh Evans, interesting bit is the focus on data replication cross data center ( or availability zones in this case). It seems pretty obvious that Netflix went the right way with dealing with scaling the resiliency of their product : start with the primitive then the data not the other way around. If the data is not available or consistent their is always a chance to fallback at a cost. While if the services are down, having the data available won't help.

Monday, March 14, 2016

[Links of the day] 14/03/2016 : Maths for social computing, Stats for engineers and SSH key distribution

Tuesday, March 08, 2016

[Links of the day] 08/03/2016 : Deeplearning google tech talk, NVMW 2016 workshop

Monday, March 07, 2016

[Links of the day] 07/03/2016 : Unikernel single address spaces, Breaking stuff in production, 10 years of workload scheduling

  • Single address spaces : unikernel approach to use a single address space can provide a great deal of speed up however it requires to rethink the way code works. Its quite attractive but as the article explain, their is trade off. To be honest, I think that unikernel will be great for the whole synchronous programming model (spdk-dpdk-pmem, etc..), where we have a swarm of small highly optimized service tight to specific hardware consumed by other service on more generic compute infrastructure.
  • Breaking Things On Purpose : if you are not ready to brak stuff in production , your product is not ready.
  • Borg, Omega, and Kubernetes - ACM Queue : 10 years of evolution of workload scheduling. 

Tuesday, March 01, 2016

[Links of the day] 01/03/2016 : DSSD , Datacenter design [book] and latent faults [paper]

  • DSSD : EMC released into the wild DSSD product (acquired last year). Quite a beast: all flash, 10 M IOPs, 100μs latency, 100GB/s BW, 144TB/5U. It use a PCIe fabric to connect the storage to the compute nodes, however I expect them to move soon to infiniband / omnipath fabric based on the talk they recently made.
  • Datacenter Design and Management: book that surveys datacenter research from a computer architect's perspective, addressing challenges in applications, design, management, server simulation, and system simulation.
  • Unsupervised Latent Faults Detection in Data Centers : talk and paper that look at automatically enable early detection and handling of performance problems, or latent faults. These faults "fly under the radar" of existing detection systems because they are not acute enough, or were not anticipated by maintenance engineers.
Rolex Deep Sea Sea Dweller (DSSD)