Friday, April 29, 2016

[Links of the day] 29/04/2016 : End of numerical error, 504 Eth Drive Ceph Cluster, Modern Storage architecture

  • End of Numerical Error : Really cool concept of encoding of floating point number. It seems really promising , faster, lighter and significantly reduce error rate.. Unums should probably the net default encoding of the future [Julia implementation]. Ps : the non associative property for floats is really scary.
  • Ceph cluster with 504 ethernet drives : Well, its start to happen, and it might quickly take over all these pesky storage cluster out there. 
  • Modern Storage Architectures : Intel devloper forum slide deck looking at the future of storage class memory and its impact on storage architecture.

Wednesday, April 27, 2016

[Links of the day] 27/04/2015 : Containers at CERN, Nvidia Tesla and Usenix Hotcloud16

  • Containers and Orchestration in the CERN Cloud : interesting talk looking at how CERN embrace containers on its cloud platform. And how they leverage Magnum which relies heavily on Heat for the orchestration part of the container clusters - called bays. I would be curious to find out how they scaled Heat and pitfall with this approach. 
  • NVIDIA Tesla P100 : Nvidia whitepaper on the next generation accelerator . Impressive spec, however I always feel that there is still significant room for making it easier to consume those type of ressource.
  • HotCloud16  : Usenix program is out, some interesting papers especially these two : 
    • Cloud Spot Markets are Not Sustainable: The Case for Transient Guarantees
    • Mlcached: Multi-level DRAM-NAND Key-value Cache

Tuesday, April 26, 2016

[Links of the day] 26/04/2015 : Quantum Bitcoin, Non-transactional distributed systems consistency and Flame Graph

  • Consistency in Non-Transactional Distributed Storage Systems : This paper tries to provide a comprehensive overview of different consistency notions that appeared in distributed systems with a focus on storage systems. The authors define 50 different consistency notions, ranging from linearizability to eventual and weak consistency which is subsequently mapped to different practical systems and research prototypes. 
  • Flame Graph : ACM queue article on the necessity of good visualization tool for profiling , debugging, and optimizing computing systems.
  • Quantum Bitcoin : In this paper the authors describe a Bitcoin-like currency that runs on a quantum computer using the non-cloning quantum property. While not everybody has (yet) the luxury to have a quantum computer, the solution presented offers several advantages over classical Bitcoin, including immediate local verification of transactions with full anonymity for the users, no transaction fees and the system can scale to any transaction volume. Now you just need a quantum computer :)

Monday, April 25, 2016

[Links of the day] 25/04/2016: Approximate computing benchmark suit, Awesome Network analysis and complex network book

Network of U.S. political blogs by Adamic and Glance (2004) (preprint)

Tuesday, April 19, 2016

[Links of the day] 19/04/2016: All CPU docs, Linux scheduling waste and probabilistic programming

  • Decade of Wasted Cores : the forever war of linux scheduler optimisation main victimes, your cpu cores.. Well not really but as always the jack of trade default config is a master of none which implies that as soon as you have specific workload you need to spend the time to optimise it and sometime it doesn't exist.. This paper look into the impact of the linux scheduler policies and design.
  • Pamela : Probabilistic Advanced Modeling and Execution Learning Architecture
  • Awesome CPU : All CPU and MCU documentation in one place

Monday, April 18, 2016

[Links of the day] 18/04/2016 : Persistent Memory file system and kernel, CS education

  • pmFS : persistent memory file system , not in development anymore but for anybody curious in persistent and storage class memory storage this would be good place to start 
  • Kernel persistent memory :  instruction for working with persistent memory code in Linux
  • CS education : compiled ressource of computer science class by google

Wednesday, April 13, 2016

[Links of the day] 13/04/2016: distributed system debugging tool, SRE book note, resilient ad serving at scale

  • Shiviz : tools for debugging distributed system at scale via helping visualization of logs generated throughout the cluster [repo]
  • Notes on Google's Site Reliability Engineering book : this provide a good overview of the content of the book and provide enough details that can be used to research each aspect / chapter independently.
  • Resilient ad serving at Twitter-scale : its interesting to see that their is a correlation between query latency and revenue  for ad serving. This stem from the fact that latency for answering an ad query is dependent of the number of participant in the auction,and obviously the more participant the higher the revenue. However with the increased latency the higher the risk is to time out and hence revenue loss. Twitter use an adaptative system in order to maximise revenue while maintaining resiliency (availability), scalability, resource-utilization.

Tuesday, April 12, 2016

[Links of the day] 12/04/2016: distributed systems class, docker for HPC and nice monitoring system

  • netdata : nice monitoring systems for a single node system
  • distributed systems class : a must read distributed system class by Aphyr (Kyle Kingsbury). It pretty much covers everything you need to know as an introductory class and give you enough pointer to then dig deeper into the wonderful world of distributed systems. Bu beware, like alice in wonderlands you never know how deep the rabbit's hole goes. 
  • Shifter : docker container seems to have have some limitation when it comes to run in HPC environment. By example the lack of capabilities to run on disk less node is a significant limiting factor. Shifter was created using docker technology with HPC as the main deployment environment in mind to specifically address these shortcomings.

Monday, April 11, 2016

[Links of the day] 11/04/2016: Rust Distributed K/V store, Consensus in Cloud and interactive service tail latency

After a short hiatus here are the links of the day back :

  • Target-Driven Parallelism : Microsoft researcher look into using prediction and correction to reduce tail latency in interactive services.
  • Consensus in the Cloud : a very good technical report on systems using Paxos and the advantage/disadvantage associated with its use.
  • tikv : Distributed key value store written in Rust, Use Raft to deliver consistency and scalability coupled with a nice georeplication capability. Cherry on the top: distributed transaction are supported similar to google spanner.