Friday, October 31, 2014

Links of the day 31 - 10 - 2014

Today's links 31/10/2014: TCP, kernel, NVMe, network fabric

Thursday, October 30, 2014

Links of the day 30 - 10 - 2014

Today's links 30/10/2014 : userfault, transaction, cloud frontend , virtkick

  • Phaser : phase Reconciliation for Contended In-Memory Transactions by Neha Narula at MIT [slides
  • Scaling Address-Space Operations on Linux with TSX : Thesis by Christopher Ryan Johnson on transacional memory and how these operations can be scaled within multicore systems. 
  • VirtKick : A simple orchestrator. Manage virtual machines or #docker [github]
  • Userfault : Andrea Arcangeli release the first RFC for page fault resolution in userspace. The interesting bit is the possibility to treat write and read fault differently. I can foresee some promising spin off from this project

Wednesday, October 29, 2014

Links of the day 29 - 10 - 2014

Today's links 29/10/2014: Ceph, storage, shingle and erasure code

Tuesday, October 28, 2014

Links of the day 28 - 10 - 2014

Today's links 28/10/2014: cloud trading, CERN, snapshot isolation, cloud infrastructure, RDMA to remote HW/device

  • Clock-SI : [paper summary] Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks
  • Cloud trading : 6 fusion blog post on their approach on trading cloud resources as commodities. 
  • CERN Cloud Infrastructure Report : a lot openstack, looks like they are trying to double the capacity every year. They are really at the bleeding edge of openstack production and stressing what can be done with it. 
  • Peer-Direct support : allows RDMA operations to directly target memory in external hardware devices, such as GPU cards, SSD based storage, dedicated ASIC accelerators, etc.

Monday, October 27, 2014

Links of the day 27 - 10 - 2014

Today's links 27/10/2014: #latex, #twitter anomaly detection toolkit, #git

Friday, October 24, 2014

Links of the Day 24 - 10 - 2014

Today's links 24/10/2014: Failure injection, Virtual Core, Distributed analytic engine , biological circuits
  • Kylin : eBay has released to the open-source community its distributed analytic engine designed to accelerate analytics on Hadoop and allow the use of SQL-compatible tools [ ebay blog post
  • FIT : how netflix use Failure Injection Testing to validate its code, architecture and everything else 
  • Virtual core : after risc, cisc here comes VISC. The VISC architecture is based on the concept of “virtual cores” and “virtual hardware threads.” This new approach enables dynamic allocation and sharing of resources across cores.
  • Rich Test Results by Google: aims to represent most of the data that can be produced by static and dynamic program analysis. 
  • Precise and programmable biological circuits : on step closer to biological computers with the aim of designing small circuits made from biological material

Thursday, October 23, 2014

Links of the day 23 - 10 - 2014

Today's links 23/10/2014: fast TCP socket, HFT , cuckoo filter and  sidechain for Bitcoin blockchain
  • Fastsocket : highly scalable socket and underlying networking implementation of Linux kernel rewrite. Nice performance and results, however its going to be hard to push within the upstream kernel. Might take a while if it ever happen.
  • 5th Annual Modeling High Frequency Data in Finance Conference : last year HFT conference slides
  • Cuckoo Filter - Practically Better Than Bloom : Performance wise, at the same level of space efficiency, insert speeds are better than a standard Bloom filter when the hash is mostly empty (low load), but significantly worse than a Bloom filter at high load when the hash gets filled and many 'evictions' are necessary for each insert. But if a tradeoff for space is acceptable (to stay at ~50% load), the Cuckoo Hash inserts much faster than its Bloom counterpart. Lookup speeds for existing keys are 2-3x the speed of a standard Bloom filter, and about 1.5x than the "Blocked" Bloom filter variant. Negative lookups (key not present) are about 2x faster than Bloom at high load, but about 3/4 the speed at low load. ( full summary on HN
  • Pegged Sidechains : Sidechains in theory could address lots of the scalability issues of Bitcoin and could accelerate the roll out of new features

Wednesday, October 22, 2014

Links of the day 22 - 10 - 2014

Today's links 22/10/2014: Twitter #NLP, Neural Turing Machine , Mesos, FB architecture

  • Tweet NLP : twitter natural language processing. It will help you understand and translate your teenage daughter tweet ( by Carnegie Mellon University)
  • Neural Turing Machines : combination of neural network approach with  Turing Machine or Von Neumann architecture by Google 
  • Mesos Deep Dive : architecture deep dive into Apache Mesosphere
  • FB architecture : Facebook and the KISS approach to datacenter

Tuesday, October 21, 2014

Links of the day 21 - 10 - 2014

Today's links 21/20/2014: all about #Linux #networking with a little bit of  #HPC distributed #storage

  • State of Linux network stack : what's new and interesting in the latest kernel release, especially the low-latency device polling
  • KVM Forum : all videos of this year KVM forum . Some interesting talk especially on the HPC front and an interesting quote from Vincent Jardin: " if you want to have high performance networking or NVF solution don't use virtualization use container"
  • RDMA and ARM : Mellanox bring its RoCE adapter to the moonshot project. Interesting to see what type of application would leverage such architecture combination: a lot of small processors with a fast fabric.  
  • IX : solution that isclose to achieve the holy grail of networking - Low latency with high throughput (line rate)
  • (Fast Forward) Storage and I/O : Distributed Application Object Storage (DAOS) by Intel for HPC solution. A lot of flash , burst buffer with Lustre for supercomputer. Very interesting approach to address the challenge of future exascale computing platform.

Monday, October 20, 2014

(Big) Data is a double edged sword

Previously, we looked at how not to fall into the mirage of unicorn hunting in your “big data” and why you should not delay too much in adopting data science techniques into your business operations. In this post we will look at why data can be both your best and worst enemy.

Data is a double edged sword.

The enterprise with the best data will greatly benefit from having a significant advantage over its competitors and consequently, enterprises should seek to amass as much data as possible. As we previously learned, an enterprise leveraging its own data allows it to gain a competitive edge on the chessboard. However, more often than not, enterprises are facing a big dilemma: who generates and then consequently who owns this precious information? Quite often most of it originates from the customer and in order to alleviate this issue and repatriate the precious data points back into the mothership, enterprises leverage the XaaS model.

These ”X” (anything) as a service products benefit consuming companies by lowering the cost of operations, reducing or eliminating CAPEX. It also to a certain extent provides data aggregation, market comparisons and a range of other useful capabilities. Whilst useful for lowering cost and product implementation / service delivery for the deployer, the real beneficiary is in fact the XaaS provider.
The provider can then leverage this information by monitoring the consumer behaviour and usage of its product in order to identify the spread of new successful innovations. This is basically what Amazon and others have been applying quite successfully over the past decade and is known as the Innovate - Leverage - Commoditise model (ILC). And in certain extreme case they enter market not to make money but simply to collect more data to drive other parts of their business.

As you can see, you have to control which data you need to keep and which you can leak or generate for a third party. Without this understanding, your enterprise business might end up being exploited as it just becomes a puppet within a bigger ecosystem which you do not own. In Fact, more often than not, the service provider is a wolf in sheep’s clothing: he presents himself as wanting to ‘help out’ but in fact and unfortunately, there is less collaboration and more exploitation driving his intentions.
Enterprises are therefore facing a dilemma and they have to adopt and consume XaaS in order to stay competitive, while trying to avoid leaking their innovation by feeding the ecosystem with more information. One efficient way to counter the later is to form their own ecosystem and leverage data from it which in turn enables them to partially workaround the enterprise’s inherent innovation limitations. However, this is often easier said than done.

The data gathered is as important as the data generated as this can either make or break an enterprise. Creating one’s own ecosystem to draw information from will quickly become critical as an enterprise cannot solely rely on a single source of information to stay competitive.
Maybe what we will begin to see in the near future is the emergence of information exchange or even data collectivism among enterprises (a behaviour triggered by collective prisoner's dilemma) in order to counterbalance the mastodons of data vacuuming, such as Google or Amazon. 

Links of the day 20 - 10 - 2014

Today's links 20/10/2014 : Deep learning, Distributed System Model Checking, Man in the middle SSL attack, Buffer Bloat Benchmark
  • Stanford Unsupervised Feature Learning and Deep Learning Tutorial : everything is in the title :) 
  • TLA : Domain specific language for verifying complex and concurrent system by Leslie Lamport.  
  • SSLsplit : tool for man-in-the-middle attacks against SSL/TLS encrypted network connections. 
  • RRUL test suite : buffer bloat benchmark suit for analyzing network performance under the heavy workloads that typically induce bufferbloat and other networking problems.

Wednesday, October 15, 2014

Links of the day 15 - 10 - 2014

Today's links 15/10/2014: Resiliency , CEPH , storage, RDMA, NVM

Tuesday, October 14, 2014

Links of the day 14 - 10 - 2014

Today's links 14/10/2014: Amazon monopoly question, Cloud, Velocity conf, OSv - CloudOS
  • It's not my Problem I'm renting them : Very good introduction to cloud. And also how cloud shield you from low level issue, ex: what about SSD wear issue - don't care i m renting them.  Scott Hanselman: "Virtual Machines, JavaScript and Assembler" Keynote - Velocity Santa Clara 2014 
  • Velocity conference : All videos of the excellent velocity conference.
  • Amazon Is Not a Monopoly vs Amazon Must Be Stopped : Compelling argument on both side but both completely ignore Amazon AWS and cloud strategy which implies they probably misunderstood a large portion of Amazon model. 
  • OSv : Optimizing the Operating System for Virtual machines using OSv cloud OS -  Usenix 2014

Friday, October 10, 2014

Links of the day 10 - 10 - 2014

Today's links 10/10/2014: #MachineLearning, #BigData, Distributed Computation
  • Recommender Problem : Under the hood of Netflix , recommendation system. Apparenty they  spend more than 150 M$ on their recommendation system. [slides]
  • ONYX : a cloud scale, fault tolerant, distributed computation system written in Clojure, for Clojure
  • Understanding Random Forests: good introduction on how to start from a set of measurements, learn a model to predict and understand a phenomenon

Thursday, October 09, 2014

Links of the day 09 - 10 - 2014

Today's links 09/10/2014: Scaling machine learning, bitcoin and bribes
  • Parameter Server : The parameter server makes it easy to scale machine learning algorithms by separating the problems of processing data and the problem of communicating and synchronizing them between different machines.[usenix paper
  • Bribes and Bitcoin : How to accept bribes with bitcoin and why country with high corruption index embrace faster than others.

Wednesday, October 08, 2014

Links of the day 08 - 10 - 2014

Today's links 08/10/2014: Power 8 , NUMA, VM flash cloning , state of bitcoin
  • Project Fargo / VMfork : flash cloning solution from VMWare, allow the very fast cloning of VM ( second vs minutes). VMware needs to push that if it wants to keep its hypervisor relevant compared to docker. 
  • IBM Power 8 server : Impressive beast with proprietary NUMA interconnect enabling to reach up to 192 core and 16 TB of RAM by linking 4 nodes together. 
  • State of Bitcoin Q3 2014 : adoption , usage, ATMs, investment is up but market cap is down ( probably due to the recent massive sell off ) 

Tuesday, October 07, 2014

Links of the day 07 - 10 - 2014

Today's links 07/10/2014: #bigdata , RDMA, InMemory everything, Log,  #Openstack Storage , unreliable network coding

Monday, October 06, 2014

Links of the day 06 - 10 - 2014

Today's links 06/10/2014: storage and docker, automation , and human interaction.
  • Storage Scalability in Docker : a look into how choice of storage affects scalability, container start up time, stability, supportability 
  • Boxen : automate deployment of development environment ( for Mac OSx mainly) - By the github crowed.
  • Talking to Human :  ( you can download the book for free here)

Friday, October 03, 2014

Links of the day 03 - 10 - 2014

Today's links 03/10/2014: Cloud regulations, OO Linux Kernel , scaling SSL, DBMS Architecture principle
  • Cloud computing/security regulations : by country mashed-up on a Google Map
  • BOOS-MOOL : Minimalistic Object Oriented Linux, its a redesign of the kernel with object oriented abstractions and C++ driver support will increase maintainability while reducing complexity of the kernel.
  • Scaling Universal : Cloudflare is able to reduce the CPU usage of Universal SSL to almost nothing.
  • Architecture of a Database System : paper presenting an architectural discussion of DBMS design principles, including process models, parallel architecture, storage system design, transaction system implementation, query processor and optimizer architectures, and typical shared components and utilities.
Evolving database landscape 

Thursday, October 02, 2014

Links of the day 02 - 10 - 2014

Today's links 02/10/2014: Binary Analysis Platform, scheduling , Thesis

Wednesday, October 01, 2014

Links of the day 01 - 10 - 2014

Today's links 01/10/2014 : consensus, #startup, #bigdata, #machinelearning

Intelligence cannot be commoditized

In the first posting, we saw how the enterprise world needs to be realistic regarding its expectations of data science tools. In this second posting, we will be looking at why it is still essential to embrace them sooner rather than later or else they run the risk of suffering dire consequences.

Intelligence cannot be commoditized

Companies must understand that they need to rapidly embrace data analytic methods as they are essentially the next stage of evolution in the enterprise toolkit for understanding and leveraging information to gain competitive advantages. While enterprises should not just jump and sprinkle “bigdata” everywhere, but they also should not wait too long to embrace these technologies as you cannot commoditize Intelligence.

Almost everything else is commoditizing over time, albeit at different pace but nonetheless it is evolving from custom solution to product and finally to utility. However, for data science tools, the opposite actually happens. Data and knowledge improve by use and accumulation of information and one cannot hope that it will be commoditized for the particular enterprise at some point.

In short, “thee” who has the best data (and uses it) wins. If you wait too much to collect and learn from it you are giving an advantage to your competitors that you may never recover from. By adopting such techniques early on, an enterprise will gain an edge over the rest of the actors within its ecosystem as they will now have to catch up. Moreover, very quickly, big-data - machine learning tools - analytic models will become widely available, as well as affordable - and these same tools will rapidly devolve to become another weapon in the market Red Queen’s race.

As you can see it can be rather dangerous to delay the adoptions of these novel methods within a given enterprise because of its intrinsic nature. In the following posting, I will look into the double edge sword that is the data itself.