Reflections Of The Void: 2014

Tuesday, December 23, 2014

Links of the day 23 - 12 - 2014

Today's links 23/12/2014: Erasure code library and #cloud #storage that use them ( #swift , #ceph ), warehouse scale computing and scheduling

Erasure code library : if you need a refresher on the subject [ erasure code for storage ]

liberasurecode : Erasure Code API library written in C with pluggable Erasure Code backends.
Jerasure : Too bad that Streamscale decided to shut down this project using patent coercion methods [project homepage with announcement].. But the github is still UP :)

Cloud storage using Erasure code:

Swift : interesting that they chose AP over C ( CAP theorem ) and use liberasurecode
Ceph : use jerasure, i wonder what are going to be the implication of the partial amputation of the project on CEPH performance

Next Gen Erasure code : erasure code are mapped deterministicaly using a global namespace scheme eliminating the need for accessing metada repository hence speeding up
A few control issues in warehouse-scale computing : John wilkes (Google) #cloud Control Workshop, London, UK December 2014
Scheduling for Large Scale Systems : 9th Scheduling for Large Scale Systems Workshop - July 1-4, 2014 - Lyon, France

Monday, December 22, 2014

Links of the day 22 - 12 - 2014

Today's links 19/12/2014: #azure postmortem, #machinelearning and programmer resources list, #robot knowledge engine

Root Cause Analysis : Nov 18 Azure Storage Service Interruption detailed postmortem.
10 Technical Papers Every Programmer Should Read (At Least Twice) : some good paper , basically the minimum paper list that any CS student should have read by the end of its university cycle.
200 machine learning and data science resources : looks like everybody love to make list when the year comes to an end.
Robot brain : Knowledge Engine For Robots

Thursday, December 18, 2014

Links of the day 18 - 12 - 2014

Today's links 18/12/2014: #devops infra provisioning model, #blockchain Distributed apps stack, anonymous group communication

7 Layer DIP : 7 layer osi like model that decompose the DevOps Infrastructure Provisioning environment.
Eris : blockchain based Distributed Application Software Stack that aim at enabling design, test, and deployment of distributed applications which are as flexible, user-friendly, and legally compliant as they are secure.
Dissent : accountable anonymous group communication delivering prractical anonymous group communication system offering strong, provable security guarantees with reasonable efficiency

Wednesday, December 17, 2014

Links of the Day 17 - 12 - 2015

Today's links 17/12/2014: HP's machine, #datacenter, #bigdata, stream computing, #bitcoin dev guide

Tigon : open-source, real-time, low-latency, high-throughput stream processing framework from Cask Data, Inc. and AT&T . Interesting to see how now company are all releasing open source components.. Open source projects are now just another economic weapon on the corporate arsenal.
Bitcoin Developer Guide : detailed information about the Bitcoin protocol and related specifications
The Machine: HP datacenter scale computer based on memristor technology
The HP Memristor Solution for Computing Big Data : Stanley Williams lecture on HP "machine"

Tuesday, December 16, 2014

Links of the day 16 - 12 - 2014

Today's links 16/12/2014: VRAM fs, Mesh Network, #NuPic , free courses, #Machinelearning

VRAMfs : video ram file system.. Now you know what to do with all your old graphics card laying around.
Freedom Layer : quest for a scalable, secure and distributed mesh network.
NuPic : Numenta is ramping up its community investment

2014 Fall Hackathon
NuPIC Studio : all-in-one tool that allows users create an HTM network from scratch, train it, collect statistics
Jeff Hawkins Hackaton Kickoff video

A Master List of 1,100 Free Courses From Top Universities : 1100 free online courses from the leading universities

Monday, December 15, 2014

Links of the day 15 - 12 - 2014

Today's links 15/12/2014 : Code Monkeys, Cache Monitoring, Distributed system design and Elliptic curve crypto for Rust

Not Just Code Monkeys : Martin Fowler keynotes on the importance of building a healthy social environment where software development can thrive.
Intel Cache Monitoring : enables threads, applications, VMs or any combination to be tracked simultaneously in a flexible manner to suit a wide variety of software usage models. with some really nice tools and especially KVM support
Introduction to Distributed System Design : Google Code University distributed system lecture and notes.
Elliptic Curve Crypto : pure rust implementation , now you can have identity based crypto solution in pure rust.

Friday, December 12, 2014

Links of the day 12 - 12 - 2014

Today's links 12/12/2014 : #python document DB and pipelines / batch jobs orchestration, RAM #Cloud

BitzDB : document-oriented database for Python
Luigi : Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in. From spotify
ARC : Analysis of Raft Consensus, in depth technical report analyzing the raft consensus by Heidi Howard
RamCloud : this is a really interested project i reported about a couple of time already they seem to move to the next stage of the project which is probably more about productization than research.

Thursday, December 11, 2014

Links of the day 11 - 12 - 2014

Today's links 11/12/2014: OS conf, #Unikernel : #Rump , #jitsu , atlas #devops, bgp image format

New Directions in Operating Systems conference: video slide and audio of the 2014 conference, selected talk :

Rump Rump kernels and {why,how} we got here
Rumprun : unikernel for Posix application on top of rump
jitsu : Just-in-Time Summoning of Unikernels
Userland Networking and Userspace networking : when the network stack move out of the kernel

BPG : Frabice Bellard is a new image format based on HEVC and supported by most browsers with a small Javascript decoder. You can see the comparison with other format here
Atlas : bridge the gap between the devops split by bundling all 5 Hashicorp tools into a single SaaS solution. Really nice toolkit.

Wednesday, December 10, 2014

Links of the day 10 - 12 - 2014

Today's links 10/12/2014: #openstack , VM placement , #Cloud , #HPC and transactional app delivery and update from #Ubuntu

Snappy : Ubuntu Core, with a new mechanism for transactional app delivery and system updates ( "snappy" ). This is in direct competition with rpm-ostree from project atomic or NixOS
Ironic + Crowbar for openstack : another step toward making bare metal provissioning as easy as provisioning VM
Job packing in warehouse-scale computing : more VMs or jobs in more space .
HPC aware VM placement : VMs packing for HPC use case

Tuesday, December 09, 2014

Links of the day 9 - 12 - 2014

Today's links 9/12/2014: #quantum computing, #LLVM conf, #HPC fault tolerance

Future Of Quantum Computing : Vern Brownell (CEO, D-Wave) insights in quantum computing and where it is headed.
LLVM Developers' Meeting : video and slides of the 2014 conference
Fault-tolerant Techniques for HPC : Tutorial on Techniques and practical use of fault tolerance approach for large scale HPC systems.
Failures at Petascale : Lessons Learned From the Analysis of the peta scale Blue Waters HPC system

Monday, December 08, 2014

Links of the day 8 - 11 - 2014

Today's links 8/11/2014: stream processing, documentation, #machinelearning , #bigdata

Write the docs : a place where the art and science of documentation can be practiced and appreciated
Mantis : Netflix's Event Stream Processing System
ACID Stream Processing : transactional stream processing system that supports full ACID properties without compromising scalability and high throughput
Machine learning :

when machine learning goes bad : when using ML to do preemptive maintenance end up being counter productive and really expensive..
large scale deep learning : lecture deck from Google fame Jeff Dean

Friday, December 05, 2014

Links of the day 5 - 12 - 2014

Today's links 5/12/2014 : #hotchips , code performance, #SGI UV, #HPC , #KVMGT

Hotchips : video and slides for 2014 edition.
KVMGT : the implementation of #Intel GVT-g(full GPU virtualization) for #kvm
Coding for Performance : how aligning of data within structures can help you achieve optimal performance
SGI UV 300 : A single UV 300 chassis (5U) provides 4 Xeon E7 processors, up to 96 DIMMs, and 12 PCIe slots. Connecting 8 of these units together in a single rack, using NUMAlink7 interconnect, then creates a huge pool of resources, up to 480 cores, 24TB DRAM.

Thursday, December 04, 2014

Links of the day 4 - 12 - 2014

Links of the day 4/12/2014: #devops tools, Distributed systems and Micron Automata processors

Serverspec : framework using RSpec tests to validate server configuration
Diamond : python daemon that collects system metrics and publishes them to Graphite (and others).
Automata : Micron processor is designed to have complex, unstructured data written to it and to perform graph processing on that data, and provide analytic results back to the user or to the host system.
Distributed systems for fun and profit : distributed programming and systems concepts you'll need to understand commercial systems in the data center.

Wednesday, December 03, 2014

Links of the day 3 - 11 - 2014

Today's links 3/12/2014: #container management , choosing a #nosql platform, future programming , #ARM v8.1 , #Apache drill for #Hadoop

Giant Swarm : kubernetes alternative
Visual Guide to NoSQL Systems : picking one is not as trivial as what they want you to believe.
Future Programming Workshop 2014 : Video from the workshop
Arm v8 architecture : An overview of the ARMv8.1 architecture extensions (enhancements over current ARMv8). An important feature is the possibility to run a host OS kernel directly in EL2 (Hypervisor mode) saving some extra context switches for KVM.
Drill : new top level project from apache , aim to deliver Schema-free SQL Query Engine for Hadoop and NoSQL

Tuesday, December 02, 2014

Links of the day 2 - 12 - 2014

Today's links 2/12/2014: #CoreOS #docker Rocket container runtime, #Go Probabilistic data-structure, software define stuff.

Rocket: after lxd and nspawn , Coreos brings its own container run-time, as long as the API stay stable across the product who cares. Competition is good even if I feel that the departure from LXC is premature and more effort should be poured into it rather than going for the its doomed lets rewrite it attitude.
Probabilistic Data Structures for Go : When you do not always know what you are getting, trade off between speed/ memory and flexibility.
Software-Defined Application Delivery : Software defined everything.. but more seriously the fragmentation of the software engineering landscape is interesting.

Monday, December 01, 2014

Links of the day 1 - 12 - 2014

Today's links 1/12/12014: functional programming pattern, virtual supercomputer on demand and VMWARE software defined datacenter architecture.

Functional Programming Patterns : nice look into design pattern for Functional programming
Virtual Supercomputer: Massive Solutions, announced a beta version of their computational service platform. It provides secure Internet access to high-performance parallel cluster resources and applications on demand.
VMware Software-Defined Data Center : reference architecture providing an overview of the solution and the logical architecture as well as results of the tested physical implementation

Friday, November 28, 2014

Links of the day 28 - 11 - 2014

Today's links 28/11/2014: #Cloud Ressource Marketplace, Stream / Even processing, #bigdata benchmark, Facebook Fabric, Software Defined #IaaS

Deutsche Börse Cloud Exchange : market place for trading Compute, Memory and Storage. Interesting but the market place size looks really tiny, the available resources advertised could be added in just one day of daily operation of AWS.
Flafka : Apache Flume Meets Apache Kafka for Event Processing
Terraform : common configuration to launch infrastructure — from physical and virtual servers to email
Facebook fabric: in depth analysis of Facebook data-center network fabric and its implication.
Big Bench : Industry standard Bigdata analytic.

Thursday, November 27, 2014

Links of the day 27 - 11 - 2014

Today's links 27/11/2014: #statistical #programming, SMP #python, Stumbleupon #bigdata infra

Statistical Programming : Statistical Program Analysis and Synthesis, use github and other massive codebase to probabilisticaly fill in the blank in NEW code, Really cool [video]

JSNICE : statistical renaming, type inference and deobfuscation derived from the work above

SMP python : augment new or existing (Python) serial scripts for scalability across parallel hardware
Stumbleupon BigData infrastructure : lot of open source and moving pieces

Wednesday, November 26, 2014

Links of the day 26 - 11 - 2014

Today's links 26/11/2014: #linkedin Goblin, HPC with SR-IOV, #startup hotness

Comet: Realizing High-Performance Virtualized Clusters using SR-IOV Technology
Goblin : linkedin tool for simplify big data ingestion for Hadoop-based warehouses
Hot Startup: startup investment trends using crunchbase data.

Tuesday, November 25, 2014

Links of the day 25 - 11 - 2014

Today's links 25/11/2014: #AWS reinvent video , Opensource #CPU, messaging.

AWS Re:Invent video : 215 videos of 2014 Amazon event.
Open Source Parallel CPU : feel deja vu with some hints of connection machine
Kayos : fast, low cost, fault tolerant messaging and durable queueing system that offers predictable performance and can take advantage of high end dedicated hardware as well as unreliable, commodity infrastructure like EC2.... Wow , even if they achieve just half of that it would be great :)

Monday, November 24, 2014

Links of the day 24 - 11 - 2014

Today's links 24/11/2014: algorithmic design, storage startup and cc-numa

Algorithmic design manual : book on how to design algorithms, and analyze their efficacy and efficiency.
Primary Data demo : data virtualization solutions that improve enterprise efficiency and agility.
Bull Mesca BCS Systems : An expandable SMP node for memory-hungry applications using CC-NUMA. Up to 128 cores - 4 modules with IB - 16 NUMA nodes - 3 NUMA levels - RAM 2 TB

Friday, November 21, 2014

Links of the day 21 - 11 - 2014

Today's links 21/11/2014: Shared memory system, Debunking handbook, #IaaS price war

Large shared memory system : Numascale, Supermicro, and AMD have introduced the world's largest shared memory system to date. The system features 5184 CPU cores and 20.7 TBytes of shared memory
debunking handbook : a guide to debunking misinformation, offers practical guidelines on the most effective ways of reducing the influence of myths.
IaaS price war: how amazon use low level entry IaaS price to attract customer and transition them to higher value more sticky services inorder to increase revenue and retention.

Thursday, November 20, 2014

Links of the day 20 - 11 - 2014

Today's links 20/11/2014: tracing, messaging, queue, cache is the new ram

Tracing summit : Tracing Summit 2014 held in Düsseldorf, Germany, on October 13, 2014, video are also available here
Operating Apache Samza at Scale : how do leverage samza under kafka to scale your messaging infrastructure.
Cache is the new ram : with the rise of in memory database CPU cache becomes the next frontier ( but what's next ? cpu register is the new cache ?? )
Queues Don't Fix Overload : on why queue are great but not the silver bullet and what are their limitation.

Wednesday, November 19, 2014

Links of the day 19 - 11 - 2014

Today's links 19/11/2014: #resiliency, #cloud , distributed system, #stream processing

R esilient by Design : React SF 2014 talk by Ben Christensen on how Netflix use the application of reactive design principles to guarantee resiliency.
The Distributed Developer Stack : opensource Field Guide by O'reilly crowd
Kafka and Samza : Distributed stream processing slide deck

Tuesday, November 18, 2014

Links of the day 18 - 11 - 2014

Today's links 18/11/2014: #microservices, #cloud , oracle CPU, automated deployment, #RDMA, Intel Omni-path

Colossus : a lightweight framework for building high-performance applications in Scala that require non-blocking network I/O. In particular Colossus is focused on low-latency stateless microservices by tumlbr
M7: Next Generation Oracle Processor [ slides ]
Appolo : shared internal deployment service at Amazon and its public pendant AWS code deploy
RDMA in the cloud : Vmware demonstrate RDMA for virtual machine in cloud env but as usual with vmware presentation the graph scale are poorly chosen...
Omnipath : Intel is gunning at infiniband with its new 100 Gbps fabric.

Monday, November 17, 2014

Virtual Cores tech and partitioning hypervisor

A couple of weeks ago Soft Machines came out of stealth mode and introduced the Variable Instruction Set Computing (VISC). VISC attempts to avoid the difficulties of scaling multiple threads in hardware by providing a framework in which workloads that appear sequential to the operating system are then scheduled across a set of virtual cores in hardware.

What is interesting here is that most often virtualization has been leverage to deliver fine grained (over)subscription from your system. The disadvantage is that with such technology you have often a penalty to pay. It is on average between 5 to 10% but it can be less or worse depending of the workload and over-subscription scenario. One alternative for separating critical applications while guaranteeing performance and isolation exist: partitioning hypervisor such as jailhouse.

Jailhouse can create asymmetric multiprocessing (AMP) setups on Linux-based systems. What it effectively does is partition the physical hardware into cells and guarantee the isolation between them. Effectively it use some cores to run a Linux based hypervisor to manage and partition the rest of the hardware. Each OS or virtual machine effectively run only on its dedicated core and hardware. The typical workloads we expect to see in non-Linux cells are applications with highly demanding real-time, safety or security requirements. However the trade-off is that you are under subscribing your system in order to guarantee high performance and isolation.

Now enter VISC, with such technology you could in theory eliminate the under subscription problem while retaining the performance and isolation characteristics of partitioning hypervisor. What would be interesting to see if this cpu architecture will be able to leverage this type of hypervisor technology to gain momentum within the cloud. Workload such as network virtualization function or virtual network systems would greatly benefit from such approach as the overhead of classic virtualization is a significant limiting factor.

Links of the day 17 - 04 - 2014

Today's links 17/04/2014: distributed transaction, data center fabric, SSD friendly alternative to bloom filter.

Granola: Paper Summary: Granola, Low overhead distributed transaction coordination
Consistency and coordination : you do not always need coordination to be consistent while retaining ACID property [paper]
Quotient filter : SSD optimized alternative to bloom filter.
Facebook datacenter fabric : facebook we broke the network up into small identical units – server pods – and created uniform high-performance connectivity between all pods in the data center in order to maintain performance, scalability and reliability.

Friday, November 14, 2014

Links of the day 14 - 11 - 2014

Today's links 14/11/2014: NUMA, distributed transaction, NVMe, Bimodal IT

Scale out Numa : an architecture, programming model, and communication protocol for low-latency, distributed in-memory processing by layering an RDMA-inspired programming model directly on top of a NUMA memory fabric via a stateless messaging protocol.
Calvin : Paper Summary - Calvin, Distributed transactions for database systems
NVMe and Fabric : performance impact of fabric with NVMe
Bimodal IT : Simon Wardley explain why Gartner's bimodal IT concept is an half baked old concept in new clothes

Thursday, November 13, 2014

Links of the day 13 - 11 - 2014

Today's links 13/11/2014: Mellanox ConnectX4, Immutable infrastructure and ARM server

ConnectX4 : EDR 100Gb/s InfiniBand and 100Gb/s Ethernet, 150M messages/second, impressive numbers from Mellanox.
Fugue: immutable infrastructure delivering Automating the creation and operations of cloud infrastructure, Short-lived and simplified compute instances
Custom Cloud Arm Server : online lab design its own ARM based server for its cloud infrastructure.

Wednesday, November 12, 2014

Links of the day 12 - 11 - 2014

Today's links 12/11/2014: all about high performance distributed networking and communication

Aeron : Open-source high-performance communication protocol, really nice fast and efficient communication system, reliable, ordered, low latency and high throughput!! [github]
SBE : Simple Binary Encoding - better performance than protocol buffer [github]
Why Is Exactly-Once Messaging Not Possible In A Distributed Queue? : ever wondered why ? here is the answer.

Tuesday, November 11, 2014

Links of the day 11 - 11 - 2014

Today's links 11/11/2014: Kafka and infrastructure, Product vision building, and software performance techniques

Infrastructure for Data Streams : how to persist incoming data stream with guaranteed data availability and redundancy using Kafka at Chartbeat
Idea stack : Idea Stack exercises and use case studies, illustrating how to build a vision for a product before developing it.
Below an image of the various software improvement techniques :

Monday, November 10, 2014

Links of the day 10 - 11 - 2014

Today's links 10/11/2014: cluster CI , DHT, Multiqueues, blockchain ssh key pki

Cluster Runner: fast easy test feedback for your continuous integration system. Its always a challenge to retain speed and agility as your test and team growth. Cluster Runner helps solve that.
DHT routing table maintenance: performance improvements of the DHT by the bitorrent crowd
MultiQueues : multiple sequential priority queues that outperform previous more complicated data structures
emcssh : blockchain based secure, decentralized management of PKI.

Thursday, November 06, 2014

On the emergence of hardware level API for dis-aggregated datacenter resources

The technologies enabling the modular dis-aggregated data-center concept are reaching maturation point as demonstrated by the latest technology showcase RSA from Intel or to a lesser extent FusionCube / FusionSphere from Huawei. The needs for such technologies arise from the fact that current cloud and data-center technology does not and cannot fulfill all the demands of cloud users for multiple reasons. On one hand, as the number of cores and amount of memory on servers continues to increase (over the next few years, we expect to see servers with hundreds of cores and terabytes of memory per server commonly used), leasing an entire server may be too large for many customer’s needs with resources wasted. On the other hand, with the emergence of a broad class of high-end applications for analytic, data-mining, etc., the available amounts of memory and compute power on a single server may be insufficient.

Moreover, leasing cloud infrastructure resources in a fixed combination of CPU, memory, etc. are only efficient when the customer load requirements are both known in advance and remain constant over time. As neither of these conditions are met for a majority of customers, the ability to dynamically mix-and-match different amounts of compute, memory, and I/O resources is the natural evolutionary step after the hyper-converged solutions.

The objective here is to address the gaps that allow us to go beyond the boundaries of the traditional server, effectively breaking the barrier of using a single physical server for resources. in other words, we will be able to provision compute, memory, and I/O resources across multiple hosts within the same rack, while being consumed dynamically by varying quantities at run-time instead of in fixed bundles. This will effectively enable a fluid transformation of current cloud infrastructures targeting fixed commodity sized physical nodes to a very large pool of resources that can be tapped into and consumed without classical server limitation.

Intel has been advertising its RSA stack for a while and it is finally becoming reality, however, the real interesting part is not the technology. Indeed, to a certain extent, a lot of technology already exists and enables us to implement resource pooling. We demonstrated that it was already feasible to deliver cloud memory pooling in the Hecatonchire Project as well as with other vendors such as TidalScale or ScaleMP, who already offer compute aggregation. However, the last two solutions are monolithic and lack the flexibility needed to be used with the cloud consumption model and as a result, they are confined to a niche market.

What can really kick the dis-aggregated model into top gear is that Intel has now teamed up with a couple vendors and has already, created a draft hardware API specification called Redfish. Such API can be leveraged by a higher level of the stack thus allowing more intelligent, flexible and predictable resource consumption of how, where, and when workloads (VMs, containers, standard processes/threads) get scheduled onto that hardware. In a certain way this then enables Mesos / Kubernetes to deliver enhanced scheduling for every hardware aspect.

This brings some interesting capabilities to existing cloud technologies, cores and memory which then can be dynamically reallocated across the workload and arguably, , it greatly reduces the need for load balancing via live migration. You would then dynamically re-allocate the resource underneath (core, memory) rather than the whole system, thus making such process more robust and less error prone.

On the container side it would solve a lot of security headache the community is now facing. Rather than going with the physical->virtual->container route, you could simply run physical->container with a fine grained per core allocation using RSA / Redfish. Effectively you would provide fine grained subscription from your system in order to get maximal separation and performance guarantees. One can use this for separating critical applications while guaranteeing performance and isolation and indeed something we can already do now with jailhouse, at the cost of under subscribing your system.

If Intel is successful in disseminating (or having the other vendors standardize around it’s Hardware API), it would allow the technology to leap forward, as it’s biggest enemy is the difficulty to port across management API from one fabric, compute, I/O, storage model to another.

Wednesday, November 05, 2014

Links of the day 05 - 11 -2014

Today's links 05/11/2014: #Docker and #LXD, scalability rule, Google #containers engine

LXD : Shuttleworth announce LXD, a secure container technology design to address the isolation and security concern of exiting solution.
Scalable commutativity rule : Whenever interface operations commute, they can be implemented in a way that scales
Container engine: interesting that Google offer multiple container per VM management (1:M) while all its concurrent only offer a 1:1 mapping.

Tuesday, November 04, 2014

Links of the day 4 - 11 - 2014

Today's links 04/11/2014 : infra conf mgmt tools,distributed DB sharding and replication, smart scheduling

ICMT eMAG : Infrastructure Configuration Management Tools eMag by infoq
Dynomite : a sharding and replication layer for database inspired by Cassandra and Dynamo paper from the Netflix crowd.
Smart Scheduler : ovirt smart scheduler for VM placement and migration in order to maximize occupation. The nice part is that it use a migration planner that plan out how to reach the optimal placement in multiple steps via migration, a little bit like the Hanoi tower problem but NP hard.

Monday, November 03, 2014

Links of the day 03 - 11 - 2014

Today's links 03/11/2014: Intel Netburst, Fig and Crypto banking

Replay : well hidden feature of the Intel Netburst architecture. Interesting to see how much impact this non advertised feature had on the performance of the system.
FIG : Fast, isolated development environments using Docker [screen cast]
Crypto bank : they are trying to build the first cryptocurrency bank. A step up from bitcoin wallet.

Friday, October 31, 2014

Links of the day 31 - 10 - 2014

Today's links 31/10/2014: TCP, kernel, NVMe, network fabric

lwIP on BareMetal : a lightweight TCP/IP stack running on an ultra-lightweight kernel ( coded in x86 assembly )
NVMe over Fabrics : a look at the performance of NVMe over various fabric and what it implies for the future of storage.

Thursday, October 30, 2014

Links of the day 30 - 10 - 2014

Today's links 30/10/2014 : userfault, transaction, cloud frontend , virtkick

Phaser : phase Reconciliation for Contended In-Memory Transactions by Neha Narula at MIT [slides]
Scaling Address-Space Operations on Linux with TSX : Thesis by Christopher Ryan Johnson on transacional memory and how these operations can be scaled within multicore systems.
VirtKick : A simple orchestrator. Manage virtual machines or #docker [github]
Userfault : Andrea Arcangeli release the first RFC for page fault resolution in userspace. The interesting bit is the possibility to treat write and read fault differently. I can foresee some promising spin off from this project

Wednesday, October 29, 2014

Links of the day 29 - 10 - 2014

Today's links 29/10/2014: Ceph, storage, shingle and erasure code

Ceph Developer Summit : Hammer release, Schedule - video and slides
Shingled Erasure Code : mixing shingled parity with erasure code to increase resilience and speed up recovery.

Tuesday, October 28, 2014

Links of the day 28 - 10 - 2014

Today's links 28/10/2014: cloud trading, CERN, snapshot isolation, cloud infrastructure, RDMA to remote HW/device

Clock-SI : [paper summary] Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks
Cloud trading : 6 fusion blog post on their approach on trading cloud resources as commodities.
CERN Cloud Infrastructure Report : a lot openstack, looks like they are trying to double the capacity every year. They are really at the bleeding edge of openstack production and stressing what can be done with it.
Peer-Direct support : allows RDMA operations to directly target memory in external hardware devices, such as GPU cards, SSD based storage, dedicated ASIC accelerators, etc.

Monday, October 27, 2014

Links of the day 27 - 10 - 2014

Today's links 27/10/2014: #latex, #twitter anomaly detection toolkit, #git

Breakout detection in the wild : R package for anomaly detection by twitter [ git repo ]
Pro #Git book, written by Scott Chacon and Ben Straub
latex-templates : lightweight Community-driven minimalist LaTeX

Friday, October 24, 2014

Links of the Day 24 - 10 - 2014

Today's links 24/10/2014: Failure injection, Virtual Core, Distributed analytic engine , biological circuits

Kylin : eBay has released to the open-source community its distributed analytic engine designed to accelerate analytics on Hadoop and allow the use of SQL-compatible tools [ ebay blog post ]
FIT : how netflix use Failure Injection Testing to validate its code, architecture and everything else
Virtual core : after risc, cisc here comes VISC. The VISC architecture is based on the concept of “virtual cores” and “virtual hardware threads.” This new approach enables dynamic allocation and sharing of resources across cores.
Rich Test Results by Google: aims to represent most of the data that can be produced by static and dynamic program analysis.
Precise and programmable biological circuits : on step closer to biological computers with the aim of designing small circuits made from biological material

Thursday, October 23, 2014

Links of the day 23 - 10 - 2014

Today's links 23/10/2014: fast TCP socket, HFT , cuckoo filter and sidechain for Bitcoin blockchain

Fastsocket : highly scalable socket and underlying networking implementation of Linux kernel rewrite. Nice performance and results, however its going to be hard to push within the upstream kernel. Might take a while if it ever happen.
5th Annual Modeling High Frequency Data in Finance Conference : last year HFT conference slides
Cuckoo Filter - Practically Better Than Bloom : Performance wise, at the same level of space efficiency, insert speeds are better than a standard Bloom filter when the hash is mostly empty (low load), but significantly worse than a Bloom filter at high load when the hash gets filled and many 'evictions' are necessary for each insert. But if a tradeoff for space is acceptable (to stay at ~50% load), the Cuckoo Hash inserts much faster than its Bloom counterpart. Lookup speeds for existing keys are 2-3x the speed of a standard Bloom filter, and about 1.5x than the "Blocked" Bloom filter variant. Negative lookups (key not present) are about 2x faster than Bloom at high load, but about 3/4 the speed at low load. ( full summary on HN )
Pegged Sidechains : Sidechains in theory could address lots of the scalability issues of Bitcoin and could accelerate the roll out of new features

Wednesday, October 22, 2014

Links of the day 22 - 10 - 2014

Today's links 22/10/2014: Twitter #NLP, Neural Turing Machine , Mesos, FB architecture

Tweet NLP : twitter natural language processing. It will help you understand and translate your teenage daughter tweet ( by Carnegie Mellon University)
Neural Turing Machines : combination of neural network approach with Turing Machine or Von Neumann architecture by Google
Mesos Deep Dive : architecture deep dive into Apache Mesosphere
FB architecture : Facebook and the KISS approach to datacenter

Tuesday, October 21, 2014

Links of the day 21 - 10 - 2014

Today's links 21/20/2014: all about #Linux #networking with a little bit of #HPC distributed #storage

State of Linux network stack : what's new and interesting in the latest kernel release, especially the low-latency device polling
KVM Forum : all videos of this year KVM forum . Some interesting talk especially on the HPC front and an interesting quote from Vincent Jardin: " if you want to have high performance networking or NVF solution don't use virtualization use container"
RDMA and ARM : Mellanox bring its RoCE adapter to the moonshot project. Interesting to see what type of application would leverage such architecture combination: a lot of small processors with a fast fabric.
IX : solution that isclose to achieve the holy grail of networking - Low latency with high throughput (line rate)
(Fast Forward) Storage and I/O : Distributed Application Object Storage (DAOS) by Intel for HPC solution. A lot of flash , burst buffer with Lustre for supercomputer. Very interesting approach to address the challenge of future exascale computing platform.

Monday, October 20, 2014

(Big) Data is a double edged sword

Previously, we looked at how not to fall into the mirage of unicorn hunting in your “big data” and why you should not delay too much in adopting data science techniques into your business operations. In this post we will look at why data can be both your best and worst enemy.

Data is a double edged sword.

The enterprise with the best data will greatly benefit from having a significant advantage over its competitors and consequently, enterprises should seek to amass as much data as possible. As we previously learned, an enterprise leveraging its own data allows it to gain a competitive edge on the chessboard. However, more often than not, enterprises are facing a big dilemma: who generates and then consequently who owns this precious information? Quite often most of it originates from the customer and in order to alleviate this issue and repatriate the precious data points back into the mothership, enterprises leverage the XaaS model.

These ”X” (anything) as a service products benefit consuming companies by lowering the cost of operations, reducing or eliminating CAPEX. It also to a certain extent provides data aggregation, market comparisons and a range of other useful capabilities. Whilst useful for lowering cost and product implementation / service delivery for the deployer, the real beneficiary is in fact the XaaS provider.

The provider can then leverage this information by monitoring the consumer behaviour and usage of its product in order to identify the spread of new successful innovations. This is basically what Amazon and others have been applying quite successfully over the past decade and is known as the Innovate - Leverage - Commoditise model (ILC). And in certain extreme case they enter market not to make money but simply to collect more data to drive other parts of their business.

As you can see, you have to control which data you need to keep and which you can leak or generate for a third party. Without this understanding, your enterprise business might end up being exploited as it just becomes a puppet within a bigger ecosystem which you do not own. In Fact, more often than not, the service provider is a wolf in sheep’s clothing: he presents himself as wanting to ‘help out’ but in fact and unfortunately, there is less collaboration and more exploitation driving his intentions.

Enterprises are therefore facing a dilemma and they have to adopt and consume XaaS in order to stay competitive, while trying to avoid leaking their innovation by feeding the ecosystem with more information. One efficient way to counter the later is to form their own ecosystem and leverage data from it which in turn enables them to partially workaround the enterprise’s inherent innovation limitations. However, this is often easier said than done.

The data gathered is as important as the data generated as this can either make or break an enterprise. Creating one’s own ecosystem to draw information from will quickly become critical as an enterprise cannot solely rely on a single source of information to stay competitive.

Maybe what we will begin to see in the near future is the emergence of information exchange or even data collectivism among enterprises (a behaviour triggered by collective prisoner's dilemma) in order to counterbalance the mastodons of data vacuuming, such as Google or Amazon.

Subscribe to: Posts ( Atom )