Tuesday, September 29, 2015

Links of the day 29/09/2015 : Time series in SQL, Chaos engineering, Distributed Systems Architecture


Monday, September 28, 2015

Links of the day 28/09/2015: SPDK , CAP for durability, NVM and In memory DB


  • SPDK : Intel pendent of its DPDK toolkit specially aimed at storage and NVMe , As usual you can do some interesting stuff with it but it is also higly restrictive in term of Hardware and behavior. The device need to be purely dedicated to the library and cannot be shared with other system. Moreover it might be as big as a security risk as the DPDK libs (memory is accessible across DPDK consumer..) [more user friendly overview here]
  • Impact of new Persistent Memory Tech on In memory DB: Nice overview of the impact of the new NVM tech. Too bad its still try to push the classic DB model instead of proposing an approach from scratch that would fully use the potential of these new technology. 
  • CAP theorem for Durability : interesting effort to formulate the limitation of the durability problematic in distributed environment.  However it doesn't help curtail the acronyms soup from spreading. 



Monday, September 21, 2015

Links of the day 21/09/2015 : Encrypted DB, Curl and No/SQL

CryptDB : database system that can process SQL queries over encrypted data.
Curl Cheat Sheet : for the ones out there curling away.
How to do SQL with K/V : how to map SQL style table to a key / value store. Also know SQL in NoSQL :)

Thursday, September 17, 2015

Links of the day 17/09/2015 : philosophy and donuts, GDB dashboard, Dweet API

  • GDB dashboard : while i love DDD this is still really good when you don't have access to an X environment.
  • Dweet : twitter style messaging for IoT. Reminded me of the time when I was using XMPP to do something similar to chef. 
  • Philosophy Explained With Donuts : enough said...

Wednesday, September 16, 2015

The upcoming Storage API battle

There is an interesting trend within the storage ecosystem. We are witnessing a polarization of the offer. On one side, we are seeing the rise of high performance rack scale solution(DSSD, NVMe over fabric solution, etc..) . And on the other side we have the object storage solution which are more datacenter scale. While both leverage heavily non volatile memory they play a different different role within the ecosystem.

The rack scale storage target “very” high performance solution, delivering very low latency high bandwidth access time. Often in the 100 of usec or less. However often these solution come at a higher financial cost due to more expensive hardware (custom NVM), network fabric (IB+NVMe, PCIe + NVMe, Omnipath +NVMe, Pure PCIe, etc..) and significant power consumption (>2000W/5U for DSSD). Finally, these offer access via a specialized API that needs to be either accessed natively or adapted to other more standard one.
On the other side we have the object storage solution. Users access object storage through applications that will typically use a REST API. This makes object storage ideal for all online, Cloud environments. Moreover they tend to be a lot more cost efficient especially with the rise of Ethernet connected drives (up to 50% less TCO).
Stuck in the middle is the classic Filer / POSIX compliant solution that seems to slowly dwindle away. To a certain extent the rack scale solution should have a bright future in the niche (but still significant) market for enterprise that still consider that their application requires custom for what they think is a custom problem. On the other side the object storage is gaining momentum by riding the unstoppable cloud tide.



While, both technology can and should co-evolve, they both suffer from software limitation and to a certain extend hardware one as bandwidth and latency get dangerously close to what cpu are capable to handle. This requires a drastic shift on how applications are developed if user want to actually get any benefit from these solution. However, few company are willing to risk specializing their code using an API that can become obsolete when the next generation of storage solution pops up.
Storage startup/company out there start to discover that  API is playing a significant role success of their product and performance while still important will lose of its importance. It will either make rewriting applications to access your storage infinitely easier task or transform it into a painful experience by forcing to go through hoops and/or adaptation layers with the performance cost associated. 
The fight for the next generation storage API only started. There is and will be more push toward standardization which will be fueled by the customer tiredness with every revolving siloed point solution. People who use object storage one want it to behave more like POSIX storage, but they also want to keep the storage costs at an object level and improve the performance. On the other hand people using rack scale storage want to retain the performance but increase its simplicity and also want the price to come down. It is going to be extremely hard to deliver both but hopefully we might finally see a rationalization of the storage market as having an object storage system that allows byte range access is very appealing. 

Links of the day 16/09/2015 : neuroinformatics conference , power8 , server forum

  • Neuroinformatics : congress 2015 videos of presentation
  • Power 8 : inception, power optimization of the power 8 cpu 
  • Server Forum : 2014 conference presentation . The interesting presentation is the Facebook DIMM interconnect one. It seems that they are looking on a way to allow server to share DIMM bank across server. Potentially I guess for their massive K/V store or memcache in order to reduce the access time. However i would be really curious to see how it works in practice.

Tuesday, September 15, 2015

Links of the day 15/09/2015 : NVDIMM, Silicon Quantum computer , concurrency kit


Monday, September 14, 2015

Links of the day 14/09/2015 : devops cert, Byte-Addressable NVM, Kernel Bypass

  • WrAP : paper on Managing Byte-Addressable Persistent Memory
  • Kernel bypass : as network and storage get faster generic solution start to be seen as a limitation in current software stack. As a result we are seeing more and more bypass library. However it always end up being dependent on how efficient the end consuming software is. 
  • Devops League : There are plenty of DevOps certifications out there of varying quality. This one is the best. It is wonderful and You'll love it, too. You'll love it so much that you'll print out your certification and even put it on your résumé. You'll tell all your friends about it and even ask your loved ones to mention it at your funeral. RIP, by the way.

Friday, September 11, 2015

Links of the day 11/09/2015: how to rock @ SXSW , Transputer and Async load balancer

  • ADLB : Asynchronous Dynamic Load Balancing software library designed to help rapidly build scalable parallel programs. ADLB does not achieve scalability solely by load balancing. It also includes some features that exploit work-stealing as well. [paper]
  • OpenTransputer : an implementation of the Transputer architecture designed at Inmos during the 1980s. It is design to work hand in hand with occam language (but other can work with it too). What I really miss from the presentation is the CLEAR advantage of this approach over more generic or pure software one leveraging x86 arch.
  • Tim Ferris guide on how to rock at SXSW : nice breakdown on how to exploit to its maximum potential big gathering opportunity [video]


Thursday, September 10, 2015

Links of the day 10/09/2015: Scale + fault tolerance, and web is dead long live the web

  • IPFS : a lot like the webtorrent project or other, anyway interesting to check out. Even if some inherent characteristics makes it non practical for dynamic web. Think of Reddit, this would generate so many diff of the same page... There manifesto can be found here 
  • Scalable High Performance Systems : nice presentation on addressing interesting new challenges which emerge in the operation of the datacentres that form the infrastructure of cloud services, and in supporting the dynamic workloads of demanding users.
  • FTS workshop : funny that its the the first workshop on fault tolerance while HPC system have a long history of addressing these issue.. 


Wednesday, September 09, 2015

No, "you weren't ahead of time", you just were riding the wrong diffusion curve

Launched ahead of their time” - a claim a lot of startups (and indeed more established companies) use to explain their product failure. In some rare cases, a product is truly ahead of it’s time, however, there is no market for it at all and no supporting component within the supply chain enabling it to be viable commercially and economically. But in most cases, these claims can be boiled down to a lack of traction from their offering. 

In this blog post, I will focus on the “prematurely interrupted” hockey stick growth curve that some companies experience and the misunderstanding surrounding same. It looks and feels like exponential growth, but the ride terminates far earlier than the potential market research predicted. Incomprehension, surprise and denial are often common when the sales flat-line occur because customer feedback was great. As a consequence, companies use the “ahead of their time” excuse to explain their failure. However, the truth is the market for the product they built simply dried up.
Often these companies misunderstood the true reality of the diffusion of innovations curve presented below. With successive groups of consumers adopting the new technology (shown in blue), its market share (yellow) will eventually reach saturation level. The interpretation is that the technology adoption implies same product consumption across a consumer group. 
In this graph, each phase of the adoption is represented by a different customer group that requires a tailored product in order for them to adopt it. While the concept and technology to a certain extent is similar across each consumer group, the actual product may vary drastically in shape and form. As a result, the technology, product, and consumption model evolves with each phase at different pace. In the graphic below, I have overlayed the actual diffusion curve of each sub group on top of the diffusion of innovation curve, in order to make it clearer. Note that this concept is derived from Wardley’s mapping technique tying diffusion and evolution within a single map.

As you can see, each customer type represents an independent sub-market with its own characteristic and inertia. It can be extremely easy to become trapped within a sub customer ecosystem. Often companies validated their products within such subspace and show impressive stats along a number of dimensions, such as high engagement, viral coefficient, or long-term retention. However, what is important to understand is how big is the customer market you validate your product in as well as asking the question, does it belong to a bigger ecosystem? Without this information, a company can quickly end up trapped into a local maxima. As a result, companies get boxed into a line of creative design thinking making tiny incremental improvements but never looking beyond that one solution. They became addicted to positive reinforcement, created out of their customer feedback, thereby preventing them from looking beyond that one solution to an innovative solution along different creative lines of thinking. That's how a company ends up having hipchat vs slack. The only difference between the two is the packaging of technology and it allows one to thrive along a bigger diffusion curve, while the other one seems stuck.

As mentioned, the technology evolves over time and with each diffusion wave. Quite often from genesys, custom built, product, and finally utility. However, there are many chasms to cross as there are a multitude of competing versions created, evolved (and dying). To be able to cross from one stage to another requires not only to understand the technological requirements of the new consumption model for the diffusion curve, but also the economic imperative associated with it, as shown in the graphic below. The reality is that the market fabric is a fractal tissue, made of a multitude of diffusion curves. You have the actual technology evolution as shown in the graph below, for each of these curves you have the same similar sub-curve representing the various adoption rate. These sub-curves are then subdivided and overlapped with smaller ones created by each company's product/services competing within the space.


This overall complex fabric creates a difficult environment for determining the correct strategy to apply. Identifying the current state of the ecosystem, its direction and when to adapt is a daunting task with a multitude of variables to take into consideration (which I might try to take a stab at in a future post). For the lucky or for the visionary, that spot the trend early enough, they may then attempt to sell early, or pivot their strategy. Pivoting their strategy is a rather difficult operation to execute correctly or even at the right time. Too early or too late and you can lose momentum of the current diffusion wave while the next one might not have picked up yet. In this case, your capacity to wait it out depends ruthlessly on your burn rate. Many companies fail at that stage simply because of bad timing.

To conclude, often when a product, company or startup claims to have failed in their endeavours because they were “ahead of their time”, this is a misconception. In reality and unfortunately in the majority of cases, they simply did not understand the ecosystem they had evolved in and got stuck in a local maxima. For some, it turned into a kiss of death while others, into a curse of zombification.

Tuesday, September 08, 2015

Links of the day 08/09/2015 : Kinetic ethernet storage drives, Silicon Valley Show and Time Maps

  • Silicon Valley : If you can't wait for the new season here is the hilarious script written by the founder of firefox. 
  • Kinetic Open Storage : now a collaborative project under the Linux Foundation for Ethernet drives. I'm really excited about this project.
  • Time Maps : technique for visualizing many events across multiple timescales in a single image

Monday, September 07, 2015

Links of the day 07/09/2015 : #bitcoin, computer science paper, and micro datacenter

  • Paper we love : A lot of good talk mainly introduction level: 
    • Bitcoin : overview of the bitcoin Peer-to-Peer Electronic Cash System for those living under a rock for the past 2 years.
    • Propositions as Types : Michael Bernstein talks about Philip Wadler’s paper Propositions as Types, which starts out with the following sentence: "Powerful insights arise from linking two fields of study previously thought separate." And just keeps on going from there. In less than 9 full pages, Wadler assembles an exuberant, hilarious take on the deep, meaningful connections between mathematics, philosophy, and computer science.
  • Micro Datacenter : Microsoft makes the argument that for the future of mobile computing edge computation and the doting of physical landscape will be the only way to solve the latency issue. I agree to a certain extent however as usual the requirement for that to happens many other technology needs to bridge some chasm in term of maturity. Surprisingly AOL experimented with that idea ~3 years ago.


Thursday, September 03, 2015

Links of the day 03/09/2015 : vNVDIMM, user-space tcp/ip stack, Cern openday

  • Event-driven, user-space and highly-scalable TCP/IP stack : framework got a 2.6× performance improvement, when compared to the same application layer running on the new reusable TCP sockets introduced in Linux 3.9. The web-server was able to deliver static HTML pages at a rate of 12 Gbps on a Tilera TILE-Gx36 36-core processor. Userspace vs kernel space ..
  • Cern Openlab Day : research and industrial teams from CERN openlab present their projects
  • vNVDIMM : NVDIMM driver has been merged into upstream Linux Kernel and there is tries to enable it in virtualization field. Virtual machine with persistent memory.

Wednesday, September 02, 2015

Links of the day 02/09/2015 : Seagate kintetic @CERN , KVM HA and Intel Omnipath perf

  • KVM HA : Update on the COLO VM replication project. What's interesting is that they only look at the network side of the external state to confirm the coherence of the system. This limits what the VM can do and access but on the other hand simplify the overall setup.
  • Kinetic @ CERN : In a previous blog post I mentioned how this type of approach is a game changer for cloud storage solution. Here you can already see some numbers : ~1/3 of power , ~1/2 of cost, No more storage server ( simplified maintenance ) [video]
  • Omnipath :  the number are really interesting: (a) a 4.6x improvement in small message throughput over the previous generation InfiniBand fabric technology, (b) a 70ns decrease in switch latency  (c) a single ASIC that can deliver 50 GB/s of dual-channel bidirectional bandwidth, or 12.5GB/s single channel unidirectional bandwidth. In short, Omni-Path delivers all this and more as an integrated component inside the processor package or via other form factors like PCIe cards or custom mezzanine cards.

Tuesday, September 01, 2015

Links of the day 01/09/2015 : software FTL for SSD, Decryption , startup

  • Open Channel : It seems that there is a current trend to strip most solution to their bare minimum. Open Channel is an effort to strip away the FTL layer as some estimate that Embedded FTL’s introduce significant limitations for Server compute. Basically the goal is to allow the specialization at software level for specific workload ex: 90% read, transnational , etc.. [github]
  • Encrypted database case : cryptoanalysis of basic encryption using using information entropy
  • How to start a startup marathon : somebody decided to watch the whole 16h straight with live commentary. To a certain extend it is a good supplement to the lecture note and a excellent way for those who didn't watch the course yet to pick what they using the commentary  to watch first using the commentary.