Tuesday, June 19, 2018

[Links of the Day] 19/06/2018 : Facebook network balancer, Open policy agent, Intel NLP libs

  • OPA : an open source policy agent that decouple policy from actual code logic. This is essential to provide great flexibility with fine-grained control of resources. These kinds of features are a key building block for secure and robust API based solution. [github]
  • Katran : facebook scalable network load balancer. It relies on eBPF and XDP from the Linux kernel to deliver impressive performance at low-cost thanks to its capability to run on off the shelf hardware. [github]
  • NLP Architect : Intel NLP library and solution. Sometimes I feel that Intel has some great hardware and software but the release cycle is rather decoupled. Which often leave the user in an odd situation, where the hardware is out but the software is not there yet. And sometimes it's the opposite. I really feel that Intel should work on this. Maybe externalise the software to a separate entity as the hardware culture might be impeding the software side of the company.

Thursday, June 14, 2018

[Links of the Day] 14/06/2018 : GDPR documentation template, Survey of Vector representation of meanings, Supervised learning by quantum neural networks

  • A Survey on Vector Representations of Meaning : the papers present an overview of the current state of word vector model research space. The survey is quite useful when you need to choose a vector model for your NLP application as each model comes with different tradeoffs.
  • EverLaw GDPR documentation Template: Highly practical and down to earth document helping you classify your current status regarding GDPR and understand what exposure you have to it. To some extent, this is almost a must fill the first step for any company out there that deals with individuals information. 
  • Supervised learning by Quantum Neural Networks:  what's better than neural networks? Quantum neural networks !!! 

Tuesday, June 12, 2018

[Links of the Day] 12/06/2018 : Type checking for Python, Golang Web scrapper , Google Style Guide

  • Pyre : Fast Type Checking for Python by Facebook crowd. Written in Ocaml
  • Colly : web scrapper and crawler framework in Golang. I really like Scrappy but I think colly has some good potential. Even if often speed is not the main characteristic of scrappers. Actually, you really want to have good rate limiting mechanism if you want to avoid crashing the website you scrap
  • Google Style Guides : All style guide for the different programming languages used at Google 

Thursday, June 07, 2018

[Links of the Day] 07/06/2018 : Quantum algo for beginners, Dynamic branch prediction and Running Python in Go

Tuesday, June 05, 2018

[Links of the Day] 05/06/2018: All about kubernetes - kops and descheduler

Today is all about k8s

  • KopsProduction Grade K8s Installation, Upgrades, and Management
  • Kops terraformHA, Private DNS, Private Topology Kops Cluster all via terraform on AWS VPC
  • Descheduler :  this aim at solving the issue of overprovisioning nodes with k8s. This descheduler checks for pods and evicts them based on defined policies. Ideally, these policies aim at maximising resource usage without compromising availability.

Microsoft aim at undercutting AWS strategic advantage with its Github acquisition

Microsoft acquired Github code sharing platform. This is a brilliant move. It allows Microsoft to offset some of the insane advantages that AWS gained over the last couple of year via its innovate, leverage, commoditise strategy

ILC model by Simon Wardley

ILC relies on the following mechanisms: the larger the ecosystem, the higher the economy of scale, the more users, the more products being built on to of it, and the more data gathered. AWS continuously use this data trove to identify patterns and apply it to determine what feature they are going to build and commoditise next.  The end goal is to offer more industrialised components to make the entire AWS offer even more attractive. It's a virtuous circle, even if sometimes AWS cannibalise existing customer product and market share on the way. Effectively, AWS customers are AWS R&D department that feedback information into the ecosystem. 
As a result, AWS methodically eat away at the stack by standardising and industrialising components built on top of their existing offer. It further stabilises the ecosystem and enables them to tap further into the higher level of the IT value chain. As a result, AWS can reach more people while organically growing their offer at blazing speed with minimal risk. Because, apparently, all these startups are taking all the risks instead of AWS. 

How does Microsoft acquisition play into this?  Well, Microsoft with its Azure platform is executing a similar play to the one that AWS is delivering. However, Microsoft has a massive gap to bridge to catch up to AWS. And the difference is widening at incredible speed as the economy of scale offers an exponential advantage. AWS has a significant head start in the ILC game, which confers them a massive data collection advantage over its competitor. However, Microsoft can hope to bridge that gap by directly undercutting AWS and instantly tap into the information pipeline coming from GitHub. By doing so, Microsoft can combine the information coming from its Azure platform with Github. Providing them with an invaluable insight that combines actual component usage and developers interest and use. Moreover, this will also offer valuable insight into AWS, and other cloud platforms as a majority of projects ( opensource or not) deploying onto these are hosted on Github.
Cloud Wardley Map with Github position

I quickly drew the Wardley map above to demonstrate how smart the acquisition of Github is. You can clearly see how the code sharing platform enables Microsoft to undercut AWS strategic advantage by gaining ecosystem information straight from the developers and the platforms above.  As Ballmer once yelled: Developers, developers, developers!

Thursday, May 31, 2018

[Links of the Day] 31/05/2018 : Testing Distributed Systems, Quantum Supremacy , Togaf Tool

  • Testing distributed systems : Curated list of resources on testing distributed systems. Thre is no silver bullet, just sweat, blood and broken systems. 
  • The Question of Quantum Supremacy : Google folks are trying to determine the smaller smallest computational task that is prohibitively hard for today’s classical computers but trivial for quantum computer. This is the equivalent of hello world for a quantum computer and is critical to validate quantum computer capability and correctness. 
  • Archi : open source modelling tool to create ArchiMate models and sketches. If you ever look at TOGAF or use the enterprise architecture principle, this tool is for you. 

Wednesday, May 30, 2018

The curse of low-tech Scrum

I recently read the following article that describes how scrum disempowers devs. It criticises the "sell books and consulting" aspect that seems to have become the primary driver behind the Agile mantra. Sadly, I strongly agree with the authors' view.

Scrum brings some excellent value to the technical development process such as :
  1. Sprints offer a better way to organise than Waterfalls.
  1. Force to ship functional products as frequently as possible to get feedback early and often from the end user.
  1. Requires stopping what you're doing on a regular basis to evaluate progress and problems. 

However, Scrum quickly spread within the tech world as a way for companies to be "agile" without too much structural change. First, Scrum does not require technical practices and can be installed in place at existing waterfall companies doing what is effectively mini-waterfall. Second, such deployment generates little disruption to the corporate hierarchy (and this is the crux of the issue). As a result, Scrum allows managers and executives to feel like something is being done without disturbing the power hierarchy.
Even though the method talks about being flexible and adapting when there are real business needs to adjust to. The higher level of corporation rarely adjusts this approach which relegates scrum to allow companies to move marginally in the direction of agility and declare "mission accomplished". Agile providing a low-tech placebo solution to an organisational aspiration.
Last but not least adopting a methodology for the sake of it is often doomed to fail. If you have a customer that needs a new thing built by a specific date. Then scrum is less than ideal as it requires the flexible date and profoundly involved customer stakeholders in the process. The waterfall approach would be a better choice as it forces to define the project up front and allows for calling out changes to the plan and thus changes the scope.

It is often disappointing to see claims by consulting firms that organisation needs to adopt agile. It's a piecemeal solution that will only temporary mask deeper organisational problems without the required correct structural change. It's not because your dev teams started to use agile or devops that your organisation as a whole suddenly became agiler. 

Don't misunderstand this blog post as a complete rejection of the principle of scrum and agile. It's not. The core ideas are awesome and should be adopted where it suits. Other methodologies such as waterfall, devops, etc.. have also their place in an organisation depending on the lifecycle stage of the products. However, these need to be adopted alongside organisational change beyond the devs teams to improve the overall operations and efficiency of the company. Without these, it's just a low-tech placebo.

Tuesday, May 29, 2018

[Links of the Day] 29/05/2018 : Tracers performance, Testing Terraform , Virtual-kubelet

  • Benchmarking kernel and userspace tracers : a good recap of what tracing toolkit is out there and the performance tradeoff that comes with them
  • terratest : this is the thing I was looking for, a way to test and validate my terraform script. This will really help the adoption of Terraform I think as it will significantly increase the confidence in Terraform code before deployment.
  • Virtual-Kubelet : that s a really cool concept, and introduce a great dose of flexibility in your Kubernetes cluster deployment. There is already some really exciting solution leveraging it such as the AWS fargate integration. With this, you could implement easily bursting and batching solution or real hybrid k8s solution with virtual kubelet hosted in Azure, Aws and on your private cloud.  

Thursday, May 24, 2018

[Links of the day] 24/05/2018 : TLA+ video course, Pdf Generator, Quantum Algorithms Overview

  • The TLA+ Video Course : if you ever had to design a distributed system and spend the sleepless night thinking about edge cases TLA+ is a godsend. You just spec your system & go. It also gives a huge decrease in cognitive load when you're implementing your system against a TLA+ spec. The hard stuff is already done. You can just glance at the spec to see what preconditions must be checked before an action is performed. No pausing halfway through writing a function as you suddenly think of an obscure sequence of events that breaks your code.
  • ReLaXed : generate PDF from HTML. It supports for Markdown, LaTeX-style mathematical equations, CSV conversion to HTML tables, plot generation, and diagram generation. Many more features can be added simply by importing an existing JavaScript or CSS framework. 
  • Quantum algorithms: an overview survey some known quantum algorithms, with an emphasis on a broad overview of their applications.

Monday, May 21, 2018

[Links of the Day] 21/05/2018 : Automation and Make, FoundationDB, Usenix NSDI18

  • Automation and Make : this is a really good description of best practice for Makefile and automation. 
  • FoundationDB : Apple open source it's distributed DB system, another contender enters the fray. With Spanner on google cloud, CockroachDB and now FoundationDB. The Highly resilient distributed transactional system start to reach widespread usage.  [Github]
  • Usenix NSDI 2018 Notes: a very good overview of NSDI conference, and naturally the morning paper is currently doing a more in-depth analysis of the main papers. [day 2&3]

Friday, May 18, 2018

Hedging GDPR with Edge Computing

Cloud has drastically changed the way companies deal with data as well as compute resource. It is no more constraint by tedious and long procurement process and offers unparalleled flexibility. The next wave of change is currently taking shape. A combination of serverless solution offering ever more flexibility coupled with more significant financial control and at at the edge, where the amount of data, the complexity of applications are driving requirements for local options. 

IoT and AR/VR are the two obvious applications driving enterprise to the edge because of their use of complicated and expensive solution coupled with humongous performance requirement such as extra low latency with no jitters.

However other reasons behind edge computing start to emerge and will probably attract more traditional enterprise because of the advantage conferred by the ultra localisation of data and compute solutions. 

GDPR has the potential to accelerate edge computing adoption. Edge Computing can offer hyper localisation of data storage as well as processing. These features ticks many boxes of the regulatory requirements. With the boom in personal data being generated via the ever-increasing number of consumer devices, like smart watches, smart cars and homes, there are the ever-looming potentials, for a company, to expose themselves to GDPR infractions. Not to mention data ownership, and responsibility can also be a tricky question to answer, for example, who is responsible for the data – the consumer, the watch provider or the vendor?

One of the solutions delivered by edge computing would be to store and process data onsite within the local premise or a delimited geographical perimeter. It would not only offers greater access and guaranteed control. But also enable hyper localisation and regulatory compliance. 

Hence, there is a significant potential market for future Edge computing provider to offer robust regulatory and compliance solution. Look at gaming servers and underage data protection, HR or healthcare information. There is a vast trove of customers that will now see a way of leveraging cloud-like models while maintaining tight geographical and regulatory constraints. One potential would be to offer a form of reverse take over, or merger: Edge computing providers would be invited to leverage existing on-premise infrastructure and turn them into cloud-like serverless solution with strong compliance out of the box. It will allow companies to benefit at low cost from cloud-like flexibility while offering robust regulatory compliance via explicitly exposing and constraining storage and compute operations to specific locations.

Last but not least edge computing providers will be able to facilitate access to local data or processing capability on demand to third parties while having the capability to enforce robust compliance. Opening an entirely new market for market compliant brokerage. By example, customers can allow access to data or extract metadata from the vendor back to the watch provider or its own medical insurance company.  All these interactions being mediated (and charged) by the edge computing provider. 

By becoming the custodian of data at the edge, Edge computing provider can build a two-sided market. Serving data generator, customers, individuals, organisations, aka data issuers and issuer processor one side. Also, on the other side, merchants,  advertisement companies, insurance, etc... aka acquirers and acquirers processor.  Edge computing provider would facilitate the transactions between issuer and acquirer while enabling hyper-local and compliant solutions. A little bit like visa but for data and compute.

Thursday, May 17, 2018

[Links of the Day] 17/05/2018 : Edge Computing and the Red Wedding problem, Vector Embedding utility , Scalability efficiency

  • Towards a Solution to the Red Wedding Problem : interesting look at how to handle massive Read spike while being able to update (write spike ) the content at the same time. The authors propose to leverage edge computing to spread and limit the impact of a write-heavy spike in such network
  • Magnitude : this is a really cool project for those out there dabbling with NLP and vector embedding. This package delivers a fast, efficient universal vector embedding utility.
  • Scalability! But at what COST? : the authors of this paper introduce the concept of measuring the scalability performance of a solution by comparing it to the hardware configuration required before the platform outperforms a competent single-threaded implementation. As always, and often, most system and company do not need a monstrous cluster to satisfy their need. But it's always more glamorous to say: "we used a cluster" rather than: "I upgraded the RAM so the model can fit in memory".

Monday, May 14, 2018

[Links of the Day] 14/05/2018 : Concurrency and Paxos resources, PostgreSQL + docker streaming replication

  • PostDock : This is an interesting project, It aims are delivering a Postgres streaming replication cluster for any docker environment. Sprinkle this with Kubernetes config and you would end up with an RDS equivalent. Even if I still think that on a long run CockroachDB / spanner solution are probably better for cloud deployment.
  • awesome-consensus : Awesome list for Paxos and friends
  • Seven Concurrency Models in Seven Weeks : more concurrency stuff. Excellent (free) book looking at all the important stuff: Threads & locks, functional programming, separating identity & state, actors, sequential processes, data parallelism, and the lambda architecture. 

Thursday, May 10, 2018

[Links of the Day] 10/05/2018 : GDPR guide for devs , Gloo function gateway, HA SQL

  • GDPR - a practical guide for developers : If you are wondering why you are getting so many emails notification regarding the update of Term of services. You need to read this. It's a rather simple explanation of what GDPR means and how it impacts developers. The followup discussion on Hacker News is also a must read as it expands and nuance the article. 
  • Gloo: Gloo is a function ( as in serverless ) proxy router. It is a Functions Gateway service that allows you to compose legacy and serverless services through a single platform. It's built on to of the envoy proxy from solo.io . On interesting bit is that it allow function level routing functionalities that are hard to achieve via standard API gateway such as fan out, canary etc.. [github]
  • phxsql : Tencent high availability MySQL cluster. It aims at guaranteeing data consistency between a master and slaves using Paxos algorithm. This looks promising, however, I would really like to see how it behave using Jespen verification framework. 

Wednesday, May 09, 2018

A look at Google gVisor OCI runtime

Google released a new OCI container runtime: gVisor. This runtime aim at solving part of the security concerns associated with the current container technology stack. One of the big argument of the virtualisation crowd has always been that the lack of explicit partitioning and protection of resource can facilitate “leakage” from the containers to the host or other adjacent containers. 

This stem from the historical evolution of containers in Linux. Linux has no such concept of native containers. Not like BSD with Jails or Solaris with Zones. Containers in Linux are the result of the gradual emergence of a stack of various security and isolation technology that was introduced in the Linux kernel. As a result, Linux ended up with a highly broad technology stack that can be separately turned on / off or tuned. However, there is no such thing as a pure sandbox solution. The classic jack of all trade curse, it’s a master of none solution.
The Docker runtime (Containerd) package all the Linux kernel isolation and security stack (namespaces, cgroup, capabilities, seccomp, apparmor, SELinux) into a neat solution that is easy to deploy and use. 
It allows the user to restrict what the application can do such as which file it can access with which permission or limits resource consumption such as networks, disk I/O or CPU. It allows the applications to share resources without stepping on each other toes happily. Also, limits the risk any of their data being accessed while sitting on the same machine. 
With a correct configuration ( the default one is quite reasonable ) will allow blocking anything that is not authorised and in principle protect from any leak from a malicious or badly coded piece of code running in the container. 

However, you have to understand that Docker has some real limitation already. It has only limited support for user-namespace. The user-namespace allows applications to have UID 0 permission within the containers ( aka root ) while the containers and user running has a lower privilege level. As a result, each container would run under a different user ID without stepping on each other toes. 

All of these features rely on the reliability and security (as in no bugs) of the Linux kernel. Most of Docker advanced feature relies on kernel features. And getting new features is a multi-year effort, it took a while for good resource isolation mechanism percolates from the first RFC to the stable branch by example. As a result, Docker and current container ecosystem are directly dependent on the Linux kernel update inertia as well as its code quality. While being excellent, no system is entirely free of bug, not to mention the eternal race for patching them when they are discovered and fixed. 

Hence the idea is to, rather than having to share the kernel between all the containers, have one kernel per container. Explicitly limiting potential leakage, interference and reduce the attack surface. gVisor adopts this approach, which is not new as KataContainers already implemented something similar. Katacontainers is the result of the fusion of ClearContainer (intel) and runV (hyper). Katacontainers use KVM as a minimalistic kernel dedicated to the container runtime. But, you still need to manage the host machine to ensure fair resource sharing and their securitisation. This additional layer of indirection limits the attack surface as even if a kernel bug is discovered you will be challenged to exploit it to escape to another an adjacent container or underlying one as they are not shared. 

gVisor can use a KVM as kernel; however, it was initially and is still primarily designed around ptrace. User Mode Linux already used the same technique, which is to start a process in userspace for the subsystem that will be running on top. Similarly to a hypervisor model used by virtual machines. All the system calls will be executed using the permission of the userspace process on behalf of the subsystem via an interception mechanism. 

Now, how do you intercept these system calls which should be executed by the kernel? UML and gVisor divert ptrace primary goal ( which is debugging ) and inject a breakpoint in the executable code to intercept and stop for every system call execution. Once caught the new userspace kernel will execute the call on behalf the original process within userspace. It works well, but as you guessed, there is no free lunch. This method what heavily used by the first virtualisation solution. But rapidly, processor vendors realised that offering hardware-specific acceleration method would be highly beneficial ( and sell more at the same time).

KVM and other hypervisor leverage such accelerator. Now you even have AWS and Azure deploying completely dedicated coprocessor for handling virtualization related acceleration. Allowing VM to run almost that the same speed as a bare metal system. 

And like Qemu leveraging KVM, gVisor also offer KVM as underlying runtime environment. However, there is significant work to be done to enable any container to run on to of it. While ptrace allow to directly leverage existing Linux stack, with KVM you need need to reimplement a good chunk of the system to make it work. Have a look at Qemu code to understand the complexity of the task. This is the reason behind the limited set of supported applications as not all syscalls are implemented yet.

As is, gVisor is probably not ready yet for production. However, this looks like a promising solution providing a middle ground between the Docker approach and the Virtualization one while taking some of the excellent ideas coming from the unikernel world. I hope that this technology gets picked up, and the KVM runtime becomes the default solution for gVisor. It will allow the system to benefit from a rock-solid hardware acceleration with all the paravirtualisation goodies such as virtio. 

Monday, May 07, 2018

[Links of the Day] 07/05/2018 : Cryptocurrency Consensus Algorithms , Fast18 conference, Google 2 real world project translation

  • A Hitchhiker’s Guide to Consensus Algorithms: this post provides a quick and easy way to understand the classification of the various cryptocurrency consensus models. It's a gentle introduction to the concept of proof of work vs proof of stakes vs proof of authority vs ... Well, you got it many many more algorithm.
  • Notes from FAST18 : a very good overview of the Storage conference. What is becoming obvious over the years is that a lot of the actual practical implementation of novel distributed storage solution is directly pushed into Ceph. Ceph is poised to become the defacto default private storage solution even if it has a long way to go in term of manageability and automation. I think it stems from the preconception that a lot of operations need a storage admin person. But the projects like Helm are helping it get there.
  • xg2xg : a practical translation table of internal google tech and similar technology available to those that do not work in the chocolate factory. It is a very good list of production-ready project that can be leveraged in many devops (and non-devops) environment.

Thursday, May 03, 2018

[Links of the Day] 03/05/2018 : Fundamental Values of Cryptocurrencies, Kafka SQL streaming engine, SaaS pricing

  • Fundamental Values of Cryptocurrencies and Blockchain Technology : paper looking at the fundamental values of cryptocurrency. The authors propose fundamental models and framework to study the price of digital assets. It's an interesting approach, however, they limited their analysis to only two cryptocurrencies: Bitcoin and Ethereum. While there are loads of other cryptocurrencies out there that either derives from these two or are independent. By solely focusing on these two, there is a feeling that the authors are missing the bigger picture and especially the transient aspect of the cryptocurrency market. 
  • KSQL : this project offers a streaming SQL engine for Kafka. It basically allows you to use the Kafka stream engine with the common SQL model.
  • SaaS Pricing : a real-life example of why the right SaaS pricing model is SOOO.. important. It can make or break your business. And you rarely have more than one shot at it. 

Wednesday, May 02, 2018

State of Intellectual Property protection, or lack of thereof, by cloud services providers

In 2015 AWS started making the news regarding its aggressive “Non-Assert Clause” in its terms of services. This type of clause exists to protect the cloud providers from being sued for patents, copyrighted works or trademarks infringement by their customers in perpetuity. 

Now, most explanations given for the use of these type of clause tends to revolve around open source and patent troll defence. While patent troll defence is rather obvious, open source defence is slightly more tenuous but still valid. If Amazon uses your open source (in some way that is a violation of the license), and you use AWS, you can't sue them. 
However, it became slightly more difficult to digest when such clause is used to fight a patent suit. Which indicate that any cloud provider using such clause could duplicate your products without fear of legal repercussion.

Fast forward a couple of years later, AWS start to feel a little bit more pressure from its competitor and drops its controversial clause in July 2017. This is perceived to be a move to woo the more traditional enterprise market that starts to adopt cloud deployment. The objective is to reassure more legally astute customers.
A month later AWS also decided to mimic to some extent Microsoft Azure patent protection scheme. While not as extensive as Azure offering this is still a start. Microsoft IP shared protection is far more comprehensive than AWS as it expands the company’s existing indemnification policy to include its patents portfolio available to Azure customers to help defend themselves from possible infringement suits; and pledges to Azure customers that if Microsoft sells patents to an NPE they can never be asserted against them. 
The shared protection scheme is quite attractive as well as dangerous for the provider. If one customer falls prey to an IP lawsuit and loses because of a cloud provider infringement. This can become really quickly a complete bloodbath as every customer might become a target. Because of such scheme, the cost can easily snowball. As a result, I can only foresee the biggest player in the fields offering such protection.

I decided to do a quick check to see the current state of the cloud provider terms of services regarding IP protection or lack of thereof. I specifically focused on trying to pinpoint which one used the “Non-Assert Clause” and which one offer Share protection ( to varying degree) in the Table below. 

Cloud Provider
Share protection
No Non-Assert Clause
🗸 (since Aug 2017)
🗸 ( since July 2017)
Digital Ocean
Clause 15.2
Clause 10.3
OVH (VmWare)
Alibaba Cloud

Disclaimer: I am not a lawyer, and this is based on documents available at the time of publication. The terms can change at any time and you are free to try to negotiate terms with your cloud providers yourself. In any case, I recommend having a qualified lawyer to give you advice on this matter.

The result: Pretty much all cloud provider DO NOT use Non-Assert clause except two: Digital Ocean and SAP. While I would understand why Digital Ocean might do that based on their business model and market size. SAP seems a little bit more surprising in their aggressive stance. However, its customers tend to be more legal savvy and might negotiate the clause out more easily.
When it comes to shared protection scheme, half of the provider do not mention any shared protection scheme. As for the other half, your mileage may vary. Azure offer the most comprehensive one. While most of the others that do offer a protection tend to offer legal and/or financial support only.

It is clear that, as more institutional customer move to the cloud, providers will gradually need to offer more legal protection regarding intellectual property risks. However, this might come at a cost and risk that only the biggest one will be able to bear.

Tuesday, May 01, 2018

[Links of the Day] 01/05/2018 : CD over K8s , SQL query parser, NLP annotation tool

  • ApolloThe logz.io continuous deployment solution over kubernetes
  • Query parser : Open Source Tool for Parsing and Analyzing SQL. THis is really interesting tools for a specific use case. How do you map and understand the usage of databases and tables used in an organization that maintains hundred or thousands of systems without central coordination and architecture. 
  • Prodigy : An annotation tool powered by active learning. Made by the creator of Spacy. This really cool tool help you streamline the creation of models and quickly test hypothesis. Not free :( but to be honest, if you use spacy ( like I do ) and need to annotate model. The cost is probably worth it. 
Image result for apollo prodigy

Tuesday, March 06, 2018

[Links of the Day] 06/03/2018 : #AI legal liability, Economists unhealthy obsession with the top 5 journals, Tulip Mania Wasn't

  • Tulip Mania wasn't : Apparently, the often referenced 1637 tulip mania event wasn't irrational.  The authors describe the mechanism behind the events and how the culture and society at the time explain the phenomenon. Moreover, it seems that the story was greatly misrepresented. Anyway, this is a great read and debunk the myth and explains parallels or lack of thereof with the recent bitcoins fade.
  • Top5ITIS :  Economist only considers that papers published in the top 5 journals have a value. Everything else is quickly dismissed. Naturally, this leads to a form of hyper-obsession and resentment between economist. Sadly this does not only happen in the economy field. Many other science fields felt or are falling prey to this "disease".
  • Artificial Intelligence and Legal Liability :  a look at legal liability for artificial intelligence. Criminal liability seems to the big one. However, negligence and warranty might be the real liability that might come back and haunt #AI system vendors. 

Thursday, February 15, 2018

[Links of the Day] 15/02/2018 : #a16z crypto readings, Unlocking network embedding black box, Pouch - Alibaba's OCI implementation

  • The Crypto Canon : a16z crypto readings resources. Really well organised and will bring you up to speed at a gradual pace. It covers a wide range of topics from foundations (& history); and key concepts and beginners’ guides — followed by specific topics such as governance; development, privacy, and security; scaling; consensus; cryptoeconomics and investing; fundraising and token distribution; decentralized exchanges; stablecoins; and cryptoeconomic primitives (crytocollectibles, curation markets, games & culture).
  • Unlocking the black box of network embedding : this can have a huge impact on network graph embedding ( word2vec , skipgraph , etc..). The authors claim that they can provide a theoretical explanation of popular methods used to automatically map the structure and characteristics of networks
  • Pouch : container technology open-source project created by Alibaba Group. Basically the 3000 pounds Asian gorilla decided to push Open container initiative and help make container tech a commodity. It's interesting that we are seeing history repeat itself in the container domain the same way it happened, to some extent, with the virtualization stack. 

Tuesday, February 13, 2018

[Links of the Day] 13/02/2018 : API aware network & security for containers with BPF, Machine Learning Data linter, Promise Theory & money

  • Cilium : layer 3/4 networking management stack offering API-aware Networking and Security for Containers based on BPF. BPF is becoming the de facto tool for any high performance networking software out there. 
  • data-linter : this is one of the must-have tool for ML. These show that the machine learning community is finally maturing out of the tinker age and into a more productivity age. This data linter offer a lightweight, automated sanity checking for ML datasets
  • Promise theoretical analysis of money : an interesting look at money from the network technology point of view. It shows that the classic economic model might need to be revisited as monetary flow edge more toward the operational research/graph theory. 

Tuesday, January 30, 2018

[Links of the Day] 30/01/2018 : Rule of Machine Learning, Dynamic structure of political corruption networks, Bitcoin Price Manipulation

  • Rules of Machine Learning:Best Practices for ML Engineering -This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming. 
  • The dynamical structure of political corruption networks : this is a really fascinating paper presenting an analysis of corruption and the network of individuals participating in corruption. Interesting enough: corruption runs in small groups that rarely comprise more than eight people, in networks that have hubs and a modular structure that encompasses more than one corruption scandal.
  • Price Manipulation in the Bitcoin Ecosystem : A single actor likely drove the USD/BTC exchange rate from $150 to $1000 in 2 months.

Thursday, January 25, 2018

[Links of the Day] 25/01/2018 : VPS comparison, Scale free network are rare and Data mining OCR PDFs

  • VPS Comparison : this provides a very good overview of the different VPS provider out there. Obviously not complete, but hey, I would love to see a benchmark community driven website maintaining comparison for the different providers. Not just VPS, cloud / paas / lambda etc..
  • Scale-free networks are rare : the ideal scale-free network does not really happen that often in the wild ( of the internet). Maybe its time to go beyond this concept and explore other more realistic avenues for real-world networks ( I'm looking at your distributed network of microservice) 
  • Data Mining OCR PDFs : extracting info from PDF is a nightmare, it's even worse when you have to do OCR and I always considered that tabulation was a no go territory. But looks like somebody actually spent the effort to make it work and it's impressive.

Thursday, January 18, 2018

[Links of the Day] - 18/01/2018 : Stellar Cryptocurrency Consensus protocol, Optimizing linux server for high throughput and low latency, performance impact of meltdown patch on HPC Filesystem

  • Stellar Consensus Protocol : from ripple for to full-blown rewrite. Stellar looks like an impressive protocol addressing many of the shortcoming and risk of Ripple. Also, the authors seem to be smart enough to avoid jumping to fast onto the smart contract aspect as it is a really tough nut to crack. Maybe, with all the mayhem surrounding cryptocurrency, the stellar approach seems to be rather measured. Worth keeping an eye on. 
  • Optimizing web servers for high throughput and low latency : very good post on how to optimise your Linux system. A lot of it has already described many times, but it is never a bad thing to repeat them.
  • The performance impact of Meltdown patches on HPC FS (Lustre) : no surprise here, IO intensive applications are the one the most heavily impacted. However, I wasn't expecting 40% performance penalty and up to 45% for large folders. 

Image result for stellar

Tuesday, January 16, 2018

[Links of the Day] 16/01/2018 : planetary scale DB - AntidoteDB, Benchmarks for Machine Learning and the hardware running the algorithms

  • AntidoteDB : large scale ( planet-scale ) distributed DB system. Competing with the like of cockroachDB or spanner. The core differentiator the architecture heavily rely on CRDT for its core functionality. It is a spin-off from the SyncFree EU research project. Sadly like a lot of EU or research-driven startup spin-off the documentation and website are slightly lacking polish. The architecture reference link is broken and a lot of stuff seems to be work in progress. Common guys! If you want to build a community and a product you really need to pick up the pace. This project has great potential, don't let it go to waste. 
  • Machine Learning Benchmarks - Hardware Provider : a very good survey of machine learning benchmark of the current cloud provider. What is even more useful from that benchmark is that you get a cost overview of running ML application. Which is often a big unknown at the moment. 
  • DeepMind Control Suite : benchmark suite for machine learning algorithms using a set of continuous control tasks with a standardised structure and interpretable rewards

Thursday, January 11, 2018

[Links of the Day] 11/01/2018 : Two machine learning conference NIPS 2017 & Robot Learning CoRL 2017 , CS Paper ML detector can still be fooled too easily

  • Nips : This conference is considered one of the biggest events in ML\DNN Research community. Here are two sets of notes from the conference by ‎Olga Liakhovich and by David Abel. These are two fairly long article but worth a read. Looks like fairness and bias is one of the big topics of the moment. Also, I like how ML is compared to alchemy. The current approach is extremely fragile, tailor-made and not fully understood. Too often machine learning tools are considered black box where you shove in data at one end and get a result on the other. 
  • Conference on Robot Learning (CoRL) : robot and machine learning are converging at an aggressive pace. It is rather impressive how all these different aspects of computer science are clicking together and with each small improvement in each domain lead to an overall jump in robotic capability. 
  • Adversarial Examples that Fool Detectors : last but not least, common machine learning classifiers are still way too fragile and can be easily fooled. With the boom in use of ML technique everywhere. This can become really quickly a problem in the near future. 

Tuesday, January 09, 2018

[Links of the day] 09/01/2018 : Learned index structures, 2 paper on Human behavior : herding and stubbornness in Jury deliberation, overconfidence is universal?

  • The Case for Learned Index Structures : as we performance progression for single code cpu slow down ( not to mention spectre and meltdown slowing down existing one). Application moves to a distributed model to scale. As a result databases and distributed systems are forced to become more data-aware to achieve efficiency and performance. This is a very nice paper that demonstrates that data structures often contain components that are learnable and machine learning system can help optimise those data structures. 
  • Evidence of Herding and Stubbornness in Jury Deliberations : human do not rely on logic for important decision and try to coherence fellow human to fit its opinion... While this is widely know, we now have a good hint that this even happens in the judicial system of trial by jury. That or too many people saw twelve angry men. 
  • Overconfidence Is Universal? : interesting paper trying to understand how to identify overconfidence and if this behaviour is more predominant in a certain type of population or gender.