Tuesday, June 30, 2020

Data is the new oil fueling Machine learning adoption but Businesses are discovering #AI is no silver bullet

Data is the new oil. However, unlike oil, as data scarcity is becoming less of a problem, processing costs are skyrocketing. The business world is waking up to the fact that while the cost of computing keeps getting cheaper all the time. The cost of training machine learning models is outpacing the compute cost drop.

Moreover,  business are finding challenging to adopt #ai, and the economist report numbers are showing how often #machinelearning projects in the real business world fail :
  • Seven out of ten said their #ai projects had generated little impact so far.
  • Two-fifths of those with “significant investments” in ai had yet to report any benefits at all.
Companies are finding that #machinelearning is not the promised silver bullet. The non-tech company are discovering what tech companies had to learn the hard way: that they are no Google, Facebook, ...

To successfully deploy an AI/ML/DL project you need: a vast amount of data, skilled employee, solid engineering practice, access to infrastructure and last but not least, a clear understanding of the business problem.

I have a false hope that corporation will abandon the silver bullet thinking, but I would settle for avoiding another #ai winter cycle.

[Links of the Day] 30/06/2020 : Homomorphic encryption for Machine Learning, Neural Network on Silicon, Python Graph visualization library

  • PySyft : Python framework for homomorphic encryption for Machine learning. It allows you to train model on encrypted data without the need to decrypt it. It's 40x slower than normal method but you this means you don't have to deal with the new EU regulation on AI. 
  • Neural Networks on Silicon : a collection of papers and works on Neural Networks on Silicontopic
  • Pygraphistry Python visual graph analytics library to extract, transform, and load big graphs in Graphistry 

Thursday, June 25, 2020

[Links of the Day] 25/06/2020 : Architecture decision record, Database stress test, Rust Network Function Framework

  • Architecture decision record : Methods and tools for capturing software design choices that address a functional or non-functional requirement that is architecturally significant. [template]
  • pstress : Perconna Database concurrency and crash recovery testing tool
  • capsule : A framework for network function development. If you want to do fast packet process in a memory safe programing language (RUST) this is for you.

Tuesday, June 23, 2020

The rise of Domain Specific Accelerators

Two recent articles indicate a certain pick up of Domain-specific Accelerators adoption. With the end of Moore's Law, domain-specific hardware solution remains one of the few paths to continuing to increase the performance and efficiency of computing hardware.

For a long time, domain-specific Accelerators adoption was limited by economics factors. Historically, the small feature sizes, small batch sizes, and high cost of fab time (for ASICs) translated in a prohibitive per unit cost.
However, economic factors have shifted :

  • move toward standardised opensource tooling,
  • more flexible licensing model,
  • RISC-V architecture coming of age and maturing rapidly
  • Fab cost dropping
  • Wide availability of FPGA (AWS F1)
  • Rise of co-designed high-level programming language reducing the learning curve and design cycle.
  • power/performance wall of general-purpose compute unit

We are about to see a dramatic shift toward heterogeneous compute infrastructure over the next couples of years.

[Links of the Day] 23/06/2020 : Thinking while moving, AI snake oil, Graph Database

  • Thinking While Moving too often current machine learning system is used in a rigid control loop. Leading to saccades. The authors of this paper propose concurrent execution of the fingering system with the controlled system. Allowing more fluid operations and shorter execution time of the task.
  • AI snake oil : a lot of AI solution project fail to return their initial investment. Too many buzzwords and not enough understanding of the limits of the current technology. At least NVIDIA is selling GPU by the millions. When there is a gold rush, the one making a fortune is the one selling shovels.
  • TerminusDB : in-memory graph database management system. it's interesting to see that 99% of the source code is Prolog and they JSON-LD as the definition, storage and interchange format for the query language. The original use case for this solution targeted financial data stored as time series but lacking graph correlation.

Monday, June 22, 2020

Stop throwing GPU at HPC, Not all scientific problems are compute dense

The current race to exascale has put a heavy emphasis on GPU-based acceleration at the detriment of other HPC architecture. However, Crossroads and Fugaku supercomputer are demonstrating that it is not all about GPU.

The vast majority of the (pre-)exascale machines are relying heavily on GPU acceleration targeting scientific problems that can be cast as dense matrix-matrix multiplication problems.

However, there are large numbers of scientific problems that are not compute dense. And such GPU architectures are ill-equipped to accelerate these problems. Sadly, the current trends seem to have relegated those type of scientific challenges to second class citizens in the HPC world. If you look at extreme-scale graph problems by example, the graph500 benchmark clearly shows that these type of problem have been orphaned. 4 out of ten systems are more than seven years old and nearing their end of life. Moreover, the newer systems show marginal progress toward accelerating extreme-scale graph traversal. 

I understand that the current machine learning hype heavily influences the HPC ecosystem. However, we have to remind ourselves that there is life beyond FLOPS. And the Fugaku and Crossroads system demonstrates it is possible to achieve strategic compute leadership without sacrificing the architecture to the altar of exaflop compute dense gods. 

The Japanese latest ARM-based Fugaku supercomputer is demonstrating that it can address both compute dense GPU optimised and the one that not reducible to dense linear algebra and therefore incompatible with GPU technologies. The Japanese supercomputer built around the ARM v8.2A A64FX CPU just picked up the number one in the HPC Top 500 Green benchmark and the Graph500 BFS benchmark.

Hopefully, this will be a wake-up call within the HPC community to properly fund R&D efforts orthogonal to the compute dense and exaflops benchmark friendly architecture.

Update 22/06/2020 : after publishing this article Fugaku just got ranked Nb 1 at the top 500 Linpack benchmarks with close to half an exaflops! (415 Petaflops).And Fugaku is pretty much topping every single HPC ranking :

  • Top500 : Nb 1.
  • Top500-Green : Nb 1.
  • HPCG : Nb 1.
  • HPL-AI: Nb 1.
  • Graph500 : Nb 1.

Friday, June 19, 2020

With enough data and/or fine tuning, simpler models are as good as more complex models

This is an age-old issue that seems to repeat itself in every field. There are a couple of recent papers published criticising the race to beat SOTA.

This recent paper demonstrates that older and simpler model perform as well as newer models as long as they get enough data to train.

This has some interesting impact on production systems. As if you already have a good enough model, throwing more data at it can help achieve close to SOTA result.
Which means that you won't have to build from scratch a new model to keep up with SOTA in your production system. You just need to collect more data as the system run and retrain your model once in a while.
Also, less complex models tend to have shorter Inference time in production. Which would be a quite crucial component as well that gets impacted by model complexity.

In another recent paper, the authors look at Metric learning papers from the past four years and demonstrate that the performance claims over the old method (often more than double) are mainly due to the lack of tuning.
Most of the time the authors of the SOTA beating algorithm show two evaluations. One where they finetune their algorithm on the test set and compare against the off the shelf tuning SOTA algorithm.

"Our results show that when hyperparameters are properly tuned via cross-validation, most methods perform similarly to one another"

"...this brings into question the results of other cutting edge papers not covered in our experiments. It also raises doubts about the value of the hand-wavy theoretical explanations in metric learning papers."
This happens time and time again across the industry and academia: perf benchmark of CPU Intel vs AMD, GPU Nvidia vs ATI, Network, Storage, etc....
This can be due to lack of knowledge, time, integrity, etc..

To conclude, be careful, the latest shiny model might note the best one for your production. If you spend enough time and data on older models you might achieve the same performance at lower inference cost.
Obviously, this assumes that you already have the best practice when it comes to model monitoring in production :)

Thursday, June 11, 2020

[Links of the Day] 11/06/2020 : Metric Time-Series Database, Machine Learning for metrics, Causal Time series Analysis

  • Victoria Metrics : fast, cost-effective and scalable time-series database, if you need a backend for Prometheus, by example, this is the DB for you.
  • Sieve : a platform to derive actionable insights from monitored metrics in distributed systems. the platform is composed of two separate systems. One geared toward trace reduction and selection with intelligent sampling using a form of zero-positive learning. And a second system that extracts correlations between the services generating the traces.
  • Tigramite : causal time series analysis python package. It allows to efficiently reconstruct causal graphs from high-dimensional time-series datasets and model the obtained causal dependencies for causal mediation and prediction analyses [github]

Tuesday, June 09, 2020

[Links of the Day] 09/06/2020 : #WASM on #K8S , Fast Anomaly Detection on Graphs, Linux one ring

  • Krustlet : it seems that web assembly is getting more pervasive, we have kernel WASM, WASM for Deep learning, now Krustlet offer WASM for Kubernetes via Kubelet.
  • Fast Anomaly Detection in Graphs :  Really cool real-time anomaly detection on dynamic graphs.  the authors claim to be 644 times faster than SOTA with 42-48% higher accuracy. What is even more attractive is the constant memory usage which is fantastic for production deployment. [github
  • io_uring : this will dominate the future of the Linux interface. It is currently eating up every single IO interface and probably won't stop just there. 

Saturday, June 06, 2020

Yet another Red Queen Project : Franco-German Gaia-X

For some reason, the EU and especially the French government love moonshot project. The only problem is that they tend to be launched after the moon as already been colonized.

Gaia-X is not a moonshot, but a Red Queen project. I use this term in reference to the Red Queen hypothesis or Red Queen effect, which is derived from Lewis Carroll's Through the Looking-Glass :
Now, here, you see, it takes all the running you can do, to keep in the same place.

Gaia-X is a Red queen project because the French and German government (and the EU to some extend) are trying to forcefully evolve the digital ecosystem to stay in the same place. Also, because they always launch this initiative way too late or without any long term strategic planning both in term of funding and impact. 

Let's look at Gaia-x and why there is an air of "deja vu". First, it's not a cloud service; it's a "platform" aggregating cloud hosting services from dozens of companies. Does that remind you of anything? Bingo, the European cloud initiative, which aim at : 
"Strengthen Europe's position in data-driven innovation, improve its competitiveness and cohesion, and help create a Digital Single Market in Europe."

This initiative started back in 2012; at the time, I didn't get the strategy and structure of the effort. And unfortunately, I still don't. EU wanted to regulate and impose EU standard to the industry hoping to spruce the EU cloud ecosystem via standards and funding sprinkling. I use the term sprinkling because EU thought that by seeding a constellation of research projects and local initiatives it would magically help sprout an EU cloud giant.

The "standard effort" side of the program seems to have fizzled out. Judge by yourself: The official final report is here.

Gaia-x seems to be an offshoot of the European Cloud Partnership side of the cloud initiative, aiming at increasing trust when using cloud services: 
"it's (Gaia-X) conceived as a platform joining up cloud-hosting services from dozens of companies, allowing businesses to move their data freely with all information protected under Europe's tough data processing rules."

Compounding with the regulatory compliance spin, the project promoters cannot refrain themselves from using the vendor lock-in FUD: 
"One important concept underpinning Gaia-X is "reversibility", a principle that would allow users to switch providers quickly. " 

They conveniently forgot to mention that by using Gaia-x, you will be replacing provider lock-in for platform lock-in. 

If you dig a little bit on the technical side you find out that this reads more like a program to keep academic research institutes busy and rehashes fantasies of dynamically matching service providers to consumers and policies. Dynamic matching was something that was a hot topic in academia during the SOAP times but isn't used at all in practice. Moreover, it doesn't use any established logic programming paradigm and re-invents an ad-hoc service ontology/taxonomy and query language.

Last but not least, one of the glaring omission from the platform is the complete lack of specification regarding a common accounting, payment and monetization of services. Where is the processing and payment service? It is conveniently absent.

Providing an accounting and payment platform for dynamically orchestrated services from a multitude of providers is not only hard. It's near impossible. Without this crucial element, the platform is stillborn.

If France and Germany want to avoid turning Gaia-X into another Qwant. Maybe pivoting the platform to a more niche domain such as a government and large company cloud services procurement platform. This would fit right in the compliance, sovereignty and interoperability narrative as well as the business profile of most of the consortium participants.

Thursday, June 04, 2020

[Links of the Day] 04/06/2020 : XoR filters, SIMD + Json , Online tracking and publisher's revenues

  • Xor Filters :  Xor filters are great as they provide a fast and small version of bloom or cuckoo filter. However, there is some key difference. Xor filters require all the members of the set be provided upfront. While, Bloom filters allow adding members, but not removing them and finally Cuckoo filters allow removing members. So just pick what's best for you.
  • SimdJson : nice performance leveraging the CPU feature. However, the lack of support for null entry feel like cheating ( and probably crash with the most common real-life payload)
  • Online Tracking and Publishers' Revenues : The authors demonstrate that the use of cookie only represent a 4% increase of revenue vs non-cookie for an advertiser. Which brings question the differential benefit between ad publisher like Google and Facebook vs the advertiser. Bringing into question why should advertiser pay for the loss of privacy that only benefits their platform provider.

Tuesday, June 02, 2020

[Links of the Day] 02/06/2020 : Real time network topology, Detecting node failure using graph entropy, Monitoring machine learning in production

  • Skydive : open source real-time network topology and protocols analyzer providing a comprehensive way of understanding what is happening in your network infrastructure.
  • Vertex : the authors propose to use vertex entropy for detecting and understanding node failures in a network. By understanding the entropy in a graph they are able to circumvent the lack of locality in the information available and pinpoint critical nodes. 
  • Monitoring Machine Learning Models in Production :  Once you have deployed your machine learning model to production it rapidly becomes apparent that the work is not over.