Thursday, April 30, 2020

Tuesday, April 28, 2020

[Links of the Day] 28/04/2020 : Distributed Time Series Database, Data lakes, Translate data between format

  • Modern data lakes : if you think you need a data lake, you probably don' need one and are better off using S3/athena or GCP/bigquery . If you know you want a data lake you might be mature enough to need one and should read this article.
  • M3DB : Distributed Time Series database from Uber, it tries to address horizontal scaling of storage and queries or long term storage limitation of existing solutions.
  • ConfBase : a practical tool for inferring and instantiating schemas and translate between data formats. The tools support JSON, GraphQL, YAML, TOML, and XML. [github]


Thursday, April 23, 2020

[Links of the Day] 23/04/2020 : Machine Learning technical debt, Python and Bayesian Deep Learning perspective of generalization



Tuesday, April 21, 2020

[Links of the Day] 21/04/2020 : Machine Learning for Relational Query processing, Augmenting Language model with latent knowledge retriever, Computer Vision Recipies

  • Extending Relational Query Processing with ML Inference : the authors present advanced cross-optimizations between ML and DB operators in Raven DB. The authors demonstrate significant performance improvement, up to 5.5x from the native integration of ML in SQL Server, and up to 24x from cross-optimizations.
  • REALM: Retrieval-Augmented Language Model Pre-Training : the authors propose to leverage Retrieval-Augmented Language Model pre-training for the challenging task of Open-domain Question Answering.By using augmenting their model with latent knowledge retriever they are able to beat current SOTA models while limiting the model growth size.
  • Computer Vision Recipies : Microsoft is releasing a lot of really good content, this time it's for computer vision. In this repository, you will find best practices, code samples, and documentation for Computer Vision.


Thursday, April 16, 2020

[Links of the Day] The future of Machine Learning is DBMS, Fun Exploring Explanations, High performance Regex

  • Hyperscan : high-performance multiple regex matching library
  • Cloudy with a chance of DBMS : A. Colier reviews the 10 year ML prediction paper. the TL;DR: Model, model everywhere in enterprise databases. You have already seen of a glimpse of what that means with Big-query ML
  • Explorables : awesome website explaining a lot of concepts through play. Lot of computer science stuff in there. 

Tuesday, April 14, 2020

[Links of the Day] 14/04/2020 : Time series dynamical attractors Autoencoder , Binarized Neural Network framework, Machine learning and Databases

  • Deep learning of dynamical attractors from time series measurements : the authors propose a general embedding technique for time series, consisting of an autoencoder trained with a novel latent-space loss function. Worth giving it a look if you deal with time series.
  • larq : open-source Python library for training neural networks with extremely low-precision weights and activations, such as Binarized Neural Networks. Basically, this framework is aiming at embedded / FPGA / ASIC machine learning models deployment. A fantastic resource and great model zoo on top of that.
  • Cloudy with a chance of DBMS : Databases are going to embedded more and more machine learning solution. Big query from Google already does that. But it's just a question of time for most mainstream DB to offer ML service.

Thursday, April 09, 2020

[Links of the Day] 09/04/2020 : TRAX deep Learning library, The next decade in AI, 1:1 questions

  • The Next Decade in AI : Paper by Gary Marcus where he explores the possible future of AI over the next decade
  • 1 on 1 meeting questions : a collection of 1:1 questions, great list that can help any manager pick the right question for the right context. As long as you are able to read the room/ team/ person.
  • Trax: advanced google deep learning library built on top of JAX. It is actively used by the DeepMind team and aiming code clear while providing advanced models like Reformer.


Tuesday, April 07, 2020

[Links of the Day] 07/04/2020 : Incentivizing Innovation, Network Performance analysis, Neural Networks for embedded systems

  • The Effects of Prize Structures on Innovative Performance : how to incentivize innovation? Well, the authors found that a winner-takes-all compensation scheme generates significantly more novel innovation relative to a compensation scheme that offers the same total compensation, but shared across the ten best innovations. However, like every psychological paper, you have to take it with a grain of salt.. reproducibility is always difficult.
  • nfstream : Python package providing fast, flexible, and expressive data structures designed to make working with online or offline network data [github]
  • Neural Networks on embedded systems : a good overview of the challenges and available neural network architectures for running on embedded systems.


Thursday, April 02, 2020

[Links of the Day] 02/04/2020 : Grep all, HealthCare mobile data collection for machine learning, FastAI framework

  • ripgrep : grep search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
  • pymedserver : a server framework for mobile data collection and machine learning in healthcare
  • fastai : fantastic machine learning library trying to abstract away a lot of PyTorch into simple API and building blocks. Sometimes it attracts a bit too much, especially with you want to get murky with some details. But all in all, fastAI is really a framework you want to look at if you are doing machine learning.