Thursday, May 28, 2020

[Links of the Day] 28/05/2020 : Reverse Oauth proxy , Prometheus timeseries backend, Google use machine learning to improve audio chat

  • Oauth Proxy : A reverse proxy that provides authentication with Google, Github or other providers.
  • Zebrium : Prometheus backend project, not sure why you don't want to just export the data from Prometheus into a distributed column-store data warehouse like Clickhouse, MemSQL, Vertica. This gives you fast SQL analysis across massive datasets, real-time updates regardless of order, and unlimited metadata, cardinality and overall flexibility. Maybe because they want to focus on the monitoring / reactive aspect and less on the analytics.
  • Improving Audio Quality with WaveNetEQ : Google uses machine learning to deal with packet loss, jitter, and delays. An interesting bit of info: " 99% of Google Duo calls need to deal with packet losses, excessive jitter or network delays. Of those calls, 20% lose more than 3% of the total audio duration due to network issues, and 10% of calls lose more than 8%."

Wednesday, May 27, 2020

[Links of the Day] 27/05/2020 : Combining Knowledge graphs, Real World storage resource management, Jespen black box transactional safety checker

  • Combining knowledge graphs : The authors describe a new entity alignment technique that factors in information about the graph in the vicinity of the entity name. It provides a 10% higher accuracy while reducing computational cost for model generation.
  • Wizard : project looking into real-world storage reliability for cost-effective data and storage resource management system for reliability enhancement.
  • Elle : Jespen black-box transactional safety checker based on cycle detection. You can find more in the Arxiv Paper by Kyle Kingsbury and Peter Alvaro : "Elle: Inferring Isolation Anomalies from Experimental Observations"

Tuesday, May 19, 2020

[Links of the Day] 19/05/2020 : Fat Tails, AutoML Zero , Version Control for GIS

  • Statistical Consequences of Fat Tails : Nassim Nicholas Taleb book investigates the misapplication of conventional statistical techniques to fat-tailed distributions and looks for remedies
  • AutoML Zero : aims to automatically discover computer programs that can solve machine learning tasks, starting from empty or random programs and using only basic math operations. The goal is to simultaneously search for all aspects of an ML algorithm—including the model structure and the learning strategy—while employing minimal human bias.
  • Sno : Distributed version-control for geospatial and tabular data

Thursday, May 14, 2020

[Links of the Day] 14/05/2020 : Wasm In Linux Kernel, Knowledge Graphs, Contrastive Machine learning Model for Software Performance Regressions

  • Kernel Wasm : Looks like people want to run WASM everywhere. This time the authors propose to run wasm program in the kernel. In this case, I just wonder if it would not be more judicious to try run WASM in EBPF. From the GitHub repo it seems that they might actually try to do the opposite. [github]
  • Knowledge Graphs : A comprehensive introduction to knowledge graphs. If you want to learn more about the knowledge graph I would recommend reading the following paper before reading the arxiv one.
  • A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions : really innovative approach by Intel folks there. I have started to see interesting trends in machine learning where instead of trying to train the ML model using a dataset that contains the whole spectrum of possibility. The authors start to use contrastive methods instead. In this case, the ML model is trained on a non-abnormal dataset in order to identify abnormal behaviour. It is much easier in performance evaluation to obtain ideal, or standard metrics rather than abnormal scenario. In this case, the author uses the ideal hardware performance counter to train their model in order to identify abnormal behaviour. [poster]

Tuesday, May 12, 2020

[Links of the Day] 12/05/2020 : Learning From Unlabeled Data, Fast Dataset Classifier, Azure Bad Rollout guardian

  • Learning From Unlabeled Data : Slidedeck of a talk by Thang Luong of Google research. Thang present a novel method for learning from unlabeled data and more specifically semi-supervised learning methods. These methods were used to generate Google Meena Chatbot model.
  • Flying Squid : Looks like a super-fast Snorkel with even better performance. Like Snorkel this is used to quickly building classifiers of datasets that would be otherwise extremely time-consuming (and expensive) to label by hand for training purposes.
  • Gandalf : Azure machine learning system trained to catch bad rollout deployment. The aims of this system is to catch bad deployment before they can have ripple effects across the whole system.

Thursday, May 07, 2020

[Links of the Day] 07/05/2020 : Startup tactical manuals, AutoML pipeline, Thread Caching Malloc

  • Tactical manuals and guides for startups : an awesome collection of strategic posts, essays or documents for startups. While these are great resources, it doesn't replace experience.
  • AutoML Pipeline : The power of Juila meet Machine learning. However, beware as just feeding data into a system and hoping to get the best result coming out without any effort is doomed to deliver sub-optimal results. Often you end up with an ok-ish solution that blows up in production down the line.
  • Tcmalloc : Google Thread Caching Malloc