Thursday, June 04, 2020

[Links of the Day] 04/06/2020 : XoR filters, SIMD + Json , Online tracking and publisher's revenues

  • Xor Filters :  Xor filters are great as they provide a fast and small version of bloom or cuckoo filter. However, there is some key difference. Xor filters require all the members of the set be provided upfront. While, Bloom filters allow adding members, but not removing them and finally Cuckoo filters allow removing members. So just pick what's best for you.
  • SimdJson : nice performance leveraging the CPU feature. However, the lack of support for null entry feel like cheating ( and probably crash with the most common real-life payload)
  • Online Tracking and Publishers' Revenues : The authors demonstrate that the use of cookie only represent a 4% increase of revenue vs non-cookie for an advertiser. Which brings question the differential benefit between ad publisher like Google and Facebook vs the advertiser. Bringing into question why should advertiser pay for the loss of privacy that only benefits their platform provider.

Tuesday, June 02, 2020

[Links of the Day] 02/06/2020 : Real time network topology, Detecting node failure using graph entropy, Monitoring machine learning in production

  • Skydive : open source real-time network topology and protocols analyzer providing a comprehensive way of understanding what is happening in your network infrastructure.
  • Vertex : the authors propose to use vertex entropy for detecting and understanding node failures in a network. By understanding the entropy in a graph they are able to circumvent the lack of locality in the information available and pinpoint critical nodes. 
  • Monitoring Machine Learning Models in Production :  Once you have deployed your machine learning model to production it rapidly becomes apparent that the work is not over.

Thursday, May 28, 2020

[Links of the Day] 28/05/2020 : Reverse Oauth proxy , Prometheus timeseries backend, Google use machine learning to improve audio chat

  • Oauth Proxy : A reverse proxy that provides authentication with Google, Github or other providers.
  • Zebrium : Prometheus backend project, not sure why you don't want to just export the data from Prometheus into a distributed column-store data warehouse like Clickhouse, MemSQL, Vertica. This gives you fast SQL analysis across massive datasets, real-time updates regardless of order, and unlimited metadata, cardinality and overall flexibility. Maybe because they want to focus on the monitoring / reactive aspect and less on the analytics.
  • Improving Audio Quality with WaveNetEQ : Google uses machine learning to deal with packet loss, jitter, and delays. An interesting bit of info: " 99% of Google Duo calls need to deal with packet losses, excessive jitter or network delays. Of those calls, 20% lose more than 3% of the total audio duration due to network issues, and 10% of calls lose more than 8%."

Wednesday, May 27, 2020

[Links of the Day] 27/05/2020 : Combining Knowledge graphs, Real World storage resource management, Jespen black box transactional safety checker

  • Combining knowledge graphs : The authors describe a new entity alignment technique that factors in information about the graph in the vicinity of the entity name. It provides a 10% higher accuracy while reducing computational cost for model generation.
  • Wizard : project looking into real-world storage reliability for cost-effective data and storage resource management system for reliability enhancement.
  • Elle : Jespen black-box transactional safety checker based on cycle detection. You can find more in the Arxiv Paper by Kyle Kingsbury and Peter Alvaro : "Elle: Inferring Isolation Anomalies from Experimental Observations"

Tuesday, May 19, 2020

[Links of the Day] 19/05/2020 : Fat Tails, AutoML Zero , Version Control for GIS

  • Statistical Consequences of Fat Tails : Nassim Nicholas Taleb book investigates the misapplication of conventional statistical techniques to fat-tailed distributions and looks for remedies
  • AutoML Zero : aims to automatically discover computer programs that can solve machine learning tasks, starting from empty or random programs and using only basic math operations. The goal is to simultaneously search for all aspects of an ML algorithm—including the model structure and the learning strategy—while employing minimal human bias.
  • Sno : Distributed version-control for geospatial and tabular data

Thursday, May 14, 2020

[Links of the Day] 14/05/2020 : Wasm In Linux Kernel, Knowledge Graphs, Contrastive Machine learning Model for Software Performance Regressions

  • Kernel Wasm : Looks like people want to run WASM everywhere. This time the authors propose to run wasm program in the kernel. In this case, I just wonder if it would not be more judicious to try run WASM in EBPF. From the GitHub repo it seems that they might actually try to do the opposite. [github]
  • Knowledge Graphs : A comprehensive introduction to knowledge graphs. If you want to learn more about the knowledge graph I would recommend reading the following paper before reading the arxiv one.
  • A Zero-Positive Learning Approach for Diagnosing Software Performance Regressions : really innovative approach by Intel folks there. I have started to see interesting trends in machine learning where instead of trying to train the ML model using a dataset that contains the whole spectrum of possibility. The authors start to use contrastive methods instead. In this case, the ML model is trained on a non-abnormal dataset in order to identify abnormal behaviour. It is much easier in performance evaluation to obtain ideal, or standard metrics rather than abnormal scenario. In this case, the author uses the ideal hardware performance counter to train their model in order to identify abnormal behaviour. [poster]