Reflections Of The Void: 01/01/2018

Tuesday, January 30, 2018

Rules of Machine Learning:Best Practices for ML Engineering -This document is intended to help those with a basic knowledge of machine learning get the benefit of best practices in machine learning from around Google. It presents a style for machine learning, similar to the Google C++ Style Guide and other popular guides to practical programming.
The dynamical structure of political corruption networks : this is a really fascinating paper presenting an analysis of corruption and the network of individuals participating in corruption. Interesting enough: corruption runs in small groups that rarely comprise more than eight people, in networks that have hubs and a modular structure that encompasses more than one corruption scandal.
Price Manipulation in the Bitcoin Ecosystem : A single actor likely drove the USD/BTC exchange rate from $150 to $1000 in 2 months.

VPS Comparison : this provides a very good overview of the different VPS provider out there. Obviously not complete, but hey, I would love to see a benchmark community driven website maintaining comparison for the different providers. Not just VPS, cloud / paas / lambda etc..
Scale-free networks are rare : the ideal scale-free network does not really happen that often in the wild ( of the internet). Maybe its time to go beyond this concept and explore other more realistic avenues for real-world networks ( I'm looking at your distributed network of microservice)
Data Mining OCR PDFs : extracting info from PDF is a nightmare, it's even worse when you have to do OCR and I always considered that tabulation was a no go territory. But looks like somebody actually spent the effort to make it work and it's impressive.

Stellar Consensus Protocol : from ripple for to full-blown rewrite. Stellar looks like an impressive protocol addressing many of the shortcoming and risk of Ripple. Also, the authors seem to be smart enough to avoid jumping to fast onto the smart contract aspect as it is a really tough nut to crack. Maybe, with all the mayhem surrounding cryptocurrency, the stellar approach seems to be rather measured. Worth keeping an eye on.
Optimizing web servers for high throughput and low latency : very good post on how to optimise your Linux system. A lot of it has already described many times, but it is never a bad thing to repeat them.
The performance impact of Meltdown patches on HPC FS (Lustre) : no surprise here, IO intensive applications are the one the most heavily impacted. However, I wasn't expecting 40% performance penalty and up to 45% for large folders.

AntidoteDB : large scale ( planet-scale ) distributed DB system. Competing with the like of cockroachDB or spanner. The core differentiator the architecture heavily rely on CRDT for its core functionality. It is a spin-off from the SyncFree EU research project. Sadly like a lot of EU or research-driven startup spin-off the documentation and website are slightly lacking polish. The architecture reference link is broken and a lot of stuff seems to be work in progress. Common guys! If you want to build a community and a product you really need to pick up the pace. This project has great potential, don't let it go to waste.
Machine Learning Benchmarks - Hardware Provider : a very good survey of machine learning benchmark of the current cloud provider. What is even more useful from that benchmark is that you get a cost overview of running ML application. Which is often a big unknown at the moment.
DeepMind Control Suite : benchmark suite for machine learning algorithms using a set of continuous control tasks with a standardised structure and interpretable rewards

Nips : This conference is considered one of the biggest events in ML\DNN Research community. Here are two sets of notes from the conference by ‎Olga Liakhovich and by David Abel. These are two fairly long article but worth a read. Looks like fairness and bias is one of the big topics of the moment. Also, I like how ML is compared to alchemy. The current approach is extremely fragile, tailor-made and not fully understood. Too often machine learning tools are considered black box where you shove in data at one end and get a result on the other.
Conference on Robot Learning (CoRL ) : robot and machine learning are converging at an aggressive pace. It is rather impressive how all these different aspects of computer science are clicking together and with each small improvement in each domain lead to an overall jump in robotic capability.
Adversarial Examples that Fool Detectors : last but not least, common machine learning classifiers are still way too fragile and can be easily fooled. With the boom in use of ML technique everywhere. This can become really quickly a problem in the near future.

The Case for Learned Index Structures : as we performance progression for single code cpu slow down ( not to mention spectre and meltdown slowing down existing one). Application moves to a distributed model to scale. As a result databases and distributed systems are forced to become more data-aware to achieve efficiency and performance. This is a very nice paper that demonstrates that data structures often contain components that are learnable and machine learning system can help optimise those data structures.
Evidence of Herding and Stubbornness in Jury Deliberations : human do not rely on logic for important decision and try to coherence fellow human to fit its opinion... While this is widely know, we now have a good hint that this even happens in the judicial system of trial by jury. That or too many people saw twelve angry men.
Overconfidence Is Universal? : interesting paper trying to understand how to identify overconfidence and if this behaviour is more predominant in a certain type of population or gender.