- Age partitioned Bloom Filters : Age-Partitioned Blocked Bloom Filter variant
- Open source libraries to deploy, monitor, version and scale your machine learning : A curated list of open source libraries to deploy, monitor, version and scale your machine learning
- Data Sentinel : Linkedin platform for automatically validating the quality of large-scale data in production environments
A blog about life, Engineering, Business, Research, and everything else (especially everything else)
Showing posts with label data. Show all posts
Showing posts with label data. Show all posts
Thursday, May 21, 2020
[Links of the Day] 21/05/2020 : Aging Bloom Filters, Awesome List of Machine learning Production Library, Large scale data quality management platform
Labels:
bloom filter
,
clean
,
data
,
links of the day
,
machine learning
,
open source
Tuesday, November 06, 2018
[Links of the Day] 06/11/2018 : Intro to probabilistic programming, Unit tests for data, Ali Wong stand-up routing analysis
- An Introduction to Probabilistic Programming: a first-year graduate-level introduction to probabilistic programming. It not only provides a thorough background for anyone wishing to use a probabilistic programming system but also introduces the techniques needed to design and build these systems.
- deequ : library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
- Ali Wong structure of stand up comedy : fantastic and beautiful designed article analysing Ali wong stand up routine and how it is closer to a tv/movie script than a slapstick one-line joke comedy.
Labels:
comedy
,
data
,
links of the day
,
probabilistic
,
programming
,
unit test
Tuesday, September 18, 2018
[Links of the Day] 18/09/2018 : Data transfer project, Observability pipeline, and Operating systems Book
- Data Transfer Project : open-source, service-to-service data portability platform. Not really sure who would want to transfer data between facebook - google and Microsoft from a privacy point of view... But there is probably a use case.
- Veneur : distributed, fault-tolerant pipeline for observability data. This is a really cool project that allows to for aggregate metrics and sends them to downstream storage to one or more supported sinks. It can also act as a global aggregator for histograms, sets and counters. The key advantage of this approach is that you only maintain, store ( and pay for ) the aggregated data rather than the tons of separate data points.
- Operating Systems - Three Easy Pieces : free operating system book centred around three conceptual pieces that are fundamental to operating systems: virtualization, concurrency, and persistence
Labels:
book
,
data
,
data transfer
,
distributed system
,
links of the day
,
observability
,
operating system
,
pipeline
,
platform
Tuesday, June 26, 2018
[Links of the Day] 26/06/2018 : How economist got Brexit wrong, Driving data set, CRDT @ redis
- How the economics profession got it wrong on Brexit : Economist got the economy wrong... News at 11 .. Anyway, it's a very good analyse of the pitfalls that the various group fell into. And a good read to get a better understanding of the UK economy and how to reacts to large socio-economic events.
- BDD100K : want data for your driverless car ?? Berkeley got you covered. [data][paper]
- CRDT @ redis : I love CRDT and this talk about their use in Redis.
Labels:
brexit
,
car
,
crdt
,
data
,
economics
,
links of the day
,
machine learning
,
redis
Subscribe to:
Posts
(
Atom
)