- Memory configuration scalability blog series : Part 1: Memory subsystem organisation is out, ( upcoming Memory Subsystem Bandwidth, DDR4 Memory, NUMA Architecture)
- FAST 2015 : 13th USENIX Conference on File and Storage Technologies
- Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth : really interesting work where the authors design design erasure codes that are simultaneously optimal in terms of I/O, storage, and network bandwidth.
- Tale of Two Erasure Codes in HDFS : paper demonstrating the use of two erasure code in order to deliver fast recovery while maintaining low overhead. It uses a fast code to optimize for recovery performance and a compact code to reduce the storage overhead
Tuesday, February 24, 2015
Today's links 24/02/2015: Memory scalability, erasure codes, #FAST15
Monday, February 23, 2015
Today's links 23/02/2015: #Bitcoin K/V store, distributed systems, Job Titles and Latency visualization
- Blockstore : experimental key-value store (for unique, global name registration and secure data storage) on top of Bitcoin.
- Highly Distributed Computations Without Synchronization : this article explore the basic building blocks for crafting deterministic applications that, when operating over data structures that guarantee convergence in the event of concurrent operations, guarantee convergence of the applications themselves.
- The Problem with Job Titles : interesting view on what job title really mean for different people. However this works both way, company misuse job title the same way employee misinterpret them.
- HdrHistogram : histogram designed for recoding histograms of value measurements in latency and performance sensitive applications. [github]
Friday, February 20, 2015
Links of the day 20/02/2015:Probabilistic counters, Rack PCIe Fabric, Complex Networks robustness, High performance Network protocol
- Characterizing Storage Workloads with Counter Stacks : Really great approach to use hyperloglog algorithm to characterize workloads. It allow the identification of working set sizes without the large memory overhead . The key advantage is it allow the generation of accurate MRC for large workload without the tremendous overhead which then can be used to tweak the cache size and for online placement decision.
- Rack Level PCIe fabric : looks like the competition is heating up for the next gen rack level fabric. There is some really nice feature in there, advanced topology ( torus, 3D fat tree), support for native tcp/ip , rdma and native. And last but not least sharing of I/O by the assignment of the VFs of SR-IOV . Not to forget that most ops are sub micro seconds. Its nice to see the rack level fabric competition heating up.
- Improving the Robustness of Complex Networks with Preserving Community Structure : 3-step strategy to improve the robustness of a network, while retaining its community structure, and also its degree distribution.
- Trickles : Stateless High Performance Networking that relies on Transport continuations information within the packet for congestion control algorithm , effectively removing the state maintenance on both side. Creating a stateless protocol at the cost of periodical update signal.
Thursday, February 19, 2015
Today's links 19/02/2015: Technical debt, #Microservices & µ²services, SSD
- Technical debt financial metaphor : using finance term to describe technical debt, however I feel that we need to introduce the concept of refinancing, restructuring of technical debt. Without forgetting the bankruptcy option when there is really no other choice.
- Microservices - stress-free and without increased heart attack risk : a short explanation why µservice are always hard (yes, the title of the presentation is sort of an oxymoron) and what kind of architectural trade-off they are compared to monoliths.
- What every programmer should know about solid-state drives : Part 6 summarizing the key information ( you can find link to the previous 5 parts in the page)
- Micro Micro Services : proposed to push the concept of micro service one step further when you have a single container per request per service and not service alone. There is certain advantage however until the technology to make this approach seamless ( such as flash cloning etc..) it will remain a nice thought exercise ( follow up post here ) .
Wednesday, February 18, 2015
Today's links 18/02/2015: Packet Sender and #Bigdata : Stream processing , Probabilistic methods, PCA
- Stream Processing and Probabilistic Methods: great introduction on how you can leverage probabilistic methods ( hyperloglog, blom filter, etc..) to handle data at scale
- Making sense of stream processing : excellent talk by Martin Kleppmann on how event streams can help make your application more scalable, more reliable and more maintainable. [transcript]
- Packet Sender : open source utility to allow sending and receiving TCP and UDP packets. Really practical when you need to debug network protocols.
- PCA : Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize.
Tuesday, February 17, 2015
Today's links 17/2/2015: #Bigdata statistics theorems, teaching computer to make music, #Micorservices simulator
- Central Limit Theorem : reminder on this crucial theorem for statistics, i.e. how to reason about sample and their impact on our results.
- Teaching Computer to Write Its Own Music : a demonstration that computation could complement human creativity
- Bayes Theorem : the benefit of integrating Bayesian techniques to analyse data vs pure frequentist based approach.
- Spigo : Simulate Protocol Interactions in Go using nanoservice actors. Suitable for fairly large scale simulations, runs well up to 100,000 independent nanoservice actors. Two architectures are implemented. One creates a peer to peer social network (fsm and pirates). The other is based on NetflixOSS microservices in a more tree structured model.
Monday, February 16, 2015
Today's links 16/02/2015: Memory Analysis, CPU instruction for NVM,SR-IOV and Linux kernel live patching
- ANATOMY: an analytic model of memory system performance able to summarize key workload characteristics, namely row buffer hit rate, bank-level parallelism, and request spread which are used as inputs to the queuing model to estimate memory performance. [slides]
- CLWB and PCOMMIT : a look at the new specific cpu instruction for NVM. The real benefit will start to appear when the dev willstart using them in application such as in-memory DB or persistence logging.
- SR-IOV : Single-root I/O virtualization (SR-IOV) standard allows an I/O device to be shared by multiple Virtual Machines (VMs), without losing runtime performance. series of videos covering topics for your virtualization environment such as VXLAN Tunnel End Point (VTEP), live VM migration, and HPC clustering.
- Live patching : kGraft and kpatch merged into a single patchset for kernel live patching ..
Friday, February 13, 2015
Today's links 13/02/2015: Scalable and resilience web site, CAP theorem, automated log analyzer and Exascale HPC challenges
- Scalable and Resilient website : lessons learned from all the biggest sites on the internet about how to build scalable and resilient architectures.
- Perspectives on the CAP theorem : paper summary that show the CAP theorem in the broader context of a family of results in distributed computing theory that shows impossibility of guaranteeing both safety and liveness in an unreliable distributed system.
- Sequence: Automated Analyzer for Reducing 100,000's of Log Messages to 10's of Patterns
- Algorithmic and Software Challenges atExtreme Scales : presentation by Jack Dongarra , quite redundant as the same theme are pretty common to all HPC exascale challenge : matrix operations optimizations, power, resilience, scalability, etc..
|IBM Photonic 3D chip|
Thursday, February 12, 2015
Today's links 12/02/2015: #bigdata algorithm, probabilistic programming, Start-up resources and RAM myth
- PPAML : Probabilistic Programming for Advancing Machine Learning program goal tries to facilitate the construction of machine learning applications by using probabilistic programming.
- Start-up Resources : a very good list of start-up resources and articles.
- The Myth of RAM : Memory access patterns matters.... a lot. This is hardly news and this article higlight it as if you solely use "classical" complexity analisys to evaluate your code you are going to be bitten badly. The main idea is that the memory accesses costs O(√N) and not O(1) in practice.
- HyperLogLog : technique to estimate the cardinality of a set with cardinality Nmax using just loglog(Nmax) + O(1) bits. Like the Linear Counter the Hyper LogLog counter allows the designer to specify the desired accuracy tolerances. Very good summary of the paper here.
Wednesday, February 11, 2015
Today's links 11/02/2015: Test and optimization articles, scaling product team, PCIe vs Eth , Distributed Sys fallacies
- 100 Must-Read Articles on Testing and Optimization : data driven, big data, a/b testing etc.. The best articles from 2014
- Scaling a product team : lesson learned from Intercom on how they scaled a product building team, and the nitty gritty involved in getting valuable product out the door as fast as possible.
- Eight Fallacies of Distributed Computing : very good tech talk with real life encounter of the fallacies.
- PCIe vs Ethernet : with the rise of Intel’s silicon photonics (SiPh) optical PCIe (OPCIe) and other PCIe fabric, is it time to fragment your datacenter and use fast PCIe rack fabric and Eth for cross rack one. To be honest time will tell as you already know the best technology doesn't always win.
Friday, February 06, 2015
Today's links 06/02/2015: Hashing, Distributed time series / monitoring , MTU, Spray list
- Perfect hashing : when you like your hash without collision
- Prometheus : Some interesting concept such as data is sharded and local only for reliability and scalability, time series oriented storage and query system and expose time series via key value pair.
- Path MTU discovery : when you want to know what is the biggest frame you can send when you do not own 100% of the network path ( or if you just forgot)
- Spray list : nice data structure delivering scalable relaxed priority queue[github implementation]
Thursday, February 05, 2015
Today's links 05/02/2015 : #openstack #neutron with #dpdk , #ARM A72 coherency, Captain proto
- Openstack* Neutron Accelerated by DPDK : dpdk is gaining a lot of momentum, to bad that they decided to adopt their own API model for queue rather than adopting the RDMA one ( also why they lack critical feature in term of security and multicore support)
- ARM Cortex-A72 chips : coming in 2016 thus ARM 64-bit processors that can run at clock speeds of up to 2.5 GHz. They can also be paired with lower-power ARM Cortex-A53
- Coherency : ARM coherence for heterogeneous core package.
- Captain Proto : fast data interchange format and capability-based RPC system.
Wednesday, February 04, 2015
Today's links 04/02/2015: Shared Memory API for MPI, #Openstack File Service, Dell Power FX converged hardware
- Remote Memory Access Programming in MPI-3 : interesting proposal and method to bridge the gap between shared memory model and message passing with the de-facto standard of HPC communication library.
- Openstack Manilla : Shared File Services for the Cloud and specifically for Openstack , interesting number : 65% of all storage sold is for file-based use cases (IDC 2012) [slides]
- Power Fx : Dell PowerEdge FX architecture, combining Dell networking, servers, and storage into a 2U chassis. Interesting piece of technology even if the presenter is rather monotone.
Tuesday, February 03, 2015
Today's links: 03/02/2015 : Log parser, Esoteric languages, Distributed systems, High speed Interconnect
- Sequence: A High Performance Sequential Semantic Log Parser at 175,000 MPS
- Esoteric Programming Languages : if you have ever wonder what are brainfuck , befunge , ook
- Distributed Systems Seminar reading list : a good list of recent distributed system paper
- High-Speed Datacenter Interconnects : special issue discusses the challenges surrounding high-speed datacenter interconnects and presents five articles that have novel approaches to achieving efficient, scalable datacenter designs. [table of content]
Monday, February 02, 2015
Today's links 02/02/2015: #AI , #Debian #Openstack Image, Metrics Driven prioritization and search engine
- Awesome Artificial Intelligence : A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers
- Openstack Debian Image : the official OpenStack Debian image is now generated at the same time as the official Debian CD ISO images. If you are a cloud user, if you use OpenStack on a private cloud, or if you are a public cloud operator, then you may want to download the weekly build of the OpenStack image from here: http://cdimage.debian.org/cdimage/openstack/testing/
- Metrics-Driven Prioritization : integrate business metrics and probabilistic modeling into prioritization process.
- Hound : an extremely fast source code search engine.