- Investigating opaque algorithms : Eye-opening set of pieces of journalism that investigates, audits, or critiques algorithms in society. It is amazing how the impact of code is often overlooked while it impacts so many lives on a daily basis.
- A Good Makefile for Go : even with go you cannot escape the mighty makefile
- M3 : Uber metric platform, built as a storage backend for Prometheus. It handles all the compaction, compression and data aggregation so you don't have to pay these huge AWS bills at the end of the month because you have hyper scaled infrastructure that generates terabytes of metrics... Hoo wait you still do, but at least not because of your monitoring storage cost :) [github]
A blog about life, Engineering, Business, Research, and everything else (especially everything else)
Showing posts with label storage. Show all posts
Showing posts with label storage. Show all posts
Tuesday, September 25, 2018
[Links of the Day] 25/09/2018 : Investigating opaque algorithms, Good Go Makefile, Uber metric storage platform
Labels:
algorithm
,
investigation
,
journalism
,
links of the day
,
metric
,
storage
,
uber
Monday, May 07, 2018
[Links of the Day] 07/05/2018 : Cryptocurrency Consensus Algorithms , Fast18 conference, Google 2 real world project translation
- A Hitchhiker’s Guide to Consensus Algorithms: this post provides a quick and easy way to understand the classification of the various cryptocurrency consensus models. It's a gentle introduction to the concept of proof of work vs proof of stakes vs proof of authority vs ... Well, you got it many many more algorithm.
- Notes from FAST18 : a very good overview of the Storage conference. What is becoming obvious over the years is that a lot of the actual practical implementation of novel distributed storage solution is directly pushed into Ceph. Ceph is poised to become the defacto default private storage solution even if it has a long way to go in term of manageability and automation. I think it stems from the preconception that a lot of operations need a storage admin person. But the projects like Helm are helping it get there.
- xg2xg : a practical translation table of internal google tech and similar technology available to those that do not work in the chocolate factory. It is a very good list of production-ready project that can be leveraged in many devops (and non-devops) environment.
Labels:
conference
,
consensus
,
cryptocurrency
,
google
,
links of the day
,
storage
Tuesday, October 17, 2017
[Links of the Day] 17/10/2017: Corporate Taxes & wages, Storage with macromolecule, 25 years of MPI
- Do Higher Corporate Taxes Reduce Wages? : interesting paper where the authors estimate that 40% of the corporate tax burden is passed onto the worker. And that most of the tax variation is directly imposed on the workforce.
- Macromolecules as Storage Media : the authors suggest that you can achieve a petabyte per cubic centimetre. Stability and durability are not fully addressed yet compared to another non-organic medium. However, I doubt that this can be a major concern as it can be an extremely viable solution for short-term transport of digital data.
- MPI symposium : MPI is 25 years old, and still improving. The venerable HPC message passing interface is still widely used and underpin a lot of non-HPC critical infrastructure such as stock markets. A must read is Jack Dongarra presentation on the evolution of MPI.
Labels:
corporate
,
links of the day
,
mpi
,
storage
,
taxes
Tuesday, May 16, 2017
[Links of the Day] 16/05/2017 : Exascale Project, Storage as Stream , ServerlessConf
- Exascale Computing Project : An update on the state on the US exascale project. It always amazes me how HPC system and software are such a different beast from industrial solution. I wonder if it isn't artificially maintained as such more because of bias and ego than the actual requirements. [slides]
- Pravega : Where stream is the new storage. It's pretty much taking the log based storage to an extreme and combine messaging and persistence in an elegant approach. It's really aimed at tackling modern distributed systems and micro service problem. A concept to watch[github]
- ServerlessConf : If you couldn't attend here is the least of the key takeway / summaries you can find online:
Labels:
distributed systems.
,
exascale
,
HPC
,
links of the day
,
serverless
,
storage
Wednesday, March 08, 2017
[Links of the Day] 08/03/2016 : Intel blockchain, Fast17 conference and papers, AWS cloud formation devops tool
After a small hiatus, here is the return of the links of the day.
- Sawtooth Lake: Intel distributed ledger system. It uses an interesting security mechanism to deliver secure consensus. Sadly it relies on Intel proprietary hardware encryption modules to deliver this feature.
- Fast17: File and Storage technology Usenix conference happened last month. There were a couple of interesting papers but one picked my interest: Redundancy Does Not Imply Fault Tolerance:Analysis of Distributed Storage Reactions toSingle Errors and Corruptions. The authors look at single file system fault impact on Redis, ZooKeeper, Cassandra, Kafka, RethinkDB, MongoDB, LogCabin, and CockroachDB. Turns out most systems are not able to handle these type of faults very well. It seems that a single node persistency layer error can have an adversarial ripple effect as distributed system seems to have put way to much trust in the reliability of this layer. Sadly they lack tools for recovering from errors or corruption emerging from file systems.
- Stacker : remind 101 tools for creating and updating AWS formation stacks. Looks like an interesting alternative to terraform.
Labels:
automation
,
aws
,
blockchain
,
conference
,
devops
,
fault tolerance
,
intel
,
links of the day
,
storage
Monday, December 19, 2016
[Links of the Day] 19/12/2016 : Cloud storage consistency models, heterogeneous memory management and atomic consistency for storage class memory
- Consistency Models for Cloud Storage Services : A must read for anybody relaying on any for of cloud storage. It is imperative to understand the consistency model of these service in order to avoid bad surprises. Sadly, a lot of cloud storage out there lack of official documentation on the subject or are really fuzzy and lack proof.
- Soft2LM : heterogeneous memory management , basically optimise memory allocation and migration between tier in order to minimise power consumption while maximizing performance.
- Free atomic consistency in storage class memory with software based write-aside persistence : interesting article on a software stack that aim to deliver atomic consistency for SCM in write aside scenario. I am not sure how often write aside pattern though.
Labels:
cloud
,
consistency
,
heterogeneous
,
model
,
nvm
,
scm
,
storage
Monday, August 22, 2016
[Links of the day] 22/08/2016 : Kubernetes the hard way , GopherCon 2016, 3dxpoint DIMM
- Kubernetes The Hard Way : The famed Keysley Higtower explain how to deploy and run a kubernetes system on google cloud the hard way ( no automatic installation cheat)
- GopherCon 2016 : Videos of this year Gopher academy
- Wicked Fast Storage and Beyond : Intel IDF 2016 talk on the future of storage , looking at 3dxpoint, and the Optane SSD . What is really exceiting is the upcoming Intel DIMM using 3DXpoint tech. This provide native PMEM capability , 2x storage vs RAM (but slower yes). I can seriously see future in memory database ( SAP HANA ... ) bypassing storage together and just stacking NVM DIMM. Using slower storage form ( over NVMe by example) for backup / snapshot, etc..
Labels:
3dxpoint
,
cloud
,
golang
,
google
,
intel
,
Kubernetes
,
links of the day
,
nvdimm
,
nvm
,
nvme
,
storage
Friday, July 22, 2016
[Links of the day] 22/07/2016 : Blockchain distributed storage, Docker use cases, Run Go in unikernel
- BlockStack : a distributed storage system that uses blockchains on Bitcoin to securely define a global name space. The name’s are all tied to values which representing URI(URL)s to storage systems like AWS S3 but could be any cloud storage service.
- Use Case track : Videos from the Use Case track at @DockerCon 2016
- Atmanos : this is really cool, this project enable you to compile go code and run it as a unikernel on top of Xen
Labels:
blockchain
,
docker
,
golang
,
links of the day
,
storage
,
unikernel
Tuesday, May 24, 2016
[Links of the day] 24/05/2016: Hybrid Memory Cube performance , Storage history and Energy computing problem
- Performance Exploration of the Hybrid Memory Cube : Thesis evaluating the performance challenge of Hybrid Memory Cube (HMC). HMC is an emerging main memory technology that leverages advances in 3D fabrication techniques to create a memory device with several DRAM dies stacked on top of a CMOS logic layer.
- Computing’s Energy Problem : Mark Horowitz 2014 ISCC presentation on the energy computing challenge and how application needs to be more energy aware
- Storage History : great presentation of the storage history from 1956 4.4 MB RAMAC to modern day storage system. With some great anecdotes thrown in the middle.
Labels:
energy
,
links of the day
,
memory
,
storage
Tuesday, May 17, 2016
[Links of the day] 17/05/2016: CMU DB lectures , Seminal IA papers, Storage noisy neighbors
- Database Systems Lectures: Carnegie Mellon University lectures on database system. It gives a really good overview of the state of the art of database systems.
- Intelligence without representation & Intelligence Without Reason : 1991 Seminal paper by Rodney A. Brooks from the MIT artificial intelligence lab. In these the author argue that intelligent behavior could be generated without having explicit manipulable internal representations and it also can be generated without having explicit reasoning systems present.
- Noisy Neighbor analysis : a look at the effect of deploying heavy workload onto modern storage systems and the collateral effect on overall performance for all the participant in the cluster.
Labels:
Artificial intelligence
,
database
,
lecture
,
links of the day
,
papers
,
storage
Friday, May 13, 2016
[Links of the day] 13/05/2016 : NVMesh , NVM file system
- nvmesh : pure software product using a shared nothing architecture that leverages, NVMe SSD, SR-IOV and RDMA. Performance are interesting: 4M read and 2.8M write 4k IOPS, 16GB/s throughput and super low latency with 90µs/25µs for read and write from client to server. Whats is really interesting is the dual mode of operations: shared nothing with direct storage access for really fast access or centralized one which offer more redundancy and serviceability feature at the cost of a lower ( but still fast ) performance [video]
- Fine-grained Metadata Journaling on NVM : the authors propose to move away from the limitation of block based journaling to a fine grained approach more suitable for NVM storage. They propose to move to a inode based transaction and journaling approach, each inode representing 256 byte. The solution seems cache friendly however it beg the question : why do we need to go through the CPU .. With DAX and other system it should be more efficient to completely bypass it[slides]
- Fast and Failure-Consistent Updates of ApplicationData in Non-Volatile Main Memory File System : being crash consistent is the number 1 requirement for any storage solution. Current File system optimized for NVM doesn't seem to be good enough. The authors propose an alternative file system specifically tailored for consistency and high performance by moving away from the FS level consistency and target application level consistency solution. Naturally this put a greater burden on the application layer.. Then again researcher really need to move away from the classical FS solution and deliver a new paradigm. [slides]
Labels:
architecture
,
links of the day
,
nvm
,
papers
,
rdma
,
storage
Thursday, May 12, 2016
[Links of the day] 12/05/2016: Lustre + Omnipath in Bridges Supercomputer & Storage Media Evolution
- Lustre + Omnipath : HPC filesystem of choice meet Intel Omnipath fabric. Intel was poised to release such crossover as it continue to push in the HPC domain and rack infrastructure domination . Remember that Intel acquired Whamcloud (Lustre) a while back.
- Storage Media Overview: Historic Perspectives of storage solution. Interesting snippet of information all storage media revenue decreased from 2014 to 2015 except for NAND. However, NAND revenue increased by 30% in 2014 but only 3% in 2015. Hinting a plateau of the tech and entering a commoditization phase with lower margin. [Video]
- Bridges :supercomputer being built at the Pittsburgh Supercomputing Center (PSC), they have a really cool Virtual Tour .
Labels:
distributed file system
,
HPC
,
intel
,
links of the day
,
network fabric
,
storage
,
supercomputer
Monday, May 09, 2016
[Links of the day] 09/05/2016: OSS bio metric framework , Deep learning framework comparative study & dropbox magic pocket
- OpenBR : open source bio-metric framework, I can't wait for the first community driven mass recognition system to come out. No more secrets...
- Inside the Magic Pocket : really good case study and architecture behind the storage system design to replace AWS S3 after Dropbox moved out of AWS [HN discussion]
- Comparative Study of Deep Learning Software Frameworks : version 3 of the extensive study of deep learning framework. What is interesting is while tensor flow is deemed extremely versatile it seriously lag behind the other framework performance wise.
Labels:
aws
,
bio-metric
,
deep learning
,
dropbox
,
framework
,
links of the day
,
paper
,
s3
,
storage
Thursday, May 05, 2016
[Links of the day] 05/05/2016 : OVH Kinetic, Go best practices and 9front
- 9front : excellent book on plan9 and 9front , the first chapter is a must read for anybody interested in the field of distributed systems and OS.
- Kinetic : OVH start deploying in Beta Eth Connected drive
- Go best practices : well the title say it all
Our module "SATA2IP" is ready ! Forget the SATA. Another way to consider the storage: 1 HDD = 1 ip and let's scale ! pic.twitter.com/PqPn9rNq7F— Octave Klaba / Oles (@olesovhcom) May 3, 2016
Labels:
drive
,
ethernet
,
go
,
links of the day
,
operating system
,
ovh
,
plan9
,
storage
Tuesday, May 03, 2016
[Links of the day] 03/05/2016: Linux Storage, Filesystem, and Memory-Management Summit 2016
Linux Storage, Filesystem, and Memory-Management Summit : Loads of really good talk , here is a selection :
- VM as containers : Current effort focus on solving 2 main problems : 1. total VM memory consumption is superior to what application that runs in. 2. Storage access : a lot of the storage aspect focus on moving the storage stack back to the host ( providing DAX or Fuse). However all these aspects require carefull design in order to avoid compromising security and isolation features of virtual machines.
- Bulk memory-allocation APIs : What do we want ? we want loads of memory fast - when do we want it ? -N...O...W.. :) [slides]
- Persistent memory as remote storage : a look into leveraging RDMA for remote persistent storage access. A really good discussion around the possibility to move from PULL to PUSH mode for remote access . However this would require a lot of change and addition to work with the RDMA stack. Probably too much for it to be a viable option in the short term. Another aspect of the discussion was related to the durability guarantee of remote storage protocol. It is interesting to see that their is a consensus regarding the need for an API to hide the different durability behavior variation of the fabric / protocol / HW. This is sorely missing and why storage solution often trap you down a certain path and cannot evolve to adopt new tech, fabric, and hardware.
Labels:
API
,
containers
,
links of the day
,
linux
,
memory
,
nvm
,
storage
,
vm
Monday, May 02, 2016
[Links of the day] 02/05/2016 : All about storage @ Intel IDF 16 + no more secret
- Storage Transition : NVM Express and PCI Express in the Client and Data Center
- Intel Non-Volatile Memory Inside : A look into 3d NAND and next gen SSD
- Modern Storage Architectures : The implication of new fabric and NVM tech for modern storage architectures
- No more secrets : recreate the famous "decrypting text" effect as seen in the 1992 movie Sneakers
Labels:
architecture
,
idf16
,
intel
,
links of the day
,
network fabric
,
nvm
,
pcie
,
SSD
,
storage
Friday, April 29, 2016
[Links of the day] 29/04/2016 : End of numerical error, 504 Eth Drive Ceph Cluster, Modern Storage architecture
- End of Numerical Error : Really cool concept of encoding of floating point number. It seems really promising , faster, lighter and significantly reduce error rate.. Unums should probably the net default encoding of the future [Julia implementation]. Ps : the non associative property for floats is really scary.
- Ceph cluster with 504 ethernet drives : Well, its start to happen, and it might quickly take over all these pesky storage cluster out there.
- Modern Storage Architectures : Intel devloper forum slide deck looking at the future of storage class memory and its impact on storage architecture.
Labels:
ceph
,
error
,
ethernet
,
links of the day
,
nvm
,
programming
,
storage
Saturday, March 26, 2016
PureStorage bring us one step closer to micro storage architecture
Pure storage just released its Flashblade product. It is an fabric connected object storage solution. It is a modular solution composed of a large numbers of blades which are each made of :
What is interesting is that, when you look at Purestorage solution, they decided to integrate the high level compute aspect of storage directly with the low level one in a single blade. They ended of with an hybrid solution combining ARM and FPGA for low level aspect such as deduplication, erasure code. And the Xeon for the object storage and file system solution.
One can assume that the decision behind such architecture was driven by the customers requirement that tend to want a high performance Jack of all trade solution. I can picture the product manager arguing for supporting every scale out storage protocol popular at the moment. However, Jack always end up master of none and to over compensate PureStorage had to pump up the compute capabilities.
- 8TB to 52TB raw NAND storage capacity : a lot but still take less than half the real estate space on each blade.
- NV-RAM+supercapacitor write buffer : when your NAND is still too slow you want to have a persistent buffer of NVRAM to handle the bursts
- ARM CPU + FPGA : to deal with the “low level” operations such as erasure code, etc..
- 8 core Xeon System on chip : for moving the computation to where the data is located, pretty much all the high level operation such as NFS , S3 , object storage etc..
- 40 Gbit ethernet : that s where the data gets out
- PCIe fabric networking : in chassis solution linking compute, storage cards via a proprietary protocol, what’s interesting is that the system is self contained and scaling with other box goes through the 10 Gb/s connectivity and not a proprietary fabric link. Which implies that it doesn’t need exotic solution once you go past the box boundaries. This is great as it makes it easy (and cheap) to scale however I wonder what are the implication in term of performance once you start crossing boundaries.
One can assume that the decision behind such architecture was driven by the customers requirement that tend to want a high performance Jack of all trade solution. I can picture the product manager arguing for supporting every scale out storage protocol popular at the moment. However, Jack always end up master of none and to over compensate PureStorage had to pump up the compute capabilities.
While this seems like a good choice it is also counter productive in term of Watt per GB coupled with a lot of real estate wasted or duplicated. Don’t get me wrong, what Pure achieved with the flashblade is impressive but I can’t stop thinking that they should have taken it a step further.
This type of high performance, high-cost and high-power architecture technology is a right step toward micro storage architecture which delivers low cost low power high performance and scalability features. Now it is all about trimming down the system while maintaining scalability by dividing the blade system into a much larger number of smaller nodes, literally offering what the ethernet connected equivalent of HGST with flash.
However this might also implies that you won’t be able to offer support for every single storage solution out there (NFS, S3, block, etc..) without having to rely on either client side processing or using a frontend. This should be achievable while maintaining excellent performance, the key to this will hide in the detail of the core storage api employed.
This type of high performance, high-cost and high-power architecture technology is a right step toward micro storage architecture which delivers low cost low power high performance and scalability features. Now it is all about trimming down the system while maintaining scalability by dividing the blade system into a much larger number of smaller nodes, literally offering what the ethernet connected equivalent of HGST with flash.
However this might also implies that you won’t be able to offer support for every single storage solution out there (NFS, S3, block, etc..) without having to rely on either client side processing or using a frontend. This should be achievable while maintaining excellent performance, the key to this will hide in the detail of the core storage api employed.
Labels:
arm
,
bigdata storage
,
flash
,
fpga
,
micro storage
,
nand
,
nvram
,
purestorage
,
storage
Friday, March 25, 2016
[Links of the day] 25/03/2016: Scheduling with queuing theory, LLVM Assembler Framework, Erasure Coding at Azure
- Efficient Queue Management for Cluster Scheduling : MS researcher look into introducing queue management techniques, such as appropriate queue sizing, prioritization of task execution via queue reordering, starvation freedom, and careful placement of tasks to queues for big data task cluster scheduling.
- Keystone : a indie gogo project to refactor LLVM to build a multi-architecture, multi-platform, open source assembler framework.
- Erasure Coding in Windows Azure Storage : MS azure use Local Reconstruction Codes (LRC) for its storage. LRC greatly reduces the number of erasure coding fragments needed for reconstruction in case of failure/ offline data. The key benefit is a drastic reduction in I/O and bandwidth requirement for repairs while maintaining the storage overhead low.
Labels:
assembler
,
azure
,
bigdata
,
cluster
,
erasure code
,
links of the day
,
llvm
,
Queuing Theory
,
scheduling
,
storage
Friday, February 26, 2016
[Links of the day] 26/02/2016 : Usenix Fast 16 , FPGA liberouter and Event delivery at spotify
- Fast 2016: all of Usenix Fast 2016 goodness available in one place.Interesting to see that we start to see the emergence of storage systems optimized for time series [BTrDB] . Also note the always interesting report on failure rate (this time for flash)
- Liberouter : really cool project using FPGA to deliver hardware acceleration of network security and monitoring tools.
- Event Delivery at spotify : part 1 of a series of blog post on event monitoring and management system used at spotify.
Labels:
conference
,
file system
,
flash
,
fpga
,
links of the day
,
network
,
storage
,
time series
Subscribe to:
Posts
(
Atom
)