Showing posts with label distributed system. Show all posts
Showing posts with label distributed system. Show all posts

Thursday, November 28, 2019

[Links of the Day] 28/11/2019 : Metrics Stream Processing Framework, SOPS'19, Distributed Transaction K/V

  • Mantis : netflix metric stream processing framework. This seems to be a really powerful platform for realtime monitoring and especially contextual alerting and alerting on logs. What's impressive is the throughput it can churn through events...  [github]
  • SOPS'19 : SOPS papers are in free open access, a lot of papers on security and formal verification of distributed this year. [Murat review Day0, Day1, Day2]
  • Tikv : Distributed transactional key-value database.  [Github]



Tuesday, September 18, 2018

[Links of the Day] 18/09/2018 : Data transfer project, Observability pipeline, and Operating systems Book

  • Data Transfer Project : open-source, service-to-service data portability platform. Not really sure who would want to transfer data between facebook - google and Microsoft from a privacy point of view... But there is probably a use case. 
  • Veneur : distributed, fault-tolerant pipeline for observability data. This is a really cool project that allows to for aggregate metrics and sends them to downstream storage to one or more supported sinks. It can also act as a global aggregator for histograms, sets and counters. The key advantage of this approach is that you only maintain, store ( and pay for ) the aggregated data rather than the tons of separate data points. 
  • Operating Systems - Three Easy Pieces : free operating system book centred around three conceptual pieces that are fundamental to operating systems: virtualization, concurrency, and persistence




Monday, April 11, 2016

[Links of the day] 11/04/2016: Rust Distributed K/V store, Consensus in Cloud and interactive service tail latency

After a short hiatus here are the links of the day back :

  • Target-Driven Parallelism : Microsoft researcher look into using prediction and correction to reduce tail latency in interactive services.
  • Consensus in the Cloud : a very good technical report on systems using Paxos and the advantage/disadvantage associated with its use.
  • tikv : Distributed key value store written in Rust, Use Raft to deliver consistency and scalability coupled with a nice georeplication capability. Cherry on the top: distributed transaction are supported similar to google spanner. 

Thursday, August 13, 2015

Links of the day 13/08/2015 - Flash failure @ #facebook , tech behind #google voice and its datacenter network provisioning, distributed system #cloud challenges


Monday, June 08, 2015

Links of the day 08 - 06 - 2015

Today's links 08/06/2015: #SSD no panacea for VMs, Hardware X86 #security breach, #Bigdata stream engine

  • SSD no panacea with VM : because of is operational model SSD tend to have poor performance in virtual environment when shared across multiple VM. The authors suggest to isolate and dedicate SSD for each VM in order to benefit from the speed benefit of SSD. Micro storage solution anyone ?
  • X86 design flaw : Apparently there is a way to escalate from ring 0 to the high privileged state. This can be really interesting if this would enable VM or container to escape and gain access to the bare-metal system. : :
  • Heron : the descendant of Storm by Twitter. The architecture seems nice and try to address some of Storm shortcoming.  



Wednesday, May 27, 2015

Links of the day 27 - 05 - 2015

Today's links 27/05/2015: L3 #datacenter #networking, #Linux futex #bug, Distributed system simultaneity problem
  • Calico : pure L3 approach to data center networking. It uses BGP route management rather than an encapsulating overlay network, and thus avoids NAT and port proxies, doesn't require a fancy virtual switch setup, and supports IPv6. Looks like a viable probably more scalable alternative to virtual switch approach.
  • Futex bug : The linux futex_wait call has been broken for about a year (in upstream since 3.14, around Jan 2014), and has just recently been fixed (in upstream 3.18, around October 2014). More importantly this breakage seems to have been back ported into major distros (e.g. into RHEL 6.6 and its cousins, released in October 2014), and the fix for it has only recently been back ported (e.g. RHEL 6.6.z and cousins have the fix).
  • There is No Now : explore the problem with simultaneity in distributed systems

Tuesday, May 05, 2015

The rise of micro storage services


Current and emergent storage solutions are composed of sophisticated building blocks: dedicated fabric, RAID controller, layered cache, object storage etc. There is the feeling that storage is going against the current evolution of the overall industry where complex services are composed of small, independent processes and services, each organized around individual capabilities.

What surprised me is that most of the storage innovation trends focus on very high level solutions that try to encompass as many features possible in a single package. Presently, insufficient efforts are being made to build storage systems based on small, independent, low-care and indeed low-cost components. In short, nimble, independent modules that can be rearranged to deliver optimal solution based on the needs of the customer, without the requirement to roll out a new storage architecture every time is simply lacking - a "jack of all trade" without,or limited "master of none" drawbacks or put another way modules that extend or mimic what is happening in the container - microservices space.

Ethernet Connected Drives

Despite this, all could change rapidly as an enabler (or precursor, depending how you look at it), of this alternative solution as it is currently emerging and surprisingly, coming from the Hard Drive vendors : Ethernet Connected Drives [slides][Q&A].This type of storage technology is going to enable the next generation of hyperscale cloud storage solution. Therefore, massive scale out potential with better simplicity and maintainability,not to mention lower TCO.

Ethernet Connected Drives are a step in the right direction as they allow a reduction in capital and operating costs by reducing:
  • software stack (File System, Volume Manager, RAID system);
  •  corresponding server infrastructure; connectivity costs and complexity; 
  • granularity which enable greater variable costs by application (e.g. cold storage, archiving, etc.).
Currently, there are two vendors offering this solution : Seagate with Kinetic and HGST with the Open Ethernet Drive. In fact we are already seeing some rather interesting applications of the technology. Seagate released a port of SheepDog project onto its Kinect product [Kinect-sheepdog] there by enabling the delivery of a distributed object storage system for volume and container services that doesn't requires dedication. Indeed there is a proof of concept presented HEPiX of HGST drive running CEPH or Dcache. While these solutions don’t fit all the scenarios, nevertheless, both of these solutions demonstrate the versatility of the technology and its scalability potential (not to mention the cost savings).

What these technologies enables is basically the transformation of the appliances that house masses of HDD into switches thereby eliminating the need for a block or file header as there is now a straight IP connectivity to the drive making these ideal for object based backends.

Emergence of fabric connected Hardware storage:

What we should see over the next couple of years is the emergence of a new form of storage appliance acting as a fabric facilitator for a large amount of compute and network enable storage devices. To a certain extend it would be similar to HP's moonshot except with a far greater density.

Rather than just focusing on Ethernet, it would be easy to see PCI, Intel photonic, Infiniband or more exotic fabrics been used. Obviously Ethernet still remains the preferred solution due to its ubiquity in the datacenter. However, we should not underestimate the need for a rack scale approach which would deliver greater benefit if designed correctly.
While HGST Open Ethernet solution is one good step towards the nimble storage device, the drive enclosure form factor is still quite big and I wouldn't be surprised if we see a couple of start-ups coming out of stealth mode in the next couple of months with fabric (PCIe most likely) connected Flash. This would be an equivalent of the Ethernet connected drive interconnected using a switch + backplane fabric as shown in the crudely designed diagram below.







Is it all about hardware?

No, indeed quite the opposite. That said, there is a greater chance of penetration of new hardware in the storage ecosystem as compared to the server market. This is probably where ARM has a better chance of establishing a beach head within the hyperscale datacenter as the microserver path seems to have failed.
What this implies is that it is often easier to deliver and sell a new hardware or appliance solution in the storage ecosystem than a pure software one. Software solutions tend to take a lot longer to get accepted, but when they pierce through, they quickly take over and replace the hardware solution. Look at the object storage solution such as CEPH or other hyper-converged solution. They are a major threat to the likes of Netapp and EMC.
To get back on the software side as a solution, I would predict that history repeats itself to varying degrees of success or failure. Indeed, like the microserver story we see, hardware micro storage solutions while rising, at the same time we see the emergence of software solutions that will deliver more nimble storage features than before.

In conclusion, I feel that we are going to see the emergence of many options for a massive scale-out, using different variants of the same concept: take the complex storage system and break it down to its bare essential components; expose each single element as its own storage service; and then build the overall offer dynamically from the ground up. Rather than leveraging complexed pooled storage services we would have dynamically deployed storage applications for specific demands composed of a suite of small services, each running in its own process and communicating with lightweight mechanisms.These services are built around business capabilities and independently deployable by fully automated deployment machinery. There is a minimum of centralized management relating to these services, which may be written in different programming languages and use different data storage technologies, . which is just the opposite of current offers where there are a lot of monolithic storage applications (or appliances) that are then scaled by replicating across servers.

This type of architecture would enable a true, on-demand dynamic tiered storage solution. To reuse a current buzzword, this would be a “lambda storage architecture”.
But this is better left for another day’s post that would look into such architecture and lifecycle management entities associated with it.





Tuesday, April 28, 2015

Links of the day 28 - 04 - 2015

Today's links 28/04/2015: #Rump Kernel Stack, Disque Distributed In Memory MQ, #FusionIO new PCIe product, Power level estimation of VM systems
  • Ramp Stack : Nginx, MySQL, and PHP built on Rump Kernels without rearchitecting the application. Most of the work requires the app to be cross compile correctly (Nginx & MySQL). This implies that Unikernel-compatible unmodified POSIX C and C++ applications “just work” on top of Rump Kernels, provided that they can be cross- compiled.
  • Disque :a distributed, in memory, message broker by Redis folk. Not production ready but a promising start.
  • Pcie Flash : fusion IO is still kicking and deliver an interesting solution: up to 350,000 I/O operations per second (IOPS) on random reads and 385,000 IOPS on random writes (on the 3.2 TB model) with a 15k nanosecond write latency and 2.8 GB/sec of read bandwidth. .. However I still don't get why they don't want to use NVMe tech
  • Process-level Power Estimation in VM-based Systems : the authors describe a fine-grained monitoring middleware providing real-time and accurate power estimation of software processes running at any level of virtualization in a system. 




Wednesday, November 19, 2014

Links of the day 19 - 11 - 2014

Today's links 19/11/2014: #resiliency, #cloud , distributed system, #stream processing


Tuesday, October 28, 2014

Links of the day 28 - 10 - 2014

Today's links 28/10/2014: cloud trading, CERN, snapshot isolation, cloud infrastructure, RDMA to remote HW/device

  • Clock-SI : [paper summary] Snapshot Isolation for Partitioned Data Stores Using Loosely Synchronized Clocks
  • Cloud trading : 6 fusion blog post on their approach on trading cloud resources as commodities. 
  • CERN Cloud Infrastructure Report : a lot openstack, looks like they are trying to double the capacity every year. They are really at the bleeding edge of openstack production and stressing what can be done with it. 
  • Peer-Direct support : allows RDMA operations to directly target memory in external hardware devices, such as GPU cards, SSD based storage, dedicated ASIC accelerators, etc.

Friday, September 05, 2014

Links of the day 05 - 09 - 2014

Today's links : Spam, Consensus, Distributed system, Raft

  • Raft explained: Visual explanation of the Raft consensus algorithm
  • Modern anti-spam and E2E crypto interesting and really enlightening email from an ex-Googler describing the past - current and future of  spam war and the potential effect of cryptography