Reflections Of The Void: In memory Database

Showing posts with label In memory Database. Show all posts

Tuesday, June 23, 2020

[Links of the Day] 23/06/2020 : Thinking while moving, AI snake oil, Graph Database

Thinking While Moving : too often current machine learning system is used in a rigid control loop. Leading to saccades. The authors of this paper propose concurrent execution of the fingering system with the controlled system. Allowing more fluid operations and shorter execution time of the task.
AI snake oil : a lot of AI solution project fail to return their initial investment. Too many buzzwords and not enough understanding of the limits of the current technology. At least NVIDIA is selling GPU by the millions. When there is a gold rush, the one making a fortune is the one selling shovels.
TerminusDB : in-memory graph database management system. it's interesting to see that 99% of the source code is Prolog and they JSON-LD as the definition, storage and interchange format for the query language. The original use case for this solution targeted financial data stored as time series but lacking graph correlation.

Tuesday, August 23, 2016

[Links of the day] 23/08/2016 : Adapting In memory database architecture for Storage class memory and Datacenter network congestion management

The implication of Storage Class Memory for In memory database architecture :

SOFORT : The authors propose to modify traditional In memory database architecture in order to optimise its operation for upcoming storage class memory hardware. The idea is quite simple, get rid of the log mechanism and persist all data to NVM except for the index which needs to be maintained in RAM for performance requirement. SCM allow to drastically eliminate a lot of boiler plate architecture functionality by delivering fast byte addressable persistent storage. However, now the developers needs to be aware of the transnational model imposed by this new class of persistent memories. [Slides]
Instant Recovery for Main-Memory Databases : This paper build on top of SOFORT and looks at leveraging NVDIMM or SCM for speeding up crash recovery features. The idea is not only speed up the normal operation but also eliminate the recovery cost in case of application crash [Slides]
Note that both these paper have an author working for SAP, so my guess that we will start to see new dedicated feature in SAP Hana for supporting SCM.

Flowtune : It seems that we are going to see slowly a return of the ATM model in data-center for networking fabric. In this paper the author propose to combine a form of MPLS system with a centralized allocator for resources management and congestion avoidance. Basically the system identify connection ( called flowlet ) establishment and end . Using the existing and past information it derive an optimal path and resources allocation minimizing interference and congestion over the lifetime of the flowlet. Looks like SDN is finally enabling a simplified and more robust ATM model within and probably across data-centers.

Monday, November 09, 2015

Links of the day 09/11/2015 : workload resources management and ACID in memory DB

Quantiles Threshold : How to determine the best threshold for your alarms using quantiles. Interesting, however the following paper tend to argue that this approach is fine until the system get close to maximum capacity.
Percentile-Based Approach to Forecasting Workload Growth : this paper explains the issue when the resource utilization gets closer to the workload-carrying capacity of the resource, upper percentiles level off (the phenomenon is colloquially known as flat-topping or clipping), leading to under predictions of future workload and potentially to undersized resources. They analyse the problem and propose a new approach that can be used for making useful forecasts of workload when historical data for the forecast are collected from a resource approaching saturation.
MemDB : ACID compliant distributed in memory database. The performance are interesting as they deliver ACID transaction at 25k/s per shard (essentially per core). While not fantastically fast when you take a "simple"12 core server you could potentially run 250k/s transactions per system. Rapidly you can see that you can reach 1M+/s transactions with a small setup. However I would still query the need for a pure in memory ACID without persistence.. I would be curious to see if anybody has any use case for it.

Thursday, October 15, 2015

Links of the day 15/10/2015: Paxos , NVM for In-memory DB , Workload characterization Conference

EPaxos, transactions and the next 700 Paxos systems : well distributed consensus will always be a highly discussed topic and wheel will be reinvented many time because not everybody like the size of the.
Energy-Efficient In-Memory Data Stores onHybrid Memory Hierarchies : propose code instrumentation in order to identify the best object placement for performance/power efficiency. Nice idea however lack of practicality for production environment. Also it assume a pretty stable behavior of the system.
IISWC-2015 : 2015 IEEE International Symposium on Workload Characterization

Monday, September 28, 2015

Links of the day 28/09/2015: SPDK , CAP for durability, NVM and In memory DB

SPDK : Intel pendent of its DPDK toolkit specially aimed at storage and NVMe , As usual you can do some interesting stuff with it but it is also higly restrictive in term of Hardware and behavior. The device need to be purely dedicated to the library and cannot be shared with other system. Moreover it might be as big as a security risk as the DPDK libs (memory is accessible across DPDK consumer..) [more user friendly overview here]
Impact of new Persistent Memory Tech on In memory DB: Nice overview of the impact of the new NVM tech. Too bad its still try to push the classic DB model instead of proposing an approach from scratch that would fully use the potential of these new technology.
CAP theorem for Durability : interesting effort to formulate the limitation of the durability problematic in distributed environment. However it doesn't help curtail the acronyms soup from spreading.

Tuesday, October 07, 2014

Links of the day 07 - 10 - 2014

Today's links 07/10/2014: #bigdata , RDMA, InMemory everything, Log, #Openstack Storage , unreliable network coding

ADMS 2014 : Workshop on Accelerating Data Management Systems Using Modern Processor and Storage Architectures

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects : how to leverage RDMA for current and next gen data processing cluster.
Oracle's In-Memory Data Management Strategy: In-Memory in all Tiers, and for all Workloads : Big push to move to an all memory approach from Oracle. Not sure that all the piece of the puzzle is already in place to support such vision.

I heart Log : very nice and well written short books on log ( not just system log )
Network Coding in unreliable environment [Paper 1] and [paper 2] from MIT
Evaluation of Openstack storage options : Good overview of the state of Openstack storage. Give a good idea of the trade off in term of performance, complexity, cost , etc..

Monday, July 28, 2014

Links of the day 28 - 07 - 2014

Today : hash , compression, economics in Memory DB and anonymity made simple

xxHash : a fast 64 bit wide non cryptographic hash ( code )
History of Lossless Data Compression Algorithms : a very nice overview of the history of loss less data compression algorithms by IEEE
The Sacred Economy : how the money system will have to change—and is already changing, didn't have time to read it yet, but planning too. ( or if you want the book directly )
Aerospike : Flash optimised nosql database has been open sourced.. They have some bold claim on the performance of their product
Streisand : sets up a new server running L2TP/IPsec, OpenSSH, OpenVPN, Shadowsocks, Stunnel, and a Tor bridge. It also generates custom configuration instructions for all of these services. At the end of the run you are given an HTML file with instructions that can be shared with friends, family members, and fellow activists.

Saturday, August 03, 2013

Hecatonchire Version 0.2 Released!

Version 0.2 of Hecatonchire has been released.

What's New:

Write Invalidate coherency model added for those who want to use Heca natively in their application as Distributed Shared Memory( more on that in a subsequent post)
Significant improvement in performance of page transfer as well as a numbres of bugs squashed.
Specific Optimisation for KVM.
Scale out memory mirroring
Hybrid Post copy live migration
Moved to linux Kernel 3.9 Stable
Moved to Qemu-kvm 1.4 stable
Added Test / Proof of concept tools ( specifically for the new coherency model)
Improved Documentation

Voila!

We are now focusing on Stabilizing the code as well as robustness ( we aim at making the code production ready by 0.4) . Also, we are starting significant work to transparently integrate Hecatonchire so it can be transparently leverage via a cloud stack and more specifically openstack.

You can download it here : http://hecatonchire.com/#download.html
You can see the install doc here: https://github.com/hecatonchire/heca-misc/tree/heca-0.2/docs
And finally the changelog there : http://hecatonchire.com/#changelog-0.2.html
Or you can just pull the Master branch on github: https://github.com/hecatonchire

Stay tuned for more in depth blog post on Hecatonchire.

Thursday, August 01, 2013

Slide Deck - Project Hecatonchire - The Lego Cloud : Status, Vision, Roadmap update 08/2013

This slide deck provide the vision, status and roadmap update of the Hecatonchire project
Website: http://hecatonchire.com/
Git: https://github.com/hecatonchire

Project Hecatonchire - The Lego Cloud : Status, Vision, Roadmap 08/2013 from Benoit Hudzia

Monday, January 25, 2010

The network performance within the cloud, an hidden enemy

A lot of people talked about the latency issue when hosting services in the cloud . Recently amazon latency hiccup revealed a deeper problem, but seems to be rarely discussed. While most focus on the network access and consume services from the cloud. I realise that their is a big unknown concerning network performance inside the cloud.

Could provider don't disclose their real infrastructure underlying their cloud offers. By doing so, cloud customers are completly left in the dark regarding the network linking their different instances. Leaving them with the false warm feeling that their are on top their own flat network.

What does it mean:

You have no idea of your network or I/O performance for your instance. Your virtual interface is sharing a physical (sometimes trunked) one(s) with other tenants collocated on the same physical server and theycompete with you for a share of the network pipe.
You have no idea of your network performance between multiple instances within the same cloud:

First your instances can be located in different branch of the infrastructure. Which means more network gears between them.
Then, Virtualizated network gears can also be thrown into the mix. Which add virtual switches and routers with sub optimal performance (remember they are software) but add greater flexibility.
Finally, the network traffic generated by all the tenants makes it very difficult (and expensive) to guaranty QoS throughout the infrastructure. Not to mention that capacity planning , measurement and management becomes extremely difficult because it is impossible to predict the(often asymmetric) bandwidth network consumption of the instance. A reason why cloud providers dream for hugely dense, multi-terabit, wire speed L2 switching fabrics.

As a consequence, there is not generally a published service level associated with throughput and latency within cloud. When oversubscription hit you, you often don't see it coming. Maybe cloud will become similar to the home broadband with advertised "unilimited" offers but with content ratio.

All this, makes it extremely difficult to deploy and guaranty the performance of services that rely on low latency and/or high bandwidth architectures such as high performance computing, web and database clusters, storage access, seismic analysis, large scale data analytics, financial services and algorithmic trading platform.

I can think of some solutions to these problems but this will be for another post.

Monday, January 18, 2010

IT Trading systems and Cloud take one

Investment banks, insurance companies, and hedge fund firms are running HPC applications to keep their financial services running smoothly. More specifically Algorithmic trading require a huge amount of processing power as well as fast network capabilities ( speed is money ).

I recently came across this company : Marketcetera. I think we will soon see the emergence of PaaS , SaaS and IaaS company specifically dedicated for algo trading solution or financial computing.

The cost of creating testing and deploying such trading platform as well as testing new trading algorithm is rather prohibitive nowadays. With the advent of cloud computing, it becomes possible in the near future that companies will start offering at a reasonable price ( and even at dynamic price) resource specifically dedicated and collocated for financial market operations.

However financial trading applications have requirements that are very different from the classical web one that are run on the cloud. I will just expose some of the one affecting IaaS and PaaS:

IaaS , Performance is key:

Network:

Speed and location: Financial trading services have high bandwidth and low latency requirements. To satisfy such stringent requirements IaaS provider will have to be as close as possible to the stock exchange. They will need to have cloud's location s to be physically close or better collocated with the exchange systems (a la ec2 zone). Except that in this case the actual location will matter less than the actual proximity physical proximity of a certain exchange (especially for ~~unfair~~ high frequency trading) .
Virtual Networking:

Hardware FPGA and GPGPU, ASIC : Hardware acceleration can easily boost the performance of operations by a degree of magnitude. It has been successfully used by financial institution to do such thing as XML processing, network routing, algorithmic trading , etc.. In the race to be the fastest, such piece of hardware can give an significant edge. However, future financial cloud providers will need to find a way to easily expose such piece of hardware. Creating a pool of hardware resource accessible through I/O virtualization seems a potential solution (by using Hypertransport , Quickpath interconnect or PCIe over ethernet ).

Virtualization vs BareBone: Virtualization always comes at a price. You lose performance and I/O speed. But, contrary to the popular belief, cloud infrastructure does not preclude the use of non virtualized resources. Cloud Providers can easily provide both: virtual and dedicated hardware resource within a same cloud in order to satisfy the various demands of its customers.

PaaS , how to balance performance , flexibility and accessibility:

Langage: C and C++ still represent a huge portion fo the core of financial apps framework. You can even see bits ASM !. However current PaaS languages of predilection are .Net, Java, Phyton or Ruby on Rails. These are often slower but much more easier (and easy to secure ) to use to create a platform cloud computing engine. One way to go around this problem would be to design a custom language for trading algorithms. Or allow easy integration of external processing component. Which leads to the next aspect.
Messaging and order routing : A similar problem arise for messaging today cloud messaging APIs are based on public REST, XML, and SOAP standards. While current trading platforms prefer the fast but proprietary (hence expensive ) ESBs or messaging platforms. It make it rather expensive to integrate cheaply with external component. Maybe ,if the PaaS vendor is also a IaaS (servers ) and a SaaS (the ESB), the integration can be done or provided in a cost effective way.

Security : challenges are multiple:

VM security
network
Data (secure storage)
Tracability
Audit
Authentication
Non-repudiation
Integration with third parties and customer owned solution
Etc..

Reliability and high availability are and order of magnitude higher than typical cloud apps.

This list is not extensive and I definitely know I have missed some aspects. But i will try to dwelve deeper in this problem that constitute trading in the cloud. And, while providing cloud for customers wanting to run financial apps is relatively more difficult than "traditional cloud offer". The benefits of such offer can be worth the efforts:

Sharing exchange access point cost.
Lowering the cost of ~~cheating~~ entering in High frequency game.
Eliminating the cost of creating a trading platform.
Lowering the TCO of trading platform.
Lowering the cost of testing and validating trading algorithm.
Etc...

Subscribe to: Posts ( Atom )

Reflections Of The Void