Monday, January 25, 2010

The network performance within the cloud, an hidden enemy

A lot of people talked about the latency issue when hosting services in the cloud . Recently amazon latency hiccup revealed a deeper problem, but seems to be rarely discussed. While most focus on the network access and consume services from the cloud. I realise that their is a big unknown concerning network performance inside the cloud.

Could provider  don't disclose their real infrastructure underlying their cloud offers. By doing so, cloud customers are completly left in the dark regarding the network linking their different instances. Leaving them with the false warm feeling that their are on top their own flat network.

What does it mean:
  • You have no idea of  your  network or I/O performance for your instance. Your virtual interface is sharing a  physical (sometimes trunked) one(s) with  other tenants collocated on the same physical server and theycompete with you for a share of the network pipe.
  • You have no idea of your network performance  between multiple instances within the same cloud:
    • First your instances can be located in different branch of the infrastructure. Which means more network gears between them.
    • Then, Virtualizated  network gears can also be thrown into the mix. Which add virtual switches and routers with sub optimal performance (remember they are software) but add greater flexibility.
    • Finally, the network traffic generated by all the tenants makes it very difficult (and expensive) to guaranty QoS throughout the infrastructure. Not to mention that capacity planning , measurement and management becomes extremely difficult because it is impossible to predict  the(often asymmetric) bandwidth  network consumption of the instance.  A reason why cloud providers dream for hugely dense, multi-terabit, wire speed L2 switching fabrics.
As a consequence, there is not generally a published service level associated with throughput and latency  within cloud.  When oversubscription hit you, you often don't see it coming.  Maybe cloud will become similar to the home broadband  with  advertised "unilimited" offers but with content ratio.

All this, makes it extremely difficult to deploy  and guaranty the performance of  services that rely on low latency and/or high bandwidth architectures such as high performance computing, web and database clusters, storage access, seismic analysis, large scale data analytics, financial services and algorithmic trading platform.

I can think of  some solutions to these problems but this will be for another post.