Reflections Of The Void: ROI

Showing posts with label ROI. Show all posts

Thursday, November 05, 2020

The real motivation behind the Matrix engine (GPU/TPU/...) adoption in HPC

There is current backlash in the HPC community against GPU/TPU/... aka matrix engine accelerator adoption. Most of the arguments are performance, efficiency and real HPC workload driven.

Like in a recent paper by Jens Domke et Al., colleagues of Dr Matsuoka at RIKEN and AIST in Japan, explore if the inclusion of specialized matrix engines in general-purpose processors are genuinely motivated and merited, or is the silicon better invested in other parts.

I wrote before in that a lot of new HPC systems overuse matrix engine hardware in their architecture. In this paper, the authors looked at the broad usefulness of matrix engines. They found that there is only a small fraction of real-world application that use and benefit from accelerated dense matrix multiplications operations. Moreover, when combining HPC and ML or when you try to accelerate traditional HPC applications, the inference computation is very lightweight compared to the heavyweight HPC compute.

While I agree with the argument put forward some other aspects that go beyond HPC need to be taken into consideration as to why there is such a push for matrix engine adoption. And these aspects are mainly market-driven. If you compare markets, there is significantly more money in the "hyped" years old AI market (training + inference) vs the 30 years old "mature" HPC market.

In raw numbers, the HPC market is worth $39 Billion. In comparison, the AI market is worth $256 Billions in hardware along. If you focus on AI semiconductor only it is still $32 Billion alone! And the growth projections are not in favour of HPC.

If you then look into the N^4 computing complexity for AI vs at best N^3 for HPC. Or look where (institutions, companies, systems/individuals such as in cars, wearables, medical appliances, etc.) those AI systems are going vs HPC systems. You quickly understand the significant difference and potential between the two markets.

If you take the ROI of AI-related business into consideration, it now makes more sense why HPC institutes are investing in such type of hardware. Such investment will allow them to tap into a promising and fast-expanding market. The matrix engine movement is simply a market-driven investment to ensure the best ROI for HPC centres.

Friday, January 31, 2014

On avoiding vendor lock-in by leveraging Openstack

One of the main drivers for user to adopt Openstack is to avoid vendor lock-in (see stats here ).

Architecture is rapidly becoming a commodity

Arguably, if you develop you own cloud solution you are locking yourself into yourself. Openstack in its current state require so much effort, customization, and maintenance that you end up building your own cage. Managing your maintenance and devs cost becomes critical in order to have a good ROI. Unless you plan to resell these services or expose them directly to your customers you won't benefit from the scaling strategy.

Often you might be better off with a vendor lock-in as you "should" more easily control your costs and ROI. Or better, contract out your Openstack implementation from a third party and outsource the maintenance and development cost while retaining a certain degree of flexibility.

Different size , different strategy different risks

SMB customers can be very aggressive about getting into the cloud, and they do not have a legacy to deal with, whereas the enterprises tend to be very risk-averse. They have to protect what they have, and they cannot be as aggressive.

As a result we are seeing a number of mature enterprises looking toward a multicloud strategy. Whether that is through multiple platforms or whether it's deploying on an open cloud platform, the outcome that they are trying to achieve is the same. Enterprises are increasingly transitioning from general-purpose tools to point solutions as their IT environments become bigger and more complex.

Sunday, March 29, 2009

Amdahl's law and automation

The theory (and a little bit of practice)

Automation in datacenter and now utility computing is heavily used to drive down TCO cost. However it is rather hard to find out what to automate in order to get the maximum benefit out of it. Some automation that seems obvious often has a low return on investment.

Hopefully, we can use ( or abuse) amdahl's law here. It is used to find the maximum expected improvement to an overall system when only part of the system is improved. Interpreted simply, Amdahl's Law says focus on improving the things that make the biggest difference overall.

Pic: Amdahl's law

If we adapt this law for TCO reduction 1 is the cost of running the system (datacenter, service etc.. ) for a discreet amount of time. The new cost is will be the length of cost the unimproved fraction takes, (which is 1 - P), plus the cost the improved fraction takes. The cost for the improved part of the system is the cost of the automated part's former cost divided by the automation cost factor, making cost of the improved part P/S. The final cost is computed by dividing the old running time by the new running time, which is what the above formula does.

Applied to datcenters, cloud or IT operation, this logic suggests that organizations should start with automation that makes the biggest impact, particularly IT staff productivity.

The reality

The reality

However, if we look at the reality the OPEX cost for a server and a datecenter is represent a very small part of the overall cost. According to google paper it varies between 7% and 9 % of the overall cost. Which means that if we still follow Amdahl law automation can provide only a very limited impact on the overall cost while maximising server utilisation guaranty a better return on investment ( not to mention being smart with hardware acquisition).

But there is no small economy.