Sunday, November 22, 2020

HPC ecosystem - SC20

 This article is a quick summary of SC20 trends and the current state of the HPC ecosystem from a tech and market perspective.

Technology-wise there are three main competing HPC architectures: 

  • Commodity (e.g. Intel)
  • Commodity + accelerator (e.g. GPUs)
  • Lightweight cores (e.g. IBM BG, Xeon Phi, TaihuLight, ARM )

Commodity systems represent the bulk of the systems out there. However, commodity + accelerator are ramping up their presence aggressively. Nvidia dominates this market segment with 142 systems out of 149. With Intel scooping 4 with it's Phi solution. Lightweight cores systems are a minority with only four systems. But with the new A64FX and a renewed appetite for custom chips, this might change rapidly.

Intel is still dominating the ecosystem, with 92% of the shares, followed by AMD with 4%. However, this might change rapidly with AMD EHP technology ramping up. Another aspect is that AMD technology tends to be more open-source friendly, which can make it more attractive long term. Not to mention that their GPU also start to become highly competitive in the AI space.

From a market size, the HPC market was $39.0 billion in 2019, up 8.2% from $36.1 billion in 2018. Predictions show growth to $55.0 billion in 2024. Most of the growth was led by government spending after six years of growth led by industry. The number of system in industry vs public is not equally divided with ~50% each.

One notable change is double-digit growth of cloud HPC related market. Cloud grew 17.8% to $1.4 billion; however, this might only be the tip of the iceberg as many companies might be using HPC like system in the cloud without labelling it as HPC. Cloud solutions are heavily displacing low-end HPC segment. Entry and mid-range level server classes have the slowest growth in years as consumers prefer to buy HPC as a service solution and reduce their CAPEX. 

AI is still heavily influencing the HPC infrastructure market as it represents a considerable opportunity for HPC solution vendors. HyperscaleAI infrastructure by itself is about $8 billion. It seems that for the moment, AI and HPC future are closely intertwined.

Sources: Intersect360 research - Pre-SC20 Market Update & Jack Dongarra - An overview of HPC

Thursday, November 05, 2020

The real motivation behind the Matrix engine (GPU/TPU/...) adoption in HPC

 There is current backlash in the HPC community against GPU/TPU/... aka matrix engine accelerator adoption. Most of the arguments are performance, efficiency and real HPC workload driven. 

Like in a recent paper by Jens Domke et Al., colleagues of Dr Matsuoka at RIKEN and AIST in Japan, explore if the inclusion of specialized matrix engines in general-purpose processors are genuinely motivated and merited, or is the silicon better invested in other parts.

I wrote before in that a lot of new HPC systems overuse matrix engine hardware in their architecture. In this paper, the authors looked at the broad usefulness of matrix engines. They found that there is only a small fraction of real-world application that use and benefit from accelerated dense matrix multiplications operations. Moreover, when combining HPC and ML or when you try to accelerate traditional HPC applications, the inference computation is very lightweight compared to the heavyweight HPC compute. 

While I agree with the argument put forward some other aspects that go beyond HPC need to be taken into consideration as to why there is such a push for matrix engine adoption. And these aspects are mainly market-driven. If you compare markets, there is significantly more money in the "hyped" years old AI market (training + inference) vs the 30 years old "mature" HPC market. 

In raw numbers, the HPC market is worth $39 Billion. In comparison, the AI market is worth $256 Billions in hardware along. If you focus on AI semiconductor only it is still $32 Billion alone! And the growth projections are not in favour of HPC. 

If you then look into the N^4 computing complexity for AI vs at best N^3 for HPC. Or look where (institutions, companies, systems/individuals such as in cars, wearables, medical appliances, etc.) those AI systems are going vs HPC systems. You quickly understand the significant difference and potential between the two markets.

If you take the ROI of AI-related business into consideration, it now makes more sense why HPC institutes are investing in such type of hardware. Such investment will allow them to tap into a promising and fast-expanding market. The matrix engine movement is simply a market-driven investment to ensure the best ROI for HPC centres.

Sunday, November 01, 2020

ARM ecosystem disintegration and the rise of RISC-V

#ARM acquisition by #Nvidia is making people uneasy. 

And the early sign of the unravelling of the #ARM ecosystem start to appear: ThunderX3 general-purpose ARM CPU has been cancelled.

One would ask why spending $$ to build a better product and increase its number of consumers if, for that, it will have to use the Nvidia IP and compete directly against the IP owner.
If you combine this with the difficult viability of putting together a general-purpose #ARM alternative to #Intel / #AMD as #ARM vendors are effectively competing on cost with much lower volumes.

We start to understand why Marvell decided to shift toward the much more trendy IPU/PDU/Smartnic market.

On the other hand, I think we will see an acceleration of RISC-V adoption. Eating away at the traditional #ARM market share. This will be driven by the large scale edge deployment of #riscv sees chips with a RISC-V core and an #NPU (neural processing unit). These chips can be churned out at incredibly cheap cost, less than $10, and these will become ubiquitous really rapidly.

It might take 10-15 years but ultimately this will seal the fate of the ARM franchise.