Sunday, November 22, 2020

HPC ecosystem - SC20

 This article is a quick summary of SC20 trends and the current state of the HPC ecosystem from a tech and market perspective.


Technology-wise there are three main competing HPC architectures: 

  • Commodity (e.g. Intel)
  • Commodity + accelerator (e.g. GPUs)
  • Lightweight cores (e.g. IBM BG, Xeon Phi, TaihuLight, ARM )


Commodity systems represent the bulk of the systems out there. However, commodity + accelerator are ramping up their presence aggressively. Nvidia dominates this market segment with 142 systems out of 149. With Intel scooping 4 with it's Phi solution. Lightweight cores systems are a minority with only four systems. But with the new A64FX and a renewed appetite for custom chips, this might change rapidly.

Intel is still dominating the ecosystem, with 92% of the shares, followed by AMD with 4%. However, this might change rapidly with AMD EHP technology ramping up. Another aspect is that AMD technology tends to be more open-source friendly, which can make it more attractive long term. Not to mention that their GPU also start to become highly competitive in the AI space.





From a market size, the HPC market was $39.0 billion in 2019, up 8.2% from $36.1 billion in 2018. Predictions show growth to $55.0 billion in 2024. Most of the growth was led by government spending after six years of growth led by industry. The number of system in industry vs public is not equally divided with ~50% each.





One notable change is double-digit growth of cloud HPC related market. Cloud grew 17.8% to $1.4 billion; however, this might only be the tip of the iceberg as many companies might be using HPC like system in the cloud without labelling it as HPC. Cloud solutions are heavily displacing low-end HPC segment. Entry and mid-range level server classes have the slowest growth in years as consumers prefer to buy HPC as a service solution and reduce their CAPEX. 







AI is still heavily influencing the HPC infrastructure market as it represents a considerable opportunity for HPC solution vendors. HyperscaleAI infrastructure by itself is about $8 billion. It seems that for the moment, AI and HPC future are closely intertwined.


Sources: Intersect360 research - Pre-SC20 Market Update & Jack Dongarra - An overview of HPC

Thursday, November 05, 2020

The real motivation behind the Matrix engine (GPU/TPU/...) adoption in HPC

 There is current backlash in the HPC community against GPU/TPU/... aka matrix engine accelerator adoption. Most of the arguments are performance, efficiency and real HPC workload driven. 

Like in a recent paper by Jens Domke et Al., colleagues of Dr Matsuoka at RIKEN and AIST in Japan, explore if the inclusion of specialized matrix engines in general-purpose processors are genuinely motivated and merited, or is the silicon better invested in other parts.




I wrote before in that a lot of new HPC systems overuse matrix engine hardware in their architecture. In this paper, the authors looked at the broad usefulness of matrix engines. They found that there is only a small fraction of real-world application that use and benefit from accelerated dense matrix multiplications operations. Moreover, when combining HPC and ML or when you try to accelerate traditional HPC applications, the inference computation is very lightweight compared to the heavyweight HPC compute. 


While I agree with the argument put forward some other aspects that go beyond HPC need to be taken into consideration as to why there is such a push for matrix engine adoption. And these aspects are mainly market-driven. If you compare markets, there is significantly more money in the "hyped" years old AI market (training + inference) vs the 30 years old "mature" HPC market. 




In raw numbers, the HPC market is worth $39 Billion. In comparison, the AI market is worth $256 Billions in hardware along. If you focus on AI semiconductor only it is still $32 Billion alone! And the growth projections are not in favour of HPC. 





If you then look into the N^4 computing complexity for AI vs at best N^3 for HPC. Or look where (institutions, companies, systems/individuals such as in cars, wearables, medical appliances, etc.) those AI systems are going vs HPC systems. You quickly understand the significant difference and potential between the two markets.

If you take the ROI of AI-related business into consideration, it now makes more sense why HPC institutes are investing in such type of hardware. Such investment will allow them to tap into a promising and fast-expanding market. The matrix engine movement is simply a market-driven investment to ensure the best ROI for HPC centres.

Sunday, November 01, 2020

ARM ecosystem disintegration and the rise of RISC-V

#ARM acquisition by #Nvidia is making people uneasy. 

And the early sign of the unravelling of the #ARM ecosystem start to appear: ThunderX3 general-purpose ARM CPU has been cancelled.

One would ask why spending $$ to build a better product and increase its number of consumers if, for that, it will have to use the Nvidia IP and compete directly against the IP owner.
If you combine this with the difficult viability of putting together a general-purpose #ARM alternative to #Intel / #AMD as #ARM vendors are effectively competing on cost with much lower volumes.

We start to understand why Marvell decided to shift toward the much more trendy IPU/PDU/Smartnic market.

On the other hand, I think we will see an acceleration of RISC-V adoption. Eating away at the traditional #ARM market share. This will be driven by the large scale edge deployment of #riscv sees chips with a RISC-V core and an #NPU (neural processing unit). These chips can be churned out at incredibly cheap cost, less than $10, and these will become ubiquitous really rapidly.

It might take 10-15 years but ultimately this will seal the fate of the ARM franchise.




Saturday, October 17, 2020

CryptMPI: A Fast Encrypted MPI Library

 As more #HPC applications move to cloud infrastructure, securing and protecting HPC sensitive data in such an environment becomes critical.

But HPC solution tends to fall short when it comes to security. Security features tend to be perceived as detrimental to the performance of the applications.

By example, encrypted communication has always been seen as incurring very significant overheads when you are aiming for microsecond latency.


The author of the Crypt MPI paper demonstrates that you can ensure the privacy and integrity of sensitive data with minimal performance degradation using an enhance MPI library.

I hope to see this kind of feature integrated as standard in a future version of MPI.


Paper: https://arxiv.org/abs/2010.06471

Code: https://github.com/abu-naser/CryptMPI-A-Fast-Encrypted-MPI-Library




Wednesday, July 08, 2020

Software RDMA revisited : setting up SoftiWARP on Ubuntu 20.04

Almost ten years ago I wrote about installing SoftIwarp on Ubuntu 10.04. Today I will be revisiting the process. First, what is SoftIwarp: Soft-iWARP is a software-based iWARP stack that runs at reasonable performance levels and seamlessly fits into the OFA RDMA environment provides several benefits. SoftiWARP is a software RDMA device that attaches with the active network cards to enable RDMA programming. For anyone starting with RDMA programming, RDMA-enabled hardware might not be at hand. SoftiWARP is a very useful tool to set up the RDMA environment, and code and experiments with.

To install SoftIwarp you have to go through 4 stages: Setting up the environment, Building SoftIwarp, Configuring Softiwarp, Testing.

Setting up RDMA environment

Before you start you should prepare the environment for building a kernel module and userspace library.
Basic building environment

sudo apt-get install build-essential libelf-dev cmake

Installing userspace libraries and tools

sudo apt-get install libibverbs1 libibverbs-dev librdmacm1 \
librdmacm-dev rdmacm-utils ibverbs-utils

Insert common RDMA kernel modules

sudo modprobe ib_core
sudo modprobe rdma_ucm


Check if everything is correctly installed : 

sudo lsmod | grep rdma 

You should see something like this : 

rdma_ucm               28672  0
ib_uverbs             126976  1 rdma_ucm
rdma_cm                61440  1 rdma_ucm
iw_cm                  49152  1 rdma_cm
ib_cm                  57344  1 rdma_cm
ib_core               311296  5 rdma_cm,iw_cm,rdma_ucm,ib_uverbs,ib_cm

Now set up some library for the userspace libs : 

sudo apt-get install build-essential cmake gcc libudev-dev libnl-3-dev \
libnl-route-3-dev ninja-build pkg-config valgrind


Installing SoftiWARP

10 years ago you had to clone the SoftiWARP source code and build it (https://github.com/zrlio/softiwarp.git). Now you are lucky, it is by default in the Linux kernel 5.3 and above!

You just have to type : 

sudo modprobe siw

verify it works : 

sudo lsmod | grep siw
you should see : 
siw                   188416  0
ib_core               311296  6 rdma_cm,iw_cm,rdma_ucm,ib_uverbs,siw,ib_cm
libcrc32c              16384  3 nf_conntrack,nf_nat,siw

moreover, you should check if you have an Infiniband device present : 

ls /dev/infiniband 

Result : 

rdma_cm


You also need to add the following file in your /etc/udev/rules.d/90-ib.rules directory containing the below entries : 

 ####  /etc/udev/rules.d/90-ib.rules  ####
 KERNEL=="umad*", NAME="infiniband/%k"
 KERNEL=="issm*", NAME="infiniband/%k"
 KERNEL=="ucm*", NAME="infiniband/%k", MODE="0666"
 KERNEL=="uverbs*", NAME="infiniband/%k", MODE="0666"
 KERNEL=="uat", NAME="infiniband/%k", MODE="0666"
 KERNEL=="ucma", NAME="infiniband/%k", MODE="0666"
 KERNEL=="rdma_cm", NAME="infiniband/%k", MODE="0666"
 ########


If it doesn't exist you need to create it.


I would suggest you add also the module to the list of modules to load at boot by adding them to /etc/modules file

You need now to reboot your system.


Userspace library

Normally, recent library support softiwarp out of the box. But if you want to compile your own version follow the step bellow. However, do this at your own risk... I recommend to stick with the std libs.

Optional build SIW userland libraries: 

All the userspace library are in a nice single repository. You just have to clone the repo and build all the shared libraries. If you want you can also just build libsiw but it's just easier to build everything at once. 

git clone https://github.com/zrlio/softiwarp-user-for-linux-rdma.git
cd ./softiwarp-user-for-linux-rdma/
./buid.sh

Now we have to setup the $LD_LIBRARY_PATH so that build libraries can be found. 
cd ./softiwarp-user-for-linux-rdma/build/lib/
export LD_LIBRARY_PATH=$(pwd):$LD_LIBRARY_PATH


or you can add the line in your .bashrc profile:
export LD_LIBRARY_PATH=<<PATHTOTHELIBRARIES>>:$LD_LIBRARY_PATH

End of optional section



Setup the SIW interface : 


Now we will be setting up the loopback and a standard eth interface as RDMA device:

sudo rdma link add <NAME OF SIW DEVICE > type siw netdev <NAME OF THE INTERFACE>


In this case for me : 

sudo rdma link add siw0 type siw netdev enp0s31f6
sudo rdma link add siw_loop type siw netdev l0

You can check the two devices have been correctly set up using ivc_devices and ibv_devinfo command
result of ibv_devices  :
    device              node GUID
    ------           ----------------
    siw0             507b9ddd7a170000
    siw_loop         0000000000000000

result of ibv_devinfo :

hca_id: siw0
 transport:   iWARP (1)
 fw_ver:    0.0.0
 node_guid:   507b:9ddd:7a17:0000
 sys_image_guid:   507b:9ddd:7a17:0000
 vendor_id:   0x626d74
 vendor_part_id:   0
 hw_ver:    0x0
 phys_port_cnt:   1
  port: 1
   state:   PORT_ACTIVE (4)
   max_mtu:  1024 (3)
   active_mtu:  invalid MTU (0)
   sm_lid:   0
   port_lid:  0
   port_lmc:  0x00
   link_layer:  Ethernet
hca_id: siw_loop
 transport:   iWARP (1)
 fw_ver:    0.0.0
 node_guid:   0000:0000:0000:0000
 sys_image_guid:   0000:0000:0000:0000
 vendor_id:   0x626d74
 vendor_part_id:   0
 hw_ver:    0x0
 phys_port_cnt:   1
  port: 1
   state:   PORT_ACTIVE (4)
   max_mtu:  4096 (5)
   active_mtu:  invalid MTU (0)
   sm_lid:   0
   port_lid:  0
   port_lmc:  0x00
   link_layer:  Ethernet

Testing with RPING: 

Now we simply test the setup with rping : 

In one shell : 
rping -s -a <serverIP> 

in the other : 

rping -c -a <serverIP> -v 


And you should see the rping working successfully! 

You are now all set to use RDMA without the need for expensive hardware. 


Thursday, July 02, 2020

[Links of the Day] 02/07/2020 : Database query optimization, Deep Learning Anomaly detection survey, Large scale packet capture system

  • event-reduce : accelerate query result after write. Basically if cache part of the write and recalculate the new query result using past query result and the recent write event. The authors observe an up to 12 times faster displaying of new query results after a write occurred.
  • Deep Learning for Anomaly Detection: A Survey : comprehensive survey of anomaly detection techniques out there. 
  • Moloch : Large scale, open-source, indexed packet capture and search.



Tuesday, June 30, 2020

Data is the new oil fueling Machine learning adoption but Businesses are discovering #AI is no silver bullet

Data is the new oil. However, unlike oil, as data scarcity is becoming less of a problem, processing costs are skyrocketing. The business world is waking up to the fact that while the cost of computing keeps getting cheaper all the time. The cost of training machine learning models is outpacing the compute cost drop.

Moreover,  business are finding challenging to adopt #ai, and the economist report numbers are showing how often #machinelearning projects in the real business world fail :
  • Seven out of ten said their #ai projects had generated little impact so far.
  • Two-fifths of those with “significant investments” in ai had yet to report any benefits at all.
Companies are finding that #machinelearning is not the promised silver bullet. The non-tech company are discovering what tech companies had to learn the hard way: that they are no Google, Facebook, ...

To successfully deploy an AI/ML/DL project you need: a vast amount of data, skilled employee, solid engineering practice, access to infrastructure and last but not least, a clear understanding of the business problem.

I have a false hope that corporation will abandon the silver bullet thinking, but I would settle for avoiding another #ai winter cycle.