Reflections Of The Void: fpga

Showing posts with label fpga. Show all posts

Tuesday, June 23, 2020

The rise of Domain Specific Accelerators

Two recent articles indicate a certain pick up of Domain-specific Accelerators adoption. With the end of Moore's Law, domain-specific hardware solution remains one of the few paths to continuing to increase the performance and efficiency of computing hardware.

For a long time, domain-specific Accelerators adoption was limited by economics factors. Historically, the small feature sizes, small batch sizes, and high cost of fab time (for ASICs) translated in a prohibitive per unit cost.
However, economic factors have shifted :

move toward standardised opensource tooling,
more flexible licensing model,
RISC-V architecture coming of age and maturing rapidly
Fab cost dropping
Wide availability of FPGA (AWS F1)
Rise of co-designed high-level programming language reducing the learning curve and design cycle.
power/performance wall of general-purpose compute unit

We are about to see a dramatic shift toward heterogeneous compute infrastructure over the next couples of years.

Tuesday, September 17, 2019

[Links of the Day] 17/09/2019 : DHCP load balancer, Graph processing on FPGA, Transform data with SQL

Dhcplb : DHCP load balancer. Looks really cool as it helps scale physical datacenter rapidly with the limited hassle of DHCP. It also reduces a lot of the network complexity that comes with the DHCP protocol.
Graph Processing on FPGA : survey and taxonomy on graph computations on FPGAs.
DBT : enables data analysts and engineers to transform their data using SQL language.

Thursday, August 09, 2018

[Links of the Day] 09/08/2018 : Consciousness and integrated information, Optical FPGA, Events DB

Making Sense of Consciousness as Integrated Information : in this papers, the authors argue that we currently have a dissociation between cognition and experience and that it might impact in the future in an hyper-connected world.
Towards an optical FPGA : it look like programmable silicon photonic circuits is the next frontier in the hardware accelerator. Converting light into an electrical signal has rapidly become too expensive and modern CPU have a hard time coping with the pace of evolution of networking capabilities.
TraildDB : tool for storing and querying series of events. Fast small efficient.

Tuesday, October 10, 2017

[Links of the Day] 10/10/2017 : Machine Learning Hardware acceleration , Homomorphic encryption

Tutorial on Hardware Architectures for Deep Neural Networks : How to leverage hardware for accelerating machine learning processes.
A Survey on Homomorphic Encryption Schemes : this paper presents a thorough survey of the state of homomorphic encryption schemes. Homomorphic encryption allows manipulation of the encrypted data without the need to decrypt it. This will allow when hardware will be fast enough to deal with the complexity of the operations, to have a true secure distributed multitenant database. As no operation on the hosting side will require clear text decryption of the data and everything can be done securely on the client side.
Efficient Methods and Hardware for Deep Learning : Standford lecture where guest lecturer Song Han present algorithms and specialized hardware (FPGA, GPU, ASIC, etc..) that can be used to accelerate training and inference of deep learning workloads. [video]

Friday, December 23, 2016

[Links of the Day] 23/12/2016 : Microsoft Configurable cloud (with fpga), Open Pilot OSS driving software, Deep learning is all about rigor

Microsoft's Production Configurable Cloud : built in custom nic + fpga for highly configurable and dynamic network stack in Microsoft DC. The work is really impressive. It demonstrate how pervasive FPGA and customization hardware will be in future datacenter.
Open Pilot : open source driving agent providing Adaptive Cruise Control (ACC) and Lane Keeping Assist System (LKAS) for Hondas and Acuras. This is a really interesting solution and I wonder how fast other company will start to leverage or opensource their own solution in order to accelerate adoption. However without extremely strong verification and proof system ( formal method ) it will be extremely hard ( and illegal probably ) to deploy such software at this stage.
Nuts and Bolts of Building Deep Learning : Andrew Ng reiterated at NIPS2016 that there is no secret AI equation that will let you escape your machine learning woes. All you need is some rigor. [video]

Monday, October 17, 2016

[Links of the day] 17/10/2016 : MIT Tardis 2.0 Cache, TCP/IP FPGA stack , Knowledge Defined Networking

Tardis 2.0 : MIT people are back with optimized and extended version of their novel cache system.
FPGA TCP/IP stack : TCP/IP stack that can be embedded on FPGA along applications, this allow seamless flow of data without CPU interaction or reliance on other devices. You could do some neat in line processing of data flow using this. It support 10 Gbps and thousands of concurrent connections. [github]
Knowledge-Defined Networking : merging network analytics and software defined network by using machine learning. The objective is to enable automated network control. To some extend we should replace the software is eating the world mantra with Machine learning is eating software one. And closer than you think at least for SDN as there is an effort in Open daylight by Cisco and al. to push machine learning in the SDN framework.

Monday, September 19, 2016

[Links of the day] 19/09/2016 : #AI bias, Incremental consistency , Customizable datacenter

Stuck in a Pattern : as predictive policing tools are being widely adopted in corporation and public organisation. There is little transparency as how these systems have been configured. It seems that the current set of software designed and deployed may reinforce discrimination and inequality under a veil of marketing publicizing intelligent solution.
Incremental consistency guarantees : The authors propose a system that instead of providing a single "hard" consistent answer to a query a system that will provide multiple reply with incremental consistency guarantee albeit with incremental latency cost. This allow system to make decision based on their consistency requirement as well as performance needs. This is interesting as it would allow some application to take decision based on consistent enough information while being able to revise their decision if needed once receiving a higher level of consistency response.
Customizable Computing at Datacenter Scale : NAS 16 keynote , it seems that HPC and exascale system are slowly converging toward an hybrid model with heterogeneous resources, FPGA, GPGU , CPU , etc..

Tuesday, June 28, 2016

[Links of the day] 28/06/2016 : Heterogeneous Data-centers (FPGA - GPU) and User-mode Ethernet verbs

Heterogeneous datacenters : Datacenter slowly evolve and we start to see the emergence of specialized dedicate hardware to squeeze the maximum efficiency per watts consumed. As general CPU start to hit their limit, user turn , like the HPC community, to FPGA or GPU to break away of the current power wall.
A quantitative analysis on microarchitectures of modern CPU-FPGA platforms : Well if you want to venture in the heterogeneous datacenter you better read this paper on the different platform available out there.
User Mode Ethernet Verbs : Probably the only serious contender to Intel DPDK, User mode verbs expose Verbs API allowing both user-mode applications/ULPs direct access to offload capabilities in the form of Raw Ethernet QPs. Basically you can send receive raw ETH packet using RDMA verbs API and leverage offload engine to accelerate the operations [video]

Monday, May 30, 2016

[Links of the day] 30/05/2016 : FPGA market extinction event, Consensus as cloud service , OSCON16

FPGA extinction level event : article looking at the evolution of the FPGA vendor market. It seems that if Xilinx get aquired, 80% of the FPGA market vanished ( Altera was acquired recently by Intel). This as far reaching implication for the market as consolidation occurs and focus seems to be toward datacenter solution at the detriment of the rest of the market.
Filo : consolidated consensus as a cloud service. Really interesting paper looking at the possibility to offer a consensus system as a service within cloud. This would greatly help anybody out there relying on their zookeeper / consul / etc.. and allow them to focus even more on the business logic.
OsCon : slides of the excellent OsCon are up . Lots of docker related stuff .. but if you look past it there is also some gems such as Netflix SSH Bastion talk or the Build to Lead talk.

Wednesday, May 04, 2016

[Links of the day] 04/05/2016 : Openserver Summit & Fortran OpenCoArray

OpenCoArray : Fortran is not dead, and the work on the Co array with accelerator demonstrate it.
Openserver summit :

pcie 4.0 : Some really nice improvement with the upcoming standard in term of performance and especially RAS. However not mr-iov capability yet.. This is sorely missing to make PCIe a true contender on the rack scale fabric level.
Azure SmartNIC : Microsoft use FPGA based smartnic to shorten the update cycle of their Azure cloud fabric. Its a really impressive solution.
Persistent Memory over Fabrics : Mellanox pushing for RDMA based persistent memory solution. Probably trying to corner the market quickly as 3dXpoint and Omnipath solution from Intel are just around the corner. However what caught my attention is slide 14: HGST PCM Remote Access Demo. What is really interesting is that HGST is probably one step away from merging NVM and RDMA fabric onto a single package. With that they would be able to offer a direct competition with DSSD at lower cost ( following the Eth Drive model ).

Saturday, March 26, 2016

PureStorage bring us one step closer to micro storage architecture

Pure storage just released its Flashblade product. It is an fabric connected object storage solution. It is a modular solution composed of a large numbers of blades which are each made of :

8TB to 52TB raw NAND storage capacity : a lot but still take less than half the real estate space on each blade.
NV-RAM+supercapacitor write buffer : when your NAND is still too slow you want to have a persistent buffer of NVRAM to handle the bursts
ARM CPU + FPGA : to deal with the “low level” operations such as erasure code, etc..
8 core Xeon System on chip : for moving the computation to where the data is located, pretty much all the high level operation such as NFS , S3 , object storage etc..
40 Gbit ethernet : that s where the data gets out
PCIe fabric networking : in chassis solution linking compute, storage cards via a proprietary protocol, what’s interesting is that the system is self contained and scaling with other box goes through the 10 Gb/s connectivity and not a proprietary fabric link. Which implies that it doesn’t need exotic solution once you go past the box boundaries. This is great as it makes it easy (and cheap) to scale however I wonder what are the implication in term of performance once you start crossing boundaries.

What is interesting is that, when you look at Purestorage solution, they decided to integrate the high level compute aspect of storage directly with the low level one in a single blade. They ended of with an hybrid solution combining ARM and FPGA for low level aspect such as deduplication, erasure code. And the Xeon for the object storage and file system solution.

One can assume that the decision behind such architecture was driven by the customers requirement that tend to want a high performance Jack of all trade solution. I can picture the product manager arguing for supporting every scale out storage protocol popular at the moment. However, Jack always end up master of none and to over compensate PureStorage had to pump up the compute capabilities.

While this seems like a good choice it is also counter productive in term of Watt per GB coupled with a lot of real estate wasted or duplicated. Don’t get me wrong, what Pure achieved with the flashblade is impressive but I can’t stop thinking that they should have taken it a step further.

This type of high performance, high-cost and high-power architecture technology is a right step toward micro storage architecture which delivers low cost low power high performance and scalability features. Now it is all about trimming down the system while maintaining scalability by dividing the blade system into a much larger number of smaller nodes, literally offering what the ethernet connected equivalent of HGST with flash.

However this might also implies that you won’t be able to offer support for every single storage solution out there (NFS, S3, block, etc..) without having to rely on either client side processing or using a frontend. This should be achievable while maintaining excellent performance, the key to this will hide in the detail of the core storage api employed.

Friday, February 26, 2016

[Links of the day] 26/02/2016 : Usenix Fast 16 , FPGA liberouter and Event delivery at spotify

Fast 2016: all of Usenix Fast 2016 goodness available in one place.Interesting to see that we start to see the emergence of storage systems optimized for time series [BTrDB] . Also note the always interesting report on failure rate (this time for flash)
Liberouter : really cool project using FPGA to deliver hardware acceleration of network security and monitoring tools.
Event Delivery at spotify : part 1 of a series of blog post on event monitoring and management system used at spotify.

Monday, January 11, 2016

Links of the day 11/01/2015 : All about NVM papers

A Case for Efficient Hardware/Software Cooperative Management of Storage and Memory : As hardware get faster and CPU becomes the bottleneck in the stack. Storage, network etc... are now LIMITED by the OS/CPU stack. As a result we need to rethink how not to waste by increasing the coupling between SW and HW (think dpdk, unikernel, etc..) [slide deck]
High Performance Hardware-Accelerated Flash Key-Value Store : New type of storage HW require new logical data-structure , K/V store is a natural fit and DSSD has proven it with their product. [slide deck]
FPGA-based hardware acceleration for a key-value store database : If CPU can't keep up, well maybe throwing dedicated HW will :) . This is probably a first step toward fabric connected object or K/V storage system. The next step up from the Ethernet connected drive described in this post.

Subscribe to: Posts ( Atom )

Reflections Of The Void