written by Asterfuison
DPU/SmartNIC vendors & product lines include:
As a new “hotshot” in data centers,DPU is bound to be fierce battleground for various giants and companies; self-research, mergers and acquisitions, FPGA-based DPU (SmartNIC) ;ASIC (Application-specific integrated circuits) based DPU (SmartNIC); SoC(system-on-chip) based SmartNIC (DPU); various companies use their unique skills to seize this market opportunities.
In this article , we will systematically summarize main DPU vendors & its product lines on the market.
Before starting the text, let’s understand the following concepts:
Achronix Semiconductor is an American fabless semiconductor company ,which provides high-end FPGA-based data acceleration solutions designed to meet the needs of high-performance, compute-intensive, and real-time processing applications. Achronix offers Speedster7t FPGA family and Speedcore eFPGA IP. Users can deploy the technology as a standalone product, ASIC or SoC design. Achronix also offers VectorPath accelerator cards.
Achronix Speedster7t FPGA series optimized for high bandwidth workloads, eliminating performance bottlenecks associated with traditional FPGAs. Speedster7t FPGA is built on TSMC’s 7nm FinFET technology，with new 2D Network on Chip (2D NoC) 、a series of machine learning processor (MLP) workloads optimized for high bandwidth and artificial intelligence/machine learning (AI/ML)、High-bandwidth GDDR6 interface, 400G Ethernet and PCI Express Gen5 ports. It provides ASIC-level performance while retaining the full programmability of an FPGA.
Founded in 1969,Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company，specializing in the design and manufacture of various CPUs, GPUs and other microprocessors for the computer, telecommunications and consumer electronics industries . AMD completes its acquisition of Xilinx in February 2022，the deal, valued at nearly $50 billion, brings AMD Xilinx’s FPGA programmable logic modules and related DSP engines, AI accelerators, memory controllers and other key technologies ,supplemented technical reserves for AMD.
The DPU/SmartNIC provided by Xi linx is the Alveo series, which is based on FPGA. It is capable of accelerating computationally intensive applications, including machine learning inference, data analysis, video transcoding, etc,.
The Alveo series performs 90 times more than the CPU, and can be reprogrammed according to the user’s specific requirements. Since algorithms evolve faster than chip design cycles, programmable hardware that can adapt to changing algorithms is required. Xilinx Alveo SN1000 is the industry’s first SmartNIC to provide software-defined hardware acceleration for offloading all functions in a single platform, which can offloads CPU-intensive tasks to optimize network performance, and its architecture can accelerate a variety of custom offloads at wire speed. The SN1000 SmartNIC is based on the Xilinx 16nm UltraScale+™ architecture, powered by a low-latency Xilinx XCU26 FPGA and a 16-core Arm® processor.
In May 2022, AMD announced its acquisition of Pensando Systems in a deal valued at approximately $1.9 billion.
Pensando’s distributed services platform that will expand AMD’s data center portfolio with high-performance data processing units (DPUs) and software stacks.
These products are deployed at scale across cloud and enterprise clients such as Goldman Sachs, IBM Cloud, Microsoft Azure and Oracle Cloud. Pensando’s Elba SoC is a DPU focused on intelligent network switches, and the last Capri DPU was used in the Aruba CX 10000.
Tracing the source of DPU, there are two cloud computing giants that realize large-scale commercial DPU architecture: Amazon AWS and Alibaba Cloud. The Amazon Nitro system has been in R&D since 2013 and was officially released in 2017 to maximize performance and security.
The AWS Nitro product family is designed to offload all data center overhead (provides remote resources, encryption and decryption, fault tracing, and security policy services for VMS )from the CPU to the Nitro accelerator card, which will release 30% of the computing power originally used to pay “Tax” for upper-layer applications.
The Nitro system consists of three parts:
Azure’s Project Catapult launched its first pilot deployment of FPGA-enabled servers in a data center in 2013，the project shows a significant improvement in latency, running a decision tree algorithm 40 times faster than using the CPU alone while reducing the number of servers.
Azure deployed Catapult v1 in its WCS cloud storage in 2012, later Catapult v2 has been deployed on all newly purchased servers within Bing and Azure. By 2015, Microsoft had massively deployed FPGA into its Azure public cloud, and within a year, its AccelNet initiative introduced FPGA-based SmartNIC as the default hardware for implementing virtual networking capabilities in Azure, And FPGAs are deployed in more than 1 million hosts. In 2017, Azure deployed Catapult v3 to accelerate deep neural networks and increase network speeds in Bing to 50 Gb/sec.
Broadcom’s Stingray combines a powerful network controllers, high-performance ARM cpus, PCI Express 3.0, performance accelerators, and DDR4 RAM to offload computation-intensive applications from the CPU of host servers.
Stingray provides high packet rates and low latency. Broadcom is based on the logic of NetXtreme E series controllers,designed the NetXtreme-S BCM58800 chip at the core of Stingray，then placed 8 Arm v8 A72 cores clocked at 3 GHz in the cluster configuration.Additionally, the Stingray can be configured with 16 GB of DDR4 memory.
Broadcom also employs TruFlow technology, a configurable flow accelerator for offloading common network flow processes into hardware.From published information, TruFlow can offload tasks such as Open vSwitch (OvS) on hardware. The company also claims that TruFlow implements many classic SDN concepts in hardware, such as classification, matching, and operation.Therefore, Stingray is equipped with two programmable components, TruFlow and a cluster of four 3 GHz dual-core Arm v8 A72 complexes.
Mount Evans is Intel’s first ASIC IPU, developed in partnership with Google Cloud, targeting high-end and hyperscale data center servers. Oak Springs Canyon is Intel’s second-generation FPGA-based IPU platform built with Intel Xeon-D and Agilex FPGAs.
One of the keys to Intel IPU technology is a fast programmable packet processing engine.Whether it’s FPGA or ASIC-based products, customers can program them with P4 and support processes such as finding, revising, encrypting, and compressing.
At the “Intel Vision 2022” conference, Intel announced its latest IPU roadmap, showing the overall plan for IPUs from 2022 to 2026. Intel will continue ASIC + FPGA IPU design, its IPU roadmap is as follows：
In addition, Intel also introduced the IPU’s open source development kit, IPDK, which can be used to write applications for x86 chips and Arm chips such as Marvell Octeon. The toolkit includes functional blocks for customizing and defining workloads, including offloading package processing.
In 2019, Fungible defines DPU as a new type of data processing unit. Fungible’s F1 DPU is the industry’s first 800Gbps DPU and its flagship product.
Architecturally, the F1 DPU integrates a large number of multi-core processors,all 52 cores are the latest generation MIPS64 R6 kernel,not only supports hardware virtualization but also separates it into independent control units. F1 DPU adopts dual emission pipeline design, equipped with 64KB L1 I-cache and 80KB L1 D-cache, and the L1 cache supports data transmission between ccache, the total on-chip L2 cache reaches 32MB.In terms of memory, in addition to integrating 8GB of HBM, the F1 DPU also supports dual-channel DDR4 memory up to 512GB per channel.
Utilizing a unique combination of hardware and software design, the F1 DPU provides maximum functional flexibility without compromising data center computing energy efficiency.This enables F1 DPUs can use in high performance density and low latency environments such as storage (NVMe/TCP storage offload), security, AI/ML (GPU decoupling) and data analytics servers (OLAP, OLTP big data analytics engine). Taking storage as an example, in a storage system that does not require x86 CPU and AFA, F1 DPU can achieve 15M IOPS performance, and the bandwidth limitation here is entirely from the bandwidth limitation of PCIe itself.
Kalray is a fabless semiconductor company.Founded in 2008 as a spin-off of CEA French lab.
Kalray 3rd Generation MPPA® DPU Processor called Coolidge, which can manage multiple workloads in parallel for smarter, more efficient, and power-efficient data-intensive applications. Leveraging Kalray’s patented MPPA® (Massively Parallel Processor Array) architecture, Coolidge is a scalable 80-core processor designed for intelligent data processing. It provides a unique alternative to GPU, ASIC or FPGA，which brings unique value to multiple applications from data center to edge and embedded systems.
Based on its MPPA® processors, Kalray’s has developed a series of data-centric accelerator cards, the K200/K200-LP, that provide high performance and a high degree of programmability,which can be used as:
Marvell Technology, Inc. Founded in 1995,is an American company, headquartered in Santa Clara, California, which develops and produces semiconductors and related technology.
Marvell’s OCTEON and ARMADA devices are designed for wireless infrastructure and networking equipment, including switches, routers, security gateways, firewalls, network monitoring and SmartNICs，And supports a comprehensive and unified SDK and open source API for a wide range of network, security and computing applications.
Marvell OCTEON 10 DPU family is optimized for hyperscale cloud workloads, 5G wireless transport, 5G RAN Intelligent Controller (RIC) and edge inference, carrier and enterprise data center applications. It adopts TSMC’s 5nm process technology and ARM’s Neoverse N2 CPU core, plus an array of many functional building blocks of the first-generation OCTEON TX2. It also includes advanced IP and capabilities such as an integrated machine learning inference engine, inline encryption processor, and vector packet processor, all of which can be run in a virtualized manner.As an important complement to the DPU, Marvell is also introducing an in-house machine learning (ML) engine for OCTEON 10.
Napatech is a provider of programmable FPGA-based SmartNIC solutions serving global telecom, cloud, enterprise, cybersecurity and financial applications. Napatech SmartNICs include 1 GbE, 10 GbE, 25 GbE, 40 GbE and 100 GbE options with software support to offload computationally intensive networking and security processing from the server CPU.
Napatech NT200A02 SmartNIC is based on Xilinx’s powerful UltraScale+ VU5P FPGA architecture and supports 2×1/10G, 8x10G, 2×10/25G, 4×10/25G, 2x40G, 2x100G applications. NT200A02 SmartNIC supports GTP, IP-in-IP, NVGRE, VxLAN and other tunneling protocols.
The NT200A02 SmartNIC can also delete duplicate packets, fragment packets and filter packets to reduce the amount of data, thereby offloading server applications.Stateful stream processing with support for 140 million streams enables CPU-intensive applications to intelligently and accurately choose which streams to process and which streams to ignore . Maintain all flow records and report to the application.
Netronome is a company specializing in network stream processor acceleration semiconductors,its Agilio series SmartNICs provide the performance and programmability required by cloud operators and service providers without consuming a large number of CPU cores.
Nvidia, is an American multinational technology company incorporated in Delaware and based in Santa Clara, California.In 1999, Nvidia defined the GPU, greatly boosting the PC gaming market, redefining modern computer graphics technology.In April 2020, NVIDIA officially announced that it had completed the acquisition of Mellanox, with a product layout covering CPU, GPU and DPU.
NVIDIA BlueField DPU brings innovation to the modern data center.By offloading, accelerating and segregating a wide range of advanced networking, storage and security services, BlueField DPUs provide a securely accelerated infrastructure for a variety of workloads in environments such as cloud, data center or edge computing. BlueField DPUs combine powerful computing power, full on-chip infrastructure programmability, and high-performance networking to support demanding workloads.
Silicom is an industry-leading provider of high-performance networking and data infrastructure solutions.
Founded in 2017 ,Asterfusion Data Technologies delivers a fully open and highly disaggregated network from 1G-400G bare metal switches with SONiC based commerial operating system for next generation data centers, campus and service providers.Its cloud network switches, white box hardware, DPU appliance ,SmartNiC and NPB device are widely used in operators, Internet, public/private cloud and other industries.
Helium SmartNIC is self-develop by Asterfusion,equipped with Marvell Octeon CN96XX chip which has 2 models, one provides PCIe x16 Gen4.0 lane 4*25G interface and the other provides 2*100G interface,which is based on a high-performance SoC chip (24-core ARM integrated with multiple hardware-acceleration co-processors). which enables customers to build high-performance intelligent programmable networks while preserving valuable computing resources in servers and reducing the total CAPEX of cloud data centers.
Asterfusion Helium SmartNIC comes with a basic network operating system, FusionNOS- Framework. Network developers can use FusionNOS-Framework as the foundation to develop their own upper-layer applications, thus accelerating application porting & development.
SmartNIC has various usage, ranging from network acceleration, storage acceleration to security acceleration.
Such as VTEP, OVS offload, TCP offload, GRE/GTP tunnel encapsulation and decapsulation, reliable UDP,etc.
Such as IPSec, SSL, XDP/eBPF, vFW/vLB/vNAT, DPI, and DDoS defense,etc.
Such as NVMe-oF (TCP) and data compression/decompression,etc.
Based on Marvell octeon cn96xx chip, Asterfusion ET3000A is 48 core high performance DPU appliance.
It can be deployed into a variety of network scenarios, such as edge computing, NFV offload, network security device such as security gateways, network monitoring; Layer 3/edge Routers and switches, Software Defined Networks (SDN), Network Function Virtualization (NFV), Artificial Intelligence (AI), SSL/IPsec offload processing, LTE/ IoT/Fog / edge gateways, SD-WAN, 5G UPF application, intrusion protection system(IPS)
|Vendor||Product Line||Core Processor|
|Achronix||Speedster7t FPGA series||FPGA|
|AMD||(Xilinx;Pensando) Alveo series、Elba、Capri||FPGA, SoC|
|Asterfusion||Helium||SoC ARM+ASIC+ Dedicated Accelerator|
|Fungible||Fl DPU||NP SoC|
|Intel||IPU||FPGA+X86 SoC、 |
|Kalray||K200/K200-LP||MPPA DPU Processor|
|Marvell||OCTEON 10 DPU||SoC:ARM+ASIC|
|Nvidia||BlueField DPU||SoC:ARM+ASIC+Dedicated Accelerator |
GPU Dedicated Accelerator
|Silicom||N5010 , N5110A, P425G2SNxlAONIC||FPGA|
Asterfusion Networks is the leading provider of open networking infrastructure solutions. We provide an open, disaggregated, and highly programmable network fabric for next generation data centers and campus with white-box switching.