RDMA RoCE
INCAST
Spectrum-X switches
Bluefield DPUs
Flow level loadbalancing vs packet level loadbalancing (spraying)
Leads to out-of-order
NVIDIA's RoCE (RDMA over Converged Ethernet)
RDMA
HPC - high performance computing
Need for - high speed low latency connections
Eveyday internet use TCP/IP not good for this
Application > OS > TCP/IP stack > NIC card (CPU intence) - adds latency
RDMA Approach
-rNIC offlocading (OS,stack)
1.Infiniband (dedicated nw switch & nic)
2.iWARP - internet wide area RDMS protocol (need iWARP capablie NICs)
3a.RoCE
3b.RoCEv2 (UDP+IP packets)
for RDMA lossless is must
*traditional ethernet is lossy when there is a congestion
1.MTU
2.QoS - prioritse roce packet dscp
3.PFC - priority flow control
Network switch (receving switch send a pause frame to sending switch , this will make sending switch to stop sending for some time)
this will help in lossless transaction but intraduce Head-of-line-blocaking
4.DCQCN - data center quantized congestion notification)
ECN - Explicit congestion notification
receving switch sets ECN towards receiver based on the buffer utilization .then receiver generates a special congestion notification packet (CNP) directly back to the sender
and sender slows down the sending rate
Best practice is use both DCQCN + PFC
Comments
Post a Comment