raww
INCAST
Spectrum-X switches
Bluefield DPUs
Flow level loadbalancing vs packet level loadbalancing (spraying)
Leads to out-of-order
NVIDIA's RoCE (RDMA over Converged Ethernet)
RDMA
HPC - high performance computing
Need for - high speed low latency connections
Eveyday internet use TCP/IP not good for this
Application > OS > TCP/IP stack > NIC card (CPU intence) - adds latency
RDMA Approach
-rNIC offlocading (OS,stack)
1.Infiniband (dedicated nw switch & nic)
2.iWARP - internet wide area RDMS protocol (need iWARP capablie NICs)
3a.RoCE
3b.RoCEv2 (UDP+IP packets)
for RDMA lossless is must
*traditional ethernet is lossy when there is a congestion
1.MTU
2.QoS - prioritse roce packet dscp
3.PFC - priority flow control
Network switch (receving switch send a pause frame to sending switch , this will make sending switch to stop sending for some time)
this will help in lossless transaction but intraduce Head-of-line-blocaking
4.DCQCN - data center quantized congestion notification)
ECN - Explicit congestion notification
receving switch sets ECN towards receiver based on the buffer utilization .then receiver generates a special congestion notification packet (CNP) directly back to the sender
and sender slows down the sending rate
Best practice is use both DCQCN + PFC
AI models
1.opanai(chatcpt)
2.Anthropic(claude)
3.google(gemini)
4.Llama(open-source)
5.Deepseek
we can use browser to access these > to test prompt
We can use APIs to access these models > to build Apps
Gen AI Framwork (chain,flow,pipeline of components)
1.Langchain
2.LlamaIndex
3.haystack
Building a chain = connecting steps into a pipeline
Type of chains
Simple chain → prompt + LLM
RAG chain → retriever + prompt + LLM
Agent chain → tools + reasoning + LLM
Agentic AI framwork
1.CrewAI
2.autogen
3.metaGPT
RAG
RAG sits between your app/agent and the LLM
User
↓
Agent / App Logic
↓
RAG (data retrieval layer)
↓
LLM (reasoning)
↓
Response
Comments
Post a Comment