Boost GenAI performance, Improve xPU Efficiency, & Host More Instances

Limited Memory Capacity for GPU-accelerated Chatbot
Without CXL

Challenge:

  • AI assistants rely heavily on GPUs for rapid token generation
  • GPU utilization decreases when data is needed from disk
  • System memory resources become more constrained as user base grows
  • AI services have a limited context window due to limited memory

Expanded Memory Capacity for GPU-accelerated Chatbot
With CXL

Solution:

  • Up to 40% more CPU usage, 40% faster insights, and 200% more instances
  • Enlarge context window by up to 4TB of memory per socket
  • Boost token generation with Leo CXL Smart Memory Controllers and LLM engine
  • Reduce latency and CPU overhead with zero storage I/O

Videos