Hands-On LLM Serving and Optimization

Hosting LLMs at Scale

AvPeiheng Hu,Chi Wang

E-bok
Engelska, 2026

811 kr

Läs direkt i Bokus Reader – eller ladda ned till din enhet

Fler format och utgåvor

Beskrivning

Large language models (LLMs) are the reasoning engines of modern AI. Today, a major inflection point has arrived: as the world races to deploy AI at scale, model inference has moved to the center of the stack. Welcome to the inference era. Without proper optimization, however, LLMs can be expensive and slow to serve. Hands-On LLM Serving and Optimization is a comprehensive guide to the complexities of deploying and optimizing LLMs at scale.In this hands-on, engineering-focused book, authors Chi Wang and Peiheng Hu combine practical examples, code, and strategies for building robust, performant, and cost-efficient AI token factories. Whether you re building the LLM inference infrastructure or the applications that consume it, a deep understanding of LLM serving will make you a more effective, future-ready engineer as AI transforms how we work and build.Learn the foundations of model serving with core concepts, design paradigms, and industry best practicesUnderstand the common challenges of hosting LLMs at scaleBalance latency and throughput to meet the demands of AI applications and business requirementsHost LLMs cost-effectively with practical, code-backed techniques

Produktinformation

Utforska kategorier

Hoppa över listan

Mer från samma författare

Hoppa över listan

Du kanske också är intresserad av