Skip to main content
Version: Next

Welcome to KServe

Deploy and scale AI models effortlessly — from cutting-edge generative AI and large language models to traditional ML models — with enterprise-grade reliability across any cloud or on-premises environment.

CNCF Incubating Project

KServe is a CNCF incubating project and part of the Kubeflow ecosystem.


Why KServe?

KServe eliminates the complexity of productionizing AI models. Whether you're a data scientist, DevOps engineer, or platform architect, KServe provides a unified solution that works across clouds and scales with your needs.

🚀 Minutes to Production

Deploy GenAI services and ML models with simple YAML — no complex infrastructure setup required.

☁️ Cloud-Agnostic

Run anywhere: AWS, Azure, GCP, on-premises, or hybrid environments with consistent behavior.

📈 Enterprise-Scale Ready

Scale to zero when idle, handle traffic spikes automatically, and manage hundreds of models efficiently.


Key Benefits

FeatureDescription
LLM Multi-frameworkDeploy LLMs from Hugging Face, vLLM, and custom generative models
OpenAI-Compatible APIsChat completion, streaming, and embedding endpoints out of the box
LocalModelCacheReduce LLM startup time from 15–20 minutes to ~1 minute
KV Cache OffloadingOptimized memory management for long conversations and large contexts
Multi-node InferenceDistributed LLM serving across multiple nodes
Envoy AI GatewayEnterprise-grade API management and routing for AI workloads
Metric-based AutoscalingScale on token throughput, queue depth, and GPU utilization
Canary DeploymentsA/B testing and canary rollouts for LLM experiments
→ Full Generative AI docs

Architecture Overview

KServe consists of two main planes:

🎛️ Control Plane

📡 Data Plane

KServe extends Kubernetes with custom resources for AI/ML workloads — handling load balancing, autoscaling, canary deployments, and monitoring automatically. Pluggable runtimes let you use the best engine per model type: vLLM for LLMs, TorchServe for PyTorch, or custom containers.


Supported Frameworks


Get Started

Learning path: Tutorial → Core conceptsProduction setupAPI reference


Community & Support

ChannelLink
GitHubgithub.com/kserve/kserve — issues, PRs, releases
SlackCNCF Slack #kserve — questions and discussion
Community MeetingsMonthly calendar — open to all
AdoptersSee who's using KServe