Reshaping Workflows with Dell Pro Precision and NVIDIA RTX PRO GPUs

VLLM, Inference, and the Next Era of Intelligent Workflows

Episode Summary

Get an inside look at VLLM, the open-source engine making LLM inference faster, scalable, and more efficient for local and cloud AI deployments.

Episode Notes

Ever wondered what makes real-world AI applications like chatbots, code assistants, and cloud agents lightning-fast and scalable?

In this episode of Reshaping Workflows with Dell Pro Precision and NVIDIA RTX, host Logan Lawler and VLLM project lead Kaichao You pull back the curtain on the open-source engine supercharging LLM inference everywhere.

Discover what VLLM actually does, how it handles massive models in the cloud and locally, the role of KVCache, and why it’s being adopted by tech giants from Amazon to LinkedIn. Learn the difference between throughput and latency-focused deployments, how hardware and model innovations impact AI scaling, and why staying current with VLLM means accessing the best in modern AI infrastructure. Whether you’re running a single agent locally or orchestrating thousands of requests in a data center, this conversation gives you practical insight and next steps for your own workflows.

Ready to rethink what’s possible with AI? Tune in now and see how VLLM, Dell, and NVIDIA are shaping the future of scalable, efficient AI deployments!

You can also watch this and all previous episodes here.

Follow Us

LinkedIn @DellTechnologies
Twitter @DellTech
YouTube @DellTechnologies
LinkedIn @NVIDIA
Twitter @NVIDIA
Instagram @NVIDIA
Facebook @NVIDIA

Presented by Dell and NVIDIA
https://www.dell.com/precisionai

https://www.dell.com/dellpromax

https://www.dell.com/nvidia-ai