Vishesh Jindal on GPU Orchestration with CloudStack 4.21

View organization page for Apache CloudStack

2,631 followers

2mo

Watch back Vishesh Jindal's #CloudStackCollab session "Orchestrating GPU workloads with CloudStack"! This session dives into the technical design and implementation of native GPU orchestration in Apache CloudStack 4.21 on KVM, including device discovery, capability classification, and inventory synchronization via the KVM agent. By watching this session back, you will learn how GPU‑backed service offerings are defined and consumed by Instances. Vishesh covers operator prerequisites & host setup (IOMMU, vendor vGPU profiles), and lifecycle operations from provisioning to teardown. A live demo walks through GPU discovery on host, offering creation, and end‑to‑end deployment. https://lnkd.in/gwJhzdXn

Orchestrating GPU workloads with CloudStack | Vishesh Jindal

https://www.youtube.com/

To view or add a comment, sign in

More Relevant Posts

dstack

2,884 followers
2mo
Report this post
dstack 0.20.9 is out 🚀 More improvements around GPU workload provisioning and visibility: easier to see what’s happening during provisioning/scheduling, plus better Kubernetes integration. Release notes 👇 https://lnkd.in/dRYQ2UMR
Like Comment
To view or add a comment, sign in
GOStack

1,479 followers
2mo
Report this post
CruiseKube is a new Kubernetes controller that keeps an eye on your workloads and automatically right-sizes CPU and memory requests. It goes beyond static limits by looking at real CPU pressure (PSI metrics) and what else is running on the node so resource adjustments are actually context-aware. https://lnkd.in/esqkGzVS

GitHub - truefoundry/CruiseKube: CruiseKube is an intelligent Kubernetes resource optimization controller that automatically monitors, analyzes, and applies resource recommendations to improve cluster efficiency and reduce costs. github.com
Like Comment
To view or add a comment, sign in
Vikash K.
2mo
Report this post
🧠 “Cluster has capacity.” Then why are my pods still Pending? If you’ve run EKS long enough, you’ve seen this: 35% CPU free 40% memory free Cluster looks healthy Critical workload stuck in Pending for 30+ minutes. The problem isn’t capacity. It’s fragmentation. Kubernetes isn’t a perfect bin-packer. Between: • requests vs allocatable • anti-affinity rules • topology spread constraints • hugepages / GPUs • node group mix • uneven AZ distribution you create invisible holes in the cluster. Plenty of total capacity. Not enough schedulable shape. So what happens? Teams: • overprovision nodes • blame Karpenter • increase instance size • add another node group Expensive band-aids. Real fix: Design node pools intentionally. Align resource requests with instance shapes. Use descheduler where appropriate. Audit anti-affinity rules regularly. “Cluster has capacity” is a dashboard illusion. Schedulability is the real metric.
Like Comment
To view or add a comment, sign in
DaeGon Kim
2mo
Report this post
I deployed a vLLM server on my local network. Then, I exposed it to a Jupyter Hub deployed on a Kubernetes in a private (separated from the local network) network. As shown in the article, the vLLM deployment on Dell Pro Max with GB10 is straightforward. But I had to modify nftables rules and the network policy for Jupyter Notebook. When I try doing something properly, nothing is that simple. https://lnkd.in/gzvFi-qZ

Running vLLM using Nvidia DGX Spark daegonk.medium.com
Like Comment
To view or add a comment, sign in
InstaLOD

1,124 followers
2mo
Report this post
⚡ Faster turnaround, shared throughput, and clearer visibility into processing. The InstaLOD Grid series is a great place to start if you are exploring on‑prem compute for 3D pipelines. ✅ Download and more: Link in the comments https://lnkd.in/eZYAuKua

InstaLOD Grid: Jumpstart Your Instance youtube.com

1 Comment
Like Comment
To view or add a comment, sign in
llm-d

3,055 followers
2mo
Report this post
In the latest llm-d release, we explore the newest updates to the GPU Recommendation Tool. This key feature of the Configuration Explorer is specifically designed to help developers and researchers navigate the high costs of hardware resources by evaluating performance before requesting cluster access. Whether you are looking for the highest throughput, the lowest latency, or the most cost-effective setup, this tool provides a data-driven baseline to guide your decision-making. In the video demo: ⚫️ The Power of the Roofline Algorithm: Learn how we leverage the LM Optimizer roofline algorithm to analyze hardware specs and compute potential inference performance. ⚫️ Performance vs. Cost Visualizations: We walk through our intuitive UI, featuring plots that map throughput against latency. ⚫️ Finding the Sweet Spot: See how we identify optimal configurations, represented by points on the lower right of the graph where the smallest "bubbles" indicate the lowest costs. ⚫️ UI & API Flexibility: See the tool in action via the web interface, or learn how to integrate it directly into your workflows using the config_explorer.recommender API. ⚫️ Beyond the Basics: A quick look at the Capacity Planner for memory planning and parallelism strategies. https://lnkd.in/eRMDHp-d

Optimizing LLM Workloads: A Deep Dive into the GPU Recommendation Tool & Configuration Explorer

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Karen LI
2mo
Report this post
There has been a lot of talk about the end of Software 2.0. What I am watching instead is where value is concentrating. It feels closer to a memory hierarchy dynamic. The closer you are to compute, the more pricing power you hold. AI server deployments are real. DDR5 is clearly server led now. This is not sentiment. It is build activity. Demand is quiet, but allocation windows are forming. MTC20F208XS1RC48BB1 MTC20F208XS1RC56BB1 MT40A2G8SA-062E IT:F Less narrative. More deployment. Happy to compare notes if you are mapping Q2 or Q3 server builds. #Memory #DDR5 #DRAM #AIInfrastructure #Semiconductor #SupplyChain #DataCenter
Like Comment
To view or add a comment, sign in
Kennet Johansen
2mo
Report this post
Elastic VM storage and what it means. 🔥🔥 Imagine you run out of disk space and your only option is to purchase more hardware - especially in these days of price hikes on storage. 😳😳 My fellow Nutant Marc Waldrop have dropped a short video explaining how we at Nutanix still evolves our platform helping customers to get the most out of their investment. Yet another flexibility, in probably the most flexible platform you will find today, for your workloads. Jakob, Marcus, Nicolai, Lars

Marc Waldrop
2mo

I wrote a blog about this feature a few weeks back! To put more color to it and actually show it in action, I put together this video highlighting the new "Elastic VM Storage" feature that was released in Nutanix AOS 7.5 and AHV 11. Pretty cool stuff! https://lnkd.in/ezNpZbmm

Elastic VM Storage Explained: NCI 7.5

https://www.youtube.com/
Like Comment
To view or add a comment, sign in
Podili Sravan kumar
2mo
Report this post
Kubecost is migrating container image hosting from Google Container Registry (gcr.io) to IBM Container Registry (icr.io). On July 30, 2026, Kubecost images will be removed from gcr.io. After this date, Kubecost images will no longer be updated or available from gcr.io
Like Comment
To view or add a comment, sign in
Gineesh Madapparambath
2mo
Report this post
Kubernetes just dropped something useful for anyone running non-trivial clusters — Node Readiness Controller 👇 Traditionally, Kubernetes decides if a node can run workloads using a single “Ready” flag. That’s simple, but in real clusters you often have extra pieces — network agents, storage drivers, GPUs, custom checks — that must be healthy before a node truly should accept pods. The new Node Readiness Controller lets you define fine-grained, declarative readiness conditions so nodes only accept workloads once all required components are actually ready. This helps with bootstrapping complex node setups, enforcing health requirements, and improving reliability. In other words: >> You can have custom readiness gates beyond the built-in Ready condition >> You get more control over heterogeneous nodes (eg GPU vs general workloads) >> It fits better with real-world infra where services must be healthy before scheduling starts If you’re working with clusters that need more than the default “ready/not ready”, this controller is worth a look. https://lnkd.in/gVzbeevy #Kubernetes #cloudnative #containers #devops #infrastructure
Like Comment
To view or add a comment, sign in

2,631 followers

View Profile Follow

Vishesh Jindal on GPU Orchestration with CloudStack 4.21

Orchestrating GPU workloads with CloudStack | Vishesh Jindal

https://www.youtube.com/

More Relevant Posts

Optimizing LLM Workloads: A Deep Dive into the GPU Recommendation Tool & Configuration Explorer

https://www.youtube.com/

Elastic VM Storage Explained: NCI 7.5

https://www.youtube.com/

Explore content categories