Watch back Vishesh Jindal's #CloudStackCollab session "Orchestrating GPU workloads with CloudStack"! This session dives into the technical design and implementation of native GPU orchestration in Apache CloudStack 4.21 on KVM, including device discovery, capability classification, and inventory synchronization via the KVM agent. By watching this session back, you will learn how GPU‑backed service offerings are defined and consumed by Instances. Vishesh covers operator prerequisites & host setup (IOMMU, vendor vGPU profiles), and lifecycle operations from provisioning to teardown. A live demo walks through GPU discovery on host, offering creation, and end‑to‑end deployment. https://lnkd.in/gwJhzdXn
Vishesh Jindal on GPU Orchestration with CloudStack 4.21
More Relevant Posts
-
dstack 0.20.9 is out 🚀 More improvements around GPU workload provisioning and visibility: easier to see what’s happening during provisioning/scheduling, plus better Kubernetes integration. Release notes 👇 https://lnkd.in/dRYQ2UMR
To view or add a comment, sign in
-
-
CruiseKube is a new Kubernetes controller that keeps an eye on your workloads and automatically right-sizes CPU and memory requests. It goes beyond static limits by looking at real CPU pressure (PSI metrics) and what else is running on the node so resource adjustments are actually context-aware. https://lnkd.in/esqkGzVS
To view or add a comment, sign in
-
🧠 “Cluster has capacity.” Then why are my pods still Pending? If you’ve run EKS long enough, you’ve seen this: 35% CPU free 40% memory free Cluster looks healthy Critical workload stuck in Pending for 30+ minutes. The problem isn’t capacity. It’s fragmentation. Kubernetes isn’t a perfect bin-packer. Between: • requests vs allocatable • anti-affinity rules • topology spread constraints • hugepages / GPUs • node group mix • uneven AZ distribution you create invisible holes in the cluster. Plenty of total capacity. Not enough schedulable shape. So what happens? Teams: • overprovision nodes • blame Karpenter • increase instance size • add another node group Expensive band-aids. Real fix: Design node pools intentionally. Align resource requests with instance shapes. Use descheduler where appropriate. Audit anti-affinity rules regularly. “Cluster has capacity” is a dashboard illusion. Schedulability is the real metric.
To view or add a comment, sign in
-
-
I deployed a vLLM server on my local network. Then, I exposed it to a Jupyter Hub deployed on a Kubernetes in a private (separated from the local network) network. As shown in the article, the vLLM deployment on Dell Pro Max with GB10 is straightforward. But I had to modify nftables rules and the network policy for Jupyter Notebook. When I try doing something properly, nothing is that simple. https://lnkd.in/gzvFi-qZ
To view or add a comment, sign in
-
⚡ Faster turnaround, shared throughput, and clearer visibility into processing. The InstaLOD Grid series is a great place to start if you are exploring on‑prem compute for 3D pipelines. ✅ Download and more: Link in the comments https://lnkd.in/eZYAuKua
To view or add a comment, sign in
-
In the latest llm-d release, we explore the newest updates to the GPU Recommendation Tool. This key feature of the Configuration Explorer is specifically designed to help developers and researchers navigate the high costs of hardware resources by evaluating performance before requesting cluster access. Whether you are looking for the highest throughput, the lowest latency, or the most cost-effective setup, this tool provides a data-driven baseline to guide your decision-making. In the video demo: ⚫️ The Power of the Roofline Algorithm: Learn how we leverage the LM Optimizer roofline algorithm to analyze hardware specs and compute potential inference performance. ⚫️ Performance vs. Cost Visualizations: We walk through our intuitive UI, featuring plots that map throughput against latency. ⚫️ Finding the Sweet Spot: See how we identify optimal configurations, represented by points on the lower right of the graph where the smallest "bubbles" indicate the lowest costs. ⚫️ UI & API Flexibility: See the tool in action via the web interface, or learn how to integrate it directly into your workflows using the config_explorer.recommender API. ⚫️ Beyond the Basics: A quick look at the Capacity Planner for memory planning and parallelism strategies. https://lnkd.in/eRMDHp-d
Optimizing LLM Workloads: A Deep Dive into the GPU Recommendation Tool & Configuration Explorer
https://www.youtube.com/
To view or add a comment, sign in
-
There has been a lot of talk about the end of Software 2.0. What I am watching instead is where value is concentrating. It feels closer to a memory hierarchy dynamic. The closer you are to compute, the more pricing power you hold. AI server deployments are real. DDR5 is clearly server led now. This is not sentiment. It is build activity. Demand is quiet, but allocation windows are forming. MTC20F208XS1RC48BB1 MTC20F208XS1RC56BB1 MT40A2G8SA-062E IT:F Less narrative. More deployment. Happy to compare notes if you are mapping Q2 or Q3 server builds. #Memory #DDR5 #DRAM #AIInfrastructure #Semiconductor #SupplyChain #DataCenter
To view or add a comment, sign in
-
-
Elastic VM storage and what it means. 🔥🔥 Imagine you run out of disk space and your only option is to purchase more hardware - especially in these days of price hikes on storage. 😳😳 My fellow Nutant Marc Waldrop have dropped a short video explaining how we at Nutanix still evolves our platform helping customers to get the most out of their investment. Yet another flexibility, in probably the most flexible platform you will find today, for your workloads. Jakob, Marcus, Nicolai, Lars
I wrote a blog about this feature a few weeks back! To put more color to it and actually show it in action, I put together this video highlighting the new "Elastic VM Storage" feature that was released in Nutanix AOS 7.5 and AHV 11. Pretty cool stuff! https://lnkd.in/ezNpZbmm
Elastic VM Storage Explained: NCI 7.5
https://www.youtube.com/
To view or add a comment, sign in
-
Kubecost is migrating container image hosting from Google Container Registry (gcr.io) to IBM Container Registry (icr.io). On July 30, 2026, Kubecost images will be removed from gcr.io. After this date, Kubecost images will no longer be updated or available from gcr.io
To view or add a comment, sign in
-
-
Kubernetes just dropped something useful for anyone running non-trivial clusters — Node Readiness Controller 👇 Traditionally, Kubernetes decides if a node can run workloads using a single “Ready” flag. That’s simple, but in real clusters you often have extra pieces — network agents, storage drivers, GPUs, custom checks — that must be healthy before a node truly should accept pods. The new Node Readiness Controller lets you define fine-grained, declarative readiness conditions so nodes only accept workloads once all required components are actually ready. This helps with bootstrapping complex node setups, enforcing health requirements, and improving reliability. In other words: >> You can have custom readiness gates beyond the built-in Ready condition >> You get more control over heterogeneous nodes (eg GPU vs general workloads) >> It fits better with real-world infra where services must be healthy before scheduling starts If you’re working with clusters that need more than the default “ready/not ready”, this controller is worth a look. https://lnkd.in/gVzbeevy #Kubernetes #cloudnative #containers #devops #infrastructure
To view or add a comment, sign in
-
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development