Optimizing Pod Resource Allocation in GKE

Explore top LinkedIn content from expert professionals.

Summary

Optimizing pod resource allocation in GKE (Google Kubernetes Engine) means making sure each application running in a Kubernetes cluster gets the right amount of CPU and memory, helping prevent outages and keeping cloud costs down. By tuning how resources are assigned and scaled, organizations can ensure their apps stay reliable and avoid waste.

  • Set resource boundaries: Define clear CPU and memory requests and limits for each pod to prevent a single workload from using up all resources and disrupting other services.
  • Automate scaling decisions: Use tools like Cluster Autoscaler and Horizontal Pod Autoscaler to adjust the number of pods and nodes based on real-time demand and workload activity.
  • Monitor and review: Regularly check resource usage with monitoring tools and update configurations to match actual needs, making adjustments as usage patterns change.
Summarized by AI based on LinkedIn member posts
  • View profile for Akum Blaise Acha

    Senior DevOps & Platform Engineer | AWS, Docker & Kubernetes Expert | 6+ Years Designing Scalable, Reliable, Cost-Efficient Cloud Systems | Mentor & Newsletter Creator for 1500+ Engineers

    4,009 followers

    A pod in your Kubernetes cluster is eating memory. It started at 200MB at deploy time. It's now at 1.8GB and climbing. No memory limit was set in the deployment. Other pods on the same node are getting OOMKilled. What do you do as an immediate fix? And what do you change so this never happens again? I have experienced this before while working as a DevOps Engineer. Here's what I learned. The immediate fix is not to kill the pod. Your first instinct is to delete it. Don't. If there's a deployment behind it, Kubernetes will restart it immediately and the memory leak starts all over again. You've bought yourself 20 minutes before you're back in the same situation. Instead, cordon the node first. This tells Kubernetes to stop scheduling new pods on that node. The damage is now contained. No new victims. Then set a memory limit on the deployment and redeploy. Even a generous limit like 512MB is better than no limit. The pod will get OOMKilled when it crosses 512MB instead of eating 1.8GB and starving everything around it. The leak still exists but now it has a ceiling. After that, check the other pods that were OOMKilled. They didn't die because of their own problems. They died because your leaking pod stole their memory. Kubernetes kills the pods it considers least important when the node runs out of memory. Your perfectly healthy services got evicted because one pod had no manners. Now the real work. Making sure this never happens again. Every pod in your cluster needs resource requests and limits. Every single one. No exceptions. A pod without memory limits is a pod that can consume the entire node. It's not a question of if. It's when. Enforce this with admission controllers. Use OPA Gatekeeper or Kyverno to reject any deployment that doesn't include resource limits. Don't rely on code reviews to catch this. Humans miss things. Policy engines don't. Add monitoring on container memory trends. Not just current usage. The trend. A pod sitting at 400MB is fine. A pod that was at 200MB yesterday and is at 400MB today and will be at 800MB tomorrow is a leak. Alert on the rate of change, not just the threshold. Set up namespace-level ResourceQuotas. Even if one team forgets limits on a pod, the namespace itself has a ceiling. One team's leak can't consume the entire cluster. And finally, fix the actual memory leak. Profile the application. Check for unclosed connections, growing caches, event listeners that never get cleaned up. The infrastructure guardrails keep you alive but the application code is where the real fix lives. Systems without boundaries will always consume everything available to them. Your job isn't just to fix incidents. It's to make sure the environment enforces good behavior even when humans forget. How would you handle this? #kubernetes #devops #platformengineering #sitereliability #cloudinfrastructure #systemdesign #containerorchestration

  • View profile for Vikash Kumar

    Senior Platform Engineer | Ex-Intel | DevOps Architect | Specializing in Multi-Cloud, AI/ML & Kubernetes | Mentor & Tech Content Creator

    8,521 followers

    “𝐇𝐨𝐰 𝐈 𝐂𝐮𝐭 𝐎𝐮𝐫 𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 𝐂𝐥𝐨𝐮𝐝 𝐂𝐨𝐬𝐭𝐬 𝐛𝐲 58% 𝐖𝐢𝐭𝐡𝐨𝐮𝐭 𝐃𝐨𝐰𝐧𝐭𝐢𝐦𝐞 ⚡” Last quarter, I noticed our cloud bill skyrocketing. The reason? Idle Kubernetes workloads running 24/7, even when no one was using them. 𝐇𝐞𝐫𝐞’𝐬 𝐰𝐡𝐚𝐭 𝐈 𝐝𝐢𝐝 𝐭𝐨 𝐨𝐩𝐭𝐢𝐦𝐢𝐳𝐞 𝐜𝐨𝐬𝐭𝐬 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐢𝐦𝐩𝐚𝐜𝐭𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞: 1️⃣ 𝑰𝒎𝒑𝒍𝒆𝒎𝒆𝒏𝒕𝒆𝒅 𝑪𝒍𝒖𝒔𝒕𝒆𝒓 𝑨𝒖𝒕𝒐𝒔𝒄𝒂𝒍𝒆𝒓: • Scaled down unused nodes automatically during off-peak hours. • Configured resource limits/requests to prevent over-provisioning. 2️⃣ 𝑺𝒄𝒉𝒆𝒅𝒖𝒍𝒆𝒅 𝑵𝒐𝒏-𝑪𝒓𝒊𝒕𝒊𝒄𝒂𝒍 𝑾𝒐𝒓𝒌𝒍𝒐𝒂𝒅𝒔: • Used Kubernetes CronJobs & KEDA (Kubernetes Event-Driven Autoscaling) to spin up workloads only when needed. • Labeled dev/test namespaces with auto-suspend: true for automation. 3️⃣ 𝑶𝒑𝒕𝒊𝒎𝒊𝒛𝒆𝒅 𝑹𝒆𝒔𝒐𝒖𝒓𝒄𝒆 𝑨𝒍𝒍𝒐𝒄𝒂𝒕𝒊𝒐𝒏: • Ran kubectl top to identify resource hogs. • Tuned CPU & memory requests based on real usage, not guesswork. 4️⃣ 𝑪𝒍𝒆𝒂𝒏𝒆𝒅 𝑼𝒑 𝑼𝒏𝒖𝒔𝒆𝒅 𝑹𝒆𝒔𝒐𝒖𝒓𝒄𝒆𝒔: • Automated cleanup of dangling PVCs, old Helm releases, and zombie services using custom scripts. • Set TTL for jobs to delete themselves after completion. 💡 𝐓𝐡𝐞 𝐑𝐞𝐬𝐮𝐥𝐭𝐬 𝐖𝐞𝐫𝐞 𝐆𝐚𝐦𝐞-𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠: • Monthly Kubernetes costs dropped from $12,000 to $5,040 • Cluster performance improved with optimized node usage • Zero downtime for production apps • Automated Slack alerts for resource spikes or cost anomalies 𝐏𝐫𝐨 𝐓𝐢𝐩: Don’t just focus on pods and nodes. Also, check for: 1️⃣ Orphaned Persistent Volumes (PVs) 2️⃣ Unused Load Balancers or Ingresses 3️⃣ Over-provisioned StatefulSets or DaemonSets 𝐛𝐨𝐧𝐮𝐬: I created a lightweight Go-based controller that identifies untagged namespaces, idle workloads, and sends daily cost reports. Found $1,200 worth of wasted resources in the first week alone! Want the Helm chart or YAML manifests for this setup? Drop a comment below, and I’ll share the GitHub repo. #Kubernetes #CloudCostOptimization #DevOps #FinOps #K8s #CloudComputing

  • View profile for Phuong Le

    Software Engineer @VictoriaMetrics, building VictoriaLogs

    2,888 followers

    Scaling the pod vertically WITHOUT restarting is now feasible from Kubernetes v1.33. Normally, if you wanted to give an app (a Pod in Kubernetes) more memory or CPU, you had to restart it. That works fine if the app doesn’t care about restarts, but some apps really don’t like being stopped and restarted, such as databases, big batch jobs, or things that need to stay running smoothly. -- Beta The new "in-place Pod resize" feature lets you adjust how much memory or CPU a Pod is using while it’s still running. Now in version 1.33, it’s considered good enough for regular use and is turned on by default. Before that, you had to enable the InPlacePodVerticalScaling feature gate. -- How Instead of doing a normal kubectl edit on the Pod, you use a special /resize subresource. For example, you can run a patch like: kubectl patch pod mypod --subresource=resize .... This is similar to how other Kubernetes features work. For example: - /status is a subresource you can use to update only the status field of an object without touching its spec. - /scale is a subresource on Deployments or StatefulSets that lets you change the number of replicas without editing the entire manifest. In the same way, /resize is a subresource on Pods that lets you adjust resources in place. Important: This applies to individual Pods. If you run kubectl set resources on a Deployment, StatefulSet, or Job, that still changes the template and triggers new Pods, NOT an in-place change. Increasing CPU is simple, and increasing memory usually works if the node has capacity. Lowering CPU is also easy, but lowering memory is the hardest case and may fail, get deferred, or force a restart depending on the policy. To check how it is going, watch the Pod’s status fields and conditions. -- Resize policy Each container in the Pod spec can define a resizePolicy. Inside this field, you list CPU and memory separately, and for each one you choose a restart policy. The two possible values are: - NotRequired: Kubernetes will try to adjust the resource values in place while the container is running. This is the default. It’s best-effort, if the runtime cannot apply the change safely (especially in cases like reducing memory), it will fail instead of forcing a restart. - RestartContainer: Kubernetes must restart the container to apply the new resource settings. This is useful for applications that only read their resource limits when they start up, such as some JVM-based workloads. resizePolicy: - resourceName: cpu restartPolicy: NotRequired - resourceName: memory restartPolicy: RestartContainer

  • View profile for Thiruppathi Ayyavoo

    🚀 |Cloud & DevOps|Application Support Engineer |PIAM|Broadcom Automic Batch Operation|Zerto Certified Associate|

    3,590 followers

    Post 19: Real-Time Cloud & DevOps Scenario Scenario: Your organization’s Kubernetes-based microservices faced a production outage due to a misconfigured pod overusing CPU and memory, causing resource starvation. As a DevOps engineer, your task is to prevent such issues and maintain system stability. Step-by-Step Solution: Set Resource Requests and Limits: Define resources.requests and resources.limits in pod specifications to control CPU and memory usage. Example: yaml Copy code resources: requests: memory: "500Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" Enable Namespace Resource Quotas: Use ResourceQuota objects to restrict the total resource consumption within a namespace. Example: yaml Copy code apiVersion: v1 kind: ResourceQuota metadata: name: namespace-quota spec: hard: requests.cpu: "4" requests.memory: "8Gi" limits.cpu: "8" limits.memory: "16Gi" Leverage Horizontal Pod Autoscaler (HPA): Use HPA to scale pods dynamically based on CPU, memory, or custom metrics. Example: yaml Copy code apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: example-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-app minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 80 Implement Pod Priority and Preemption: Assign priority classes to pods to ensure critical workloads get resources during contention. Example: yaml Copy code apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000 globalDefault: false description: "Priority for critical workloads" Monitor and Analyze Resource Usage: Use tools like Prometheus, Grafana, or Kubernetes Metrics Server to monitor CPU and memory usage trends. Set up alerts for resource usage thresholds. Implement Node Affinity and Taints: Use node affinity and taints/tolerations to distribute workloads effectively across nodes, avoiding resource bottlenecks. Audit Configurations Regularly: Periodically review and update resource configurations for pods and namespaces. Conduct load tests to validate performance under different conditions. Enable Cluster Autoscaler: Use Cluster Autoscaler to add or remove nodes dynamically based on overall resource demand.This ensures sufficient capacity during peak loads. Outcome: Improved resource allocation prevents single pod failures from impacting other services. The system becomes more resilient and scales dynamically based on demand. 💬 How do you handle resource contention in your Kubernetes clusters? Let’s discuss strategies in the comments! ✅ Follow Thiruppathi Ayyavoo for daily real-time scenarios in Cloud and DevOps. Together, we learn and grow! #DevOps #Kubernetes #CloudComputing #ResourceManagement #Containers #HorizontalPodAutoscaler #RealTimeScenarios #CloudEngineering #LinkedInLearning #careerbytecode #thirucloud #linkedin #USA CareerByteCode

  • View profile for Abdel SGHIOUAR

    Cloud Developer Advocate | Podcaster | Speaker | KubeCon Co-Chair | CNCF Ambassador | Kubestronaut | Human

    39,153 followers

    GKE just released custom compute classes which I think is a killer and unique feature that only #GoogleCloud has for now 🎉 Custom Compute Classes are a mechanism in #GoogleKubernetesEngine that allows you to configure a set of node configurations you want to workload to run on along side the order you want these configurations to be provisioned in with a fallback order, scale out configs and defaults [1]. Let's take an example. Say for example I want my Pods to run on Spot Virtual Machines unless they are not available in which case I want to fallover to standard VM's. But I also want to fallback to Spot when they become available and In case neither Spot nor the config I want is available I want a default node config. Typically in #Kubernetes you will have to define some sort of custom logic with Labels, Tolerations, Selectors, BallonPods... With Custom Compute Classes you can define a custom cluster (as a CRD) like the image below. This will tell GKE to: - Provision an N2 with Min 64 Cores as Spot VM's. - If not fallover to N2 with any number of Core as Spot VM's. - If not fallover to N2 standard (not Spot). You define all of these are a custom compute class object and label the namespace [2]. This works for both Standard and Autopilot clusters and is supported with Autoscaler. [1] https://lnkd.in/eQwrTn6t [2] https://lnkd.in/ea7bqAsE

  • View profile for Piyush Ranjan

    28k+ Followers | AVP| Tech Lead | Forbes Technology Council| | Thought Leader | Artificial Intelligence | Cloud Transformation | AWS| Cloud Native| Banking Domain

    28,389 followers

    Kubernetes Scaling Strategies: Horizontal Pod Autoscaling (HPA): Function: Adjusts the number of pod replicas based on CPU/memory usage or other select metrics. Workflow: The Metrics Server collects data → API Server communicates with the HPA controller → The HPA controller scales the number of pods up or down based on the metrics. Vertical Pod Autoscaling (VPA): Function: Adjusts the resource limits and requests (CPU/memory) for containers within pods. Workflow: The Metrics Server collects data → API Server communicates with the VPA controller → The VPA controller scales the resource requests and limits for pods. Cluster Autoscaling: Function: Adjusts the number of nodes in the cluster to ensure pods can be scheduled. Workflow: Scheduler identifies pending pods → Cluster Autoscaler determines the need for more nodes → New nodes are added to the cluster to accommodate the pending pods. Manual Scaling: Function: Manually adjusts the number of pod replicas. Workflow: A user uses the kubectl command to scale pods → API Server processes the command → The number of pods in the backend Kubernetes system is adjusted accordingly. Predictive Scaling: Function: Uses machine learning models to predict future workloads and scales resources proactively. Workflow: ML Forecast generates predictions → KEDA (Kubernetes-based Event Driven Autoscaling) acts on these predictions → Cluster Controller ensures resource balance by scaling resources. Custom Metrics Based Scaling: Function: Scales pods based on custom application-specific metrics. Workflow: Custom Metrics Server collects and provides metrics → HPA controller retrieves these metrics → The HPA controller scales the deployment based on custom metrics. These strategies ensure that Kubernetes environments can efficiently manage varying loads, maintain performance, and optimize resource usage. Each method offers different benefits depending on the specific needs of the application and infrastructure.

  • View profile for Praveen Singampalli

    Helping Students & Professionals Get Jobs | Built 300k+ DevOps Family Across Socials | AWS Community Builder | Ex-Verizon | Ex-Infosys | 8x SSB Conference Out

    140,609 followers

    𝐊𝐮𝐛𝐞𝐫𝐧𝐞𝐭𝐞𝐬 𝐂𝐨𝐬𝐭 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 Kubernetes, a powerful container orchestration platform, can significantly reduce costs when used effectively. Here are some key strategies for optimizing your Kubernetes environment: 1. Rightsizing and Resource Allocation Pod Limits and Requests: Set precise resource limits and requests for each pod to prevent over-allocation and under-utilization. Node Sizing: Choose the appropriate node size based on your workload requirements to avoid paying for excess resources. Horizontal Autoscaling: Automatically scale pods up or down based on demand to ensure optimal resource utilization. Vertical Autoscaling: Adjust the resource allocation for pods to match their workload requirements. 2. Cost Monitoring and Analysis Utilize Cloud Provider Tools: Leverage cloud-specific tools (e.g., AWS Cost Explorer, GCP Cost Management) to track spending and identify cost-saving opportunities. Third-Party Tools: Consider using tools like Kubecost or Prometheus for detailed cost analysis and visualization. Regular Reviews: Regularly review your cost data to identify trends and areas for optimization. 3. Spot Instances and Preemptible VMs Leverage Spot Instances: Use spot instances or preemptible VMs for non-critical workloads to significantly reduce costs. Implement Fault Tolerance: Ensure your applications can handle interruptions caused by spot instance terminations. 4. Image Optimization Minimize Image Size: Remove unnecessary files and layers from your container images to reduce download and storage costs. Use Multi-Stage Builds: Create optimized images by building in multiple stages and copying only necessary artifacts. 5. Network Optimization Network Policies: Use network policies to restrict traffic between pods and reduce unnecessary network traffic. Load Balancing: Implement efficient load balancing strategies to distribute traffic evenly across pods. 6. Storage Optimization Persistent Volume Claims (PVCs): Use PVCs to manage persistent storage efficiently and avoid over-provisioning. Storage Classes: Create storage classes to define different storage types and their associated costs. Storage Provisioners: Choose appropriate storage provisioners based on your workload requirements and cost considerations. 7. Cluster Sharing Consolidate Clusters: If possible, consolidate multiple clusters into a single, shared cluster to reduce overhead costs. Namespace Isolation: Use namespaces to logically isolate different workloads within a shared cluster. 8. Consider Managed Kubernetes Services Evaluate Managed Offerings: Explore managed Kubernetes services (e.g., EKS, GKE, AKS) that often provide cost-effective solutions and managed infrastructure. Check here for more kubernetes Projects - https://lnkd.in/g5jCpiQg Share this post with your devops friends :)

  • 𝐁𝐨𝐨𝐬𝐭𝐢𝐧𝐠 𝐬𝐭𝐚𝐫𝐭-𝐮𝐩 𝐂𝐏𝐔 𝐫𝐞𝐬𝐨𝐮𝐫𝐜𝐞𝐬 𝐟𝐨𝐫 "𝐡𝐮𝐧𝐠𝐫𝐲" 𝐜𝐨𝐧𝐭𝐚𝐢𝐧𝐞𝐫𝐬  ☸️ 🤔 Certain resource intensive applications runs processes (JVM, Tomcat, etc) that spikes up CPU requirement during initial boot-up. This often leads to crashing of pod applications as dynamic allocation of CPU couldn't be accomodated at runtime. 𝑪𝒐𝒎𝒆 𝒕𝒐 𝒕𝒉𝒆 𝒓𝒆𝒔𝒄𝒖𝒆 ~ "𝑲𝒖𝒃𝒆 𝑺𝒕𝒂𝒓𝒕𝒖𝒑 𝑪𝑷𝑼 𝒃𝒐𝒐𝒔𝒕" 🚀 ➡️ Kube Startup CPU Boost is an #opensource controller from Google that increases CPU resource requests and limits during #Kubernetes workload startup time. ➡️ The Kube Startup CPU Boost leverages 𝑰𝒏-𝒑𝒍𝒂𝒄𝒆 𝑹𝒆𝒔𝒐𝒖𝒓𝒄𝒆 𝑹𝒆𝒔𝒊𝒛𝒆 for #k8s Pods feature introduced in Kubernetes 1.27 ➡️ Reverts workload's CPU resource requests and limits back to their original values without the need to recreate the Pods This feature needs to be activated while deploying the cluster, as an e.g. in kind it should be 𝘬𝘪𝘯𝘥: 𝘊𝘭𝘶𝘴𝘵𝘦𝘳 𝘢𝘱𝘪𝘝𝘦𝘳𝘴𝘪𝘰𝘯: 𝘬𝘪𝘯𝘥.𝘹-𝘬8𝘴.𝘪𝘰/𝘷1𝘢𝘭𝘱𝘩𝘢4 𝘯𝘢𝘮𝘦: 𝘱𝘰𝘤 𝘯𝘰𝘥𝘦𝘴: - 𝘳𝘰𝘭𝘦: 𝘤𝘰𝘯𝘵𝘳𝘰𝘭-𝘱𝘭𝘢𝘯𝘦 - 𝘳𝘰𝘭𝘦: 𝘸𝘰𝘳𝘬𝘦𝘳 - 𝘳𝘰𝘭𝘦: 𝘸𝘰𝘳𝘬𝘦𝘳 𝘧𝘦𝘢𝘵𝘶𝘳𝘦𝘎𝘢𝘵𝘦𝘴: 𝘐𝘯𝘗𝘭𝘢𝘤𝘦𝘗𝘰𝘥𝘝𝘦𝘳𝘵𝘪𝘤𝘢𝘭𝘚𝘤𝘢𝘭𝘪𝘯𝘨: 𝘵𝘳𝘶𝘦 How it works ❓ ✅ When a new container (pod) starts in the system, Kube Startup CPU Boost automatically receives a notification ✅ The notification triggers the Boost Manager to search for specific settings related to CPU boosts for this new container. ✅ If matching settings are found, the Boost Manager instructs the webhook to increase the container’s CPU requests and limits as defined in the configuration. ✅ Once the container reaches its desired operational state, the Boost Manager automatically adjusts the CPU resources back to their original values. Github link - https://lnkd.in/gsbdbrvy #performance #optimization #devops #sre #cloudnative #engineering #platform #developer #microservices

  • View profile for Darryl R.

    Principal Cloud Solutions Architect | AWS Community Builder

    34,159 followers

    Setting static CPU and memory requests in Kubernetes is often guesswork. Too low and you get throttling or OOMKills. Too high and you waste capacity. The Vertical Pod Autoscaler helps, but traditionally required pod restarts to apply changes. VPA v1.5.0 with Kubernetes 1.33 introduces InPlaceOrRecreate mode, which can adjust resource requests on running containers without disrupting the pod. This enables a practical strategy: start with minimal requests and let VPA right-size them dynamically. Anup Dubey shows the setup and explains how VPA makes its recommendations. https://lnkd.in/eD_8SSvW

Explore categories