Why your ML/Python container dies after 60 seconds on Cloud Run

Arun Manivannan

Published Oct 25, 2025

Works perfectly on my machine. Dead in 60 seconds on Google Cloud Run.

I burned an entire day on this for a side project - a Gemini-integrated academic paper processor that extracts structured data from research papers.

Turns out Cloud Run's CPU model is nothing like I expected, and environment differences matter way more than I realized.

The Setup

Pretty standard multi-stage Docker build. Python worker with two jobs:

/health endpoint so Cloud Run knows we're alive
Background worker pulling messages from Redis Streams

Dependencies included the usual ML/NLP suspects: torch, sentence_transformers, spacy, sqlalchemy, requests, unstructured, langgraph, google.generativeai. About 2GB of packages. You know, a light build.

Problem #1: Slow Package Imports

Locally, it worked, of course. Blazing fast. Under a second to start.

Deployed to GCP? Worker took over 60 seconds just to get through imports. Health endpoint would start fine, but the background worker just... sat there. Importing. Forever.

Eventually Cloud Run got tired of waiting and killed it.

The Usual Suspects (That Didn't Help)

I tried all the obvious stuff:

Lazy loading everything
Singleton patterns for model loading (sentence transformers & spacy)
Deferred database connections
Lazy initialization of the LangGraph pipeline

Looking back, I was optimizing the wrong things. None of it made a difference.

What Actually Helped

After way too many failed deploys, I added timing logs everywhere. Classic brute force debugging, but it worked.

Found the culprit: requests and torch were each taking 30-60 seconds to import. Same exact code that ran in under a second locally.

Why?

requests: Turns out SSL certificate validation does a bunch of network calls. In GCP's VPC, these hit timeouts and retries.

torch: Needs to compile Python to bytecode AND load 500MB+ of C++ extensions. On my local machine with unrestricted network? Fast. In GCP's environment? Slow as hell.

Solution: Import Warming

I learned you can pre-import packages during the Docker build instead of waiting for runtime:

# Dockerfile.base
RUN /app/.venv/bin/python3 -c "\
import numpy; \
import torch; \
import torchvision; \
import sentence_transformers; \
import spacy; \
import sqlalchemy; \
import requests; \
import lxml; \
import newspaper; \
import google.generativeai; \
import langgraph; \
import unstructured; \
"

(Full Dockerfile at the bottom of the article)

This makes Python do all the expensive work once during build time:

Compiles everything to .pyc bytecode
Creates dynamic linker cache, symbol tables that maps Python bindings to C++ implementations etc
Initializes C++ extensions
Does all the SSL validation

All of this gets baked into the Docker image. When the container starts in production, Python just loads the pre-compiled bytecode instead of doing everything from scratch.

My Docker layer structure:

Code changes only rebuild the top layer. The expensive import warming stays cached.

Result: Startup went from 60+ seconds to under a second.

Recommended by LinkedIn

MLOPS: Applying AWS Bedrock with LLM

Ricardo Jorge Baraldi 1 year ago

Large-Scale MLOps: Optimization Strategies and Cost…

Marcin Woch 1 year ago

MIT's Recursive Language Model (RLMs) on AWS using…

Manu Mishra 3 months ago

Problem #2: The One That Nearly Broke Me

Fast startup. Health checks passing. Everything looked good.

But messages just... weren't getting processed. Redis stream was piling up. After about a minute, Cloud Run would kill the container.

I stared at logs for hours. No errors. No crashes. Just... nothing happening.

The CPU Throttling Trap

Turns out Cloud Run throttles CPU when it thinks your container is "idle." And by "idle" it means "not actively serving 'HTTP' requests."

To Cloud Run, the Redis stream subscriber looked like it did nothing. So it gave it basically zero CPU. The worker couldn't poll, messages piled up and Cloud Run assumed the container was broken and killed it.

This design makes total sense for HTTP services. You only need CPU when handling requests and in the idle period, throttle away and save money !

For background workers? It's a trap. You need CPU all the time to poll queues.

I spent way too long on this before I found the actual solution.

Solution: No Throttling

gcloud run services update paper-processor \
  --no-cpu-throttling \
  --min-instances=1

Two flags that change everything:

--no-cpu-throttling means CPU is always on, even when the container isn't serving HTTP requests. Your background worker can actually do work.

--min-instances=1 keeps at least one instance running. No cold starts when messages arrive.

Trade-off: costs more. About 3-4x more per month for an always-on worker.

Worth it if your worker needs to actually process things.

What I Learned

Import warming: At runtime, C++ libraries still load into memory, but all the expensive initialization (symbol resolution, operator registration, certificate validation) is already done.

CPU allocation: Cloud Run is optimized for HTTP services that are idle between requests. Background workers need continuous CPU. The default throttling model is fundamentally incompatible with polling workers.

Cloud Run's model makes sense for what it's designed for, but it's not obvious for someone who didn't read the freaking manual !

Why This Might Matter to You

These issues hit any Cloud Run service using:

ML/AI libraries: torch, tensorflow, transformers, sentence-transformers. All have heavy C++ initialization.

Data processing: pandas, numpy, scipy, polars. Same deal.

Background workers: Celery, RQ, custom workers, anything polling Redis/Pub/Sub. All need --no-cpu-throttling.

Document processing: unstructured, PyPDF2, pdf parsers, OCR. Heavy imports and C dependencies.

Real-time systems: WebSockets, SSE, long polling. Anything that's not request/response.

The combination of import warming + CPU configuration turned Cloud Run from "doesn't work for my use case" into something that actually runs in production for ML workloads and background workers.

#CloudRun #Docker #Python #MLOps #GCP #BackgroundWorkers #Redis

Dockerfile.base

# Builder Stage – Install all build dependencies and Python packages
FROM python:3.12-slim AS builder

COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app

RUN apt-get update && apt-get install -y \
    gcc g++ curl build-essential liblzma-dev xz-utils zlib1g-dev libssl-dev libffi-dev \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN uv venv /app/.venv && \
    VIRTUAL_ENV=/app/.venv uv pip install --index-strategy unsafe-best-match -r requirements.txt && \
    VIRTUAL_ENV=/app/.venv uv pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl


# Intermediate Stage – Cached image with preinstalled dependencies
FROM python:3.12-slim

RUN apt-get update && apt-get install -y \
    curl liblzma5 ca-certificates supervisor libgl1 libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY --from=builder /app/.venv /app/.venv

# Warm up import cache for heavy Python libraries (reduces cold-start latency)
RUN /app/.venv/bin/python3 -c "\
import numpy, torch, torchvision, sentence_transformers, spacy, sqlalchemy, requests, lxml, newspaper, google.generativeai, langgraph, unstructured; \
"

RUN useradd --create-home --shell /bin/bash app && \
    chown -R app:app /app && \
    mkdir -p /tmp && chown app:app /tmp

USER app

ENV VIRTUAL_ENV=/app/.venv \
    PYTHONPATH=/app:/app/.venv/lib/python3.12/site-packages \
    PYTHONUNBUFFERED=1 \
    PATH="/app/.venv/bin:$PATH"

EXPOSE 8080

Clement Arulsamy 6mo

Insightful, Arun! Esp.Problem#2 :)!

1 Reaction

Bharathan Kumaran 6mo

is your local machin by any chance Mac Book Pro Max ?

1 Reaction

Hemnath Soundarajan, ACMA 6mo

Very insightful Arun Manivannan 💡Thanks for sharing 🙌🏽

1 Reaction

Sitaram Maka 6mo

Insightful

1 Reaction

See more comments

To view or add a comment, sign in

Why your ML/Python container dies after 60 seconds on Cloud Run

Arun Manivannan

The Setup

Problem #1: Slow Package Imports

The Usual Suspects (That Didn't Help)

What Actually Helped

Solution: Import Warming

Recommended by LinkedIn

Problem #2: The One That Nearly Broke Me

The CPU Throttling Trap

Solution: No Throttling

What I Learned

Why This Might Matter to You

Dockerfile.base

More articles by Arun Manivannan

Others also viewed

Architecting Digital Sovereignty: The Polyglot Core of Guardian Agent Enterprise

From Scripts to Services: Evolution of MLOps Platform at AstrumU

Build machine learning models with GCP Big Query ML from your IDE : A step by step primer

Label Detection Using Google's Cloud Vision API #SundayTechTip

Graph Thinking Without Graph Infrastructure: Scaling CodeNexus to 1.5GB

PySpark Structured Streaming in Spark 2

Why (and How) I Built a Go AI SDK

Practical - convert Azure BatchAI to Azure Machine Learning services

Comparing Methods for Integrating Machine Learning Models into Node.js Applications

Raspberry Pi Camera + Serverless + Azure Cognitive + Twilio = Fun?

Explore content categories

The Setup

Problem #1: Slow Package Imports

The Usual Suspects (That Didn't Help)

What Actually Helped

Solution: Import Warming

Recommended by LinkedIn

Problem #2: The One That Nearly Broke Me

The CPU Throttling Trap

Solution: No Throttling

What I Learned

Why This Might Matter to You

Dockerfile.base

More articles by Arun Manivannan

Open source LLMOps - Consolidation

Others also viewed

Architecting Digital Sovereignty: The Polyglot Core of Guardian Agent Enterprise

From Scripts to Services: Evolution of MLOps Platform at AstrumU

Build machine learning models with GCP Big Query ML from your IDE : A step by step primer

Label Detection Using Google's Cloud Vision API #SundayTechTip

Graph Thinking Without Graph Infrastructure: Scaling CodeNexus to 1.5GB

PySpark Structured Streaming in Spark 2

Why (and How) I Built a Go AI SDK

Practical - convert Azure BatchAI to Azure Machine Learning services

Comparing Methods for Integrating Machine Learning Models into Node.js Applications

Raspberry Pi Camera + Serverless + Azure Cognitive + Twilio = Fun?

Explore content categories