Why your ML/Python container dies after 60 seconds on Cloud Run
Works perfectly on my machine. Dead in 60 seconds on Google Cloud Run.
I burned an entire day on this for a side project - a Gemini-integrated academic paper processor that extracts structured data from research papers.
Turns out Cloud Run's CPU model is nothing like I expected, and environment differences matter way more than I realized.
The Setup
Pretty standard multi-stage Docker build. Python worker with two jobs:
Dependencies included the usual ML/NLP suspects: torch, sentence_transformers, spacy, sqlalchemy, requests, unstructured, langgraph, google.generativeai. About 2GB of packages. You know, a light build.
Problem #1: Slow Package Imports
Locally, it worked, of course. Blazing fast. Under a second to start.
Deployed to GCP? Worker took over 60 seconds just to get through imports. Health endpoint would start fine, but the background worker just... sat there. Importing. Forever.
Eventually Cloud Run got tired of waiting and killed it.
The Usual Suspects (That Didn't Help)
I tried all the obvious stuff:
Looking back, I was optimizing the wrong things. None of it made a difference.
What Actually Helped
After way too many failed deploys, I added timing logs everywhere. Classic brute force debugging, but it worked.
Found the culprit: requests and torch were each taking 30-60 seconds to import. Same exact code that ran in under a second locally.
Why?
requests: Turns out SSL certificate validation does a bunch of network calls. In GCP's VPC, these hit timeouts and retries.
torch: Needs to compile Python to bytecode AND load 500MB+ of C++ extensions. On my local machine with unrestricted network? Fast. In GCP's environment? Slow as hell.
Solution: Import Warming
I learned you can pre-import packages during the Docker build instead of waiting for runtime:
# Dockerfile.base
RUN /app/.venv/bin/python3 -c "\
import numpy; \
import torch; \
import torchvision; \
import sentence_transformers; \
import spacy; \
import sqlalchemy; \
import requests; \
import lxml; \
import newspaper; \
import google.generativeai; \
import langgraph; \
import unstructured; \
"
(Full Dockerfile at the bottom of the article)
This makes Python do all the expensive work once during build time:
All of this gets baked into the Docker image. When the container starts in production, Python just loads the pre-compiled bytecode instead of doing everything from scratch.
My Docker layer structure:
Code changes only rebuild the top layer. The expensive import warming stays cached.
Result: Startup went from 60+ seconds to under a second.
Recommended by LinkedIn
Problem #2: The One That Nearly Broke Me
Fast startup. Health checks passing. Everything looked good.
But messages just... weren't getting processed. Redis stream was piling up. After about a minute, Cloud Run would kill the container.
I stared at logs for hours. No errors. No crashes. Just... nothing happening.
The CPU Throttling Trap
Turns out Cloud Run throttles CPU when it thinks your container is "idle." And by "idle" it means "not actively serving 'HTTP' requests."
To Cloud Run, the Redis stream subscriber looked like it did nothing. So it gave it basically zero CPU. The worker couldn't poll, messages piled up and Cloud Run assumed the container was broken and killed it.
This design makes total sense for HTTP services. You only need CPU when handling requests and in the idle period, throttle away and save money !
For background workers? It's a trap. You need CPU all the time to poll queues.
I spent way too long on this before I found the actual solution.
Solution: No Throttling
gcloud run services update paper-processor \
--no-cpu-throttling \
--min-instances=1
Two flags that change everything:
--no-cpu-throttling means CPU is always on, even when the container isn't serving HTTP requests. Your background worker can actually do work.
--min-instances=1 keeps at least one instance running. No cold starts when messages arrive.
Trade-off: costs more. About 3-4x more per month for an always-on worker.
Worth it if your worker needs to actually process things.
What I Learned
Import warming: At runtime, C++ libraries still load into memory, but all the expensive initialization (symbol resolution, operator registration, certificate validation) is already done.
CPU allocation: Cloud Run is optimized for HTTP services that are idle between requests. Background workers need continuous CPU. The default throttling model is fundamentally incompatible with polling workers.
Cloud Run's model makes sense for what it's designed for, but it's not obvious for someone who didn't read the freaking manual !
Why This Might Matter to You
These issues hit any Cloud Run service using:
ML/AI libraries: torch, tensorflow, transformers, sentence-transformers. All have heavy C++ initialization.
Data processing: pandas, numpy, scipy, polars. Same deal.
Background workers: Celery, RQ, custom workers, anything polling Redis/Pub/Sub. All need --no-cpu-throttling.
Document processing: unstructured, PyPDF2, pdf parsers, OCR. Heavy imports and C dependencies.
Real-time systems: WebSockets, SSE, long polling. Anything that's not request/response.
The combination of import warming + CPU configuration turned Cloud Run from "doesn't work for my use case" into something that actually runs in production for ML workloads and background workers.
#CloudRun #Docker #Python #MLOps #GCP #BackgroundWorkers #Redis
Dockerfile.base
# Builder Stage – Install all build dependencies and Python packages
FROM python:3.12-slim AS builder
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app
RUN apt-get update && apt-get install -y \
gcc g++ curl build-essential liblzma-dev xz-utils zlib1g-dev libssl-dev libffi-dev \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN uv venv /app/.venv && \
VIRTUAL_ENV=/app/.venv uv pip install --index-strategy unsafe-best-match -r requirements.txt && \
VIRTUAL_ENV=/app/.venv uv pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl
# Intermediate Stage – Cached image with preinstalled dependencies
FROM python:3.12-slim
RUN apt-get update && apt-get install -y \
curl liblzma5 ca-certificates supervisor libgl1 libglib2.0-0 \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY --from=builder /app/.venv /app/.venv
# Warm up import cache for heavy Python libraries (reduces cold-start latency)
RUN /app/.venv/bin/python3 -c "\
import numpy, torch, torchvision, sentence_transformers, spacy, sqlalchemy, requests, lxml, newspaper, google.generativeai, langgraph, unstructured; \
"
RUN useradd --create-home --shell /bin/bash app && \
chown -R app:app /app && \
mkdir -p /tmp && chown app:app /tmp
USER app
ENV VIRTUAL_ENV=/app/.venv \
PYTHONPATH=/app:/app/.venv/lib/python3.12/site-packages \
PYTHONUNBUFFERED=1 \
PATH="/app/.venv/bin:$PATH"
EXPOSE 8080
Insightful, Arun! Esp.Problem#2 :)!
is your local machin by any chance Mac Book Pro Max ?
Very insightful Arun Manivannan 💡Thanks for sharing 🙌🏽
Insightful