Why your ML/Python container dies after 60 seconds on Cloud Run

Works perfectly on my machine. Dead in 60 seconds on Google Cloud Run.

I burned an entire day on this for a side project - a Gemini-integrated academic paper processor that extracts structured data from research papers.

Turns out Cloud Run's CPU model is nothing like I expected, and environment differences matter way more than I realized.

The Setup

Pretty standard multi-stage Docker build. Python worker with two jobs:

  • /health endpoint so Cloud Run knows we're alive
  • Background worker pulling messages from Redis Streams

Dependencies included the usual ML/NLP suspects: torch, sentence_transformers, spacy, sqlalchemy, requests, unstructured, langgraph, google.generativeai. About 2GB of packages. You know, a light build.

Problem #1: Slow Package Imports

Locally, it worked, of course. Blazing fast. Under a second to start.

Deployed to GCP? Worker took over 60 seconds just to get through imports. Health endpoint would start fine, but the background worker just... sat there. Importing. Forever.

Eventually Cloud Run got tired of waiting and killed it.

The Usual Suspects (That Didn't Help)

I tried all the obvious stuff:

  • Lazy loading everything
  • Singleton patterns for model loading (sentence transformers & spacy)
  • Deferred database connections
  • Lazy initialization of the LangGraph pipeline

Looking back, I was optimizing the wrong things. None of it made a difference.

What Actually Helped

After way too many failed deploys, I added timing logs everywhere. Classic brute force debugging, but it worked.

Found the culprit: requests and torch were each taking 30-60 seconds to import. Same exact code that ran in under a second locally.

Why?

requests: Turns out SSL certificate validation does a bunch of network calls. In GCP's VPC, these hit timeouts and retries.

torch: Needs to compile Python to bytecode AND load 500MB+ of C++ extensions. On my local machine with unrestricted network? Fast. In GCP's environment? Slow as hell.

Solution: Import Warming

I learned you can pre-import packages during the Docker build instead of waiting for runtime:

# Dockerfile.base
RUN /app/.venv/bin/python3 -c "\
import numpy; \
import torch; \
import torchvision; \
import sentence_transformers; \
import spacy; \
import sqlalchemy; \
import requests; \
import lxml; \
import newspaper; \
import google.generativeai; \
import langgraph; \
import unstructured; \
"
        

(Full Dockerfile at the bottom of the article)

This makes Python do all the expensive work once during build time:

  • Compiles everything to .pyc bytecode
  • Creates dynamic linker cache, symbol tables that maps Python bindings to C++ implementations etc
  • Initializes C++ extensions
  • Does all the SSL validation

All of this gets baked into the Docker image. When the container starts in production, Python just loads the pre-compiled bytecode instead of doing everything from scratch.

My Docker layer structure:

Article content


Code changes only rebuild the top layer. The expensive import warming stays cached.

Result: Startup went from 60+ seconds to under a second.

Problem #2: The One That Nearly Broke Me

Fast startup. Health checks passing. Everything looked good.

But messages just... weren't getting processed. Redis stream was piling up. After about a minute, Cloud Run would kill the container.

I stared at logs for hours. No errors. No crashes. Just... nothing happening.

The CPU Throttling Trap

Turns out Cloud Run throttles CPU when it thinks your container is "idle." And by "idle" it means "not actively serving 'HTTP' requests."

To Cloud Run, the Redis stream subscriber looked like it did nothing. So it gave it basically zero CPU. The worker couldn't poll, messages piled up and Cloud Run assumed the container was broken and killed it.

This design makes total sense for HTTP services. You only need CPU when handling requests and in the idle period, throttle away and save money !

For background workers? It's a trap. You need CPU all the time to poll queues.

I spent way too long on this before I found the actual solution.

Solution: No Throttling

gcloud run services update paper-processor \
  --no-cpu-throttling \
  --min-instances=1
        

Two flags that change everything:

--no-cpu-throttling means CPU is always on, even when the container isn't serving HTTP requests. Your background worker can actually do work.

--min-instances=1 keeps at least one instance running. No cold starts when messages arrive.

Trade-off: costs more. About 3-4x more per month for an always-on worker.

Worth it if your worker needs to actually process things.

What I Learned

Import warming: At runtime, C++ libraries still load into memory, but all the expensive initialization (symbol resolution, operator registration, certificate validation) is already done.

CPU allocation: Cloud Run is optimized for HTTP services that are idle between requests. Background workers need continuous CPU. The default throttling model is fundamentally incompatible with polling workers.

Cloud Run's model makes sense for what it's designed for, but it's not obvious for someone who didn't read the freaking manual !

Why This Might Matter to You

These issues hit any Cloud Run service using:

ML/AI libraries: torch, tensorflow, transformers, sentence-transformers. All have heavy C++ initialization.

Data processing: pandas, numpy, scipy, polars. Same deal.

Background workers: Celery, RQ, custom workers, anything polling Redis/Pub/Sub. All need --no-cpu-throttling.

Document processing: unstructured, PyPDF2, pdf parsers, OCR. Heavy imports and C dependencies.

Real-time systems: WebSockets, SSE, long polling. Anything that's not request/response.

The combination of import warming + CPU configuration turned Cloud Run from "doesn't work for my use case" into something that actually runs in production for ML workloads and background workers.

#CloudRun #Docker #Python #MLOps #GCP #BackgroundWorkers #Redis

Dockerfile.base

# Builder Stage – Install all build dependencies and Python packages
FROM python:3.12-slim AS builder

COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
WORKDIR /app

RUN apt-get update && apt-get install -y \
    gcc g++ curl build-essential liblzma-dev xz-utils zlib1g-dev libssl-dev libffi-dev \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN uv venv /app/.venv && \
    VIRTUAL_ENV=/app/.venv uv pip install --index-strategy unsafe-best-match -r requirements.txt && \
    VIRTUAL_ENV=/app/.venv uv pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl


# Intermediate Stage – Cached image with preinstalled dependencies
FROM python:3.12-slim

RUN apt-get update && apt-get install -y \
    curl liblzma5 ca-certificates supervisor libgl1 libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app
COPY --from=builder /app/.venv /app/.venv

# Warm up import cache for heavy Python libraries (reduces cold-start latency)
RUN /app/.venv/bin/python3 -c "\
import numpy, torch, torchvision, sentence_transformers, spacy, sqlalchemy, requests, lxml, newspaper, google.generativeai, langgraph, unstructured; \
"

RUN useradd --create-home --shell /bin/bash app && \
    chown -R app:app /app && \
    mkdir -p /tmp && chown app:app /tmp

USER app

ENV VIRTUAL_ENV=/app/.venv \
    PYTHONPATH=/app:/app/.venv/lib/python3.12/site-packages \
    PYTHONUNBUFFERED=1 \
    PATH="/app/.venv/bin:$PATH"

EXPOSE 8080        


To view or add a comment, sign in

More articles by Arun Manivannan

  • Open source LLMOps - Consolidation

    As with all software trends (maybe all industry trends), 𝘀𝗽𝗲𝗰𝗶𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗮𝗻𝗱 𝗰𝗼𝗻𝘀𝗼𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻…

    2 Comments

Others also viewed

Explore content categories