Crash Recovery in Pynenc Task Orchestration Framework

I ran `kill -9` on a Python worker processing three tasks. They vanished — no error, no retry, no record. This is the default behavior of most task frameworks: a worker dies mid-execution, and the work disappears. So I built automatic crash recovery into pynenc, an open-source distributed task orchestration framework for Python. Here's what it does: • Every runner emits periodic heartbeats • When heartbeats stop, the recovery service detects the dead runner • Orphaned tasks are automatically re-queued • A healthy runner picks them up and finishes the job No external monitoring. No manual re-queueing scripts. No lost work. I wrote up the full scenario — including a runnable demo you can try locally with zero dependencies (no Docker, no Redis): https://lnkd.in/ehWVK-3p The demo takes about 90 seconds and shows recovery happening end-to-end. How does your team handle crashed workers today? #python #distributedsystems #opensource #backend #reliability

To view or add a comment, sign in

Explore content categories