Circuit Breakers in Distributed Systems — and the Go Library I Built

I didn’t post anything for a long time. But I want to share a small project I built that still matters.

The Problem

In distributed systems, failure is normal.

Services depend on other services, containers, or pods. When one of them becomes slow or unhealthy, requests don’t just fail — they pile up. Timeouts increase, goroutines block, resources get exhausted, and a single failure can cascade and take down parts of the system that were actually healthy.

This is the problem a circuit breaker is meant to solve.

The Idea

Instead of continuously calling a failing dependency, the system temporarily stops sending requests to it.

This protects resources, avoids retry storms, and gives the failing service time to recover. Once it looks healthy again, traffic is gradually allowed back.

The circuit breaker has three states:

CLOSED → normal operation, requests pass through
OPEN   → dependency is failing, requests are rejected immediately
HALF-OPEN → probe requests allowed; if they succeed, transition back to CLOSED

Why It Matters for Big Data Pipelines

In high-throughput data pipelines, a slow downstream service doesn’t just delay one request. It can block an entire worker pool. A circuit breaker at the pipeline boundary means:

Failed stages fail fast instead of backing up
Worker goroutines aren’t stuck waiting on a dead dependency
The rest of the pipeline keeps moving

What I Built

go-bigdata-breaker — a lightweight circuit breaker in Go, built specifically for high-throughput distributed pipelines.

Key design decisions:

Zero external dependencies — pure Go
Atomic state transitions — goroutine-safe without a global mutex on the hot path
Configurable thresholds — failure count, timeout window, recovery probe count
Minimal overhead — one atomic read per call in the closed state

cb := breaker.New(breaker.Config{
    MaxFailures:  5,
    Timeout:      10 * time.Second,
    MaxRequests:  2, // half-open probe count
})

err := cb.Call(func() error {
    return callDownstreamService()
})

if errors.Is(err, breaker.ErrOpen) {
    // circuit is open — fail fast, don't call downstream
}

The Tradeoffs

A circuit breaker is not a retry mechanism. It’s the opposite — it deliberately stops retrying when a dependency is in trouble.

You still need:

Retries with backoff for transient failures before the breaker opens
Fallbacks for when the circuit is open (serve stale data, return a default, queue for later)
Observability — if you can’t see when circuits are opening, you’re flying blind

The breaker is one layer in a resilience strategy, not the whole strategy.

Source on GitHub