I didn’t post anything for a long time. But I want to share a small project I built that still matters.
The Problem
In distributed systems, failure is normal.
Services depend on other services, containers, or pods. When one of them becomes slow or unhealthy, requests don’t just fail — they pile up. Timeouts increase, goroutines block, resources get exhausted, and a single failure can cascade and take down parts of the system that were actually healthy.
This is the problem a circuit breaker is meant to solve.
The Idea
Instead of continuously calling a failing dependency, the system temporarily stops sending requests to it.
This protects resources, avoids retry storms, and gives the failing service time to recover. Once it looks healthy again, traffic is gradually allowed back.
The circuit breaker has three states:
CLOSED → normal operation, requests pass through
OPEN → dependency is failing, requests are rejected immediately
HALF-OPEN → probe requests allowed; if they succeed, transition back to CLOSED
Why It Matters for Big Data Pipelines
In high-throughput data pipelines, a slow downstream service doesn’t just delay one request. It can block an entire worker pool. A circuit breaker at the pipeline boundary means:
- Failed stages fail fast instead of backing up
- Worker goroutines aren’t stuck waiting on a dead dependency
- The rest of the pipeline keeps moving
What I Built
go-bigdata-breaker — a lightweight circuit breaker in Go, built specifically for high-throughput distributed pipelines.
Key design decisions:
- Zero external dependencies — pure Go
- Atomic state transitions — goroutine-safe without a global mutex on the hot path
- Configurable thresholds — failure count, timeout window, recovery probe count
- Minimal overhead — one atomic read per call in the closed state
cb := breaker.New(breaker.Config{
MaxFailures: 5,
Timeout: 10 * time.Second,
MaxRequests: 2, // half-open probe count
})
err := cb.Call(func() error {
return callDownstreamService()
})
if errors.Is(err, breaker.ErrOpen) {
// circuit is open — fail fast, don't call downstream
}
The Tradeoffs
A circuit breaker is not a retry mechanism. It’s the opposite — it deliberately stops retrying when a dependency is in trouble.
You still need:
- Retries with backoff for transient failures before the breaker opens
- Fallbacks for when the circuit is open (serve stale data, return a default, queue for later)
- Observability — if you can’t see when circuits are opening, you’re flying blind
The breaker is one layer in a resilience strategy, not the whole strategy.