Module 23: Circuit Breakers

What is a Circuit Breaker?

A circuit breaker prevents cascading failures by stopping requests to a failing service.
┌─────────────────────────────────────────────────────────────────┐ │ CIRCUIT BREAKER PATTERN │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ Problem: Cascading Failures │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Service A ──► Service B ──► Service C (SLOW/DOWN) │ │ │ │ │ │ │ │ │ │ │ ▼ ▼ ▼ │ │ │ │ timeout timeout overloaded │ │ │ │ threads threads │ │ │ │ exhausted exhausted │ │ │ │ │ │ │ │ │ │ ▼ ▼ │ │ │ │ FAILURE FAILURE │ │ │ │ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ Solution: Circuit Breaker │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ │ │ │ │ Service A ──► [Circuit Breaker] ──► Service B │ │ │ │ │ │ │ │ │ ┌───────┴───────┐ │ │ │ │ │ │ │ │ │ │ CLOSED OPEN │ │ │ │ (normal) (fail fast) │ │ │ │ │ │ │ │ When B fails repeatedly: │ │ │ │ Circuit OPENS → Requests fail immediately │ │ │ │ No more load on failing service │ │ │ │ A can return fallback or error quickly │ │ │ │ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Circuit Breaker States

┌─────────────────────────────────────────────────────────────────┐ │ CIRCUIT BREAKER STATES │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ │ │ │ │ │ │ │ CLOSED │ ◄── Normal operation │ │ │ │ │ │ └──────┬───────┘ │ │ │ │ │ Failure threshold │ │ exceeded │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ │ │ │ │ OPEN │ ◄── Fail fast │ │ │ │ │ │ └──────┬───────┘ │ │ │ │ │ Timeout expires │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ │ │ │ │ HALF-OPEN │ ◄── Testing recovery │ │ │ │ │ │ └──────┬───────┘ │ │ │ │ │ ┌─────────────────┼─────────────────┐ │ │ │ │ │ │ │ Success Failure More tests │ │ │ │ │ │ │ ▼ ▼ │ │ │ ┌────────┐ ┌────────┐ │ │ │ │ CLOSED │ │ OPEN │ ◄────────────┘ │ │ └────────┘ └────────┘ │ │ │ │ State Details: │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ CLOSED: │ │ │ │ - Normal operation │ │ │ │ - Requests pass through │ │ │ │ - Track failures │ │ │ │ │ │ │ │ OPEN: │ │ │ │ - Fail immediately │ │ │ │ - Return fallback or error │ │ │ │ - Don't call downstream │ │ │ │ │ │ │ │ HALF-OPEN: │ │ │ │ - Allow limited requests │ │ │ │ - Test if service recovered │ │ │ │ - Success → CLOSED, Failure → OPEN │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Circuit Breaker Implementation

go
package circuitbreaker import ( "context" "errors" "sync" "time" ) // State represents circuit breaker state type State int const ( StateClosed State = iota StateOpen StateHalfOpen ) func (s State) String() string { switch s { case StateClosed: return "CLOSED" case StateOpen: return "OPEN" case StateHalfOpen: return "HALF-OPEN" default: return "UNKNOWN" } } var ( ErrCircuitOpen = errors.New("circuit breaker is open") ErrTooManyRequests = errors.New("too many requests in half-open state") ) // CircuitBreaker implements the circuit breaker pattern type CircuitBreaker struct { mu sync.RWMutex name string state State // Configuration failureThreshold int // Failures before opening successThreshold int // Successes before closing timeout time.Duration // Time to stay open halfOpenMaxRequests int // Max requests in half-open // State tracking failures int successes int halfOpenRequests int lastFailure time.Time openedAt time.Time // Callbacks onStateChange func(from, to State) } // Config holds circuit breaker configuration type Config struct { Name string FailureThreshold int SuccessThreshold int Timeout time.Duration HalfOpenMaxRequests int OnStateChange func(from, to State) } // DefaultConfig returns default configuration func DefaultConfig(name string) Config { return Config{ Name: name, FailureThreshold: 5, SuccessThreshold: 2, Timeout: 30 * time.Second, HalfOpenMaxRequests: 3, } } // New creates a new circuit breaker func New(config Config) *CircuitBreaker { return &CircuitBreaker{ name: config.Name, state: StateClosed, failureThreshold: config.FailureThreshold, successThreshold: config.SuccessThreshold, timeout: config.Timeout, halfOpenMaxRequests: config.HalfOpenMaxRequests, onStateChange: config.OnStateChange, } } // Execute runs the given function with circuit breaker protection func (cb *CircuitBreaker) Execute(fn func() (interface{}, error)) (interface{}, error) { if err := cb.allowRequest(); err != nil { return nil, err } result, err := fn() cb.recordResult(err) return result, err } // ExecuteWithContext runs with context support func (cb *CircuitBreaker) ExecuteWithContext( ctx context.Context, fn func(context.Context) (interface{}, error), ) (interface{}, error) { if err := cb.allowRequest(); err != nil { return nil, err } result, err := fn(ctx) cb.recordResult(err) return result, err } // allowRequest checks if request is allowed func (cb *CircuitBreaker) allowRequest() error { cb.mu.Lock() defer cb.mu.Unlock() switch cb.state { case StateClosed: return nil case StateOpen: // Check if timeout has expired if time.Since(cb.openedAt) > cb.timeout { cb.transitionTo(StateHalfOpen) cb.halfOpenRequests = 1 return nil } return ErrCircuitOpen case StateHalfOpen: if cb.halfOpenRequests >= cb.halfOpenMaxRequests { return ErrTooManyRequests } cb.halfOpenRequests++ return nil } return nil } // recordResult updates state based on result func (cb *CircuitBreaker) recordResult(err error) { cb.mu.Lock() defer cb.mu.Unlock() if err != nil { cb.onFailure() } else { cb.onSuccess() } } func (cb *CircuitBreaker) onSuccess() { switch cb.state { case StateClosed: cb.failures = 0 case StateHalfOpen: cb.successes++ if cb.successes >= cb.successThreshold { cb.transitionTo(StateClosed) } } } func (cb *CircuitBreaker) onFailure() { cb.lastFailure = time.Now() switch cb.state { case StateClosed: cb.failures++ if cb.failures >= cb.failureThreshold { cb.transitionTo(StateOpen) } case StateHalfOpen: cb.transitionTo(StateOpen) } } func (cb *CircuitBreaker) transitionTo(state State) { if cb.state == state { return } from := cb.state cb.state = state // Reset counters cb.failures = 0 cb.successes = 0 cb.halfOpenRequests = 0 if state == StateOpen { cb.openedAt = time.Now() } if cb.onStateChange != nil { go cb.onStateChange(from, state) } } // State returns current state func (cb *CircuitBreaker) State() State { cb.mu.RLock() defer cb.mu.RUnlock() return cb.state } // Stats returns circuit breaker statistics type Stats struct { State State Failures int Successes int ConsecutiveFailures int LastFailure time.Time } func (cb *CircuitBreaker) Stats() Stats { cb.mu.RLock() defer cb.mu.RUnlock() return Stats{ State: cb.state, Failures: cb.failures, Successes: cb.successes, LastFailure: cb.lastFailure, } } // Reset manually resets the circuit breaker func (cb *CircuitBreaker) Reset() { cb.mu.Lock() defer cb.mu.Unlock() cb.transitionTo(StateClosed) }

Advanced Circuit Breaker

go
package circuitbreaker import ( "context" "sync" "time" ) // AdvancedCircuitBreaker with sliding window and more features type AdvancedCircuitBreaker struct { mu sync.RWMutex name string state State // Sliding window for failure tracking window *SlidingWindow windowSize time.Duration // Thresholds failureRateThreshold float64 // e.g., 0.5 = 50% minimumRequests int // Minimum requests before calculating rate // Half-open configuration halfOpenTimeout time.Duration halfOpenMax int // State halfOpenRequests int openedAt time.Time // Fallback fallback func() (interface{}, error) // Callbacks onStateChange func(name string, from, to State) onSuccess func(name string, duration time.Duration) onFailure func(name string, err error) } // SlidingWindow tracks requests in a time window type SlidingWindow struct { mu sync.Mutex buckets []*Bucket size time.Duration bucketCount int bucketSize time.Duration } type Bucket struct { successes int failures int startTime time.Time } func NewSlidingWindow(size time.Duration, bucketCount int) *SlidingWindow { buckets := make([]*Bucket, bucketCount) bucketSize := size / time.Duration(bucketCount) now := time.Now() for i := range buckets { buckets[i] = &Bucket{ startTime: now.Add(-time.Duration(bucketCount-i-1) * bucketSize), } } return &SlidingWindow{ buckets: buckets, size: size, bucketCount: bucketCount, bucketSize: bucketSize, } } func (w *SlidingWindow) RecordSuccess() { w.mu.Lock() defer w.mu.Unlock() w.currentBucket().successes++ } func (w *SlidingWindow) RecordFailure() { w.mu.Lock() defer w.mu.Unlock() w.currentBucket().failures++ } func (w *SlidingWindow) currentBucket() *Bucket { now := time.Now() // Rotate buckets if needed lastBucket := w.buckets[len(w.buckets)-1] elapsed := now.Sub(lastBucket.startTime) if elapsed >= w.bucketSize { rotations := int(elapsed / w.bucketSize) if rotations > w.bucketCount { rotations = w.bucketCount } for i := 0; i < rotations; i++ { // Shift buckets left copy(w.buckets, w.buckets[1:]) // Add new bucket w.buckets[len(w.buckets)-1] = &Bucket{ startTime: lastBucket.startTime.Add(w.bucketSize * time.Duration(i+1)), } } } return w.buckets[len(w.buckets)-1] } func (w *SlidingWindow) Stats() (successes, failures int) { w.mu.Lock() defer w.mu.Unlock() // Force bucket rotation w.currentBucket() for _, b := range w.buckets { successes += b.successes failures += b.failures } return } func (w *SlidingWindow) FailureRate() float64 { successes, failures := w.Stats() total := successes + failures if total == 0 { return 0 } return float64(failures) / float64(total) } func (w *SlidingWindow) TotalRequests() int { successes, failures := w.Stats() return successes + failures } func (w *SlidingWindow) Reset() { w.mu.Lock() defer w.mu.Unlock() now := time.Now() for i := range w.buckets { w.buckets[i] = &Bucket{ startTime: now.Add(-time.Duration(w.bucketCount-i-1) * w.bucketSize), } } } // AdvancedConfig for advanced circuit breaker type AdvancedConfig struct { Name string WindowSize time.Duration BucketCount int FailureRateThreshold float64 MinimumRequests int HalfOpenTimeout time.Duration HalfOpenMax int Fallback func() (interface{}, error) OnStateChange func(name string, from, to State) } func NewAdvanced(config AdvancedConfig) *AdvancedCircuitBreaker { return &AdvancedCircuitBreaker{ name: config.Name, state: StateClosed, window: NewSlidingWindow(config.WindowSize, config.BucketCount), windowSize: config.WindowSize, failureRateThreshold: config.FailureRateThreshold, minimumRequests: config.MinimumRequests, halfOpenTimeout: config.HalfOpenTimeout, halfOpenMax: config.HalfOpenMax, fallback: config.Fallback, onStateChange: config.OnStateChange, } } func (cb *AdvancedCircuitBreaker) Execute(fn func() (interface{}, error)) (interface{}, error) { if err := cb.allowRequest(); err != nil { if cb.fallback != nil { return cb.fallback() } return nil, err } start := time.Now() result, err := fn() duration := time.Since(start) cb.recordResult(err, duration) return result, err } func (cb *AdvancedCircuitBreaker) allowRequest() error { cb.mu.Lock() defer cb.mu.Unlock() switch cb.state { case StateClosed: return nil case StateOpen: if time.Since(cb.openedAt) > cb.halfOpenTimeout { cb.transitionTo(StateHalfOpen) cb.halfOpenRequests = 1 return nil } return ErrCircuitOpen case StateHalfOpen: if cb.halfOpenRequests >= cb.halfOpenMax { return ErrTooManyRequests } cb.halfOpenRequests++ return nil } return nil } func (cb *AdvancedCircuitBreaker) recordResult(err error, duration time.Duration) { cb.mu.Lock() defer cb.mu.Unlock() if err != nil { cb.window.RecordFailure() if cb.onFailure != nil { go cb.onFailure(cb.name, err) } } else { cb.window.RecordSuccess() if cb.onSuccess != nil { go cb.onSuccess(cb.name, duration) } } cb.evaluateState() } func (cb *AdvancedCircuitBreaker) evaluateState() { switch cb.state { case StateClosed: if cb.window.TotalRequests() >= cb.minimumRequests { if cb.window.FailureRate() >= cb.failureRateThreshold { cb.transitionTo(StateOpen) } } case StateHalfOpen: // Check recent requests in half-open successes, failures := cb.window.Stats() recentTotal := successes + failures if recentTotal > 0 { recentFailureRate := float64(failures) / float64(recentTotal) if recentFailureRate > 0 { cb.transitionTo(StateOpen) } else if cb.halfOpenRequests >= cb.halfOpenMax { cb.transitionTo(StateClosed) } } } } func (cb *AdvancedCircuitBreaker) transitionTo(state State) { if cb.state == state { return } from := cb.state cb.state = state if state == StateOpen { cb.openedAt = time.Now() } if state == StateClosed { cb.window.Reset() } cb.halfOpenRequests = 0 if cb.onStateChange != nil { go cb.onStateChange(cb.name, from, state) } }

Circuit Breaker with Bulkhead

go
package circuitbreaker import ( "context" "errors" "sync" ) var ErrBulkheadFull = errors.New("bulkhead is full") // BulkheadCircuitBreaker combines circuit breaker with bulkhead pattern type BulkheadCircuitBreaker struct { cb *CircuitBreaker semaphore chan struct{} maxConcurrent int } func NewWithBulkhead(config Config, maxConcurrent int) *BulkheadCircuitBreaker { return &BulkheadCircuitBreaker{ cb: New(config), semaphore: make(chan struct{}, maxConcurrent), maxConcurrent: maxConcurrent, } } func (b *BulkheadCircuitBreaker) Execute(fn func() (interface{}, error)) (interface{}, error) { // Try to acquire semaphore select { case b.semaphore <- struct{}{}: defer func() { <-b.semaphore }() default: return nil, ErrBulkheadFull } return b.cb.Execute(fn) } func (b *BulkheadCircuitBreaker) ExecuteWithContext( ctx context.Context, fn func(context.Context) (interface{}, error), ) (interface{}, error) { // Try to acquire semaphore with context select { case b.semaphore <- struct{}{}: defer func() { <-b.semaphore }() case <-ctx.Done(): return nil, ctx.Err() } return b.cb.ExecuteWithContext(ctx, fn) } // CircuitBreakerPool manages multiple circuit breakers type CircuitBreakerPool struct { mu sync.RWMutex breakers map[string]*CircuitBreaker config Config } func NewPool(defaultConfig Config) *CircuitBreakerPool { return &CircuitBreakerPool{ breakers: make(map[string]*CircuitBreaker), config: defaultConfig, } } func (p *CircuitBreakerPool) Get(name string) *CircuitBreaker { p.mu.RLock() cb, ok := p.breakers[name] p.mu.RUnlock() if ok { return cb } p.mu.Lock() defer p.mu.Unlock() // Double-check if cb, ok := p.breakers[name]; ok { return cb } config := p.config config.Name = name cb = New(config) p.breakers[name] = cb return cb } func (p *CircuitBreakerPool) Stats() map[string]Stats { p.mu.RLock() defer p.mu.RUnlock() stats := make(map[string]Stats) for name, cb := range p.breakers { stats[name] = cb.Stats() } return stats }

HTTP Client with Circuit Breaker

go
package circuitbreaker import ( "context" "fmt" "net/http" "time" ) // HTTPClient wraps http.Client with circuit breaker type HTTPClient struct { client *http.Client cb *CircuitBreaker timeout time.Duration } func NewHTTPClient(timeout time.Duration, cbConfig Config) *HTTPClient { return &HTTPClient{ client: &http.Client{ Timeout: timeout, }, cb: New(cbConfig), timeout: timeout, } } func (c *HTTPClient) Do(req *http.Request) (*http.Response, error) { result, err := c.cb.Execute(func() (interface{}, error) { resp, err := c.client.Do(req) if err != nil { return nil, err } // Treat 5xx as failures if resp.StatusCode >= 500 { return resp, fmt.Errorf("server error: %d", resp.StatusCode) } return resp, nil }) if err != nil { return nil, err } return result.(*http.Response), nil } func (c *HTTPClient) Get(ctx context.Context, url string) (*http.Response, error) { req, err := http.NewRequestWithContext(ctx, "GET", url, nil) if err != nil { return nil, err } return c.Do(req) } // Usage example func ExampleHTTPClient() { client := NewHTTPClient(5*time.Second, Config{ Name: "external-api", FailureThreshold: 5, SuccessThreshold: 2, Timeout: 30 * time.Second, OnStateChange: func(from, to State) { fmt.Printf("Circuit breaker state changed: %s -> %s\n", from, to) }, }) ctx := context.Background() resp, err := client.Get(ctx, "https://api.example.com/data") if err != nil { if errors.Is(err, ErrCircuitOpen) { fmt.Println("Circuit is open, returning cached data or error") return } fmt.Printf("Request failed: %v\n", err) return } defer resp.Body.Close() fmt.Printf("Got response: %d\n", resp.StatusCode) }

Best Practices

┌─────────────────────────────────────────────────────────────────┐ │ CIRCUIT BREAKER BEST PRACTICES │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. CHOOSE APPROPRIATE THRESHOLDS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Start conservative, tune based on monitoring │ │ │ │ • Consider normal failure rates │ │ │ │ • Account for startup/deployment failures │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 2. IMPLEMENT FALLBACKS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Return cached data │ │ │ │ • Return default values │ │ │ │ • Degrade gracefully │ │ │ │ • Queue for retry │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 3. MONITOR AND ALERT │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Track state changes │ │ │ │ • Monitor failure rates │ │ │ │ • Alert on circuit opens │ │ │ │ • Dashboard for visibility │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 4. USE PER-DEPENDENCY BREAKERS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Separate breaker for each downstream service │ │ │ │ • Don't let one failure affect all calls │ │ │ │ • Consider per-endpoint breakers │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ 5. COMBINE WITH OTHER PATTERNS │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ • Bulkhead for isolation │ │ │ │ • Retry with backoff │ │ │ │ • Timeout for fast failure │ │ │ │ • Rate limiting │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────┘

Summary

┌─────────────────────────────────────────────────────────────────┐ │ CIRCUIT BREAKER SUMMARY │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ States: │ │ • CLOSED: Normal operation, tracking failures │ │ • OPEN: Fail fast, protect downstream │ │ • HALF-OPEN: Testing recovery │ │ │ │ Key Parameters: │ │ • Failure threshold (when to open) │ │ • Success threshold (when to close) │ │ • Timeout (how long to stay open) │ │ │ │ Benefits: │ │ • Prevents cascade failures │ │ • Allows failing services to recover │ │ • Provides fast failure for users │ │ • Reduces load on struggling services │ │ │ │ Key Insight: │ │ "Circuit breakers are about failing fast and gracefully. │ │ Better to return an error quickly than wait for timeout." │ │ │ └─────────────────────────────────────────────────────────────────┘

All Blogs
Tags:circuit-breakerresiliencefault-tolerancemicroservices