Go Scheduler: The Engine Behind Your Goroutines

A Million Goroutines, Eight Cores

You launch a million goroutines on an 8 core machine. Each runs. None blocks the others excessively. The system stays responsive. How?
Operating systems can't handle a million threads. Context switching would consume all CPU time. Memory would explode. Yet Go makes a million goroutines feel effortless.
The secret is Go's scheduler. It's a sophisticated piece of software that maps potentially millions of goroutines onto a handful of OS threads efficiently.

Why Not Just Use OS Threads

Operating system threads are expensive. Each thread needs:
  • 1MB+ of stack memory
  • Kernel involvement for creation
  • Expensive context switches (save all registers, switch page tables)
  • Kernel scheduling overhead
Go blog diagram 1

Go blog diagram 1

Go's scheduler runs in user space, making goroutine switches orders of magnitude faster than thread switches.

The Traffic Controller Analogy

Think of it like this: Imagine an airport with thousands of planes (goroutines) but only a few runways (OS threads). The air traffic controller (scheduler) decides which planes take off when. Planes wait in queues. When one plane is waiting for fuel (I/O), another takes its slot. The controller balances load across runways and handles emergencies. Go's scheduler does exactly this for your goroutines.

The GMP Model Explained

Go's scheduler uses three main entities:
G (Goroutine): The work to be done. Your code, its stack, instruction pointer.
M (Machine): An OS thread. Actually runs the code.
P (Processor): A logical processor. Holds a run queue of goroutines. Required for an M to execute G.
Go blog diagram 2

Go blog diagram 2

The Number of Ps

By default, GOMAXPROCS equals the number of CPU cores. This is the maximum number of goroutines running truly in parallel at any moment.
go
// Filename: gomaxprocs.go package main import ( "fmt" "runtime" ) func main() { // Default: number of CPU cores fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0)) // You can change it (rarely needed) runtime.GOMAXPROCS(4) fmt.Println("After change:", runtime.GOMAXPROCS(0)) }

How Scheduling Works

Step 1: Goroutine Creation

When you write go func(), a new G is created and placed in a P's local run queue.
go
go func() { // New G created // Added to current P's local queue }()

Step 2: Running a Goroutine

An M attached to P takes a G from P's local queue and executes it.

Step 3: Goroutine Yields

Goroutines yield control at specific points:
  • Channel operations (send/receive)
  • System calls
  • Function calls (checking stack growth)
  • Explicit runtime.Gosched()
  • Blocking operations

Step 4: Work Stealing

If a P's local queue is empty, it tries:
  1. Check global run queue
  2. Check network poller
  3. Steal from another P's queue
Go blog diagram 3

Go blog diagram 3

Work Stealing in Action

go
// Filename: work_stealing_demo.go package main import ( "fmt" "runtime" "sync" "time" ) func main() { runtime.GOMAXPROCS(2) // 2 Ps var wg sync.WaitGroup // Create unbalanced load // 100 goroutines started from one P for i := 0; i < 100; i++ { wg.Add(1) go func(id int) { defer wg.Done() // Simulate work time.Sleep(10 * time.Millisecond) }(i) } // Work stealing ensures both Ps stay busy // Even though all Gs started on one P wg.Wait() fmt.Println("All done - work stealing balanced the load") }

Preemption: No More Infinite Loops

Before Go 1.14, a tight loop could starve other goroutines:
go
// Before Go 1.14: This would block other goroutines go func() { for { // No function calls = no yield points } }()
Go 1.14+ introduced asynchronous preemption using signals. The scheduler can now interrupt any goroutine, even in tight loops.
Go blog diagram 4

Go blog diagram 4

Scheduling During System Calls

When a goroutine makes a blocking system call (file I/O, network), the M blocks. But P shouldn't wait!
Go blog diagram 5

Go blog diagram 5

This is why blocking operations don't block the whole scheduler.

Network Poller Integration

For network I/O, Go uses non-blocking syscalls with epoll/kqueue. The network poller runs in a separate thread.
go
// When you write: conn.Read(buffer) // Go's runtime: // 1. Tries non-blocking read // 2. If would block, registers with network poller // 3. Goroutine is parked (not running, not in queue) // 4. Network poller wakes goroutine when data arrives
This means network I/O is highly efficient. Thousands of goroutines can wait on network with minimal overhead.

Visualizing the Scheduler

Use the trace tool to see scheduling in action:
// Filename: trace_example.go package main import ( "os" "runtime/trace" "sync" ) func main() { f, _ := os.Create("trace.out") trace.Start(f) defer trace.Stop() var wg sync.WaitGroup for i := 0; i < 10; i++ { wg.Add(1) go func(id int) { defer wg.Done() // Work here }(i) } wg.Wait() }
This opens a browser showing goroutine scheduling, including:
  • When each G runs
  • Which P/M it runs on
  • When it blocks and why
  • Work stealing events

Scheduler-Friendly Code

Use Buffered Channels Wisely

go
// Can cause excessive scheduling unbuffered := make(chan int) // Reduces scheduling overhead buffered := make(chan int, 100)

Batch Work

go
// WRONG: One goroutine per tiny task for i := 0; i < 1000000; i++ { go process(item[i]) // Million goroutines! } // RIGHT: Batch into worker pools workers := runtime.GOMAXPROCS(0) batch := len(items) / workers for w := 0; w < workers; w++ { go func(start, end int) { for i := start; i < end; i++ { process(items[i]) } }(w*batch, (w+1)*batch) }

Avoid Lock Contention

go
// WRONG: All goroutines fight for one lock var globalMu sync.Mutex var globalCounter int // RIGHT: Reduce contention with sharding type ShardedCounter struct { shards [16]struct { mu sync.Mutex count int } } func (c *ShardedCounter) Inc(id int) { shard := &c.shards[id%16] shard.mu.Lock() shard.count++ shard.mu.Unlock() }

Key Scheduler Metrics

go
// Filename: scheduler_metrics.go package main import ( "fmt" "runtime" ) func main() { fmt.Println("NumCPU:", runtime.NumCPU()) fmt.Println("GOMAXPROCS:", runtime.GOMAXPROCS(0)) fmt.Println("NumGoroutine:", runtime.NumGoroutine()) // More detailed stats var memStats runtime.MemStats runtime.ReadMemStats(&memStats) fmt.Println("Num GC:", memStats.NumGC) }

GODEBUG Scheduler Tracing

bash
GODEBUG=schedtrace=1000 ./myprogram
Output every 1000ms:
SCHED 1000ms: gomaxprocs=8 idleprocs=6 threads=9 spinningthreads=1 idlethreads=2 runqueue=0 [1 0 0 0 0 0 0 0]
  • gomaxprocs: Number of Ps
  • idleprocs: Ps without work
  • threads: Total Ms
  • runqueue: Global queue size
  • [1 0 0...]: Each P's local queue size

Scheduler Internals Summary

Go blog diagram 6

Go blog diagram 6

Common Misconceptions

Myth: More GOMAXPROCS = Faster
Not always. If your work is I/O bound, more Ps won't help. If work is CPU bound, GOMAXPROCS > cores causes overhead.
Myth: Goroutines are free
They're cheap, not free. Million goroutines still use memory and scheduling overhead.
Myth: Order of execution is predictable
Scheduling is not deterministic. Never assume goroutine execution order.

What You Learned

You now understand that:
  • GMP model: Goroutines, Machines (threads), Processors
  • Work stealing balances load: Idle Ps steal from busy ones
  • Preemption prevents starvation: No goroutine can hog CPU
  • System calls hand off P: Blocking doesn't block scheduling
  • Network poller is efficient: Async I/O under the hood
  • GOMAXPROCS controls parallelism: Usually defaults are fine

Your Next Steps

  • Profile: Use go tool trace on your application
  • Read Next: Learn about goroutine leaks and how to detect them
  • Experiment: Try GODEBUG=schedtrace to see your scheduler in action
The Go scheduler is a masterpiece of systems programming. Understanding it helps you write faster, more efficient concurrent code. You don't need to control it, but knowing how it works makes you a better Go developer.
All Blogs
Tags:golangschedulergoroutinesruntimeperformance