TCP/IP: The Foundation of Internet Communication
You've just deployed your new microservice to production. Users are reporting intermittent connection failures, and you're staring at logs filled with "Connection reset by peer" and "Connection timeout" errors. Your monitoring dashboard shows spikes in latency, and some requests are mysteriously disappearing into the void. Sound familiar?
Here's the thing: most developers write networked applications daily without understanding what happens when they call
http.Get() or socket.send(). They treat the network as a black box—magical infrastructure that "just works." Until it doesn't.Understanding TCP/IP isn't just about fixing bugs. It's about building a mental model that lets you design better systems, debug production issues in minutes instead of hours, and make informed decisions about when to use TCP, when to reach for UDP, and why WebSocket implementations behave the way they do. Every single thing you do on the internet—from streaming videos to reading this article—relies on TCP/IP working correctly. Let's pull back the curtain.
What Problem Does TCP/IP Solve?
Before TCP/IP, computer networks were fragmented islands. In the 1960s and 70s, different manufacturers built proprietary networking systems that couldn't talk to each other. IBM machines spoke SNA, DEC machines used DECnet, and Unix systems had their own protocols. It was like having a phone that could only call other phones from the same manufacturer.
The internet needed two fundamental capabilities:
-
Routing across networks (the IP part): How do you get a message from your laptop in San Francisco to a server in Tokyo, passing through dozens of intermediate routers?
-
Reliable delivery (the TCP part): How do you ensure that data arrives intact, in order, and without corruption—even when the underlying network randomly loses packets, duplicates them, or delivers them out of sequence?

Diagram 1
The underlying network infrastructure (cables, routers, switches) is inherently unreliable:
- Packets get lost: Network congestion, buffer overflows, or hardware failures
- Packets arrive out of order: Different packets take different paths
- Packets get corrupted: Electrical interference or bit errors
- Packets get duplicated: Routing loops or retransmissions
Why was a new solution needed? UDP already existed and provided basic packet delivery, but offered no reliability guarantees. Applications would need to implement their own acknowledgment systems, retransmission logic, flow control, and congestion management—over and over again. TCP abstracted all of this complexity into a single, battle-tested protocol.
Real-World Analogy
Think of TCP/IP like the postal service combined with registered mail:
IP (Internet Protocol) is like the postal service's addressing and routing system. You write an address on an envelope, drop it in a mailbox, and the postal service figures out the route: local post office → sorting facility → regional hub → destination city → local carrier → recipient. You don't need to know the route; you just trust the addressing system.
TCP (Transmission Control Protocol) is like registered mail with delivery confirmation. When you send something important:
- You get a tracking number (sequence number)
- The recipient must sign for it (acknowledgment)
- If it doesn't arrive, it gets resent automatically (retransmission)
- Multiple packages arrive in the correct order, even if shipped separately (ordering)
- The postal service won't flood you with packages faster than you can process them (flow control)
Without TCP, you'd be sending postcards (UDP): cheap, fast, but no guarantees they'll arrive or arrive in order.
The Solution
How Does TCP/IP Solve the Problem?
TCP/IP uses a layered approach, separating concerns into two distinct protocols:
IP (Layer 3 - Network Layer):
- Best-effort delivery: Gets packets from source to destination
- Addressing: Uses 32-bit (IPv4) or 128-bit (IPv6) addresses
- Routing: Each router makes independent forwarding decisions
- Fragmentation: Breaks large packets to fit network MTU (Maximum Transmission Unit)
- No guarantees: Packets can be lost, duplicated, corrupted, or reordered
TCP (Layer 4 - Transport Layer):
- Reliability: Ensures all data arrives correctly
- Ordering: Delivers data in the correct sequence
- Error detection: Checksums detect corruption
- Flow control: Prevents overwhelming the receiver
- Congestion control: Prevents overwhelming the network
- Connection-oriented: Establishes state before data transfer
Why this approach? Separating routing (IP) from reliability (TCP) creates a clean abstraction. Routers only need to understand IP—they forward packets without tracking connections. End hosts (your laptop, servers) handle the complexity of reliable delivery. This keeps the network core simple and scalable.

Diagram 2
Key innovations:
- Sequence numbers: Every byte gets a unique number, allowing detection of loss and reordering
- Acknowledgments (ACKs): Receiver tells sender what was received successfully
- Sliding window: Allows multiple unacknowledged segments in flight (pipelining)
- Adaptive retransmission: Dynamically adjusts timeouts based on network conditions
- Congestion signals: Uses packet loss as feedback to slow down
Building the Mental Model
To truly understand TCP/IP, you need to visualize it operating at multiple levels simultaneously. Let's build this model piece by piece.
The Complete TCP Connection Lifecycle

Diagram 3
Why does each step happen?
Three-way handshake (SYN, SYN-ACK, ACK):
- SYN: Client declares initial sequence number (ISN). Why? Each connection needs unique sequence numbers to prevent old duplicate packets from corrupting new connections.
- SYN-ACK: Server acknowledges client's ISN and declares its own ISN. Why two numbers? TCP is full-duplex; data flows both directions simultaneously.
- ACK: Client acknowledges server's ISN. Why needed? Without it, server doesn't know if its SYN-ACK arrived.
Why not two-way? With only SYN and SYN-ACK, the server wouldn't know if the client received the SYN-ACK. The server might start sending data that the client isn't ready to receive.
Four-way termination (FIN, ACK, FIN, ACK):
- Why four steps instead of three? TCP is full-duplex. One side finishing doesn't mean the other is done. The FIN and ACK from each side can't be combined because there might be a delay between receiving a FIN and being ready to send one.
TIME_WAIT state:
- Why wait 2*MSL (Maximum Segment Lifetime)? To ensure the final ACK isn't lost. If the server doesn't receive the final ACK, it retransmits its FIN. The client must be around to re-ACK it. Also prevents old duplicate packets from corrupting a new connection using the same port numbers.
TCP State Machine

Diagram 4
Flow Control Mechanism
TCP uses a sliding window protocol for flow control, preventing a fast sender from overwhelming a slow receiver.

Diagram 5
Why sliding window?
- Without it, TCP would be "stop-and-wait": send one packet, wait for ACK, repeat. This wastes bandwidth, especially on high-latency networks.
- With sliding window, multiple packets can be "in flight" simultaneously, utilizing available bandwidth efficiently.
Why does receiver advertise window size?
- The receiver's application might read data slowly (e.g., writing to disk, processing, waiting for user input).
- If TCP kept accepting data, the receive buffer would overflow, forcing packet drops.
- By advertising available buffer space, the receiver controls the flow.
Congestion Control Visualization
TCP assumes packet loss indicates network congestion (not always true, but a reasonable assumption). It uses additive increase, multiplicative decrease (AIMD).

Diagram 6
Why this algorithm?
Slow Start:
- Starts conservatively (1 MSS - Maximum Segment Size, typically 1460 bytes) because TCP doesn't know network capacity.
- Doubles every RTT (Round Trip Time) to quickly discover available bandwidth.
- "Slow" is relative—it's exponential growth!
Congestion Avoidance:
- Once near network capacity (ssthresh = slow start threshold), growth becomes linear.
- Why linear? Exponential growth would quickly cause congestion again.
- Adds 1 MSS per RTT, gently probing for more capacity.
Fast Recovery:
- Three duplicate ACKs indicate packet loss but network still delivering (not totally congested).
- Halves window but doesn't drop to 1 like a timeout would.
- Why? Some packets are still getting through; don't be too aggressive.
Timeout:
- Indicates serious congestion—no ACKs arriving.
- Resets to slow start (cwnd = 1) to avoid making congestion worse.
Packet Structure Deep Dive

Diagram 7
Critical fields explained:
Sequence Number (32 bits):
- Identifies the byte position of the first data byte in this segment.
- Why 32 bits? Allows 4.3 billion unique sequence numbers. With sequence number wraparound, TCP can handle connections at 10 Gbps for hours without ambiguity.
- Initial sequence number (ISN) is randomized for security (prevents old duplicate packets from being accepted).
Acknowledgment Number (32 bits):
- The next sequence number the receiver expects.
- If Ack=5001, it means "I've received everything up to byte 5000."
- Cumulative ACK: Acknowledges all data up to this point, even if received out of order.
Window Size (16 bits):
- Advertises receive buffer space (0-65,535 bytes).
- Limits throughput: max = window_size / RTT.
- TCP Window Scaling (option) extends this to 1 GB.
Flags (9 bits):
- SYN: Synchronize sequence numbers (connection setup)
- ACK: Acknowledgment field is valid
- FIN: No more data from sender (connection teardown)
- RST: Reset connection (error condition)
- PSH: Push data to application immediately
- URG: Urgent data present (rarely used)
Deep Technical Dive
Architecture Breakdown
TCP/IP operates across multiple layers, each with distinct responsibilities:

Diagram 8
Component Communication Flow:
-
Application → TCP Socket API: Application calls
write()orsend(), passing data. -
TCP Socket API → Send Buffer: Data is copied into the socket's send buffer (kernel memory).
-
TCP Protocol Engine → Segmentation:
- Breaks data into Maximum Segment Size (MSS) chunks, typically 1460 bytes (1500 MTU - 20 IP header - 20 TCP header).
- Assigns sequence numbers to each byte.
- Calculates checksum.
-
TCP → Retransmission Queue: Keeps copy of sent-but-unacknowledged segments.
-
TCP → IP Layer: Passes segments to IP with destination address.
-
IP → Routing Table: Determines next hop (next router or final destination).
-
IP → Data Link Layer: Encapsulates in Ethernet/WiFi frame with MAC addresses.
-
Receiver Side (Reverse Flow):
- Frame → IP packet → TCP segment
- TCP checks sequence numbers, reorders if needed
- Places in receive buffer
- Application calls
read()to retrieve data
Internal Mechanics
TCP Segment Structure in Detail
Let's examine a real TCP segment (hex dump format):
0000 45 00 00 3c 1c 46 40 00 40 06 b1 e6 c0 a8 01 64 E..<.F@.@......d 0010 c0 a8 01 65 04 d2 00 50 00 00 00 01 00 00 00 00 ...e...P........ 0020 a0 02 72 10 fe 30 00 00 02 04 05 b4 04 02 08 0a ..r..0.......... 0030 00 00 00 00 00 00 00 00 01 03 03 07 ............
Decoded:
IP Header (bytes 0-19): 45 Version=4, Header Length=5*4=20 bytes 00 Type of Service (TOS) 00 3c Total Length = 60 bytes 1c 46 Identification 40 00 Flags=DF (Don't Fragment), Fragment Offset=0 40 TTL = 64 hops 06 Protocol = 6 (TCP) b1 e6 Header Checksum c0 a8 01 64 Source IP = 192.168.1.100 c0 a8 01 65 Dest IP = 192.168.1.101 TCP Header (bytes 20-39+): 04 d2 Source Port = 1234 00 50 Dest Port = 80 (HTTP) 00 00 00 01 Sequence Number = 1 00 00 00 00 Ack Number = 0 (not valid, ACK flag not set) a0 02 Data Offset=10*4=40 bytes, Flags=SYN 72 10 Window Size = 29,200 bytes fe 30 Checksum 00 00 Urgent Pointer = 0 TCP Options (bytes 40-59): 02 04 05 b4 MSS = 1460 bytes 04 02 SACK Permitted 08 0a 00 00 00 00 00 00 00 00 Timestamps 01 03 03 07 Window Scale = 7 (multiply window by 128)
Why these specific values?
- Sequence = 1: This is a SYN packet (initial connection). The ISN could be any value; 1 is just an example.
- MSS = 1460: Ethernet MTU is 1500 bytes. Subtract 20 (IP) + 20 (TCP) = 1460 bytes for data.
- Window Scale: Without scaling, max window is 65 KB. With scale factor 7, max window = 65536 * 2^7 = 8 MB. Essential for high-bandwidth, high-latency networks.
- SACK (Selective Acknowledgment): Allows receiver to acknowledge non-contiguous blocks, improving performance when multiple packets are lost.
Memory Layout and Buffers

Diagram 9
Buffer Sizing Implications:
-
Bandwidth-Delay Product (BDP): Optimal buffer size = Bandwidth × RTT
- Example: 100 Mbps, 100ms RTT → BDP = 1.25 MB
- Buffer should be ≥ BDP to fully utilize bandwidth
- Default 256 KB limits throughput to ~20 Mbps on high-latency links
-
Buffer Bloat: Excessively large buffers cause high latency
- Routers with multi-second buffers lead to "bufferbloat"
- TCP congestion control relies on packet loss signals
- Large buffers delay these signals, causing latency spikes
Protocol Specifications
TCP Port Numbers
Ports multiplex multiple connections over a single IP address:
-
Well-known ports (0-1023): Require root/admin privileges
- 20/21: FTP
- 22: SSH
- 25: SMTP
- 80: HTTP
- 443: HTTPS
-
Registered ports (1024-49151): Application-specific
- 3306: MySQL
- 5432: PostgreSQL
- 6379: Redis
- 27017: MongoDB
-
Dynamic/ephemeral ports (49152-65535): Client-side connections
- OS assigns from this range for outgoing connections
Connection Tuple: (source IP, source port, dest IP, dest port, protocol)
- Uniquely identifies a TCP connection
- Allows 65K simultaneous connections per remote host
TCP Options
Beyond the basic 20-byte header, TCP supports options:
| Option | Length | Purpose | Usage |
|---|---|---|---|
| End of Option List | 1 byte | Marks end of options | Padding |
| No-Operation (NOP) | 1 byte | Padding | Align options to 4-byte boundaries |
| MSS | 4 bytes | Negotiate maximum segment size | SYN packets |
| Window Scale | 3 bytes | Multiply window size by 2^n | SYN packets |
| SACK Permitted | 2 bytes | Enable selective acknowledgment | SYN packets |
| SACK | Variable | Acknowledge non-contiguous blocks | Data packets |
| Timestamps | 10 bytes | RTT measurement, PAWS protection | All packets |
Why timestamps?
- RTT measurement: More accurate than relying on ACKs alone
- PAWS (Protect Against Wrapped Sequences): With high-speed networks, sequence numbers can wrap around in seconds. Timestamps disambiguate old vs. new data.
Code Deep Dive
Example 1: TCP Server in Go
go// tcp_server.go package main import ( "bufio" "fmt" "net" "os" "time" ) func main() { // Listen on all interfaces, port 8080 // Protocol: "tcp", "tcp4", or "tcp6" listener, err := net.Listen("tcp", ":8080") if err != nil { fmt.Fprintf(os.Stderr, "Failed to listen: %v\n", err) os.Exit(1) } defer listener.Close() fmt.Println("Server listening on :8080") for { // Accept blocks until a client connects // Under the hood: completes 3-way handshake conn, err := listener.Accept() if err != nil { fmt.Fprintf(os.Stderr, "Failed to accept: %v\n", err) continue // Keep accepting other connections } // Handle each connection concurrently // Why goroutine? One blocked connection shouldn't block others go handleConnection(conn) } } func handleConnection(conn net.Conn) { // Defer ensures cleanup even if panic occurs defer conn.Close() // Get client address for logging clientAddr := conn.RemoteAddr().String() fmt.Printf("Client connected: %s\n", clientAddr) // Set read timeout to prevent infinite blocking // Why? Protects against slow-loris attacks, hung clients conn.SetReadDeadline(time.Now().Add(30 * time.Second)) // Buffered reader reduces system calls // Default bufio size: 4096 bytes reader := bufio.NewReader(conn) for { // ReadString reads until delimiter or EOF // Why '\n'? Simple text protocol convention message, err := reader.ReadString('\n') if err != nil { // EOF means client closed connection gracefully if err.Error() == "EOF" { fmt.Printf("Client disconnected: %s\n", clientAddr) } else { fmt.Printf("Read error from %s: %v\n", clientAddr, err) } return } fmt.Printf("Received from %s: %s", clientAddr, message) // Echo back to client // Write buffers data; doesn't guarantee immediate transmission _, err = conn.Write([]byte("Echo: " + message)) if err != nil { fmt.Printf("Write error to %s: %v\n", clientAddr, err) return } // Reset read deadline after successful operation conn.SetReadDeadline(time.Now().Add(30 * time.Second)) } }
What happens under the hood:
-
net.Listen("tcp", ":8080"):- Creates a socket:
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) - Binds to port:
bind(sockfd, {0.0.0.0, 8080}) - Marks as passive socket:
listen(sockfd, backlog) - Backlog (typically 128) = max pending connections in SYN_RECEIVED state
- Creates a socket:
-
listener.Accept():- Blocks in system call:
accept(sockfd, ...) - Kernel completes 3-way handshake with client
- Returns new socket for established connection
- Original listening socket remains open for new connections
- Blocks in system call:
-
reader.ReadString('\n'):- System call:
read(connfd, buffer, size) - Blocks until data arrives in receive buffer
- TCP handles buffering, reordering, retransmission
- Application sees reliable byte stream
- System call:
-
conn.Write([]byte(...)):- System call:
write(connfd, buffer, length) - Data copied to socket send buffer (kernel space)
- TCP segments and transmits asynchronously
- Write returns immediately; doesn't wait for ACK
- System call:
Example 2: TCP Client in Python
python# tcp_client.py import socket import sys import time def main(): # Create TCP socket # AF_INET = IPv4, SOCK_STREAM = TCP sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # Set socket options # SO_KEEPALIVE: Send TCP keepalive probes # Why? Detect broken connections (router failure, cable unplugged) sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1) # TCP_NODELAY: Disable Nagle's algorithm # Nagle's algorithm: buffer small packets to reduce overhead # Why disable? For interactive applications needing low latency sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1) # Set connection timeout # Default is blocking forever, which is dangerous sock.settimeout(10.0) try: # Connect initiates 3-way handshake # Blocks until handshake completes or timeout print("Connecting to localhost:8080...") sock.connect(('localhost', 8080)) print("Connected!") # Disable timeout for data transfer # We want blocking reads/writes now sock.settimeout(None) # Send data messages = [ "Hello, server!", "How are you?", "Goodbye!" ] for msg in messages: # Encode string to bytes # TCP transports bytes, not characters data = (msg + '\n').encode('utf-8') # Send all data # Why sendall? send() might return after sending partial data # sendall() loops until all data sent or error occurs sock.sendall(data) print(f"Sent: {msg}") # Receive response # recv() returns up to 4096 bytes # May return less if less data available response = sock.recv(4096) print(f"Received: {response.decode('utf-8').strip()}") time.sleep(1) # Pause between messages except socket.timeout: print("Connection timeout", file=sys.stderr) sys.exit(1) except ConnectionRefusedError: print("Connection refused - is server running?", file=sys.stderr) sys.exit(1) except Exception as e: print(f"Error: {e}", file=sys.stderr) sys.exit(1) finally: # Always close socket # Initiates graceful shutdown (FIN handshake) print("Closing connection...") sock.close() if __name__ == '__main__': main()
Under the hood:
-
:
socket.socket(AF_INET, SOCK_STREAM)- System call:
socket(AF_INET, SOCK_STREAM, 0) - OS allocates socket data structures
- Returns file descriptor (Unix) or handle (Windows)
- System call:
-
:
sock.connect(('localhost', 8080))- Resolves hostname to IP (if needed)
- Initiates 3-way handshake:
- Sends SYN
- Waits for SYN-ACK
- Sends ACK
- Blocks until ESTABLISHED state or timeout
-
sock.sendall(data):- Loops calling
send()until all bytes sent - Each
send()copies data to kernel send buffer - Returns when all data buffered, NOT when ACKed
- Loops calling
-
sock.recv(4096):- System call:
recv(sockfd, buffer, 4096, 0) - Blocks until at least 1 byte available
- May return less than 4096 bytes
- TCP stream has no message boundaries!
- System call:
Example 3: Examining TCP State with netstat
bash#!/bin/bash # tcp_monitor.sh # Monitor TCP connections and state transitions echo "Starting TCP connection monitor..." echo "====================================" # Function to display TCP connections in a formatted way monitor_tcp() { while true; do clear echo "TCP Connection States ($(date))" echo "----------------------------------------" # On Linux: use ss (socket statistics) - faster than netstat # On macOS: use netstat if command -v ss &> /dev/null; then # -t: TCP only # -n: Numeric addresses (no DNS lookup) # -a: All sockets (listening and established) # -o: Show timer information ss -tano | head -20 echo "" echo "State Summary:" ss -tan | awk 'NR>1 {print $1}' | sort | uniq -c | sort -rn else netstat -an -p tcp | head -20 echo "" echo "State Summary:" netstat -an -p tcp | awk 'NR>2 {print $6}' | sort | uniq -c | sort -rn fi echo "" echo "Press Ctrl+C to exit..." sleep 2 done } # Trap Ctrl+C to exit cleanly trap "echo 'Monitoring stopped.'; exit 0" INT monitor_tcp
What you'll see:
State Summary: 42 ESTABLISHED # Active data transfer connections 18 TIME_WAIT # Recently closed connections (2MSL wait) 5 LISTEN # Servers waiting for connections 2 CLOSE_WAIT # Remote closed, local app hasn't closed yet 1 FIN_WAIT_2 # Local closed, waiting for remote FIN
Why TIME_WAIT accumulates:
- Each closed connection sits in TIME_WAIT for 2*MSL (120 seconds default)
- High-traffic servers accumulate thousands of TIME_WAIT sockets
- They consume port numbers from the ephemeral range
- Can exhaust ports: max ~64K connections per (client IP, server IP, server port)
- Solution: Enable
SO_REUSEADDR, increase ephemeral port range, use connection pooling
Example 4: TCP Connection with Raw Sockets (C)
c// tcp_raw.c // Demonstrates low-level TCP connection using raw sockets // Requires root/admin privileges #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <arpa/inet.h> #include <netinet/tcp.h> #include <netinet/ip.h> #include <sys/socket.h> // TCP header structure struct tcp_header { uint16_t source_port; uint16_t dest_port; uint32_t seq_num; uint32_t ack_num; uint8_t data_offset; // 4 bits: offset, 4 bits: reserved uint8_t flags; // TCP flags (SYN, ACK, FIN, etc.) uint16_t window; uint16_t checksum; uint16_t urgent_pointer; }; // Pseudo header for checksum calculation // Why? TCP checksum includes IP addresses (pseudo header) // Provides additional error detection struct pseudo_header { uint32_t source_ip; uint32_t dest_ip; uint8_t reserved; uint8_t protocol; // 6 for TCP uint16_t tcp_length; }; // Calculate TCP checksum uint16_t calculate_checksum(void *data, int length) { uint16_t *buf = (uint16_t *)data; uint32_t sum = 0; // Add all 16-bit words while (length > 1) { sum += *buf++; length -= 2; } // Add leftover byte if odd length if (length == 1) { sum += *(uint8_t *)buf; } // Fold 32-bit sum to 16 bits while (sum >> 16) { sum = (sum & 0xFFFF) + (sum >> 16); } // One's complement return ~sum; } void send_syn_packet(const char *dest_ip, uint16_t dest_port) { int sockfd; struct sockaddr_in dest_addr; char packet[4096]; // Create raw socket // IPPROTO_TCP: We're crafting TCP packets // Requires CAP_NET_RAW capability (root on Linux) sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_TCP); if (sockfd < 0) { perror("socket() failed - are you root?"); exit(1); } // Tell kernel we're providing IP header int one = 1; if (setsockopt(sockfd, IPPROTO_IP, IP_HDRINCL, &one, sizeof(one)) < 0) { perror("setsockopt() failed"); exit(1); } // Zero out packet buffer memset(packet, 0, sizeof(packet)); // Build IP header struct iphdr *ip_hdr = (struct iphdr *)packet; ip_hdr->version = 4; ip_hdr->ihl = 5; // Header length: 5 * 4 = 20 bytes ip_hdr->tos = 0; ip_hdr->tot_len = htons(sizeof(struct iphdr) + sizeof(struct tcp_header)); ip_hdr->id = htons(54321); // Identification ip_hdr->frag_off = 0; ip_hdr->ttl = 64; // Time to live ip_hdr->protocol = IPPROTO_TCP; ip_hdr->saddr = inet_addr("192.168.1.100"); // Source IP ip_hdr->daddr = inet_addr(dest_ip); // Dest IP ip_hdr->check = 0; // Kernel fills this in // Build TCP header struct tcp_header *tcp_hdr = (struct tcp_header *)(packet + sizeof(struct iphdr)); tcp_hdr->source_port = htons(12345); // Arbitrary source port tcp_hdr->dest_port = htons(dest_port); tcp_hdr->seq_num = htonl(1000); // Initial sequence number tcp_hdr->ack_num = 0; // No ACK yet tcp_hdr->data_offset = (5 << 4); // 5 * 4 = 20 bytes, no options tcp_hdr->flags = 0x02; // SYN flag tcp_hdr->window = htons(65535); // Max window size tcp_hdr->checksum = 0; // Calculate below tcp_hdr->urgent_pointer = 0; // Calculate TCP checksum using pseudo header struct pseudo_header psh; psh.source_ip = inet_addr("192.168.1.100"); psh.dest_ip = inet_addr(dest_ip); psh.reserved = 0; psh.protocol = IPPROTO_TCP; psh.tcp_length = htons(sizeof(struct tcp_header)); // Create buffer with pseudo header + TCP header char checksum_buf[4096]; memcpy(checksum_buf, &psh, sizeof(psh)); memcpy(checksum_buf + sizeof(psh), tcp_hdr, sizeof(struct tcp_header)); tcp_hdr->checksum = calculate_checksum(checksum_buf, sizeof(psh) + sizeof(struct tcp_header)); // Destination address dest_addr.sin_family = AF_INET; dest_addr.sin_addr.s_addr = inet_addr(dest_ip); // Send packet if (sendto(sockfd, packet, ntohs(ip_hdr->tot_len), 0, (struct sockaddr *)&dest_addr, sizeof(dest_addr)) < 0) { perror("sendto() failed"); exit(1); } printf("SYN packet sent to %s:%d\n", dest_ip, dest_port); printf(" Source: 192.168.1.100:12345\n"); printf(" Seq: 1000\n"); printf(" Flags: SYN\n"); close(sockfd); } int main(int argc, char *argv[]) { if (argc != 3) { fprintf(stderr, "Usage: %s <dest_ip> <dest_port>\n", argv[0]); exit(1); } const char *dest_ip = argv[1]; uint16_t dest_port = atoi(argv[2]); send_syn_packet(dest_ip, dest_port); return 0; }
Compile and run:
bashgcc -o tcp_raw tcp_raw.c sudo ./tcp_raw 192.168.1.101 80
Why this matters:
- Understanding packet structure at this level helps debug network issues
- Tools like
tcpdump,wiresharkparse these same structures - Security tools (firewalls, IDS) inspect these fields
- Network performance tuning requires understanding header overhead
Example 5: Monitoring TCP Metrics (JavaScript/Node.js)
javascript// tcp_metrics.js // Monitors TCP connection metrics and performance const net = require('net'); const { performance } = require('perf_hooks'); class TCPMetrics { constructor() { this.connections = new Map(); this.stats = { totalConnections: 0, activeConnections: 0, bytesReceived: 0, bytesSent: 0, errors: 0 }; } // Create a monitored TCP server createServer(port, callback) { const server = net.createServer((socket) => { const connId = `${socket.remoteAddress}:${socket.remotePort}`; const connMetrics = { id: connId, connectedAt: Date.now(), bytesReceived: 0, bytesSent: 0, rttSamples: [], errors: [] }; this.connections.set(connId, connMetrics); this.stats.totalConnections++; this.stats.activeConnections++; console.log(`[CONNECT] ${connId}`); // Monitor socket buffer sizes // These affect TCP window size and throughput console.log(` Send buffer: ${socket.bufferSize} bytes`); console.log(` Receive buffer: ${socket.readableHighWaterMark} bytes`); // Data received socket.on('data', (data) => { connMetrics.bytesReceived += data.length; this.stats.bytesReceived += data.length; // Measure RTT by echoing with timestamp const pingStart = performance.now(); // Echo data back socket.write(data, () => { const rtt = performance.now() - pingStart; connMetrics.rttSamples.push(rtt); connMetrics.bytesSent += data.length; this.stats.bytesSent += data.length; // Keep only last 100 samples if (connMetrics.rttSamples.length > 100) { connMetrics.rttSamples.shift(); } }); }); // Connection closed socket.on('end', () => { const duration = Date.now() - connMetrics.connectedAt; const avgRTT = connMetrics.rttSamples.reduce((a, b) => a + b, 0) / connMetrics.rttSamples.length; console.log(`[DISCONNECT] ${connId}`); console.log(` Duration: ${duration}ms`); console.log(` Bytes RX: ${connMetrics.bytesReceived}`); console.log(` Bytes TX: ${connMetrics.bytesSent}`); console.log(` Avg RTT: ${avgRTT.toFixed(2)}ms`); console.log(` Errors: ${connMetrics.errors.length}`); this.connections.delete(connId); this.stats.activeConnections--; }); // Error handling socket.on('error', (err) => { console.error(`[ERROR] ${connId}: ${err.message}`); connMetrics.errors.push({ timestamp: Date.now(), error: err.message }); this.stats.errors++; }); // Timeout handling // Why? Detect idle connections that should be closed socket.setTimeout(60000); // 60 second timeout socket.on('timeout', () => { console.log(`[TIMEOUT] ${connId} - closing`); socket.end(); }); if (callback) { callback(socket, connMetrics); } }); server.listen(port, () => { console.log(`Server listening on port ${port}`); this.startMetricsReporter(); }); return server; } // Periodically report aggregate metrics startMetricsReporter() { setInterval(() => { console.log('\n=== TCP Metrics Report ==='); console.log(`Total connections: ${this.stats.totalConnections}`); console.log(`Active connections: ${this.stats.activeConnections}`); console.log(`Total bytes received: ${this.formatBytes(this.stats.bytesReceived)}`); console.log(`Total bytes sent: ${this.formatBytes(this.stats.bytesSent)}`); console.log(`Total errors: ${this.stats.errors}`); // Per-connection details if (this.connections.size > 0) { console.log('\nActive Connections:'); for (const [id, metrics] of this.connections) { const duration = Date.now() - metrics.connectedAt; const avgRTT = metrics.rttSamples.reduce((a, b) => a + b, 0) / metrics.rttSamples.length || 0; console.log(` ${id}: duration=${duration}ms, RTT=${avgRTT.toFixed(2)}ms, ` + `RX=${this.formatBytes(metrics.bytesReceived)}, ` + `TX=${this.formatBytes(metrics.bytesSent)}`); } } console.log('========================\n'); }, 10000); // Report every 10 seconds } formatBytes(bytes) { if (bytes < 1024) return `${bytes} B`; if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(2)} KB`; return `${(bytes / (1024 * 1024)).toFixed(2)} MB`; } } // Usage const metrics = new TCPMetrics(); metrics.createServer(8080); // Keep process running process.on('SIGINT', () => { console.log('\nShutting down...'); process.exit(0); });
Run the server:
node tcp_metrics.js
Test with multiple clients:
bash# Terminal 1 echo "Hello" | nc localhost 8080 # Terminal 2 echo "World" | nc localhost 8080 # Watch the metrics output
Visual Internals: Call Stack and System Calls
When you call
socket.connect(), here's the journey through the software stack:
Diagram 10
Benefits & Why It Matters
Performance Benefits
Throughput Optimization:
- Pipelining: Sliding window allows multiple packets in flight, maximizing bandwidth utilization
- Selective acknowledgment (SACK): Recovers from multiple packet losses efficiently
- Window scaling: Supports window sizes up to 1 GB, essential for high-speed networks
- Fast retransmit: Detects loss after 3 duplicate ACKs, avoiding timeout delays
Latency Improvements:
- Fast open (TFO): Sends data in initial SYN packet, saves 1 RTT
- TCP_NODELAY: Disables Nagle's algorithm for interactive applications
- Keep-alive: Detects dead connections without application-level polling
Comparison Chart:

Diagram 11
Real-world success stories:
- Netflix: Serves 250+ million streams daily over TCP. Uses congestion control algorithms (BBR) to maximize throughput while minimizing bufferbloat.
- Google: Developed QUIC (TCP replacement over UDP) but still uses TCP for most services. BBR congestion control increased throughput by 2-25x in some regions.
- AWS: Uses optimized TCP stacks for inter-region replication, achieving 100 Gbps+ transfers.
Developer Experience
Simplicity:
- No manual retransmission: TCP handles it automatically
- No message boundaries: Can send/receive arbitrary chunks
- No ordering concerns: TCP delivers bytes in order
- No corruption handling: Checksums ensure integrity
Reliability guarantees:
- Data arrives intact or connection fails—no silent corruption
- No duplicate data delivered to application
- No gaps in data stream
Ecosystem:
- Every programming language has TCP socket libraries
- Extensive tooling:
tcpdump,wireshark,netstat,ss - Well-understood by operations teams
- Firewall-friendly (vs. UDP which is often blocked)
Scalability
Connection scalability:
- Modern servers handle millions of concurrent TCP connections
- Linux tuning: increase file descriptor limits, socket buffers, port range
epoll/kqueueenable efficient event-driven I/O
Global internet scale:
- Billions of TCP connections active simultaneously
- Works across diverse network conditions (satellite, mobile, fiber)
- Congestion control prevents internet meltdown
Trade-offs & Gotchas
When to Use TCP
Use TCP for:
- Reliability is critical: Financial transactions, file transfers, database queries
- Ordered delivery required: Protocol state machines (HTTP, SSH, FTP)
- Variable data sizes: Streaming arbitrary amounts of data
- Firewall traversal: TCP widely allowed, UDP often blocked
- Mature ecosystem needed: Extensive tooling and libraries
Example use cases:
- Web applications (HTTP/HTTPS)
- API services (REST, GraphQL)
- Database connections (PostgreSQL, MySQL)
- File transfer (FTP, SFTP, rsync)
- Email (SMTP, IMAP)
- Remote access (SSH, RDP)
When NOT to Use TCP
Avoid TCP for:
-
Latency-sensitive real-time apps: Gaming, VoIP, video conferencing
- Why: Head-of-line blocking—if one packet is lost, all subsequent packets wait for retransmission, causing latency spikes
- Alternative: UDP with application-level selective retransmission
-
Broadcast/multicast: Sending to multiple recipients
- Why: TCP is connection-oriented (one-to-one)
- Alternative: UDP multicast
-
Simple request/response: DNS queries, SNMP, DHCP
- Why: TCP overhead (3-way handshake) doubles latency
- Alternative: UDP
-
Lossy networks with time-sensitive data: Live video streaming
- Why: Retransmitting old video frames is useless; better to skip and continue
- Alternative: UDP with forward error correction
-
Extremely high packet rate: High-frequency trading, real-time telemetry
- Why: Per-packet TCP overhead (ACKs, state management)
- Alternative: UDP or kernel-bypass networking (DPDK)
Common Mistakes
1. Assuming send() means data was delivered
Why it happens: Developers think
send() returning means the recipient received the data.Reality:
send() returns when data is copied to the socket send buffer, not when it's ACKed.How to fix:
python# Wrong: assuming send completes delivery sock.send(data) # Data might not be sent yet! # Right: use sendall and handle errors try: sock.sendall(data) except socket.error as e: # Connection broke before all data sent handle_error(e)
Debugging: Use
tcpdump to verify packets actually transmitted.2. Ignoring partial reads
Why it happens: Expecting
recv(1024) to always return 1024 bytes if available.Reality: TCP is a byte stream without message boundaries.
recv() returns when any data is available, not when buffer is full.How to fix:
go// Wrong: assuming full message received data := make([]byte, 1024) n, _ := conn.Read(data) // n might be less than 1024! // Right: loop until full message received func readExactly(conn net.Conn, size int) ([]byte, error) { buf := make([]byte, size) offset := 0 for offset < size { n, err := conn.Read(buf[offset:]) if err != nil { return nil, err } offset += n } return buf, nil }
3. Not handling TIME_WAIT exhaustion
Why it happens: High-traffic clients exhaust ephemeral ports.
Reality: Each closed connection sits in TIME_WAIT for 2*MSL (120s). With 64K ports, you can only close 500 connections/second before exhaustion.
How to fix:
python# Enable socket reuse sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) # Use connection pooling # Don't open/close connections for each request
System tuning:
bash# Linux: Increase ephemeral port range sysctl -w net.ipv4.ip_local_port_range="10000 65000" # Reduce TIME_WAIT duration (risky!) sysctl -w net.ipv4.tcp_fin_timeout=30
4. Small writes causing poor performance (Nagle's algorithm)
Why it happens: Nagle's algorithm (RFC 896) buffers small writes to reduce packet overhead.
Reality: For interactive applications (SSH, gaming), this adds 40-200ms latency.
How to fix:
c// Disable Nagle's algorithm int flag = 1; setsockopt(sockfd, IPPROTO_TCP, TCP_NODELAY, &flag, sizeof(flag));
Trade-off: More packets on network, higher overhead. Use only when latency matters more than bandwidth.
5. Not setting socket timeouts
Why it happens: Default socket behavior is blocking forever.
Reality: Hung connections (network failures, crashed peers) block forever, leaking resources.
How to fix:
python# Set timeouts sock.settimeout(30.0) # 30 second timeout # Or use SO_RCVTIMEO / SO_SNDTIMEO sock.setsockopt(socket.SOL_SOCKET, socket.SO_RCVTIMEO, struct.pack('LL', 30, 0))
6. Misunderstanding window size limits
Why it happens: Developers wonder why throughput is capped despite high bandwidth.
Reality: Throughput ≤ Window Size / RTT. Default 64 KB window limits throughput.
Example: 64 KB window, 100ms RTT → max 5.12 Mbps, regardless of link speed.
How to fix:
# Linux: Increase buffer sizes sysctl -w net.core.rmem_max=16777216 sysctl -w net.core.wmem_max=16777216 sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216"
7. Ignoring TCP keep-alive configuration
Why it happens: Assuming keep-alive detects failures quickly.
Reality: Default settings probe after 2 hours idle, taking 11 minutes to detect failure (Linux).
How to fix:
c// Enable keep-alive int optval = 1; setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, &optval, sizeof(optval)); // Set aggressive timings (Linux) optval = 60; // Start probing after 60s idle setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPIDLE, &optval, sizeof(optval)); optval = 10; // Probe every 10s setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPINTVL, &optval, sizeof(optval)); optval = 3; // 3 failed probes = dead connection setsockopt(sockfd, IPPROTO_TCP, TCP_KEEPCNT, &optval, sizeof(optval));
8. Not handling RST packets gracefully
Why it happens: Unexpected RST causes unhandled exceptions.
Reality: RST happens when:
- Connecting to closed port
- Sending data after remote closed
- Network middle boxes (firewalls, load balancers) timeout connection
How to fix:
gon, err := conn.Write(data) if err != nil { // Check for connection reset if errors.Is(err, syscall.ECONNRESET) { // Handle gracefully: reconnect, log, alert log.Printf("Connection reset by peer") } }
Performance Considerations
Bottlenecks to watch:
- CPU: Checksum calculations, encryption (TLS), system call overhead
- Memory: Socket buffers, per-connection state
- Network bandwidth: Obvious but often forgotten
- Port exhaustion: TIME_WAIT sockets consuming ephemeral ports
- File descriptor limits: Default 1024 on many systems
Optimization strategies:

Diagram 12
Benchmarking:
bash# Test throughput iperf3 -c server_ip -t 60 -P 4 # Test latency ping -c 100 server_ip # Measure packet loss mtr -c 100 server_ip # Monitor TCP stats ss -tin
Security Considerations
SYN Flood Attack
Vulnerability: Attacker sends flood of SYN packets with spoofed source IPs, exhausting server's connection backlog.
Mitigation:
bash# Enable SYN cookies (Linux) sysctl -w net.ipv4.tcp_syncookies=1 # Reduce SYN-RECV timeout sysctl -w net.ipv4.tcp_synack_retries=2
Connection Hijacking
Vulnerability: Attacker guesses sequence numbers to inject data.
Mitigation:
- Random ISN (implemented since 1990s)
- IPsec or TLS for authentication and encryption
RST Injection
Vulnerability: Attacker sends RST packet to forcibly close connection.
Mitigation:
- TCP MD5 signatures (RFC 2385) for BGP sessions
- TLS protects against injection
Amplification Attacks
Vulnerability: Attacker uses TCP to amplify traffic toward victim.
Mitigation:
- Egress filtering (BCP 38) to prevent source IP spoofing
- Rate limiting
Best Practices
- Always use TLS for sensitive data (HTTPS, not HTTP)
- Validate input at application layer—TCP doesn't protect against malicious data
- Rate limit new connections to prevent resource exhaustion
- Monitor connection states and metrics for anomalies
- Harden OS TCP/IP stack (disable unused features, tune parameters)
Comparison with Alternatives
| Feature | TCP | UDP | QUIC | SCTP |
|---|---|---|---|---|
| Reliability | Guaranteed delivery | Best effort | Guaranteed (per stream) | Guaranteed |
| Ordering | Strict in-order | No ordering | Per-stream ordering | Per-stream ordering |
| Connection | Connection-oriented | Connectionless | Connection-oriented | Connection-oriented |
| Head-of-line blocking | Yes (all data) | No | No (per stream) | No (per stream) |
| Latency | Higher (ACKs, retransmit) | Lower | Medium | Medium |
| Overhead | Medium (20+ bytes) | Low (8 bytes) | Higher (QUIC + UDP) | Medium |
| Congestion control | Built-in | None | Built-in (BBR) | Built-in |
| Firewall traversal | Excellent | Poor | Medium (UDP-based) | Poor |
| TLS integration | Separate layer | Separate (DTLS) | Built-in (TLS 1.3) | Separate |
| Use cases | Web, email, file transfer | Gaming, VoIP, streaming | HTTP/3, low-latency web | Telecom (SS7, M3UA) |

Diagram 13
When to migrate from TCP:
- TCP → QUIC: Modern web applications needing low latency and multiplexing (HTTP/3)
- TCP → UDP: Real-time gaming, VoIP, live streaming (implement own reliability as needed)
- TCP → SCTP: Multi-homing, multi-streaming scenarios (telecom)
- TCP → WebSockets over TCP: Real-time web apps needing bidirectional communication
Hands-On Examples
Example 1: Simple HTTP Server (Understanding HTTP over TCP)
go// http_tcp_server.go // Implements a minimal HTTP server over raw TCP to understand the protocol package main import ( "bufio" "fmt" "net" "strings" "time" ) func main() { listener, err := net.Listen("tcp", ":8080") if err != nil { panic(err) } defer listener.Close() fmt.Println("HTTP server listening on :8080") fmt.Println("Try: curl http://localhost:8080/") for { conn, err := listener.Accept() if err != nil { fmt.Println("Accept error:", err) continue } go handleHTTPRequest(conn) } } func handleHTTPRequest(conn net.Conn) { defer conn.Close() reader := bufio.NewReader(conn) // Read request line: GET /path HTTP/1.1 requestLine, err := reader.ReadString('\n') if err != nil { fmt.Println("Error reading request:", err) return } parts := strings.Fields(requestLine) if len(parts) < 3 { sendResponse(conn, 400, "Bad Request", "Invalid request line") return } method := parts[0] path := parts[1] version := parts[2] fmt.Printf("Request: %s %s %s\n", method, path, version) // Read headers headers := make(map[string]string) for { line, err := reader.ReadString('\n') if err != nil { return } line = strings.TrimSpace(line) if line == "" { break // Empty line indicates end of headers } parts := strings.SplitN(line, ":", 2) if len(parts) == 2 { key := strings.TrimSpace(parts[0]) value := strings.TrimSpace(parts[1]) headers[key] = value fmt.Printf(" Header: %s = %s\n", key, value) } } // Route handling switch path { case "/": sendResponse(conn, 200, "OK", "<h1>Welcome!</h1><p>TCP/IP + HTTP = Magic</p>") case "/time": timeStr := time.Now().Format(time.RFC3339) sendResponse(conn, 200, "OK", fmt.Sprintf("<h1>Current Time</h1><p>%s</p>", timeStr)) case "/headers": body := "<h1>Your Headers</h1><ul>" for k, v := range headers { body += fmt.Sprintf("<li><b>%s:</b> %s</li>", k, v) } body += "</ul>" sendResponse(conn, 200, "OK", body) default: sendResponse(conn, 404, "Not Found", "<h1>404 Not Found</h1>") } } func sendResponse(conn net.Conn, statusCode int, statusText string, body string) { response := fmt.Sprintf("HTTP/1.1 %d %s\r\n", statusCode, statusText) response += "Content-Type: text/html; charset=utf-8\r\n" response += fmt.Sprintf("Content-Length: %d\r\n", len(body)) response += "Connection: close\r\n" // Tell client we're closing after response response += "\r\n" // Empty line separates headers from body response += body // Write entire response // Under the hood: TCP segments this, handles ACKs, retransmissions _, err := conn.Write([]byte(response)) if err != nil { fmt.Println("Error writing response:", err) } }
Test it:
bash# Terminal 1: Run server go run http_tcp_server.go # Terminal 2: Test with curl curl http://localhost:8080/ curl http://localhost:8080/time curl http://localhost:8080/headers # See raw HTTP with netcat echo -e "GET / HTTP/1.1\r\nHost: localhost\r\n\r\n" | nc localhost 8080
What you'll learn:
- HTTP is a text protocol running over TCP
- Request/response structure
- How Content-Length determines body boundaries (TCP has no message boundaries)
- Connection management (Connection: close)
Example 2: TCP Chat Application
# chat_server.py import socket import threading import sys class ChatServer: def __init__(self, host='0.0.0.0', port=9999): self.host = host self.port = port self.clients = {} # {connection: username} self.lock = threading.Lock() def start(self): server = socket.socket(socket.AF_INET, socket.SOCK_STREAM) server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) server.bind((self.host, self.port)) server.listen(5) print(f"Chat server listening on {self.host}:{self.port}") try: while True: conn, addr = server.accept() print(f"New connection from {addr}") thread = threading.Thread(target=self.handle_client, args=(conn, addr)) thread.daemon = True thread.start() except KeyboardInterrupt: print("\nShutting down...") server.close() def handle_client(self, conn, addr): try: # Request username conn.send(b"Enter your username: ") username = conn.recv(1024).decode('utf-8').strip() if not username: conn.close() return # Register client with self.lock: self.clients[conn] = username # Announce join join_msg = f"*** {username} has joined the chat ***\n" self.broadcast(join_msg, exclude=conn) print(f"{username} joined from {addr}") # Send welcome message conn.send(f"Welcome, {username}! ({len(self.clients)} users online)\n".encode()) # Handle messages while True: data = conn.recv(4096) if not data: break # Client disconnected message = data.decode('utf-8').strip() if message: formatted = f"[{username}] {message}\n" print(formatted.strip()) self.broadcast(formatted, exclude=conn) except Exception as e: print(f"Error with {addr}: {e}") finally: # Client disconnected with self.lock: if conn in self.clients: username = self.clients[conn] del self.clients[conn] leave_msg = f"*** {username} has left the chat ***\n" self.broadcast(leave_msg) print(f"{username} disconnected") conn.close() def broadcast(self, message, exclude=None): """Send message to all clients except 'exclude'""" with self.lock: for client_conn in list(self.clients.keys()): if client_conn != exclude: try: client_conn.send(message.encode('utf-8')) except: # Client disconnected, remove it if client_conn in self.clients: del self.clients[client_conn] if __name__ == '__main__': server = ChatServer() server.start()
Run it:
bash# Terminal 1: Start server python chat_server.py # Terminal 2: Client 1 python chat_client.py Alice Hello everyone! # Terminal 3: Client 2 python chat_client.py Bob Hi Alice! # Terminal 4: Client 3 python chat_client.py Charlie Hey folks!
What you'll learn:
- Multi-client server architecture
- Threading for concurrent connections
- Broadcast messaging patterns
- Connection lifecycle management
- Buffer management (partial reads)
Example 3: Performance Testing Tool
go// tcp_bench.go // Benchmark TCP throughput and latency package main import ( "flag" "fmt" "io" "net" "sync" "time" ) var ( mode = flag.String("mode", "server", "Mode: server or client") host = flag.String("host", "localhost", "Host address") port = flag.Int("port", 5001, "Port number") duration = flag.Int("duration", 10, "Test duration in seconds") bufferSize = flag.Int("buffer", 32*1024, "Buffer size in bytes") parallel = flag.Int("parallel", 1, "Number of parallel connections") ) type Stats struct { bytesTransferred int64 startTime time.Time endTime time.Time mu sync.Mutex } func (s *Stats) add(bytes int) { s.mu.Lock() s.bytesTransferred += int64(bytes) s.mu.Unlock() } func (s *Stats) report() { duration := s.endTime.Sub(s.startTime).Seconds() throughputMbps := float64(s.bytesTransferred*8) / duration / 1_000_000 fmt.Printf("\n--- Results ---\n") fmt.Printf("Duration: %.2f seconds\n", duration) fmt.Printf("Data transferred: %.2f MB\n", float64(s.bytesTransferred)/1_000_000) fmt.Printf("Throughput: %.2f Mbps\n", throughputMbps) } func runServer() { addr := fmt.Sprintf(":%d", *port) listener, err := net.Listen("tcp", addr) if err != nil { panic(err) } defer listener.Close() fmt.Printf("Server listening on %s\n", addr) fmt.Printf("Buffer size: %d bytes\n", *bufferSize) for { conn, err := listener.Accept() if err != nil { fmt.Println("Accept error:", err) continue } go handleServerConnection(conn) } } func handleServerConnection(conn net.Conn) { defer conn.Close() fmt.Printf("Client connected: %s\n", conn.RemoteAddr()) stats := &Stats{startTime: time.Now()} buffer := make([]byte, *bufferSize) // Read all data from client for { n, err := conn.Read(buffer) if err != nil { if err != io.EOF { fmt.Println("Read error:", err) } break } stats.add(n) } stats.endTime = time.Now() fmt.Printf("Client %s disconnected\n", conn.RemoteAddr()) stats.report() } func runClient() { addr := fmt.Sprintf("%s:%d", *host, *port) fmt.Printf("Connecting to %s\n", addr) fmt.Printf("Test duration: %d seconds\n", *duration) fmt.Printf("Parallel connections: %d\n", *parallel) fmt.Printf("Buffer size: %d bytes\n", *bufferSize) var wg sync.WaitGroup stats := &Stats{startTime: time.Now()} // Launch parallel connections for i := 0; i < *parallel; i++ { wg.Add(1) go func(id int) { defer wg.Done() runClientConnection(id, stats) }(i) } wg.Wait() stats.endTime = time.Now() stats.report() } func runClientConnection(id int, stats *Stats) { addr := fmt.Sprintf("%s:%d", *host, *port) conn, err := net.Dial("tcp", addr) if err != nil { fmt.Printf("Connection %d failed: %v\n", id, err) return } defer conn.Close() fmt.Printf("Connection %d established\n", id) buffer := make([]byte, *bufferSize) for i := range buffer { buffer[i] = byte(i % 256) } deadline := time.Now().Add(time.Duration(*duration) * time.Second) for time.Now().Before(deadline) { n, err := conn.Write(buffer) if err != nil { fmt.Printf("Connection %d write error: %v\n", id, err) break } stats.add(n) } fmt.Printf("Connection %d finished\n", id) } func main() { flag.Parse() if *mode == "server" { runServer() } else if *mode == "client" { runClient() } else { fmt.Println("Invalid mode. Use 'server' or 'client'") } }
Run benchmarks:
bash# Compile go build tcp_bench.go # Terminal 1: Start server ./tcp_bench -mode server # Terminal 2: Run client ./tcp_bench -mode client -duration 10 -parallel 4 -buffer 65536 # Test different scenarios ./tcp_bench -mode client -parallel 1 -buffer 1024 # Small buffers ./tcp_bench -mode client -parallel 10 -buffer 65536 # Multiple connections
Example output:
--- Results --- Duration: 10.00 seconds Data transferred: 4521.23 MB Throughput: 3616.98 Mbps
Interview Preparation
Question 1: Explain the TCP three-way handshake
Answer: The TCP three-way handshake establishes a connection between client and server:
- SYN: Client sends SYN packet with initial sequence number (ISN)
- SYN-ACK: Server responds with its own ISN and acknowledges client's ISN
- ACK: Client acknowledges server's ISN
Both sides exchange initial sequence numbers, allocate buffers, and transition to ESTABLISHED state. Each side confirms the other received its SYN.
Why they ask: Tests fundamental understanding of TCP connection establishment.
Red flags to avoid:
- Saying it's for "authentication" (it's not—anyone can complete the handshake)
- Not mentioning sequence numbers
- Confusing with TLS/SSL handshake
Pro tip: Mention SYN cookies as defense against SYN flood attacks. Explain that the handshake adds 1 RTT latency, which is why HTTP/3 (QUIC) uses 0-RTT connection establishment.
Question 2: What happens when a TCP packet is lost?
Answer: TCP detects loss through two mechanisms:
- Timeout: If ACK not received within retransmission timeout (RTO), sender retransmits the segment
- Fast retransmit: If sender receives 3 duplicate ACKs, it immediately retransmits without waiting for timeout
After retransmission:
- Timeout: Sender resets congestion window to 1 MSS (slow start) and halves ssthresh
- Fast retransmit: Sender enters fast recovery, halving congestion window but not resetting to 1
Receiver buffers out-of-order segments and sends duplicate ACKs for the last in-order byte received.
Why they ask: Tests understanding of reliability mechanisms and congestion control.
Red flags to avoid:
- Saying TCP "prevents" packet loss (it handles it, not prevents)
- Not distinguishing timeout vs. fast retransmit
- Ignoring congestion control impact
Pro tip: Mention SACK (Selective Acknowledgment) as an optimization that lets receivers acknowledge non-contiguous blocks, improving performance when multiple packets are lost.
Question 3: Why does TCP have TIME_WAIT state?
Answer: TIME_WAIT state persists for 2*MSL (Maximum Segment Lifetime, typically 60-120 seconds) for two reasons:
- Ensure final ACK arrives: If the remote's FIN isn't ACKed, it retransmits. The local side must be around to re-ACK.
- Prevent old duplicate packets: Packets from the old connection might still be in the network. TIME_WAIT ensures they expire before the port pair is reused, preventing corruption of a new connection.
Why they ask: Tests understanding of connection termination and edge cases.
Red flags to avoid:
- Saying TIME_WAIT is a bug or unnecessary
- Not understanding the implications for server scaling
- Confusing with CLOSE_WAIT
Pro tip: Discuss practical implications for high-traffic servers (port exhaustion) and mitigation strategies (SO_REUSEADDR, connection pooling, load balancing across multiple IPs).
Question 4: How does TCP flow control work?
Answer: TCP uses a sliding window protocol for flow control:
- Receiver advertises available buffer space in the window size field of ACK packets
- Sender limits unacknowledged data to this window size
- As receiver's application reads data, window "slides" forward and receiver advertises a larger window
- If window reaches 0, sender stops and periodically sends 1-byte probes to check if window opened
This prevents a fast sender from overwhelming a slow receiver's buffer.
Why they ask: Tests understanding of buffering and flow control mechanisms.
Red flags to avoid:
- Confusing flow control (receiver buffer management) with congestion control (network capacity management)
- Not mentioning window size field
- Ignoring buffer space implications
Pro tip: Mention that throughput is limited by window_size / RTT, and discuss TCP window scaling (RFC 1323) which extends the 16-bit window size field to support windows up to 1 GB for high-bandwidth, high-latency networks.
Question 5: What is TCP congestion control?
Answer: TCP congestion control prevents network overload using AIMD (Additive Increase, Multiplicative Decrease):
Slow Start: cwnd starts at 1 MSS and doubles each RTT until reaching ssthresh
Congestion Avoidance: cwnd increases linearly (by 1 MSS per RTT)
Loss detection:
- Timeout: Severe congestion—reset cwnd to 1, halve ssthresh, return to slow start
- 3 duplicate ACKs: Mild congestion—fast recovery halves cwnd but doesn't reset to 1
Modern algorithms (Cubic, BBR) improve on this basic scheme.
Why they ask: Tests understanding of network capacity management and TCP's adaptive behavior.
Red flags to avoid:
- Confusing with flow control
- Not explaining why it's needed (prevent internet collapse)
- Ignoring modern improvements (BBR, Cubic)
Pro tip: Mention that congestion control is end-to-end (routers don't need to understand it), making it deployable without network upgrades. Discuss BBR (Bottleneck Bandwidth and RTT), developed by Google, which measures bandwidth and RTT directly instead of relying on packet loss signals.
Question 6: What causes connection resets (RST)?
Answer: TCP sends RST in these scenarios:
- Port closed: Client connects to closed port → server sends RST
- Invalid segment: Receiving data in an invalid state (e.g., data after connection closed)
- Resource exhaustion: Server can't allocate resources for connection
- Timeout: Middle boxes (firewalls, NAT) timeout idle connections
- Application abort: Application calls close with SO_LINGER = 0 or crashes
RST immediately aborts the connection without graceful shutdown, discarding any buffered data.
Why they ask: Tests troubleshooting ability and understanding of error conditions.
Red flags to avoid:
- Not distinguishing RST from FIN (graceful shutdown)
- Blaming RST on "network issues" without specifics
- Not mentioning application-level causes
Pro tip: Explain debugging approach: use
tcpdump or Wireshark to capture the RST packet, examine flags and sequence numbers, and check application/firewall logs. Mention that RST packets can be spoofed for attacks (injection attacks).Question 7: How would you debug high latency on a TCP connection?
Answer: Systematic debugging approach:
-
Measure RTT: Use
pingto measure network latency baseline -
Check TCP metrics:
ss -tinshows retransmissions, RTT, congestion window -
Capture packets:
tcpdumpor Wireshark to see actual packet timing -
Look for:
- Packet loss → retransmissions add latency
- Small window size → limits throughput, causes waiting
- Nagle's algorithm + delayed ACKs → 40-200ms added latency
- Application-level delays (slow reads/writes)
- Middle box issues (NAT, firewall timeouts)
-
Check buffers: Bufferbloat (oversized buffers) causes latency spikes
Why they ask: Tests practical troubleshooting skills and deep technical knowledge.
Red flags to avoid:
- Immediately blaming "the network" without diagnosis
- Not using tools systematically
- Ignoring application-level issues
Pro tip: Mention enabling TCP timestamps for more accurate RTT measurement, and discuss the impact of congestion control algorithms on latency (BBR optimizes for low latency vs. traditional algorithms that react to loss).
Question 8: Explain the difference between TCP and UDP
Answer:
TCP (Transmission Control Protocol):
- Connection-oriented (handshake required)
- Reliable (guaranteed delivery, retransmission)
- Ordered (bytes delivered in sequence)
- Flow control (prevents overwhelming receiver)
- Congestion control (prevents overwhelming network)
- Higher latency (ACKs, retransmissions)
- 20+ byte header overhead
UDP (User Datagram Protocol):
- Connectionless (no handshake)
- Unreliable (best-effort delivery)
- Unordered (packets may arrive out of order)
- No flow or congestion control
- Lower latency
- 8 byte header overhead
Use TCP when: Reliability matters (web, email, file transfer, databases)
Use UDP when: Latency matters more than reliability (gaming, VoIP, streaming, DNS)
Why they ask: Tests fundamental understanding of transport layer protocols.
Red flags to avoid:
- Saying UDP is "bad" or "broken" (it's designed for different use cases)
- Not mentioning specific use cases
- Claiming TCP is "always better"
Pro tip: Discuss modern protocols like QUIC (used in HTTP/3) which implement TCP-like reliability over UDP to avoid head-of-line blocking and enable faster connection establishment.
Question 9: What is head-of-line blocking in TCP?
Answer: Head-of-line blocking occurs when a lost packet blocks delivery of all subsequent packets, even if they've been received successfully.
Example: Client requests files A, B, C over TCP. File A's first packet is lost. Even though B and C arrive successfully, TCP buffers them and doesn't deliver to the application until A's packet is retransmitted and received.
Why it happens: TCP guarantees in-order delivery. The receiver can't deliver byte N+1 until byte N arrives.
Impact: Increases latency, especially on lossy networks. One lost packet delays unrelated data.
Solution: Use multiple TCP connections (HTTP/1.1 does this), use UDP with application-level selective reliability (QUIC), or use protocols with per-stream ordering (SCTP, QUIC).
Why they ask: Tests understanding of TCP's ordering guarantees and their implications.
Red flags to avoid:
- Confusing with network congestion
- Not understanding why this is a problem for certain applications
- Not mentioning solutions
Pro tip: Explain that this is a major reason for HTTP/2 and HTTP/3 development. HTTP/2 has head-of-line blocking at the TCP layer despite multiplexing at the HTTP layer. HTTP/3 (QUIC) solves this with per-stream ordering over UDP.
Question 10: How do TCP keep-alive probes work?
Answer: TCP keep-alive detects dead connections by sending probes after idle periods:
- After
TCP_KEEPIDLEseconds of inactivity (default: 2 hours), send a keep-alive probe (empty ACK) - If no response, retry every
TCP_KEEPINTVLseconds (default: 75 seconds) - After
TCP_KEEPCNTfailed probes (default: 9), declare connection dead and close
Purpose: Detect:
- Crashed peer (no graceful FIN sent)
- Network failure (cable unplugged, router failure)
- Middle box timeout (NAT/firewall dropped state)
Limitations: Very slow by default (2 hours + 11 minutes), configurable per socket.
Why they ask: Tests understanding of connection management and failure detection.
Red flags to avoid:
- Confusing with application-level heartbeats
- Not understanding configurability
- Claiming it's always needed (it's optional)
Pro tip: Mention that keep-alive is often insufficient for production systems—most applications implement their own heartbeat/ping mechanism with shorter timeouts (30-60 seconds) for faster failure detection.
Quick Reference Sheet
Key Concepts:
- TCP provides reliable, ordered, connection-oriented byte streams over unreliable IP
- Three-way handshake (SYN, SYN-ACK, ACK) establishes connections
- Four-way handshake (FIN, ACK, FIN, ACK) terminates connections
- Sequence numbers enable ordering and loss detection
- Sliding window provides flow control (receiver buffer management)
- AIMD (Additive Increase, Multiplicative Decrease) provides congestion control
- TIME_WAIT lasts 2*MSL to ensure clean connection closure
Important Numbers:
- Default window size: 64 KB (extendable to 1 GB with scaling)
- MSS (Maximum Segment Size): Typically 1460 bytes (1500 MTU - 40 bytes headers)
- Default MSL: 60 seconds (TIME_WAIT = 2*MSL = 120s)
- Port range: 0-65535 (well-known: 0-1023, ephemeral: 49152-65535)
- TCP header: 20-60 bytes
- Initial cwnd: 1 MSS (or 10 MSS with IW10)
Decision Flowchart:

Diagram 14
Key Takeaways
🔑 TCP provides reliable byte streams over unreliable networks - It handles packet loss, reordering, duplication, and corruption transparently, so applications see a reliable stream.
🔑 The three-way handshake establishes bidirectional communication - Both sides exchange sequence numbers and allocate resources. It adds 1 RTT latency but ensures both sides are ready.
🔑 Sequence numbers are the foundation - Every byte has a sequence number, enabling TCP to detect loss, reorder packets, and prevent duplicate delivery.
🔑 Flow control and congestion control are distinct - Flow control (sliding window) prevents overwhelming the receiver's buffer. Congestion control (AIMD) prevents overwhelming the network.
🔑 TCP is a trade-off: reliability for latency - Retransmissions, ACKs, and in-order delivery add latency. For real-time applications prioritizing latency over reliability, UDP may be better.
🔑 Buffer sizes directly impact throughput - Maximum throughput = window_size / RTT. Small buffers limit performance on high-latency networks. Large buffers cause bufferbloat.
🔑 Connection management matters at scale - TIME_WAIT accumulation, port exhaustion, and file descriptor limits become bottlenecks for high-traffic servers. Use connection pooling and proper socket options.
Insights & Reflection
TCP/IP represents one of computing's most successful abstractions. By separating routing (IP) from reliability (TCP), it enabled the internet to scale from a few hundred hosts to billions of devices. The protocol's genius lies not in complexity but in simplicity—a few elegant mechanisms (sequence numbers, ACKs, sliding windows) provide robust reliability over chaotic networks.
The end-to-end principle guides TCP's design: intelligence at the edges, simplicity in the core. Routers simply forward packets; end hosts handle reliability. This makes the network deployable, upgradeable, and resilient. New congestion control algorithms (BBR, Cubic) improve performance without upgrading every router.
TCP's evolution reflects changing network conditions. In 1981, networks were slow (56 kbps), high-latency (satellite links), and lossy. Today, we have gigabit connections with millisecond latencies. Yet TCP adapts: window scaling for high-bandwidth networks, SACK for lossy wireless, fast open for low latency. The protocol's extensibility (options field) enables innovation within the same framework.
Modern applications push TCP's limits. Real-time communication (gaming, VoIP) suffers from head-of-line blocking. Protocols like QUIC reimagine reliability by implementing TCP-like mechanisms over UDP, gaining flexibility TCP can't provide (per-stream ordering, 0-RTT connection establishment). Yet TCP remains foundational—most internet traffic still flows over it.
Understanding TCP deeply changes how you approach system design. You stop treating the network as magic and start reasoning about failure modes, latency budgets, and resource consumption. You appreciate trade-offs: reliability vs. latency, throughput vs. fairness, simplicity vs. optimization. These lessons extend beyond networking—they're fundamental to distributed systems.
TCP isn't just a protocol; it's a philosophy of building robust systems in unreliable environments. Its techniques—acknowledgments, timeouts, exponential backoff, adaptive algorithms—appear throughout computer science. Master TCP, and you master a mental model applicable far beyond networking.