Migrating Monoliths to Microservices with Go — Blog

Every monolith migration I’ve worked on had the same constraint: you can’t stop shipping features while you refactor. The big-bang rewrite is a fantasy — I’ve seen three attempted, two failed, and the one that succeeded took twice as long as projected and required rewriting the rewrite halfway through. What actually works is the incremental approach: strangle the monolith piece by piece while the business keeps running.

Before any of that, though, a question most posts skip: should you migrate at all? I’ll get to that at the end. For the rest of this post, assume you’ve decided yes and you need a framework. This is the one I use.

Threat Model: What Actually Goes Wrong

Migrations fail in specific, predictable ways. Name the threats before drawing architecture diagrams. Every control I introduce below defends against one of these.

Threat	How it bites
Data loss during extraction	Shared tables get split, writes to the old schema are lost or silently diverged from the new service
Long tail of coupling	The last 20% of the monolith holds 80% of the cross-cutting hooks — reflection, shared globals, implicit transactions
Failed rollback	You’ve cut over, the new service breaks, and the monolith can’t resume because its schema has moved on
Performance regression	One JOIN becomes three network hops. p99 triples. Nobody measured before
Dual-write divergence	Monolith and new service both write to “their” copies. A failure between the two writes leaves them inconsistent, forever
Contract drift	Consumer calls provider with a shape provider no longer returns. No compile-time signal, breaks in prod
Test gap	Unit tests still pass. Integration tests were coupled to monolith internals. There’s nothing exercising the network boundary

Come back to this table when you’re tempted to skip a step. If you can’t explain which threat a given control addresses, drop the control.

Assess Before You Architect

You cannot draw service boundaries from a whiteboard. You need two things: a dependency graph of the code, and runtime traces of how requests actually flow. Both are grounded in reality; architecture diagrams are not.

Static Dependency Analysis

Walk the source tree, parse imports, build a graph. Low-coupling packages are your extraction candidates; highly coupled ones are the last to go.

Two things the common implementation gets wrong and that I’m fixing here: parse errors must be surfaced loudly (silent skips under-count coupling and you will extract the wrong package first), and the graph must count unique dependencies, not duplicate entries from repeated imports across files.

One trust note before the code: this tool is code-execution-adjacent when run in CI — it walks a source tree and feeds arbitrary .go files to go/parser. Treat its input as untrusted unless the repo itself is. The guards below (symlink resolution, root-prefix check, file-size cap) exist because a malicious or pathological input file can otherwise stall the walk or exhaust memory during parse.

// tools/depgraph/main.go
package main

import (
	"fmt"
	"go/parser"
	"go/token"
	"io"
	"log"
	"os"
	"path/filepath"
	"sort"
	"strings"
)

// DependencyGraph maps a package path to the set of internal packages it imports.
type DependencyGraph map[string]map[string]struct{}

// maxGoFileBytes caps individual file size before we hand bytes to go/parser.
// Real source files are well under this; a pathological input crafted to stall
// the parser is not. 1 MB is generous.
const maxGoFileBytes = 1 << 20

func main() {
	if len(os.Args) < 3 {
		fmt.Fprintln(os.Stderr, "usage: depgraph <root> <module-prefix>")
		os.Exit(2)
	}
	rawRoot, modulePrefix := os.Args[1], os.Args[2]

	// Resolve symlinks once, up front, and use the resolved path as the
	// containment boundary. Without this, a symlink inside the tree can
	// redirect the walk into /etc or a sibling repo.
	root, err := filepath.EvalSymlinks(rawRoot)
	if err != nil {
		log.Fatalf("resolve root %q: %v", rawRoot, err)
	}
	rootAbs, err := filepath.Abs(root)
	if err != nil {
		log.Fatalf("abs root: %v", err)
	}

	graph, parseErrs := analyze(rootAbs, modulePrefix)

	// Parse errors must be loud. A silent skip is how you extract the wrong
	// package first and discover three weeks later that it was coupled to
	// half the codebase.
	if len(parseErrs) > 0 {
		log.Printf("WARNING: %d files failed to parse — results are incomplete", len(parseErrs))
		for _, e := range parseErrs {
			log.Printf("  parse error: %v", e)
		}
	}

	printGraph(graph)
	printExtractionCandidates(graph)
}

func analyze(root, modulePrefix string) (DependencyGraph, []error) {
	graph := DependencyGraph{}
	fset := token.NewFileSet()
	var parseErrs []error

	err := filepath.WalkDir(root, func(path string, d os.DirEntry, err error) error {
		if err != nil {
			return err
		}
		if d.IsDir() || !strings.HasSuffix(path, ".go") || strings.HasSuffix(path, "_test.go") {
			return nil
		}
		// Containment check: resolve the current entry and confirm it still
		// sits under root. Skip anything that escapes (symlink out, Rel
		// starting with "..", etc.) — a malicious tree can otherwise pull
		// this tool outside the repo it was pointed at.
		resolved, err := filepath.EvalSymlinks(path)
		if err != nil {
			parseErrs = append(parseErrs, fmt.Errorf("%s: resolve: %w", path, err))
			return nil
		}
		resolvedAbs, err := filepath.Abs(resolved)
		if err != nil {
			parseErrs = append(parseErrs, fmt.Errorf("%s: abs: %w", path, err))
			return nil
		}
		rel, err := filepath.Rel(root, resolvedAbs)
		if err != nil || rel == ".." || strings.HasPrefix(rel, ".."+string(filepath.Separator)) {
			return nil
		}
		// File-size cap before ParseFile. go/parser will happily eat a
		// gigabyte file and OOM the process on pathological input.
		// Open once and read through a LimitReader: a stat-then-parse
		// pair is TOCTOU-racy (swap the file between the two calls and
		// you bypass the cap), and ParseFile with a path argument would
		// re-open by name, opening the same race. One fd, one read.
		fh, err := os.Open(resolvedAbs)
		if err != nil {
			parseErrs = append(parseErrs, fmt.Errorf("%s: open: %w", path, err))
			return nil
		}
		// +1 so we can tell "exactly at cap" from "over cap".
		src, err := io.ReadAll(io.LimitReader(fh, maxGoFileBytes+1))
		_ = fh.Close()
		if err != nil {
			parseErrs = append(parseErrs, fmt.Errorf("%s: read: %w", path, err))
			return nil
		}
		if len(src) > maxGoFileBytes {
			parseErrs = append(parseErrs, fmt.Errorf("%s: skipped, exceeds %d-byte cap", path, maxGoFileBytes))
			return nil
		}
		f, err := parser.ParseFile(fset, resolvedAbs, src, parser.ImportsOnly)
		if err != nil {
			parseErrs = append(parseErrs, fmt.Errorf("%s: %w", path, err))
			return nil
		}
		pkgDir, err := filepath.Rel(root, filepath.Dir(resolvedAbs))
		if err != nil {
			parseErrs = append(parseErrs, fmt.Errorf("%s: rel: %w", path, err))
			return nil
		}
		pkgName := filepath.ToSlash(pkgDir)
		if _, ok := graph[pkgName]; !ok {
			graph[pkgName] = map[string]struct{}{}
		}
		for _, imp := range f.Imports {
			ip := strings.Trim(imp.Path.Value, `"`)
			if !strings.HasPrefix(ip, modulePrefix) {
				continue
			}
			dep := strings.TrimPrefix(ip, modulePrefix+"/")
			if dep != pkgName {
				graph[pkgName][dep] = struct{}{}
			}
		}
		return nil
	})
	if err != nil {
		parseErrs = append(parseErrs, err)
	}
	return graph, parseErrs
}

func printExtractionCandidates(g DependencyGraph) {
	type entry struct {
		pkg  string
		deps int
	}
	var xs []entry
	for pkg, deps := range g {
		xs = append(xs, entry{pkg, len(deps)})
	}
	sort.Slice(xs, func(i, j int) bool { return xs[i].deps < xs[j].deps })
	fmt.Println("\nExtraction candidates (fewest internal deps first):")
	for _, e := range xs {
		fmt.Printf("  %3d  %s\n", e.deps, e.pkg)
	}
}

func printGraph(g DependencyGraph) {
	pkgs := make([]string, 0, len(g))
	for p := range g {
		pkgs = append(pkgs, p)
	}
	sort.Strings(pkgs)
	for _, p := range pkgs {
		fmt.Printf("%s (%d deps)\n", p, len(g[p]))
	}
}

Static analysis is necessary but insufficient. It cannot see runtime coupling — reflection, init() side effects, shared globals, implicit database transactions that span packages. A package with two imports can still be tangled with the rest of the system through a shared *sql.Tx passed through context, or a package-level singleton that every handler touches. Treat the output as a ranking hint, not a decree.

Runtime Traces of What Actually Talks to What

Static analysis tells you what could call what. Runtime tracing tells you what does. Run it in production for a week and you’ll find cross-package calls nobody documented.

The implementation detail that matters: stats aggregation across concurrent trace arrivals is a classic data race trap. Every Go monolith I’ve migrated had one of these in some analytics-adjacent code. The fix is to aggregate inside the same lock that protects the underlying map, and to compute derived stats on read — not update running averages on write, which is fragile under concurrency anyway.

Before the code, three rules this collector follows because a trace ingest endpoint is an attractive target for resource exhaustion:

Bind to localhost by default. A /report endpoint that exposes your architecture’s hot paths and component fan-out is operational intelligence. In production, gate it behind your mesh’s mTLS/authz (SPIFFE identity, service-account policy) and require a shared secret on /trace at minimum. The code below listens on 127.0.0.1 and documents the production expectation.
Cap the body, cap the fields, cap the cardinality. Path, Method, and every component string become map keys. Without caps, an unnormalized URL path (or a hostile client) turns the collector into a cardinality bomb: unique keys accumulate, the map grows, the process OOMs. We cap the body with http.MaxBytesReader, truncate each string field at ingest, and drop traces whose path would exceed a fixed LRU-bounded set of known paths.
Use a real ring buffer, not s[1:]. Re-slicing a slice to drop the head keeps the backing array alive and pins old entries forever — same footgun that burns people in log buffers. A fixed array with head/tail indices has bounded memory and no hidden retention.

// tools/tracecollector/main.go
package main

import (
	"context"
	"encoding/json"
	"errors"
	"log"
	"net"
	"net/http"
	"os/signal"
	"sync"
	"syscall"
	"time"
)

const (
	maxBodyBytes      = 64 << 10 // 64 KiB request body cap
	maxStringField    = 128      // per-field string truncation
	maxComponents     = 32       // per-trace component count cap
	maxDistinctPaths  = 1024     // LRU bound on distinct Path keys
)

type Trace struct {
	TraceID    string    `json:"trace_id"`
	Path       string    `json:"path"`
	Method     string    `json:"method"`
	DurationMs float64   `json:"duration_ms"`
	StatusCode int       `json:"status_code"`
	Components []string  `json:"components"`
	ReceivedAt time.Time `json:"-"`
}

// Collector is a fixed-size ring buffer. head is the write index; count tracks
// how many slots are filled. No re-slicing, no hidden retention in a growing
// backing array.
type Collector struct {
	mu       sync.Mutex
	buf      []Trace
	head     int
	count    int
	capacity int

	// knownPaths is a bounded set: distinct Path values we'll accept. Once
	// full, new paths are dropped rather than letting the stats map grow
	// unbounded under attacker-chosen keys.
	knownPaths map[string]struct{}
}

func NewCollector(capacity int) *Collector {
	return &Collector{
		buf:        make([]Trace, capacity),
		capacity:   capacity,
		knownPaths: make(map[string]struct{}, maxDistinctPaths),
	}
}

// truncate returns s capped to n runes-as-bytes. We care about byte length,
// not rune count, because the bound here is memory, not display.
func truncate(s string, n int) string {
	if len(s) > n {
		return s[:n]
	}
	return s
}

// admitPath returns true if the path is already known or there's room for a
// new one. When room runs out, we drop — this is the cardinality cap. A
// production variant would evict least-recently-used; for a migration tool
// the fixed-cap drop is simpler and the operator just raises the bound if
// legitimate traffic has more distinct paths. The point is: attacker-chosen
// keys cannot grow the stats map past maxDistinctPaths.
func (c *Collector) admitPath(p string) bool {
	if _, ok := c.knownPaths[p]; ok {
		return true
	}
	if len(c.knownPaths) >= maxDistinctPaths {
		return false
	}
	c.knownPaths[p] = struct{}{}
	return true
}

func (c *Collector) Record(t Trace) {
	// Truncate every attacker-authored string before it becomes a map key
	// or sits in the ring. Fields arrive pre-bounded by the body cap, but
	// a 64 KiB body can still carry a single 60 KiB Path — cap the field.
	t.Path = truncate(t.Path, maxStringField)
	t.Method = truncate(t.Method, maxStringField)
	if len(t.Components) > maxComponents {
		t.Components = t.Components[:maxComponents]
	}
	for i := range t.Components {
		t.Components[i] = truncate(t.Components[i], maxStringField)
	}

	c.mu.Lock()
	defer c.mu.Unlock()
	if !c.admitPath(t.Path) {
		return // cardinality cap reached; drop
	}
	c.buf[c.head] = t
	c.head = (c.head + 1) % c.capacity
	if c.count < c.capacity {
		c.count++
	}
}

// Report copies out a snapshot under the lock, then aggregates without it.
// Computing stats on read avoids the running-average race that burns people.
func (c *Collector) Report() map[string]PathStats {
	c.mu.Lock()
	snapshot := make([]Trace, c.count)
	// Walk the ring from oldest to newest.
	start := c.head - c.count
	if start < 0 {
		start += c.capacity
	}
	for i := 0; i < c.count; i++ {
		snapshot[i] = c.buf[(start+i)%c.capacity]
	}
	c.mu.Unlock()

	out := map[string]PathStats{}
	for _, t := range snapshot {
		s := out[t.Path]
		s.Count++
		s.TotalMs += t.DurationMs
		if t.DurationMs > s.MaxMs {
			s.MaxMs = t.DurationMs
		}
		if s.MinMs == 0 || t.DurationMs < s.MinMs {
			s.MinMs = t.DurationMs
		}
		if s.Components == nil {
			s.Components = map[string]int{}
		}
		for _, comp := range t.Components {
			s.Components[comp]++
		}
		out[t.Path] = s
	}
	for p, s := range out {
		if s.Count > 0 {
			s.AvgMs = s.TotalMs / float64(s.Count)
			out[p] = s
		}
	}
	return out
}

type PathStats struct {
	Count      int
	AvgMs      float64
	MinMs      float64
	MaxMs      float64
	TotalMs    float64
	Components map[string]int
}

// recoverHandler wraps a handler so a panic in decode/aggregation doesn't
// leave the mutex locked or take the process down. Tracing endpoints fed
// isLoopbackHost returns true only if the Host header names a loopback
// literal. DNS rebinding attacks rely on a name that currently resolves
// to 127.0.0.1 but is controlled by the attacker — the connection lands
// on localhost, but the browser sends the attacker's Host. Comparing the
// Host value against loopback literals closes that window.
func isLoopbackHost(host string) bool {
	h, _, err := net.SplitHostPort(host)
	if err != nil {
		h = host
	}
	if h == "localhost" {
		return true
	}
	ip := net.ParseIP(h)
	return ip != nil && ip.IsLoopback()
}

// untrusted bodies must isolate failures per-request.
func recoverHandler(next http.HandlerFunc) http.HandlerFunc {
	return func(w http.ResponseWriter, r *http.Request) {
		defer func() {
			if rec := recover(); rec != nil {
				log.Printf("panic in %s: %v", r.URL.Path, rec)
				http.Error(w, "internal error", http.StatusInternalServerError)
			}
		}()
		next(w, r)
	}
}

func main() {
	col := NewCollector(100_000)
	mux := http.NewServeMux()
	mux.HandleFunc("/trace", recoverHandler(func(w http.ResponseWriter, r *http.Request) {
		if r.Method != http.MethodPost {
			http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
			return
		}
		// Cap the body BEFORE json.Decode — otherwise the decoder will
		// buffer an arbitrary-size payload on attacker-friendly terms.
		r.Body = http.MaxBytesReader(w, r.Body, maxBodyBytes)
		var t Trace
		dec := json.NewDecoder(r.Body)
		dec.DisallowUnknownFields()
		if err := dec.Decode(&t); err != nil {
			http.Error(w, "bad request", http.StatusBadRequest)
			return
		}
		t.ReceivedAt = time.Now()
		col.Record(t)
		w.WriteHeader(http.StatusCreated)
	}))
	mux.HandleFunc("/report", recoverHandler(func(w http.ResponseWriter, r *http.Request) {
		// Method check: without it, any verb is accepted, which makes
		// /report a nicer DNS-rebinding target (a browser pinned to a
		// rebind domain will happily issue GET/POST/etc. to 127.0.0.1).
		if r.Method != http.MethodGet {
			http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
			return
		}
		// Host allow-list: a rebound DNS name resolves to 127.0.0.1 but
		// arrives with the attacker-controlled Host header. Refuse
		// anything that is not a loopback literal.
		if !isLoopbackHost(r.Host) {
			http.Error(w, "forbidden host", http.StatusForbidden)
			return
		}
		w.Header().Set("Content-Type", "application/json")
		_ = json.NewEncoder(w).Encode(col.Report())
	}))

	// Bind to localhost. /report leaks architectural hot paths; /trace is
	// a cardinality-bomb target. In production, front this with your mesh
	// (SPIFFE mTLS + authz policy) or at minimum a shared-secret header
	// check — never expose on 0.0.0.0 without one of those.
	srv := &http.Server{Addr: "127.0.0.1:8080", Handler: mux, ReadHeaderTimeout: 5 * time.Second}
	ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
	defer stop()
	go func() {
		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
			log.Fatalf("listen: %v", err)
		}
	}()
	<-ctx.Done()
	shutCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()
	_ = srv.Shutdown(shutCtx)
}

Feed this from your existing HTTP middleware. The payoff: a ranked list of endpoints by traffic, their true component fan-out, and their latency profile. That is your extraction priority list.

Database Coupling

The code graph tells one story; the database tells the real one. Tables joined on hot paths belong in the same service. Tables with no foreign keys to the rest of the schema are free candidates. Information schema plus slow-query log is enough to get started:

// tools/dbcoupling/main.go — key queries only
// Tables and their FK edges (MySQL example; adapt for Postgres):
//
//   SELECT TABLE_NAME, REFERENCED_TABLE_NAME
//   FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE
//   WHERE TABLE_SCHEMA = DATABASE() AND REFERENCED_TABLE_NAME IS NOT NULL;
//
// Hot joined-table pairs from the performance schema:
//
//   SELECT DIGEST_TEXT, COUNT_STAR, AVG_TIMER_WAIT/1e9 AS avg_ms
//   FROM performance_schema.events_statements_summary_by_digest
//   WHERE DIGEST_TEXT LIKE '%JOIN%'
//   ORDER BY COUNT_STAR DESC LIMIT 100;
//
// Build an edge-weighted graph: nodes are tables, edge weight is query
// frequency. Run a community-detection algorithm (or eyeball it for small
// schemas) to find clusters. Each cluster is a candidate bounded context.

If two tables are joined in 90% of hot queries, splitting them across services means you just turned every page load into a network call. Either keep them together, or change the access pattern first.

The Strangler Fig, and Its Tradeoffs

The strangler fig is the only migration pattern I recommend, not because it’s elegant but because it’s reversible at every step. You put a routing layer in front of the monolith, peel off one endpoint at a time, and leave the monolith running until its traffic drops to zero.

Phase 1          Phase 2                    Phase 3
┌────────┐      ┌────────┐                  ┌────────┐
│Gateway │      │Gateway │                  │Gateway │
└───┬────┘      └─┬────┬─┘                  └─┬──┬──┬┘
    │             │    │                      │  │  │
    ▼             ▼    ▼                      ▼  ▼  ▼
┌────────┐   ┌─────┐ ┌────────┐           ┌──┐┌──┐┌──┐
│Monolith│   │Svc A│ │Monolith│           │A ││B ││C │
└────────┘   └─────┘ └────────┘           └──┘└──┘└──┘

The tradeoff to name: you run hybrid infrastructure for the duration. Every cross-cutting concern — auth, observability, rate limiting, feature flags — has to work in both the monolith and every new service. That is expensive. It’s still cheaper than a big-bang rewrite, but it is not free, and the cost scales with how long the migration takes.

When the pattern starts to hurt: if the monolith and the new services need to participate in the same database transaction, you’ve either drawn the wrong boundary or you’re not ready to extract yet. Back off.

The Gateway: Boring on Purpose

The routing layer is a reverse proxy with a priority-ordered route table. I want it boring because it’s on the request path for every user request and it cannot be the thing that breaks:

// gateway/main.go
package main

import (
	"context"
	"crypto/rand"
	"encoding/hex"
	"log"
	"net"
	"net/http"
	"net/http/httputil"
	"net/url"
	"strconv"
	"strings"
	"time"
)

type Route struct {
	Prefix      string
	Target      string // base URL of backend. Note: http:// here for readability;
	// in production, expect https:// or service-mesh mTLS (SPIFFE, Istio, Linkerd)
	// between the gateway and every backend. Plaintext east-west is not a default.
	StripPrefix bool
}

// Routes are evaluated in order. Most-specific first; monolith is the catch-all.
// Changing a route is a gateway redeploy — do not mutate this slice at runtime
// without wrapping it in an atomic.Pointer or RWMutex, or you'll race the
// request handler on every migration flip.
var routes = []Route{
	{Prefix: "/api/users", Target: "http://user-service:8080", StripPrefix: true},
	{Prefix: "/api/products", Target: "http://product-service:8080", StripPrefix: true},
	{Prefix: "/", Target: "http://monolith:8080"},
}

// Hop-by-hop and client-supplied headers we refuse to forward upstream.
// Clients love to spoof X-Forwarded-For for rate-limit bypass, or invent
// X-Admin / X-Tenant-Override to see what sticks. The gateway owns the
// forwarded identity; the gateway emits it. Inbound copies die here.
var stripInboundHeaders = []string{
	"X-Forwarded-For",
	"X-Real-Ip",
	"X-Forwarded-Host",
	"X-Forwarded-Proto",
	"X-Forwarded-Port",
	"Forwarded",
}

// stripInboundPrefixes: any header namespace reserved for internal use.
// Document and enforce. If you need a client-supplied signal, give it a
// public name and validate it; do not share a namespace with internals.
var stripInboundPrefixes = []string{"X-Internal-", "X-Gateway-"}

func main() {
	// One shared transport. http.DefaultTransport has no response/header
	// timeouts: a hung backend stalls the proxy goroutine forever, and the
	// gateway accumulates goroutines until it OOMs. One misbehaving backend
	// should degrade one route, not take down the gateway.
	transport := &http.Transport{
		ResponseHeaderTimeout: 10 * time.Second,
		IdleConnTimeout:       90 * time.Second,
		TLSHandshakeTimeout:   5 * time.Second,
		ExpectContinueTimeout: 1 * time.Second,
		MaxIdleConns:          100,
		MaxIdleConnsPerHost:   10,
	}

	proxies := make(map[string]*httputil.ReverseProxy, len(routes))
	for _, r := range routes {
		u, err := url.Parse(r.Target)
		if err != nil {
			log.Fatalf("bad target %q: %v", r.Target, err)
		}
		p := httputil.NewSingleHostReverseProxy(u)
		p.Transport = transport
		proxies[r.Target] = p
	}

	h := http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		start := time.Now()

		// Per-request upstream budget. Even with transport timeouts set,
		// a slow-dripping backend can hold a request open indefinitely.
		// Cancel the context after the budget and the proxy aborts cleanly.
		ctx, cancel := context.WithTimeout(r.Context(), 25*time.Second)
		defer cancel()
		r = r.WithContext(ctx)

		sanitizeInboundHeaders(r)
		setForwardedIdentity(r)

		for _, rt := range routes {
			if !matchPrefix(r.URL.Path, rt.Prefix) {
				continue
			}
			originalPath := r.URL.Path
			if rt.StripPrefix && rt.Prefix != "/" {
				r.URL.Path = strings.TrimPrefix(r.URL.Path, rt.Prefix)
				if r.URL.Path == "" {
					r.URL.Path = "/"
				}
			}
			r.Header.Set("X-Trace-ID", newTraceID())
			r.Header.Set("X-Original-Path", originalPath)
			proxies[rt.Target].ServeHTTP(w, r)
			log.Printf("route=%s target=%s upstream_path=%s dur=%s",
				strconv.Quote(originalPath), rt.Target, strconv.Quote(r.URL.Path), time.Since(start))
			return
		}
		http.NotFound(w, r)
	})

	// Timeouts on every leg, not just the header. ReadHeaderTimeout alone
	// leaves you exposed to slow-read bodies and slow-write responses; both
	// are trivial DoS vectors against a gateway that sits on every request.
	srv := &http.Server{
		Addr:              ":8080",
		Handler:           h,
		ReadHeaderTimeout: 5 * time.Second,
		ReadTimeout:       30 * time.Second,
		WriteTimeout:      30 * time.Second,
		IdleTimeout:       120 * time.Second,
	}
	log.Fatal(srv.ListenAndServe())
}

// sanitizeInboundHeaders removes any client-supplied routing/identity
// headers we own. The ReverseProxy would otherwise *append* the peer to
// whatever XFF the client sent, letting a caller prepend arbitrary IPs
// that downstream rate limiters then trust.
func sanitizeInboundHeaders(r *http.Request) {
	for _, h := range stripInboundHeaders {
		r.Header.Del(h)
	}
	for name := range r.Header {
		canon := http.CanonicalHeaderKey(name)
		for _, p := range stripInboundPrefixes {
			if strings.HasPrefix(canon, p) {
				r.Header.Del(name)
				break
			}
		}
	}
}

// setForwardedIdentity emits a single authoritative X-Forwarded-For
// derived from the direct peer. Downstream services rate-limit on this
// value; it must not be client-influenced.
func setForwardedIdentity(r *http.Request) {
	host, _, err := net.SplitHostPort(r.RemoteAddr)
	if err != nil {
		host = r.RemoteAddr
	}
	r.Header.Set("X-Forwarded-For", host)
	r.Header.Set("X-Real-Ip", host)
	if r.TLS != nil {
		r.Header.Set("X-Forwarded-Proto", "https")
	} else {
		r.Header.Set("X-Forwarded-Proto", "http")
	}
	if r.Host != "" {
		r.Header.Set("X-Forwarded-Host", r.Host)
	}
}

// matchPrefix only matches at a path-segment boundary. A naive HasPrefix
// check would route /api/users-admin/foo to user-service, because the
// string "/api/users-admin" starts with "/api/users". That is a strangler
// auth gap waiting to happen: one service's auth rules silently apply to
// another service's endpoints. Match on "==" or prefix+"/" only.
func matchPrefix(path, prefix string) bool {
	if prefix == "/" {
		return true
	}
	if path == prefix {
		return true
	}
	return strings.HasPrefix(path, prefix+"/")
}

// newTraceID returns a 128-bit random ID. Do not use timestamps as "unique"
// IDs — nanosecond collisions happen in practice under load, and clock skew
// across nodes makes them worse.
func newTraceID() string {
	var b [16]byte
	if _, err := rand.Read(b[:]); err != nil {
		// crypto/rand failure is fatal; fail loud, don't fall back to time.
		log.Fatalf("rand: %v", err)
	}
	return hex.EncodeToString(b[:])
}

The matchPrefix helper looks like busywork until you ship a route collision in production. During a strangler migration the gateway is the trust boundary: the monolith’s auth middleware no longer sees requests routed to the extracted service, and vice versa. A HasPrefix check that lets /api/users-admin leak into the /api/users route hands admin endpoints to a service that was never written to authorize them. Match on segment boundaries.

A nit that is not actually a nit: trace IDs from time.Now().UnixNano() collide. On a multi-core machine under load, consecutive calls can return the same nanosecond; across nodes, clock skew compounds it. Use crypto/rand or UUIDv4 — 128 bits of entropy, no hidden structure. UUIDv7 is tempting because it sorts by time, but that timestamp is reconstructible from the ID: if your trace IDs leak to a client or a log a third party can read, so does the exact moment the request was born. For internal-only traces that’s often fine; for anything cross-trust, drop the time-sortable variant and use v4. Collisions in trace IDs produce bug reports that are impossible to reproduce.

Where to put the gateway. Most teams reach for a cloud LB (ALB, GCLB) or a k8s Ingress and stop there. That works for path-based routing. You need a dedicated gateway (this code, or Kong/Envoy/Traefik) when you need request mutation, per-route auth, traffic mirroring, canary percentages, or outbound header injection. When to skip the gateway entirely: monoliths behind an existing Ingress with decent path-routing can often migrate without adding a new L7 hop. Fewer hops, fewer things to debug.

Drawing Boundaries: Listen to the Data, Not DDD Buzzwords

Domain-Driven Design gives you the vocabulary (bounded contexts, aggregates, anti-corruption layer), but the actual boundaries should come from the coupling data you collected. The DDD-flavored presentation is:

// domain/user.go
package domain

import "time"

type User struct {
	ID        string
	Email     string
	Name      string
	CreatedAt time.Time
	UpdatedAt time.Time
}

type UserRepository interface {
	FindByID(id string) (*User, error)
	FindByEmail(email string) (*User, error)
	Save(u *User) error
	Delete(id string) error
}

Each extracted service owns a domain type, its repository interface, and its persistence. Other services do not import this package — they call it over HTTP/gRPC. That rule is what keeps the extraction real.

Database per Service: Where Migrations Die

Extracting code is the easy part. Splitting the database is the hard part. The core problem: shared tables mean the monolith and the new service both believe they own the truth, and any inconsistency in that belief becomes silent data corruption.

There are four phases, and skipping any of them has cost people their jobs.

Phase 1: Shared Tables, Shared API

The monolith still owns the table. The new service calls the monolith for reads and writes. This is the safest starting point because it moves zero data. The tradeoff: a network hop on a previously-in-process call. Keep the new service’s reads on this path behind a cache with a short TTL.

Phase 2: CDC-Driven Read Replica

The new service gets its own copy of the data, kept in sync by change-data-capture from the monolith’s database. Debezium → Kafka → consumer in the new service is the canonical stack. Postgres logical replication works for simpler cases. The new service reads from its local copy; writes still go to the monolith.

// cdc/consumer.go
package cdc

import (
	"context"
	"encoding/json"
	"fmt"
)

type Event struct {
	Op    string          `json:"op"`    // c, u, d (Debezium envelope)
	Table string          `json:"table"`
	After json.RawMessage `json:"after"`
	Key   json.RawMessage `json:"key"`
}

type UserWriter interface {
	Upsert(ctx context.Context, raw json.RawMessage) error
	Delete(ctx context.Context, key json.RawMessage) error
}

// Apply is called for each message pulled off Kafka. The caller owns
// offset commits — commit ONLY after Apply returns nil, otherwise replay
// on restart will silently drop events and diverge the replica.
func Apply(ctx context.Context, w UserWriter, ev Event) error {
	if ev.Table != "users" {
		return nil
	}
	switch ev.Op {
	case "c", "u", "r": // create, update, snapshot
		return w.Upsert(ctx, ev.After)
	case "d":
		return w.Delete(ctx, ev.Key)
	default:
		// Return an error so the caller does NOT commit the offset. An
		// unrecognised op is usually a Debezium version bump or a new
		// envelope shape — silently skipping it and committing the offset
		// means the replica diverges and nobody notices until a support
		// ticket lands weeks later.
		return fmt.Errorf("cdc: unknown op %q for table %s", ev.Op, ev.Table)
	}
}

The reason I’m explicit about offset commits: the most common CDC bug is committing offsets before the downstream write succeeded. That turns a transient DB error into permanent replica drift.

The caller side of Apply is almost as important as Apply itself — the contract only holds if the consumer loop respects it. This is the shape I want every CDC consumer in the codebase to look like:

// Caller pattern — Apply succeeds, THEN the offset is committed. Any error
// (including the unknown-op case) short-circuits the commit and the message
// is redelivered on the next poll.
for msg := range stream {
    var ev cdc.Event
    if err := json.Unmarshal(msg.Value, &ev); err != nil {
        return err // don't commit, don't swallow
    }
    if err := cdc.Apply(ctx, writer, ev); err != nil {
        return err // don't commit
    }
    if err := stream.CommitOffset(msg); err != nil {
        return err
    }
}

Phase 3: Dual-Write — Don’t Do This

The tempting pattern: both the monolith and the new service write to both databases, in parallel, on every change. It looks symmetric and safe. It isn’t.

The failure mode is a partial write: service A succeeds, service B fails, you have no transaction spanning both, and the state is permanently inconsistent. You can paper over it with retries, but retries without idempotency keys create duplicates, and idempotency with cross-database state is just a distributed transaction with extra steps.

The alternative that works: the transactional outbox pattern. The monolith writes to its own DB and an outbox table in the same transaction. A separate relay process reads the outbox and publishes to Kafka. The new service consumes from Kafka. Each hop is idempotent on its own. No distributed transaction, no dual-write.

// outbox/relay.go
package outbox

import (
	"context"
	"database/sql"
	"log"
	"time"
)

type Publisher interface {
	Publish(ctx context.Context, topic string, key, payload []byte) error
}

// Relay is the single writer to Kafka. Keep it single-instance per shard
// to preserve ordering; use row-level locking if you need HA with a warm
// standby.
func Relay(ctx context.Context, db *sql.DB, pub Publisher) error {
	t := time.NewTicker(500 * time.Millisecond)
	defer t.Stop()
	for {
		select {
		case <-ctx.Done():
			return ctx.Err()
		case <-t.C:
			if err := drain(ctx, db, pub); err != nil {
				log.Printf("outbox: drain error: %v", err)
			}
		}
	}
}

func drain(ctx context.Context, db *sql.DB, pub Publisher) error {
	// FOR UPDATE SKIP LOCKED only holds locks for the life of the
	// transaction it runs in. Calling QueryContext directly on *sql.DB
	// uses an implicit per-statement transaction that commits as soon as
	// the rows are drained — the locks vanish, two workers claim the same
	// rows, and downstream consumers see duplicate events. Wrap the claim
	// and the UPDATE in one explicit tx so the locks survive until we
	// mark the rows published.
	tx, err := db.BeginTx(ctx, nil)
	if err != nil {
		return err
	}
	defer tx.Rollback()

	rows, err := tx.QueryContext(ctx, `
		SELECT id, topic, k, payload
		FROM outbox
		WHERE published_at IS NULL
		ORDER BY id
		LIMIT 100
		FOR UPDATE SKIP LOCKED`)
	if err != nil {
		return err
	}

	type row struct {
		id      int64
		topic   string
		key     []byte
		payload []byte
	}
	var batch []row
	for rows.Next() {
		var r row
		if err := rows.Scan(&r.id, &r.topic, &r.key, &r.payload); err != nil {
			rows.Close()
			return err
		}
		batch = append(batch, r)
	}
	if err := rows.Err(); err != nil {
		rows.Close()
		return err
	}
	rows.Close()

	for _, r := range batch {
		if err := pub.Publish(ctx, r.topic, r.key, r.payload); err != nil {
			return err
		}
		if _, err := tx.ExecContext(ctx,
			`UPDATE outbox SET published_at = NOW() WHERE id = $1`, r.id); err != nil {
			return err
		}
	}
	return tx.Commit()
}

This pattern is worth reading twice. The write to the monolith’s data and the write to the outbox share one transaction. Nothing is published until the relay runs. If the relay crashes mid-batch, the unpublished rows remain, and the next run picks them up. FOR UPDATE SKIP LOCKED lets you scale relay workers horizontally if you need to.

Phase 4: Cutover and Retire

Once the new service has been serving reads from its own DB for long enough to prove correctness (I usually want at least two weeks of shadow traffic with no divergence), flip writes over. The cutover is small and boring because all the hard work was in the earlier phases.

The rollback plan: keep the monolith’s write path intact for 30 days post-cutover. If the new service fails, you flip the gateway back and you’re serving from the monolith’s database, which was kept current via reverse CDC during the verification window. Without a rollback plan, cutover is a one-way door — don’t walk through it.

Service Template: Consistent, Boring, Shippable

Every new service I extract uses the same template. Consistency beats cleverness: onboarding is fast, debugging uses the same playbook, and the platform team can automate around it.

// cmd/user-service/main.go
package main

import (
	"context"
	"errors"
	"log"
	"net/http"
	"os/signal"
	"syscall"
	"time"
)

func main() {
	ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
	defer stop()

	app, err := newApp()
	if err != nil {
		log.Fatalf("init: %v", err)
	}
	defer app.Close()

	srv := &http.Server{
		Addr:              ":8080",
		Handler:           app.Router(),
		ReadHeaderTimeout: 5 * time.Second,
		ReadTimeout:       30 * time.Second,
		WriteTimeout:      30 * time.Second,
		IdleTimeout:       120 * time.Second,
	}

	go func() {
		log.Printf("listening on %s", srv.Addr)
		if err := srv.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
			log.Fatalf("serve: %v", err)
		}
	}()

	<-ctx.Done()
	log.Println("shutting down")
	shutCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
	defer cancel()
	if err := srv.Shutdown(shutCtx); err != nil {
		log.Printf("shutdown: %v", err)
	}
}

Two things non-negotiable: ReadHeaderTimeout (a missing one is a trivial slowloris vector), and a graceful shutdown longer than your longest in-flight request. In Kubernetes you also want a preStop hook with a small sleep so the kube-proxy has time to drop you from endpoints before the process exits.

Testing at Boundaries

Microservices need four test layers. Skip one and you’re gambling.

Unit tests with mocks for immediate feedback. These prove logic, nothing more.
Integration tests against a real DB (testcontainers or a local docker-compose). These catch migrations, SQL bugs, and repository contract violations.
Contract tests (Pact or similar) between every consumer/provider pair. The consumer declares its expectations; the provider’s CI verifies them. This is the highest-ROI test type in a microservices architecture — it replaces the compile-time signal you lost when you split the monolith. Without it, every provider deploy is a gamble against its consumers.
End-to-end tests covering a handful of critical business workflows. Small set, run against staging, allowed to be slow. Not the place for edge cases.

If contract tests feel like ceremony, consider: without them, there is nothing in your pipeline that says “this provider change will break order-service.” You find out in prod.

When NOT to Migrate

This is the section most migration posts skip, and it’s the most important one.

Do not migrate if:

The monolith works and you have fewer than 4-5 teams. Microservices solve a team-scaling problem, not a code problem. If your team fits in one room, a well-factored modular monolith is almost always simpler and cheaper. The overhead of service mesh, distributed tracing, inter-service debugging, and CI pipelines per service is real and permanent.
You don’t have real scaling pressure on distinct components. If everything scales together, you don’t need to scale services independently. You need better caches or a read replica.
You don’t have platform investment yet. Without a paved road (CI/CD per service, observability stack, gateway/mesh, artifact registry, testing strategy), every new service reinvents the wheel and the extraction cost compounds.
The team doesn’t have the operational maturity to own services 24/7. A service means on-call. If your org can’t staff that, you’re building a fragile distributed monolith.
Your pain is really “the code is messy.” That’s a refactoring problem. Distributing messy code produces distributed messy code, now with network hops.

Migrate if:

You have multiple teams stepping on each other’s deploys.
You have components with genuinely different scaling profiles (read-heavy search vs. write-heavy ingestion).
You have regulatory or security isolation requirements (PCI scope reduction is a real driver).
You’re on a runtime (Rails, Django monolith) that can’t scale past a point you’ve actually hit.

What I’d Actually Choose

If I’m advising a team of 3-15 engineers today, I rarely recommend full microservices. The sweet spot is one of these:

Option A: Modular Monolith. One deployable, clean internal package boundaries enforced by import restrictions, one database, feature flags for risky changes. You keep 95% of the benefits of microservices (testability, clear boundaries, independent deploys of code paths) with 10% of the operational cost. When you genuinely need to scale a module independently, extract it — just that one. This is the default I reach for.

Option B: Small Number of Services (3-6). When you have independent scaling needs or strict isolation requirements. Draw boundaries around the things that actually differ — not every domain concept. “Users” is not a service on its own unless user management is genuinely a separate team with separate scaling needs.

Option C: Full Microservices. Dozens of services, service mesh, platform team. Only when you have the team count and platform maturity to justify it. If you’re below 30 engineers, you almost certainly don’t.

The strangler fig works because it’s boring and reversible. You extract one endpoint, verify it, move on. No heroics, no big-bang, no weekend migrations. But the best migration is the one you didn’t need to do because you stayed at modular-monolith until the pain was real. Boring is what you want when you’re refactoring the system that runs your business — and “don’t migrate” is often the most boring, and best, choice.