Migrating from Monolith to Event-Driven Architecture

Lessons learned from decomposing a 2M-line monolithic Java application into event-driven microservices without downtime.

April 15, 2026 Updated on April 18, 2026 Engineering , Cloud 4 min read

Two years ago, our core platform was a single 2M-line Java monolith deployed as one massive WAR file. It took 45 minutes to build, 20 minutes to deploy, and required a 4-hour maintenance window for releases.

Today, it’s 23 microservices communicating through an event backbone, deploying independently, and handling 10x the traffic with a fraction of the operational overhead.

This is the story of how we did it — and what we’d do differently.

The monolith wasn’t always a problem. For the first five years, it served us well. But as the team grew from 10 to 80 engineers, the cracks appeared:

Merge conflicts were a daily occurrence
Build times grew from 5 to 45 minutes
Deployments required coordinating 12 teams
Database migrations needed approval from three VPs
On-call meant being paged for services you didn’t own

The final straw was a Black Friday incident where a memory leak in the reporting module took down checkout. In a monolith, everything shares the same JVM — one bad actor takes everyone down.

We chose the Strangler Fig pattern: gradually replace pieces of the monolith by routing traffic to new services, one bounded context at a time.

                    ┌─────────────┐
                    │   API       │
                    │   Gateway   │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │            │            │
       ┌──────▼──────┐ ┌──▼───┐ ┌──────▼──────┐
       │  Monolith   │ │ New  │ │    New      │
       │  (shrinking)│ │Svc A │ │   Svc B     │
       └─────────────┘ └──────┘ └─────────────┘

We used Domain-Driven Design to identify natural boundaries:

Bounded Context	Lines of Code	Team
User Management	180K	Identity
Order Processing	340K	Commerce
Payment Processing	220K	Payments
Inventory	150K	Supply Chain
Reporting	280K	Analytics
Notifications	90K	Engagement

Before extracting any services, we needed a communication layer. We chose Apache Kafka for:

Ordered, partitioned event streams
Replay capability for debugging and new service onboarding
Decoupled producer/consumer lifecycles

The key insight: events flow out of the monolith first. New services consume these events but don’t yet write back. This gives us a safe migration path.

// Monolith publishes events for every significant action
@EventListener
public void onOrderCreated(OrderCreatedEvent event) {
    kafkaTemplate.send("orders.created", event.getOrderId(), event);
}

@EventListener
public void onPaymentProcessed(PaymentProcessedEvent event) {
    kafkaTemplate.send("payments.processed", event.getPaymentId(), event);
}

We extracted services in order of independence — starting with services that had the fewest dependencies on other monolith components.

Notifications was the ideal first candidate:

It only reads data (sends emails/SMS based on events)
No other service depends on it synchronously
Failure is non-critical (queued retries)

// New notification service (Go)
func (s *Service) HandleOrderCreated(ctx context.Context, event OrderCreated) error {
    template, err := s.templates.Get("order_confirmation")
    if err != nil {
        return err
    }
    return s.emailer.Send(ctx, event.CustomerEmail, template, event)
}

The migration process for each service:

Dual-write — monolith continues handling the domain, new service consumes events
Shadow traffic — route copy of production traffic to new service, compare outputs
Cutover — switch API gateway routing to new service
Decommission — remove monolith code for that domain

The hardest part wasn’t the code — it was the shared database. Our monolith had 847 tables, all in one PostgreSQL instance.

We used the outbox pattern to safely decompose:

-- Outbox table in the monolith database
CREATE TABLE outbox_events (
    id UUID PRIMARY KEY,
    aggregate_type VARCHAR(50),
    aggregate_id VARCHAR(100),
    event_type VARCHAR(100),
    payload JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    processed BOOLEAN DEFAULT FALSE
);

A CDC (Change Data Capture) connector reads the outbox table and publishes to Kafka. This ensures exactly-once event delivery without modifying application code.

Metric	Before	After
Deploy frequency	Weekly	50+ per day
Deploy duration	4 hours	3 minutes
Build time	45 minutes	2 minutes
P99 latency	800ms	120ms
Team autonomy	Coordinated releases	Independent deploys
Incident blast radius	Entire platform	Single service

We spent the first six months flying blind. Every service extraction was a leap of faith. If we could redo it, we’d build comprehensive tracing and metrics before the first extraction.

We started with a complex protobuf schema that required a central team to approve changes. We’ve since moved to a simpler JSON schema with backward compatibility rules enforced by CI.

We waited too long to decompose the database. The shared database became a coordination bottleneck even after the code was split.

“Microservices” isn’t a goal — it’s a means to an end. We should have defined success metrics upfront: deploy frequency, lead time, MTTR, and team velocity.

Migration to event-driven microservices isn’t about technology — it’s about organizational scalability. The monolith wasn’t failing technically; it was failing as a coordination mechanism for 80 engineers.

The event backbone gave us the decoupling we needed. The strangler fig pattern gave us a safe migration path. But the real win was giving each team ownership of their services, their databases, and their deployment timelines.

Questions about the migration? Find me on GitHub or drop a comment below.

System Design: Real-Time Payment Processing at Scale

A deep dive into the architecture behind processing millions of payment transactions per day with sub-second latency and 99.99% availability.

system-design fintech architecture

Related Posts

System Design: Real-Time Payment Processing at Scale