Building a Real-Time Stock Trading Engine
Low-latency order matching engine handling 1M+ orders/day with deterministic execution.
A trading engine is one of the most demanding systems you can build. It must match buy and sell orders in microseconds, maintain perfect consistency, and never lose an order. We built an order matching engine in Rust that handles 1M+ orders daily with deterministic execution and P99 latency under 50 microseconds.
Here’s the architecture and the decisions that mattered.
The matching engine runs as a single-threaded event loop. This eliminates concurrency bugs and ensures deterministic ordering — the same sequence of inputs always produces the same sequence of outputs.
struct TradingEngine { order_books: HashMap<Symbol, OrderBook>, event_log: EventLog, sequence: u64,}
impl TradingEngine { fn process_event(&mut self, event: OrderEvent) -> Vec<ExecutionReport> { self.sequence += 1;
let reports = match event { OrderEvent::NewOrder(order) => { self.order_books .entry(order.symbol) .or_insert_with(|| OrderBook::new(order.symbol)) .add_order(order, self.sequence) } OrderEvent::CancelOrder(cancel) => { self.order_books .get_mut(&cancel.symbol) .map(|book| book.cancel_order(cancel)) .unwrap_or_default() } };
// Log all events for replay self.event_log.append(self.sequence, &event, &reports);
reports }}Single-threaded might sound like a limitation, but a modern CPU core can process millions of simple operations per second. Our bottleneck is never CPU — it’s network I/O.
The order book maintains buy and sell orders with price-time priority:
use std::collections::BTreeMap;
struct OrderBook { symbol: Symbol, bids: BTreeMap<Price, PriceLevel>, // Descending price asks: BTreeMap<Price, PriceLevel>, // Ascending price order_index: HashMap<OrderId, (Price, Side)>,}
struct PriceLevel { orders: VecDeque<Order>, total_volume: u64,}
impl OrderBook { fn add_order(&mut self, order: Order, sequence: u64) -> Vec<ExecutionReport> { let mut reports = Vec::new();
// Try to match against the opposite side let opposite = if order.side == Side::Buy { &mut self.asks } else { &mut self.bids };
while order.remaining_volume() > 0 { let best_price = match order.side { Side::Buy => opposite.keys().next().copied(), Side::Sell => opposite.keys().next_back().copied(), };
let Some(best) = best_price else { break };
// Check price compatibility if order.side == Side::Buy && best > order.price { break; } if order.side == Side::Sell && best < order.price { break; }
// Execute trade let level = opposite.get_mut(&best).unwrap(); let matched = level.orders.front_mut().unwrap(); let fill_qty = order.remaining_volume().min(matched.remaining_volume());
matched.fill(fill_qty, sequence); reports.push(ExecutionReport::fill(order.id, matched.id, best, fill_qty));
if matched.is_filled() { level.orders.pop_front(); if level.orders.is_empty() { opposite.remove(&best); } } }
// Rest goes to the book if order.remaining_volume() > 0 { self.add_to_book(order); reports.push(ExecutionReport::accepted(order.id)); }
reports }}Using BTreeMap gives us O(log n) price lookups and automatic price ordering. For the highest-performance implementations, you’d use a fixed-size array with direct indexing, but BTreeMap is fast enough for most use cases.
The engine publishes real-time market data for every state change:
struct MarketDataPublisher { udp_socket: UdpSocket, sequence: u64,}
impl MarketDataPublisher { fn publish_trade(&mut self, trade: Trade) { self.sequence += 1;
let msg = TradeMessage { msg_type: MessageType::Trade, sequence: self.sequence, timestamp: Instant::now(), symbol: trade.symbol, price: trade.price, quantity: trade.quantity, aggressor: trade.aggressor_side, };
// Serialize to binary format (no JSON — too slow) let bytes = msg.to_bytes(); self.udp_socket.send_to(&bytes, "239.0.0.1:5001").unwrap(); }}Market data is published via UDP multicast. Subscribers (trading algorithms, dashboards, risk systems) listen to the multicast group and receive updates in real-time.
The engine maintains an append-only event log for crash recovery:
struct EventLog { file: File, buffer: Vec<u8>,}
impl EventLog { fn append(&mut self, sequence: u64, event: &OrderEvent, reports: &[ExecutionReport]) { let entry = LogEntry { sequence, timestamp: Instant::now(), event: event.clone(), reports: reports.to_vec(), };
self.buffer.extend(entry.to_bytes());
// Flush every 1000 entries or 10ms if self.buffer.len() > 65536 { self.flush(); } }
fn flush(&mut self) { self.file.write_all(&self.buffer).unwrap(); self.file.sync_all().unwrap(); // fsync self.buffer.clear(); }}On restart, the engine replays the event log to reconstruct the order book state. A full day’s activity (~1M events) replays in under 30 seconds.
Key optimizations for sub-50μs latency:
// 1. Pre-allocate memory (no allocations in the hot path)let mut report_pool = ObjectPool::new(10000, ExecutionReport::default);
// 2. Use stack allocation for small messages#[repr(C)]struct OrderMessage { msg_type: u8, symbol: [u8; 8], order_id: u64, price: u64, quantity: u32, side: u8,} // 32 bytes — fits in cache line
// 3. Busy-wait spinlock instead of mutex for shared statestruct SpinLock<T> { inner: UnsafeCell<T>, locked: AtomicBool,}
// 4. Batch network I/Ofn process_batch(&mut self, messages: &[OrderMessage]) -> Vec<ExecutionReport> { let mut all_reports = Vec::with_capacity(messages.len() * 2); for msg in messages { all_reports.extend(self.process_message(msg)); } all_reports}- Single-threaded is simpler and fast enough — don’t add concurrency until you’ve measured
- Deterministic execution is non-negotiable — replay must produce identical results
- Pre-allocate everything — heap allocation in the hot path adds unpredictable latency
- Binary protocols over JSON — serialization speed matters at microsecond scale
- Test with production traffic replay — synthetic tests don’t capture real-world patterns
Questions about trading systems? Find me on GitHub or Twitter.
Related Posts
Market Data Feed Processing
High-throughput market data feed handling with low latency.
Order Book Implementation
Designing and implementing a high-performance order book for trading systems.
Algorithmic Trading Backtesting
Building a robust backtesting framework for algorithmic trading strategies.