Design a chat messaging system like WhatsApp or Facebook Messenger
A real-time chat messaging system like WhatsApp or Facebook Messenger must deliver messages with low latency, support offline users, maintain per-conversation ordering, fan messages out to all of a recipient's devices, and store a queryable history — all at the scale of billions of messages per day. The foundational challenge is maintaining persistent connections for millions of concurrent users while keeping the system stateless enough to scale horizontally.
Requirements and Scale
Functional requirements: send and receive text messages (1:1 and group), offline delivery via push notifications, read receipts, typing indicators, presence (online/offline status), media messages (images, video, voice notes), and message history. Non-functional: end-to-end latency under 100ms for online users, at-least-once delivery, per-conversation ordering, and 99.99% availability. At WhatsApp scale: ~2 billion users, ~100 billion messages per day, tens of millions of concurrent WebSocket connections globally.
Connection Layer: WebSocket Gateways
Each mobile or web client maintains a persistent WebSocket connection to a stateless connection gateway node. WebSockets provide full-duplex, low-latency communication over a single TCP connection — far more efficient than HTTP polling. When a client connects, the gateway registers the mapping (user_id, device_id) → gateway_node_address in a Redis cluster so any component in the system can find which gateway holds a given user's connection. Gateways are stateless workers behind a load balancer; losing one gateway only disconnects its users, who immediately reconnect to another node.
Message Routing and Delivery
Online delivery — when Alice sends a message to Bob, her gateway publishes the message to a message routing layer (backed by Kafka or a purpose-built router). The router looks up Bob's active gateway in Redis and forwards the message to that node, which pushes it down Bob's WebSocket. The entire path takes under 50ms.
Offline delivery — if Bob is offline (no active WebSocket), the message is persisted to a per-user inbox in the message store and a push notification is dispatched via
APNs(Apple) orFCM(Google). When Bob comes back online, his client fetches the missed messages from the inbox.Group messaging — for a group of N members, the system fans out the message to each member's inbox individually. This keeps the read path identical for 1:1 and group chats. For very large groups (broadcast channels with millions of members), a tiered fan-out tree distributes the write across multiple worker nodes.
Multi-device sync — each user may have multiple active sessions (phone + tablet + desktop). The routing layer delivers the message to all of the user's active gateways simultaneously, ensuring every device stays in sync.
Message Storage
Cassandra is the canonical choice for message history. The data model partitions by conversation_id and clusters by message_timestamp (descending), making it efficient to fetch the most recent N messages for a conversation — the most common read pattern. Writes are append-only and naturally distributed across Cassandra nodes by conversation_id. Media files (images, voice notes, video) are stored in object storage (S3); the message record stores only the S3 URL. Clients upload media directly to S3 via signed URLs, bypassing the chat servers entirely.
Delivery Semantics and Ordering
Messages are delivered at-least-once: the sender retries until it receives an acknowledgment from the server, and idempotency keys prevent the server from storing duplicate messages. Per-conversation ordering is preserved by routing all messages within one conversation through a single Kafka partition (partitioned by conversation_id), ensuring that the ordering observed by Kafka consumers matches the ordering seen by recipients. Read receipts (single check mark → delivered, double check → read) and typing indicators travel over the same WebSocket path but are ephemeral — they bypass durable storage entirely.
Key design insight: The hardest part of a chat system is not storage — it is connection management and delivery routing. Registering which gateway holds each user's connection in Redis, routing messages to the right gateway node, and gracefully handling reconnections (including replaying missed messages) are the genuine engineering challenges. The message store is comparatively straightforward once you accept Cassandra's partition model.
Presence, End-to-End Encryption, and Reliability Extras
Presence — each connected client sends a heartbeat every 30 seconds; the gateway updates a Redis key with a short TTL. Presence queries read from Redis: if the TTL has expired, the user is offline.
End-to-end encryption — the Signal Protocol (used by WhatsApp and Signal) encrypts messages on the sending device before they reach the server. The server stores and routes ciphertext only; it never has access to plaintext. Key exchange is handled by a prekey bundle mechanism.
Reconnect storm protection — during a gateway node failure, thousands of clients reconnect simultaneously. Use exponential backoff with jitter on clients and rate-limit reconnect acceptance per gateway node.
Geo-distribution — deploy connection gateway clusters in multiple regions (US, EU, APAC); route users to the nearest cluster via anycast DNS. Cross-region message routing uses a dedicated backbone with guaranteed low latency.
Start by explaining the WebSocket gateway model and the Redis user-to-gateway registry — this is the core mechanism most candidates miss. Then cover offline delivery (push notifications + per-user inbox) as a distinct flow. The ordering guarantee via Kafka partition-by-conversation-id is a concrete detail that impresses.
Mention the Signal Protocol for E2E encryption even briefly — it shows product-level knowledge. Do not spend the entire answer on message storage schema; the connection and routing layers are the interesting problems here.