R&D : Designing a Reliable and Lightning Fast Instant Messaging Architecture
2025-08-24 13:37:25 - Rao Ashish Kumar
1. Problem Statement
Instant messaging is not like email or push notifications. It demands:
- Low latency — messages should appear instantly.
- Guaranteed persistence — no message should ever disappear.
- Correct ordering — messages must arrive in the same sequence as they were sent.
Broadcasting gives speed but is volatile, while persistence gives durability but adds latency.
The challenge is to combine both.
2. Core ArchitectureInfrastructure Components- Database Persistence – Stores durable chat history.
- Outbox Table – Guarantees reliable event delivery and retries.
- Background Workers – Process unhandled outbox events and broadcast them.
- Real-Time Transport – WebSockets (or MQTT, SSE) for low-latency event delivery.
- Authentication & Authorization – Ensures only conversation participants can subscribe to channels.
CREATE TABLE conversations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() );Participants
CREATE TABLE participants ( id BIGSERIAL PRIMARY KEY, conversation_id UUID REFERENCES conversations(id), user_id BIGINT REFERENCES users(id), role VARCHAR(20) DEFAULT 'member', created_at TIMESTAMP DEFAULT NOW(), UNIQUE(conversation_id, user_id) );Messages
CREATE TABLE messages ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), conversation_id UUID REFERENCES conversations(id), sender_id BIGINT REFERENCES users(id), content TEXT NOT NULL, message_type VARCHAR(20) DEFAULT 'text', status VARCHAR(20) DEFAULT 'pending', -- pending, sent, delivered, failed outbox_id BIGINT REFERENCES outbox(id), created_at TIMESTAMP DEFAULT NOW(), updated_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX idx_messages_conversation ON messages(conversation_id, created_at);Outbox
CREATE TABLE outbox ( id BIGSERIAL PRIMARY KEY, event_type VARCHAR(50) NOT NULL, aggregate_type VARCHAR(50) NOT NULL, aggregate_id UUID NOT NULL, payload JSONB NOT NULL, created_at TIMESTAMP DEFAULT NOW(), processed_at TIMESTAMP NULL ); CREATE INDEX idx_outbox_unprocessed ON outbox(processed_at) WHERE processed_at IS NULL;4. Message Lifecycle
- User sends a message
- Client creates a temporary ID and marks the message as pending in UI.
- Immediate WebSocket broadcast
- Server assigns a unique message ID and broadcasts to all participants.
- Ensures zero-latency visibility.
- Outbox entry
- Message is inserted into the Outbox for guaranteed persistence.
- Background processing
- Worker picks unprocessed outbox entry.
- Saves message to messages table.
- Marks message as sent.
- Broadcasts a persistence confirmation.
- Persistence confirmation or failure
- If persistence is confirmed → UI continues as normal.
- If confirmation does not arrive within a timeout window (e.g., 5–10s) → client shows a red retry icon.
- Recipient acknowledgment
- Recipient’s client sends delivery/read acknowledgments.
- Updates propagate to sender.
- Typing events are not stored in DB.
- When a user types, the client emits a typing event over WebSockets.
- Server broadcasts it to other participants in the conversation channel.
- Event expires naturally — clients auto-hide the typing indicator after a timeout (e.g., 2s).
This reduces overhead and avoids unnecessary writes for transient signals.
- Client-side retry:
- If persistence confirmation doesn’t appear in time, show a retry icon.
- User can tap retry to resend.
- Server-side retry:
- Outbox guarantees retries until processed successfully.
- Crash recovery:
- Since unprocessed messages remain in Outbox, they are replayed after restart.
- Pagination: Fetch messages in batches (e.g., 50).
- Indexes: Optimize conversation lookups.
- Compact payloads: Broadcast only minimal data.
- Single WebSocket connection: Multiplex all events (messages, typing, delivery).
- Worker scaling: Multiple workers can process outbox in parallel safely.
pending Message created on client, waiting confirmation
sending Outbox picked up, in processing
sent Successfully persisted and confirmed
delivered Recipient acknowledged receipt
failed Timeout or error; red retry icon shown
11. Benefits
- Low latency → WebSockets ensure instant updates.
- Guaranteed reliability → Outbox ensures no message loss.
- Transparent UX → Retry icon informs users of failures.
- Lightweight design → No unnecessary persistence for typing indicators.
- Scalability → Can handle millions of concurrent messages.
This architecture balances speed and reliability:
- WebSockets → instant broadcasting.
- Outbox Pattern → guaranteed persistence.
- Retry signals → transparent feedback for users.
- Direct typing broadcasts → reduce DB overhead.
It ensures that messages are delivered instantly, reliably, and in order — while providing clear recovery paths for failures.