R&D : Designing a Reliable and Lightning Fast Instant Messaging Architecture

2025-08-24 13:37:25 - Rao Ashish Kumar

1. Problem Statement

Instant messaging is not like email or push notifications. It demands:

Broadcasting gives speed but is volatile, while persistence gives durability but adds latency.

The challenge is to combine both.

2. Core ArchitectureInfrastructure Components
  1. Database Persistence – Stores durable chat history.
  2. Outbox Table – Guarantees reliable event delivery and retries.
  3. Background Workers – Process unhandled outbox events and broadcast them.
  4. Real-Time Transport – WebSockets (or MQTT, SSE) for low-latency event delivery.
  5. Authentication & Authorization – Ensures only conversation participants can subscribe to channels.
3. Database SchemaConversations
CREATE TABLE conversations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);
Participants
CREATE TABLE participants (
    id BIGSERIAL PRIMARY KEY,
    conversation_id UUID REFERENCES conversations(id),
    user_id BIGINT REFERENCES users(id),
    role VARCHAR(20) DEFAULT 'member',
    created_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(conversation_id, user_id)
);
Messages
CREATE TABLE messages (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    conversation_id UUID REFERENCES conversations(id),
    sender_id BIGINT REFERENCES users(id),
    content TEXT NOT NULL,
    message_type VARCHAR(20) DEFAULT 'text',
    status VARCHAR(20) DEFAULT 'pending', -- pending, sent, delivered, failed
    outbox_id BIGINT REFERENCES outbox(id),
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX idx_messages_conversation ON messages(conversation_id, created_at);
Outbox
CREATE TABLE outbox (
    id BIGSERIAL PRIMARY KEY,
    event_type VARCHAR(50) NOT NULL,
    aggregate_type VARCHAR(50) NOT NULL,
    aggregate_id UUID NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    processed_at TIMESTAMP NULL
);

CREATE INDEX idx_outbox_unprocessed ON outbox(processed_at) WHERE processed_at IS NULL;
4. Message Lifecycle
  1. User sends a message
  1. Immediate WebSocket broadcast
  1. Outbox entry
  1. Background processing
  1. Persistence confirmation or failure
  1. Recipient acknowledgment
5. Typing Indicators (Ephemeral)

This reduces overhead and avoids unnecessary writes for transient signals.


6. Retry & Failure Handling7. Performance Optimizations10. Message Status LifecycleStatus Meaning

pending Message created on client, waiting confirmation

sending Outbox picked up, in processing

sent Successfully persisted and confirmed

delivered Recipient acknowledged receipt

failed Timeout or error; red retry icon shown

11. Benefits

12. Conclusion

This architecture balances speed and reliability:

It ensures that messages are delivered instantly, reliably, and in order — while providing clear recovery paths for failures.


More Posts