Scaling WebSocket Connections to 1 Million Users: Lessons from Building Wibe Chat
When we started building Wibe Chat, our prototype handled 100 connections beautifully. At 1,000, things got interesting. At 10,000, we rewrote our connection layer. At 100,000, we rewrote it again.
Scaling WebSocket connections is fundamentally different from scaling HTTP APIs. With HTTP, each request is independent. With WebSockets, each connection is stateful and persistent.
Here's what we learned building infrastructure that handles over a million concurrent connections.
The Challenge
A WebSocket connection is a persistent TCP socket. Unlike HTTP requests that complete in milliseconds, a WebSocket connection lives as long as the user's session.
This means each connection consumes server memory (20-50KB), uses a file descriptor (Linux defaults to 1,024 per process), requires regular heartbeat pings, and needs state management for routing messages to specific server instances.
At 10,000 connections, a single server handles it. At 100,000, you need multiple servers with coordination. At 1,000,000, you need a distributed system with edge nodes.
Our Architecture
Layer 1: Edge Nodes. Lightweight connection servers in 30+ locations. Each handles up to 50,000 connections. Their job: accept connections, handle heartbeats, route messages. Stateless for business logic — fast, cheap, replaceable.
Layer 2: Message Router. Validates messages, persists to database, looks up channel subscribers, determines which edge nodes hold those connections, fans out messages. Uses consistent hashing to distribute channels across router instances.
Layer 3: Storage. Distributed database optimized for time-series appends. In-memory cache for hot channels (95%+ cache hit rate) plus persistent storage for history.
Handling Reconnections
We spent more time on reconnection logic than any other feature. When a client reconnects, it presents a "last seen" message ID. The edge node fetches all messages since that ID and delivers them. For long offline periods, messages are batched into compressed chunks — the client renders the first batch immediately while the rest loads.
Numbers
- Concurrent connections: 1M+ routinely
- Message latency (p50): 47ms globally
- Message latency (p99): 120ms globally
- Connection success rate: 99.7% on first attempt
- Reconnection time: Median 1.2 seconds
- Memory per connection: 32KB average
- Peak throughput: 500K messages/second
- Uptime: 99.99% over 12 months
What We'd Do Differently
Start with edge nodes from day one. We initially ran everything in a single region. The migration to edge was painful.
Invest in observability early. Real-time systems fail subtly. A message delivered in 200ms instead of 50ms triggers no alert, but users notice.
Don't underestimate mobile networks. A huge percentage of users are on 3G/4G. Designing for high-latency, low-bandwidth from the start saves rewrites.
The Takeaway
Scaling WebSocket connections gets exponentially harder with each order of magnitude. 1K to 10K is a config change. 10K to 100K is an architecture change. 100K to 1M is a distributed systems problem.
If real-time messaging is a feature (not the product), use an SDK and save months of engineering.
Frequently Asked Questions
What programming language do you use for edge nodes?
How do you handle message ordering?
What database do you use for message storage?
Vishnu Raj
Software Engineer
Expert in real-time communication infrastructure and developer experiences.