Scaling WebSockets for high concurrency poses unique challenges. This analysis explores these challenges, discusses vertical and horizontal scaling limitations, and introduces an optimization strategy by:
Real-time interactions are at the heart of the modern web. And delivering them to small audiences is relatively easy, thanks to protocols such as WebSockets. There are 2 main types of communication:
Unidirectional Communication:
Bidirectional Communication:
But there is a challenge...
...and it is scaling your solution to handle Tens of Thousands of users/connections and more.
Scaling WebSockets is notably more complex than scaling HTTP due to their fundamentally different natures:
HTTP is stateless and straightforward. Each request is self-contained, making it easy to distribute across multiple servers using load balancers. HTTP serves the same data to every client and promptly forgets about them.
WebSockets, on the other hand, are stateful and persistent. Scaling them involves two main challenges:
Maintaining Persistent Connections: WebSockets rely on persistent connections between servers and potentially a vast number of clients. This requires careful management and resource allocation.
Data Synchronization: State often needs to be shared between clients/servers, creating a complex synchronization challenge. For instance, in a chat application, messages must be shared among all participants.
OS Limitations - File descriptors - Every TCP connection uses a file descriptor. Operating systems limit the number of file descriptors that can be opened at once, and each running process may also have a limit. When you have a large number of open TCP connections, you may run into these limits, that can cause new connections to be rejected or other issues.
Higher possibility for downtime - Unless you have a backup server that can handle operations and requests, you will need some considerable downtime to upgrade your machine.
Single point of failure - Having all your operations on a single server increases the risk of losing all your data if a hardware or software failure was to occur.
Resource limitations - There is a limitation to how much you can upgrade a machine. Every machine has its threshold for RAM, storage, and processing power. If you have just one WebSocket server, there’s a finite amount of resources you can add to it, which limits the number of concurrent connections your system can handle. Tech stack might not be able to handle ever more threads and RAM. Take NodeJS, for example. A single instance of the V8 JavaScript interpreter eventually hits an upper bound on how much RAM it can use. Not only that but NodeJS is single threaded, meaning that a single NodeJS process won’t take advantage of ever more CPU cores.
Scalability Costs - Vertical scaling can become expensive as you invest in high-end hardware. The costs may not be justified, especially when horizontal scaling provides a more cost-effective and fault-tolerant solution.
Resource Underutilization - Scaling vertically often results in underutilized resources during periods of low traffic. This inefficiency can be mitigated with horizontal scaling, which dynamically allocates resources as needed.
Most of the time horizontal scaling is very good if you want to manage the resources efficiently. Either start more instances of load comes up, or decrease instances in more quite situation in order to save money, but with websockets we have more integral problems.
Data Synchronization - When WebSocket connections rely on shared data or state (e.g., chat messages), ensuring data consistency across multiple servers can be complicated. Proper synchronization mechanisms are necessary. Could be Pub/Sub solution like GCP Pub/Sub but it cost money of course...
Session Management - Managing user sessions across multiple servers can be complex. Users may connect to different servers at different times, requiring a centralized session management solution.
Data integrity - (guaranteed ordering and exactly-once delivery) is crucial for some use cases. Once a WebSocket connection is re-established, the data stream must resume precisely where it left off.
Reconnection - You could implement a script to reconnect clients automatically. However, suppose reconnection attempts to occur immediately after the connection closes. If the server does not have enough capacity, it can put your system under more pressure and lead to cascading failures.
Load balancing strategy decisions - Sticky sessions - A sticky session is a load-balancing strategy where each user is “sticky” to a specific server. For example, if a user connects to server A, they will always connect to server A, even if another server has less load. - Unbalanced servers (if a lot of users disconnect from some of the servers and only one is left busy)
go tool
pprofnet/http
package as well as most used/renowned but lately deprecated gorrila/websocketpackage main
import (
"github.com/gorilla/websocket"
"log"
"net/http"
_ "net/http/pprof"
"sync/atomic"
"syscall"
)
var count int64
func ws(w http.ResponseWriter, r *http.Request) {
// Upgrade connection
upgrader := websocket.Upgrader{}
conn, err := upgrader.Upgrade(w, r, nil)
if err != nil {
return
}
n := atomic.AddInt64(&count, 1)
if n%100 == 0 {
log.Printf("Total number of connections: %v", n)
}
defer func() {
n := atomic.AddInt64(&count, -1)
if n%100 == 0 {
log.Printf("Total number of connections: %v", n)
}
conn.Close()
}()
// Read messages from socket
for {
_, msg, err := conn.ReadMessage()
if err != nil {
return
}
log.Printf("msg: %s", string(msg))
}
}
func main() {
// Increase resources limitations
var rLimit syscall.Rlimit
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
panic(err)
}
rLimit.Cur = rLimit.Max
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
panic(err)
}
// Enable pprof hooks
go func() {
if err := http.ListenAndServe("localhost:6060", nil); err != nil {
log.Fatalf("Pprof failed: %v", err)
}
}()
// every connection is on different goroutine as http works like that
http.HandleFunc("/", ws)
if err := http.ListenAndServe("localhost:9080", nil); err != nil {
log.Fatal(err)
}
}
Taking up to 300MB RAM for 16 000 connections
If you don’t know how WebSocket works, it should be mentioned that the client switches to the WebSocket protocol by means of a special HTTP mechanism called Upgrade. After the successful processing of an Upgrade request, the server and the client use the TCP connection to exchange binary WebSocket frames.
Can we make an equation about the analyses
Memory = conn * (goroutinesMem + net/http buffers + gorilla/websocket buffers)
Memory = conn * (goroutinesMem + 2*NewReader + NewWriter)
Memory = conn * (8KB + 8KB + 4KB)
Memory = 16000 * (8KB + 8KB + 4KB) = 320000KB = 320MB
What we have almost the same value as the test shown - 320MB. There will be always some difference due to other structs which are fixed or not important
What do you think will happen if this is shipped to the real world with 1 000 000 million of users
Memory = 1000000 * (8KB + 8KB + 4KB) = 20GB
It does not look good as the if we want to scale that would cost us a lot of money for sure
net/http
a lot of structures are copied on upgrade request which is overkill. Headers are located in http.Request
http.Request
contains a field with the same-name Header type that is unconditionally filled with all request headers by copying data from the connection to the values strings. Imagine how much extra data could be kept inside this field, for example for a large-size Cookie header.CPU cycles and memory bandwidth Initial Protocol: Let's say you have a web server that initially serves content using plain HTTP.
Upgrade Request: When a client wants to transition to a different protocol, such as WebSocket, it sends an "upgrade" request to the server. This request typically includes specific headers like Upgrade and Connection.
Zero-Copy: In a zero-copy upgrade, the server doesn't copy the data from the initial connection buffer to a new buffer for the upgraded protocol. Instead, it cleverly uses the existing connection buffer for the new protocol.
For example, if the server is transitioning from HTTP to WebSocket, it can use the same underlying socket connection and buffer to handle WebSocket frames without copying the data to a new buffer.
Efficiency: This technique is highly efficient because it avoids the need to duplicate data, which can be especially crucial when dealing with a large number of connections, as you mentioned in your question about a million WebSocket connections.
That is essential, and happily we have library which can do it for us gobwas
package main
import (
"github.com/gobwas/ws"
"github.com/gobwas/ws/wsutil"
"log"
"net/http"
_ "net/http/pprof"
"syscall"
)
func main() {
// Increase resources limitations
var rLimit syscall.Rlimit
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
panic(err)
}
rLimit.Cur = rLimit.Max
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
panic(err)
}
// Enable pprof hooks
go func() {
if err := http.ListenAndServe("localhost:6060", nil); err != nil {
log.Fatalf("pprof failed: %v", err)
}
}()
http.ListenAndServe(":9080", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
conn, _, _, err := ws.UpgradeHTTP(r, w)
if err != nil {
// handle error
}
go func() {
defer conn.Close()
for {
msg, op, err := wsutil.ReadClientData(conn)
if err != nil {
// handle error
}
log.Println(msg)
err = wsutil.WriteServerMessage(conn, op, msg)
if err != nil {
// handle error
}
}
}()
}))
}
We can remove the buffers from it
Memory = conn * (goroutinesMem)
Memory = 16000 * (8KB) = 128000KB = 128MB.
// 1 million connections
Memory = 1000000 * (8KB) = 8GB.
8GB... we can do more
Netpolling Is a technique used in network programming and event-driven systems to efficiently manage and monitor multiple network connections, such as sockets. It involves periodically checking the status of network connections for events or data readiness without relying on blocking calls or continuous polling loops.
You can find details here
We are going to use gnet for the purpose as well as gobwas
as we have already improved our performance with it. There are other frameworks which can be appropriate if you want to use netpoll, of course every one of them with some library specifics
package main
import (
"flag"
"fmt"
"log"
"net/http"
"sync/atomic"
"syscall"
"time"
_ "net/http/pprof"
"github.com/gobwas/ws/wsutil"
"github.com/panjf2000/gnet/v2"
"github.com/panjf2000/gnet/v2/pkg/logging"
)
type wsServer struct {
gnet.BuiltinEventEngine
addr string
multicore bool
eng gnet.Engine
connected int64
}
func (wss *wsServer) OnBoot(eng gnet.Engine) gnet.Action {
wss.eng = eng
logging.Infof("echo server with multi-core=%t is listening on %s", wss.multicore, wss.addr)
return gnet.None
}
func (wss *wsServer) OnOpen(c gnet.Conn) ([]byte, gnet.Action) {
c.SetContext(new(wsCodec))
atomic.AddInt64(&wss.connected, 1)
return nil, gnet.None
}
func (wss *wsServer) OnClose(c gnet.Conn, err error) (action gnet.Action) {
if err != nil {
logging.Warnf("error occurred on connection=%s, %v\n", c.RemoteAddr().String(), err)
}
atomic.AddInt64(&wss.connected, -1)
//logging.Infof("conn[%v] disconnected", c.RemoteAddr().String())
return gnet.None
}
func (wss *wsServer) OnTraffic(c gnet.Conn) (action gnet.Action) {
ws := c.Context().(*wsCodec)
if ws.readBufferBytes(c) == gnet.Close {
return gnet.Close
}
ok, action := ws.upgrade(c)
if !ok {
return
}
if ws.buf.Len() <= 0 {
return gnet.None
}
messages, err := ws.Decode(c)
if err != nil {
log.Println(err)
return gnet.Close
}
if messages == nil {
return
}
for _, message := range messages {
err = wsutil.WriteServerMessage(c, message.OpCode, message.Payload)
if err != nil {
log.Println(err)
return gnet.Close
}
}
return gnet.None
}
func (wss *wsServer) OnTick() (delay time.Duration, action gnet.Action) {
//logging.Infof("[connected-count=%v]", atomic.LoadInt64(&wss.connected))
return 3 * time.Second, gnet.None
}
type wsCodec struct {
upgraded bool
buf bytes.Buffer
wsMsgBuf wsMessageBuf
}
type wsMessageBuf struct {
firstHeader *ws.Header
curHeader *ws.Header
cachedBuf bytes.Buffer
}
type readWrite struct {
io.Reader
io.Writer
}
func (w *wsCodec) upgrade(c gnet.Conn) (ok bool, action gnet.Action) {
if w.upgraded {
ok = true
return
}
buf := &w.buf
tmpReader := bytes.NewReader(buf.Bytes())
oldLen := tmpReader.Len()
hs, err := ws.Upgrade(readWrite{tmpReader, c})
skipN := oldLen - tmpReader.Len()
if err != nil {
log.Println(err)
if err == io.EOF || err == io.ErrUnexpectedEOF {
return
}
buf.Next(skipN)
logging.Infof("conn[%v] [err=%v]", c.RemoteAddr().String(), err.Error())
action = gnet.Close
return
}
buf.Next(skipN)
logging.Infof("conn[%v] upgrade websocket protocol! Handshake: %v", c.RemoteAddr().String(), hs)
if err != nil {
logging.Infof("conn[%v] [err=%v]", c.RemoteAddr().String(), err.Error())
action = gnet.Close
return
}
ok = true
w.upgraded = true
return
}
func (w *wsCodec) readBufferBytes(c gnet.Conn) gnet.Action {
size := c.InboundBuffered()
buf := make([]byte, size, size)
read, err := c.Read(buf)
if err != nil {
logging.Infof("read err! %w", err)
return gnet.Close
}
if read < size {
//logging.Infof("read bytes len err! size: %d read: %d", size, read)
return gnet.Close
}
w.buf.Write(buf)
return gnet.None
}
func (w *wsCodec) Decode(c gnet.Conn) (outs []wsutil.Message, err error) {
messages, err := w.readWsMessages()
if err != nil {
logging.Infof("Error reading message! %v", err)
return nil, err
}
if messages == nil || len(messages) <= 0 {
return
}
for _, message := range messages {
if message.OpCode.IsControl() {
err = wsutil.HandleClientControlMessage(c, message)
if err != nil {
log.Println(err)
return
}
continue
}
if message.OpCode == ws.OpText || message.OpCode == ws.OpBinary {
outs = append(outs, message)
}
}
return
}
func (w *wsCodec) readWsMessages() (messages []wsutil.Message, err error) {
msgBuf := &w.wsMsgBuf
in := &w.buf
for {
if msgBuf.curHeader == nil {
if in.Len() < ws.MinHeaderSize {
return
}
var head ws.Header
if in.Len() >= ws.MaxHeaderSize {
head, err = ws.ReadHeader(in)
if err != nil {
log.Println(err)
return messages, err
}
} else {
tmpReader := bytes.NewReader(in.Bytes())
oldLen := tmpReader.Len()
head, err = ws.ReadHeader(tmpReader)
skipN := oldLen - tmpReader.Len()
if err != nil {
log.Println(err)
if err == io.EOF || err == io.ErrUnexpectedEOF {
return messages, nil
}
in.Next(skipN)
return nil, err
}
in.Next(skipN)
}
msgBuf.curHeader = &head
err = ws.WriteHeader(&msgBuf.cachedBuf, head)
if err != nil {
log.Println(err)
return nil, err
}
}
dataLen := (int)(msgBuf.curHeader.Length)
if dataLen > 0 {
if in.Len() >= dataLen {
_, err = io.CopyN(&msgBuf.cachedBuf, in, int64(dataLen))
if err != nil {
log.Println(err)
return
}
} else {
fmt.Println(in.Len(), dataLen)
logging.Infof("incomplete data")
return
}
}
if msgBuf.curHeader.Fin {
messages, err = wsutil.ReadClientMessage(&msgBuf.cachedBuf, messages)
if err != nil {
log.Println(err)
return nil, err
}
msgBuf.cachedBuf.Reset()
} else {
logging.Infof("The data is split into multiple frames")
}
msgBuf.curHeader = nil
}
}
func main() {
//Increase resources limitations
var rLimit syscall.Rlimit
if err := syscall.Getrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
panic(err)
}
rLimit.Cur = rLimit.Max
if err := syscall.Setrlimit(syscall.RLIMIT_NOFILE, &rLimit); err != nil {
panic(err)
}
// Enable pprof hooks
go func() {
if err := http.ListenAndServe("localhost:6060", nil); err != nil {
log.Fatalf("pprof failed: %v", err)
}
}()
var port int
var multicore bool
// Example command: go run main.go --port 8080 --multicore=true
flag.IntVar(&port, "port", 9080, "server port")
flag.BoolVar(&multicore, "multicore", true, "multicore")
flag.Parse()
wss := &wsServer{addr: fmt.Sprintf("tcp://localhost:%d", port), multicore: multicore}
fmt.Println("start serving")
// Start serving!
log.Println("server exits:", gnet.Run(wss, wss.addr, gnet.WithMulticore(multicore), gnet.WithReusePort(true), gnet.WithTicker(true)))
}
We can remove the goroutines or leave fixed as they are not part of the exponential equation
Memory = conn + (goroutinesMem * 20)
Memory = 16000KB + (20*8KB) = 160160 = ~20MB
We have started from 320MB for 16 000 connection, and now we are on around 20MB. That will definitely save money on a scale:
Memory = 1000000 conn = ~1GB + fixed goroutines memory (20*8KB)
WebSockets are crucial for real-time web applications, but scaling them is challenging due to their stateful and persistent nature.
Challenges in scaling WebSockets include maintaining persistent connections and synchronizing data between clients.
Vertical scaling has limitations like file descriptor constraints, downtime during upgrades, and single points of failure.
Horizontal scaling introduces complexities in data synchronization, session management, and data integrity.
Shared challenges include managing heartbeats for many concurrent connections.
To address these issues, efficient code using the gobwas/ws library optimizes memory usage and minimizes goroutines per connection, resulting in significant memory savings.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.