Building Stable Web-to-Desktop Real-Time Connections
### TL;DR
We built FRUS Sidekick, a Mac app that syncs with our web dashboard over WebSocket. Learned a lot about what breaks, what doesn't, and how to make it production-stable. Here's what worked.
### The Problem
You've got a web app. You need a desktop companion that syncs in real-time. Copy-paste tokens, click buttons on the web, instant updates on desktop. No polling. No page refreshes.
We needed this for FRUS Foundation—a dashboard for managing AI agents, GitHub projects, and dev workflows. The desktop app (Sidekick) needed to:
- Authenticate with a one-time token from the web app
- Sync agent configurations instantly when you change them
- Show live connection status on both sides
- Auto-reconnect when the network hiccups
- Work in local dev (Docker) and production
WebSocket is the obvious choice. But stability is hard.
### Architecture: Dual-Channel Design
We use two parallel channels—WebSocket for real-time, HTTP heartbeat for reliability.
```plaintext
┌────────────────────────────────┐
│ Sidekick (Tauri Mac App) │
│ │
│ ┌───────────────────────────┐ │
│ │ WSClient │ │──┐
│ │ ws://localhost:3002 │ │ │ Real-time
│ │ • Auth, request/response │ │ │ (sync agents,
│ │ • Pub/sub channels │ │ │ push updates)
│ │ • Ping/pong keepalive │ │ │
│ └───────────────────────────┘ │ │
│ │ │
│ ┌───────────────────────────┐ │ │
│ │ HTTP Heartbeat (every 30s)│ │ │ Status check
│ │ POST /api/extension/ │ │ │ (authoritative
│ │ heartbeat │ │ │ "connected")
│ └───────────────────────────┘ │ │
└────────────────────────────────┘ │
│ │
▼ ▼
┌────────────────────────────────────┐
│ Foundation Web App (:3002) │
│ • server.js wraps Next.js │
│ • WebSocket + HTTP on same port │
│ • Token auth (SHA-256 lookup) │
│ • Rate limiting (60 msg/10s) │
└────────────────────────────────────┘
```
### Why Two Channels?
WebSocket is fast and bidirectional. You get instant request/response and server pushes.
HTTP heartbeat is the fallback. Production deployments might not support WebSocket upgrades (looking at you, reverse proxies). The heartbeat keeps the "connected" status accurate even if WS drops.
Web app sidebar checks `extension_last_seen` from the heartbeat. If it's fresh (< 60s), you're connected. If WS is down but HTTP works, you still see the green dot.
### Part 1: Auth Flow (One-Click Connect)
**Goal:** User clicks "Connect with Sidekick" in the web app → Mac app opens and authenticates.
#### 1. Generate a Sync Token
```javascript
// Foundation: app/api/installer/generate-token/route.ts
import crypto from 'crypto'
const token = crypto.randomBytes(32).toString('base64url')
const tokenHash = crypto.createHash('sha256').update(token).digest('hex')
await supabase.from('sync_tokens').insert({
user_id: session.user.id,
token_hash: tokenHash, // NEVER store plaintext
expires_at: new Date(Date.now() + 90 * 24 * 60 * 60 * 1000) // 90 days
})
return { token } // Send to user once
```
*Security*: Store SHA-256 hashes. Never log tokens. Rotate them if compromised.
#### 2. Deep Link to Mac App
```javascript
// Foundation: web UI button
```
Tauri handles `frus-sidekick://` URLs via the deep-link plugin. The Mac app receives the token and stores it (encrypted in app data dir).
#### 3. Authenticate Over WebSocket
```javascript
// Sidekick: src/lib/ws-client.ts
const ws = new WebSocket('ws://localhost:3002/api/ws/extension')
ws.onopen = () => {
ws.send(JSON.stringify({ type: 'auth', token }))
}
ws.onmessage = (event) => {
const msg = JSON.parse(event.data)
if (msg.type === 'authenticated') {
// Connected. Start heartbeat, subscribe to channels.
}
}
```
Server validates the token hash:
```javascript
// Foundation: lib/websocket-server.js
const tokenHash = crypto.createHash('sha256').update(token).digest('hex')
const { data } = await supabase
.from('sync_tokens')
.select('user_id')
.eq('token_hash', tokenHash)
.single()
if (data) {
ws.send(JSON.stringify({ type: 'authenticated', userId: data.user_id }))
}
```
*Timeout*: If no auth message arrives in 30 seconds, close the connection. Prevents zombie sockets.
### Part 2: Request/Response Pattern
WebSocket is bidirectional, but you still need request IDs to match responses.
```javascript
// Sidekick client
class WSClient {
private pendingRequests = new Map()
async request(action: string, payload?: any): Promise {
const id = crypto.randomUUID()
return new Promise((resolve, reject) => {
const timer = setTimeout(() => {
this.pendingRequests.delete(id)
reject(new Error(`Request timeout: ${action}`))
}, 10_000) // 10s timeout
this.pendingRequests.set(id, { resolve, reject, timer })
this.ws.send(JSON.stringify({
type: 'request',
id,
action,
payload
}))
})
}
private handleResponse(msg: ResponseMessage) {
const pending = this.pendingRequests.get(msg.requestId)
if (!pending) return
clearTimeout(pending.timer)
this.pendingRequests.delete(msg.requestId)
if (msg.success) {
pending.resolve(msg.data)
} else {
pending.reject(new Error(msg.error))
}
}
}
```
*Usage*:
```javascript
const result = await wsClient.request('get_agents')
// Server responds with { type: 'response', requestId, success: true, data: {...} }
```
*What we learned*:
- Always set a timeout. Networks drop packets.
- Clean up pending requests on disconnect, or they leak.
- Log every request/response for debugging (we use structured JSON logs).
### Part 3: Pub/Sub Channels
For real-time updates (e.g., "agent config changed on the server"), use channels:
```javascript
// Sidekick subscribes to 'agents' channel
wsClient.subscribe('agents', (data) => {
setAgents(data) // Update UI
})
// Foundation server pushes updates
function pushToChannel(channel: string, data: any) {
connections.forEach(conn => {
if (conn.subscriptions.has(channel)) {
conn.ws.send(JSON.stringify({ type: 'push', channel, data }))
}
})
}
```
When an agent is updated in the web UI, we call `pushToChannel('agents', updatedAgents)`. Sidekick gets the push instantly and updates the UI—no polling.
*Channel design*:
- Keep channels coarse-grained (agents, projects) not per-record (agent:abc123). Fewer subscriptions = less state.
- Send full datasets on push (not diffs). Simpler. You're sending JSON over a single connection—bandwidth isn't the bottleneck.
### Part 4: Stability—What Breaks and How to Fix It
#### Problem 1: Stale Connections (Dead Server Detection)
*Symptom*: WebSocket shows "connected" but server died. Client never reconnects.
*Fix*: Dual-layer keepalive.
```javascript
// Client sends application-level ping every 25s
setInterval(() => {
ws.send(JSON.stringify({ type: 'ping' }))
// If no pong in 10s, assume dead and force reconnect
pongTimer = setTimeout(() => {
ws.close(4001, 'Pong timeout')
}, 10_000)
}, 25_000)
ws.onmessage = (event) => {
const msg = JSON.parse(event.data)
if (msg.type === 'pong') {
clearTimeout(pongTimer) // Server is alive
}
}
```
Server echoes pong immediately. If the client doesn't get it, the connection is dead—force close and reconnect.
Layer 2: TCP keepalive at the WebSocket level (built-in). But application-level is more reliable because it traverses proxies and NATs.
#### Problem 2: Reconnect Storms
*Symptom*: 50 clients hit the server at the same time after a deploy. All try to reconnect at 1s, 2s, 4s intervals—synchronized thundering herd.
*Fix*: Jittered exponential backoff.
```javascript
let reconnectAttempts = 0
const baseDelay = 1000
const maxDelay = 30_000
function scheduleReconnect() {
const delay = Math.min(
baseDelay * Math.pow(2, reconnectAttempts) + Math.random() * 1000,
maxDelay
)
reconnectAttempts++
setTimeout(() => connect(), delay)
}
```
Add random jitter (± 1s) to spread out reconnections. Exponential backoff prevents hammering the server.
#### Problem 3: Stale Subscriptions After Reconnect
*Symptom*: Client reconnects but doesn't get pushes anymore. Forgot to re-subscribe.
*Fix*: Track subscriptions and replay them on reconnect.
```javascript
class WSClient {
private subscriptions = new Map void>()
subscribe(channel: string, handler: (data: any) => void) {
this.subscriptions.set(channel, handler)
if (this.ws?.readyState === WebSocket.OPEN) {
this.ws.send(JSON.stringify({ type: 'subscribe', channel }))
}
}
private onReconnect() {
// Re-authenticate
this.ws.send(JSON.stringify({ type: 'auth', token: this.token }))
// Wait for auth, then re-subscribe to all channels
this.once('authenticated', () => {
this.subscriptions.forEach((_, channel) => {
this.ws.send(JSON.stringify({ type: 'subscribe', channel }))
})
})
}
}
```
Every reconnect → re-auth → re-subscribe. Client state and server state stay in sync.
#### Problem 4: React Stale Closures
*Symptom*: WebSocket callbacks reference old state. You click "Connect", WebSocket is created in a useEffect, but the onmessage handler has a stale reference to wsClient.
*Fix*: Use functional state updates and refs.
```javascript
// BAD: stale closure
useEffect(() => {
const ws = new WSClient({ onPush: (channel, data) => {
if (channel === 'agents') {
setAgents(data) // If 'data' changes, this closure is stale
}
}})
setWsClient(ws)
}, [])
// GOOD: functional update
useEffect(() => {
const ws = new WSClient({ onPush: (channel, data) => {
if (channel === 'agents') {
setAgents(() => data) // Always fresh
}
}})
setWsClient(() => ws) // Functional setState avoids stale ref
return () => ws.disconnect()
}, [token, wsUrl])
```
Also: store the wsClient in a ref if you need to call methods imperatively (e.g., from a button click that doesn't depend on the effect deps).
#### Problem 5: Docker Dev Mode WebSocket Not Working
*Symptom*: npm run dev starts Next.js but WebSocket doesn't connect. Works in production.
*Fix*: Use a custom server that wraps Next.js and initializes the WebSocket server.
```javascript
// server.js
const { createServer } = require('http')
const next = require('next')
const { initWebSocketServer } = require('./lib/websocket-server')
const dev = process.env.NODE_ENV !== 'production'
const app = next({ dev })
const handle = app.getRequestHandler()
app.prepare().then(() => {
const server = createServer((req, res) => handle(req, res))
// Initialize WebSocket on the same HTTP server
initWebSocketServer(server)
server.listen(3002, '0.0.0.0', () => {
console.log('> Server + WebSocket ready on http://localhost:3002')
})
})
```
In `docker-compose.yml`:
```yaml
services:
webui:
build:
context: .
dockerfile: Dockerfile.dev
command: node server.js # NOT npm run dev
ports:
- "3002:3000"
```
*Dockerfile.dev*:
```dockerfile
FROM node:20-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
CMD ["node", "server.js"]
```
*Why this matters*: `next dev` uses its own HTTP server. You can't attach WebSocket to it. Custom server gives you control.
### Part 5: Performance and Rate Limiting
#### Rate Limiting (Prevent Abuse)
```javascript
// Foundation: lib/websocket-server.js
const rateLimits = new Map() // userId -> { count, resetAt }
function checkRateLimit(userId) {
const now = Date.now()
const limit = rateLimits.get(userId)
if (!limit || now > limit.resetAt) {
rateLimits.set(userId, { count: 1, resetAt: now + 10_000 })
return true
}
if (limit.count >= 60) { // 60 messages per 10 seconds
return false
}
limit.count++
return true
}
ws.on('message', (raw) => {
if (!checkRateLimit(userId)) {
ws.close(4429, 'Rate limit exceeded')
return
}
// ... handle message
})
```
60 messages / 10s is generous for normal use but blocks abuse. Adjust based on your traffic.
#### Connection Limits (One Session Per User)
```javascript
// Close existing connection when a new one authenticates
if (connections.has(userId)) {
const old = connections.get(userId)
old.ws.close(4002, 'New session started elsewhere')
}
connections.set(userId, { ws, subscriptions: new Set(), ... })
```
Prevents leaked connections from piling up if the client doesn't disconnect cleanly.
#### Memory Management
WebSocket servers can leak if you don't clean up:
```javascript
ws.on('close', () => {
connections.delete(userId)
clearInterval(heartbeatTimer)
clearTimeout(authTimeout)
// Remove from all subscriptions
})
```
Track every timer/interval/subscription and clear on close.
### Part 6: Testing Strategy
#### Unit Tests (Client)
We use Vitest with a mock WebSocket:
```javascript
// ws-client.test.ts
class MockWebSocket {
send = vi.fn()
close = vi.fn()
_receive(data: object) {
this.onmessage?.({ data: JSON.stringify(data) })
}
}
vi.stubGlobal('WebSocket', MockWebSocket)
it('authenticates on connect', async () => {
const client = new WSClient({ url: 'ws://test', token: 'abc' })
client.connect()
await vi.waitFor(() => {
expect(MockWebSocket.prototype.send).toHaveBeenCalledWith(
JSON.stringify({ type: 'auth', token: 'abc' })
)
})
})
```
*Test matrix*:
- ✅ Auth success/failure
- ✅ Request/response with timeout
- ✅ Subscribe/unsubscribe
- ✅ Reconnection with exponential backoff
- ✅ Ping/pong timeout triggers reconnect
- ✅ Auto re-subscribe on reconnect
38 tests in total. Run in < 1s.
#### Integration Tests (Manual for Now)
Real browser + real server:
1. Start Foundation locally (`docker compose up`)
2. Generate token in web UI
3. Open Sidekick, paste token
4. Verify "Live" indicator appears
5. Sync an agent → confirm it appears in `~/.cursor/agents/`
6. Kill server → verify reconnect overlay after 5 min
7. Restart server → verify auto-reconnect and re-subscribe
We don't have automated end-to-end tests yet—it's on the list. For now, manual QA catches regressions.
### Part 7: Debugging Tools
#### Client Debug Page
We added a `/debug` page in Sidekick with:
- WebSocket diagnostics: status, URL, reconnect count, active subscriptions, pending requests
- HTTP heartbeat status: last result, timestamp, server version
- Live event log: scrolling table of all WS messages (auth, heartbeat, request, response, push) with filters
- Quick actions: Force reconnect, ping server, copy diagnostics JSON
When users report "it's not connecting," we ask for a screenshot of the debug page. Instantly tells us:
- Is WS connected? (readyState)
- Is HTTP heartbeat working? (last result OK/error)
- Are subscriptions active? (list of channels)
- What's the last error? (event log)
#### Server Logs (Structured JSON)
```javascript
function log(level, event, data = {}) {
console.log(JSON.stringify({
ts: new Date().toISOString(),
level,
component: 'ws-server',
event,
...data
}))
}
log('info', 'client_connected', { userId, ip })
log('warn', 'rate_limit_hit', { userId, count: 60 })
log('error', 'auth_failed', { reason: 'token_expired' })
```
Pipe to a log aggregator (we use Fly.io logs → Axiom). Search by event or userId. No regex parsing of unstructured logs.
### Part 8: Production Gotchas
1. **Reverse Proxies and WebSocket Upgrades**
If you're behind nginx or Cloudflare, ensure WebSocket upgrades are allowed:
```nginx
location /api/ws/ {
proxy_pass http://localhost:3002;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_read_timeout 86400; # 24h keepalive
}
```
Cloudflare: Enable WebSocket support in dashboard (on by default for most plans).
2. **Timeouts and Idle Connections**
Default proxy timeouts are often 60s. If no data flows for 60s, the proxy kills the connection. Your app thinks it's connected but it's dead.
*Fix*: Send heartbeats every 30s (less than the timeout). Both ends think the connection is alive, and the proxy sees traffic.
3. **Binary Data**
We only send JSON. If you need binary (file uploads, images), use `ws.binaryType = 'arraybuffer'` and prefix with a message type byte. Or just use HTTP for large transfers.
4. **CORS and Auth Headers**
WebSocket doesn't support custom headers during the handshake (unlike HTTP). Send the token in the first message (`{ type: 'auth', token }`) instead of in a header.
### The Numbers
After 3 weeks of iteration:
| Metric | Result |
|----------------------------|---------------------------------------------------------------|
| Reconnect success rate | 99.7% (exponential backoff + jitter) |
| Median reconnect time | 1.2s |
| Avg message latency | 23ms (local), 85ms (prod) |
| Connection stability | 6h+ median uptime (only drops on network change or server restart) |
| Rate limit false positives | 0 (60 msg/10s is high enough) |
| Client test coverage | 38 tests, 100% of core paths |
| Server memory per connection | ~4KB (negligible) |
*Before the stability fixes (ping/pong, auto-resubscribe, exponential backoff)*:
- Reconnect success rate: ~60%
- Frequent "stuck" connections requiring full app restart
- No recovery from server restart
*After*: solid. Users don't notice reconnects.
### Key Takeaways
1. Use dual channels (WS + HTTP). WebSocket for speed, HTTP heartbeat for reliability. The heartbeat is authoritative for "connected" status.
2. Application-level ping/pong is mandatory. TCP keepalive isn't enough. Proxies and NATs drop idle connections. Send ping every 25s, expect pong in 10s, or force close and reconnect.
3. Exponential backoff + jitter. Base 1s, max 30s, add ±1s jitter. Prevents reconnect storms.
4. Re-subscribe on reconnect. Track subscriptions in client state. After re-auth, replay them all.
5. Request timeouts are non-negotiable. Default to 10s. Networks drop packets. Don't leak promises.
6. Rate limiting prevents abuse. 60 msg/10s is generous. Enforce it at the server to kill bad actors fast.
7. Test with mocks, debug with real data. Unit tests catch logic bugs. Manual QA catches integration bugs. Add a debug page—it saves hours.
8. Docker dev = custom server. `next dev` doesn't support WebSocket. Use `server.js` that wraps Next.js and initializes WS. Same code in dev and prod.
9. One session per user. Close old connections when a new one authenticates. Prevents leaks and confusing state.
10. Structured logs = debuggable. JSON logs with event names. Pipe to a log aggregator. Search by event or userId, not regex.
### Code to Copy
Full working examples:
- Client: `frus-sidekick/src/lib/ws-client.ts`
- Server: `frus-mega-foundation/webui/lib/websocket-server.js`
- Tests: `frus-sidekick/src/tests/ws-client.test.ts`
If you're building web-to-desktop sync, don't reinvent the wheel. WebSocket is the right tool. But stability takes work: keepalive, backoff, re-subscribe, timeouts, rate limits, and dual channels. Do those right and you get a connection users never think about—which is the point.
---
FRUS — Technical consultancy. We build production systems and write about what works.