One Button to Become the Hub — The Three-Layer Design of xGrid Promote
Blog/

One Button to Become the Hub — The Three-Layer Design of xGrid Promote

Promoting a Spoke to Hub doesn't require SSH, a laptop, or command-line skills. Scan a QR code for WiFi, open the PWA, press a button — three steps. Behind it is a Script → API → UI architecture that lets a nurse perform what would otherwise be a sysadmin operation.

The Misconception: "You Need SSH to Promote"

In the previous article, we described xGrid's Hub-Spoke topology. Its most powerful capability is that any Spoke can become a Hub — unplug, carry, power on, promote.

That article showed this command:

sudo xgrid-promote

A shell command over SSH. The natural follow-up question: "So I need a laptop, an SSH client, and command-line skills to do this?"

No.

SSH is the lowest layer — built for sysadmins and developers. But in a mass casualty incident, the person who needs to promote a Spoke is likely a nurse, an EMT, or an incident commander. They have an iPad, not a laptop.

So xGrid's promote is not just a shell script. It is a three-layer architecture.

Three Layers of Promote

LayerTriggerUserRequired Tools
Layer 1 — Scriptsudo xgrid-promoteSysadminSSH + laptop
Layer 2 — APIPOST /api/failover/promoteAdvanced operatorAny HTTP client
Layer 3 — PWA UI"Promote to Hub" button on red bannerAnyoneiPad / iPhone + browser

Layer 3 calls Layer 2. Layer 2 calls Layer 1. Each layer adds its own safeguards:

  • Layer 1 — 8-step atomic state machine with full rollback on any failure
  • Layer 2 — Role validation (already a Hub? rejected), state validation (transition in progress? rejected), 60-second timeout
  • Layer 3confirm() dialog to prevent accidental taps, blue pulsing animation to prevent double-clicks, success/failure toast feedback

Three layers means: the complexity of the operation is determined by the user, not imposed by the system. A sysadmin can SSH in for fine-grained control. A nurse just presses a button.

The QR Code Workflow — Zero to Promote

Every RPi ships with a laminated QR Code card, mounted next to the device. It contains two codes:

┌─────────────────────────────────────────┐
│  DNO-HC02 Connection Card                │
│                                          │
│  ┌──────────┐    ┌──────────┐           │
│  │ WiFi QR  │    │ MIRS QR  │           │
│  │          │    │          │           │
│  │  Scan to │    │  Scan to │           │
│  │  join    │    │  open    │           │
│  │  WiFi    │    │  system  │           │
│  └──────────┘    └──────────┘           │
│                                          │
│  WiFi: DNO-HC02                          │
│  MIRS: http://10.0.0.1:8000             │
└─────────────────────────────────────────┘

Step 1 — Scan the WiFi QR Code. iPhone and iPad cameras natively support WiFi QR codes (format: WIFI:T:WPA;S:{SSID};P:{password};;). Scanning triggers a "Join Network" prompt. One tap and the device connects to the RPi's hotspot. No manual SSID or password entry.

Step 2 — Scan the MIRS QR Code or type the URL. Safari opens the MIRS PWA. The Service Worker caches it immediately — subsequent visits work offline.

Step 3 — When the Hub goes offline, a red banner appears automatically. The PWA polls Hub status every 30 seconds. After 3 consecutive failures (90 seconds), a red banner slides in at the top of the screen:

Hub Offline (3 connection failures)Promote to Hub

Step 4 — Press "Promote to Hub." A confirmation dialog appears: "Are you sure you want to promote this node to Hub?" After confirmation, the banner turns blue with a pulsing animation: "Promoting to Hub, please wait..."

Within 60 seconds, the script completes its 8 steps. On success, the page auto-refreshes with a green toast: "Promotion complete! Role: HUB | Epoch: N".

No SSH. No laptop. No commands to memorize.

Automatic Detection, Not Manual Navigation

Notice the Layer 3 design choice: the button does not live in a menu for users to find. Instead, when the Hub actually goes offline, the banner appears and asks: "The Hub is down. Do you want to take over?"

This is deliberate.

Promote is not a routine operation. It is an emergency operation. You do not want someone pressing it out of curiosity — that would create a second Hub (split-brain). So the button only appears when it is needed: the Hub is genuinely offline, the current node is a Spoke, and there is a valid snapshot to load.

Even then, the first safeguard is the confirm() dialog. The second is the hub_epoch mechanism — if someone accidentally triggers a split-brain, the lower-epoch Hub will automatically demote when the two networks reconnect.

The design does not make mistakes impossible. It makes mistakes self-correcting.

When You Still Need SSH

Layer 3 covers 95% of field scenarios. But some cases still require SSH:

  • --skip-network mode: Skip hostapd/dnsmasq/static IP setup during localhost testing. The PWA button does not support this flag.
  • --dry-run mode: Preview what promote would do without executing. Useful for verifying snapshot availability.
  • Demote: Downgrade a Hub back to Spoke. This is an admin operation — not exposed via the auto-banner (though the Station Management PWA provides a manual interface).
  • Rollback diagnostics: When promote fails, inspect logs to determine the cause.

SSH is the fallback, not the primary path. Like manual flight controls on an aircraft — you want them available, but you design so they are rarely needed.

Can iPhone / iPad Do SSH?

Yes. The App Store has Termius, Prompt, and Blink Shell, all capable SSH clients.

But the point is: you do not need them. Layer 3 exists precisely to make SSH optional rather than mandatory.

In a mass casualty incident, you are not going to ask a nurse to install an SSH app. You hand them a laminated QR code card: scan once for WiFi, scan once for the system, press a button when it is time to promote.

This is the core philosophy of the three-layer design: wrap sysadmin capabilities in an interface anyone can operate.


Further reading: Unplug and Go — Hub-Spoke Topology, Role Promotion, and Five-Minute Failover · "Offline-First" Is Not "Offline-Tolerable" · The Walkaway Test — Can Your System Survive Disconnection?