Unplug and Go — Hub-Spoke Designed for Disconnection
Blog/
||||||

Unplug and Go — Hub-Spoke Designed for Disconnection

Traditional Hub-Spoke carries one fatal assumption: the Hub is always online. But in a disaster zone, when the Hub goes down, patients cannot wait. xGrid designs for disconnection as the normal state — every node is a complete system, and any one of them can take over.

First, a Hub-Spoke You Already Know

You may not have heard the term "Hub-Spoke topology," but you use it every day.

Open any airline's route map. You will see a few enormous nodes — Taoyuan, Narita, Singapore Changi — radiating a dense web of routes out to dozens of smaller cities. The large nodes are Hubs. The small cities are Spokes.

Comparison of a point-to-point network (top) versus a hub-and-spoke network (bottom) — hub-spoke dramatically reduces the number of connections by routing through a central node

Point-to-point (top) vs Hub-Spoke (bottom): routing through a central node dramatically reduces the number of connections. Source: Wikipedia (public domain)

Why do airlines design it this way? Because if every city flew direct to every other city, 30 cities would need 435 routes. But if every city first flies to a Hub and then transfers, you need only 30 routes. The Hub is the coordinator, concentrating scheduling, transfers, and resource allocation.

The pattern is common in information systems too: one central node coordinates many edge nodes. Data is concentrated at the Hub; the Spokes handle frontline operations.

But traditional Hub-Spoke carries one fatal assumption: the Hub is always online.

Flights can wait for the hub airport to reopen. Packages can wait for the sorting center. But in a disaster zone, when the Hub goes down, patients cannot wait.

xGrid's Hub-Spoke makes two crucial conceptual changes: every Spoke is a complete system, not just a terminal. And — any Spoke can take over on-site and become the new Hub.

Disconnection Is Not a Fault — It Is the Expected State

Traditional systems treat a lost network connection as a "fault" — detect the disconnection, fire an alert, wait for recovery.

xGrid designs disconnection as "normal." Every device is a complete system — with its own resource system, its own database. Disconnection means only the temporary loss of the ability to synchronize, not the loss of the ability to operate.

This is the biggest difference between xGrid's Hub-Spoke and the airline version: a Spoke is not a terminal waiting for the Hub's instructions, but a complete system that can operate on its own. What the Hub provides is coordination, not capability.

Every Node Is a Complete System

This is the single most important idea in the whole design: every device leaves the factory as a complete medical station.

A node's role is not determined by its hardware. The same machine can be a Hub or a Spoke — the difference is the role it plays, not what part it is. This means you do not need to prepare "Hub-only machines" and "Spoke-only machines." What sits in the warehouse is not "two kinds of parts," but "a stack of identical spares." When any one fails, take a fresh one out of the box, plug it in, and carry on.

The smallest deployment needs just one machine, with no network infrastructure at all — one power source, one tablet, and you have a complete medical station. Need to scale up? Bring another machine over and connect it, and it becomes a new Spoke. One machine can stand up a forward medical station; a cluster of machines can stand up a medical center. The same design, scaled to fit.

Two Independent Networks — When One Drops, the Other Holds

An xGrid deployment is two separate, independent networks layered on top of each other: one handles operations (each machine provides its own wireless coverage, and a tablet connects to the nearest one to get to work), and one handles synchronization between stations.

The key is that these two layers are completely independent. The synchronization layer drops? Every station's tablets keep operating; only the synchronization between stations is temporarily lost. A machine's wireless coverage fails? Synchronization keeps running, and tablets in that area simply switch to nearby coverage.

One layer drops, the other holds. This is what "disconnection is the expected state" looks like when it is carried all the way into the network design.

Any Spoke Can Take Over

This is the most powerful capability in the entire design, and it has two faces.

Carrying one away. In a mass-casualty incident, the command center reports a second casualty collection point ten kilometers out, and a second medical station must open immediately. You walk to one of the Spokes, pack it into a backpack with its battery and tablet, and at the new location you plug in the power — and it becomes a complete, independently operating new medical station, carrying all the patient data the original Hub held until moments ago. No advance planning, no special machine.

Taking over in place. The Hub's hardware fails — its power supply burns out, or a falling ceiling crushes it. Every Spoke is continuously checking whether the Hub is still there. Once it is confirmed that the Hub really is offline, an operator designates one of the Spokes to take over. Because every Spoke holds a near-real-time backup, there is a clear upper bound on the patient data lost after takeover; and at the peak of an influx of casualties, the operator can manually push that bound even lower.

Promotion to take over is an all-or-nothing action — either it takes over completely, or it returns to its original state. There is no half-finished result where the promotion "gets stuck halfway."

Why a human's decision, not an automatic one? Because in a disconnected environment, you cannot be sure whether the Hub is truly broken or whether a cable has merely come loose. If two Spokes both take over automatically, you end up with two Hubs each admitting patients on their own — this is called split-brain, and merging the data afterward would be a disaster. So taking over must be a deliberate human decision.

Zombie Hubs and Split-Brain Protection — By Mechanism, Not Discipline

"Do not promote two machines at once" is a rule. But rules get broken in a disaster zone — what if someone presses the button one too many times in the chaos?

So discipline alone is not enough. xGrid's design lets a stale Hub step aside on its own: when an old Hub that had failed and was then plugged back into power boots up again, it discovers that a "newer-generation" Hub is already operating on-site — and rather than trying to seize authority back, it automatically steps down to become a Spoke. No one has to go switch it off.

Likewise, if a Spoke, on reconnecting, sees two contradictory "primary stations" at once, it does not blindly pick one — it stops and asks a person to confirm. Each deployment is also isolated from the others, so your Spoke will not accidentally connect to a neighboring deployment's Hub.

This mechanism cannot prevent split-brain one hundred percent — if two completely disconnected subgroups each take over a Hub, you really will end up with two independent Hubs. But it guarantees this: the moment the two subgroups reconnect to the network, the older one steps aside automatically. The question was never "how do we prevent split-brain forever," but "how do we correct it automatically, as fast as possible, after it happens."

Conflict Resolution: It Depends on the Nature of the Data

Two devices each modified the same record during a disconnection. When they reconnect, what do you do?

The answer depends on what the data is. What can be added up gets added up — the primary station consumed 5 gauze pads, a satellite station consumed 3, and the correct answer is that 8 were consumed, not "go with the more recent one" (which would lose one of the two sides). Records that cannot be altered (vital signs, handoffs) are kept on both sides.

Most important are the data where the cost of an error is too high to allow automatic resolution: blood products, controlled substances. A unit of blood marked "issued" by two stations at the same time is not a problem you can solve with a timestamp. The system flags it as a conflict and waits for the responsible person to verify it personally.

Treating "human judgment" as the correct answer for certain situations, rather than as a flaw to be eliminated — this is the crucial dividing line when designing for high-risk environments.

Design Philosophy: Built for Disconnection

Most systems are designed on the premise that "the network is reliable," and then add exception handling for the unreliable cases.

xGrid is designed on the premise that "the network is unreliable," and then optimizes for the reliable cases.

This inversion leads to entirely different design decisions:

  • Every node is a complete system (not a terminal that can only display a screen)
  • Roles are determined by what a node plays, not by its hardware (no need for a "special Hub machine")
  • Synchronization is a periodic batch operation (not a real-time, always-on connection)
  • Conflict resolution is the default behavior (not exception handling)
  • Human judgment is the correct answer in certain situations (not a flaw to be eliminated)
  • Taking over is a deliberate human decision (because split-brain is more dangerous than waiting)
  • But a stale Hub steps aside automatically (because that is a fact, not a matter of discipline)

A kicked-out cable is not a fault. A smashed switch is not the apocalypse. A burned-out Hub is not the end.

They are merely the triggers for a reorganization of the topology.


Further reading: "Offline-First" Is Not "Offline-Usable" · ISBAR Is More Than a Handoff Format — When Oral Tradition Meets Structured Data