MAUSA: Modular Architecture for Urgent and Stable Advancement

Routing protocol design for NASA's simulated SolarNet addressing interplanetary communication challenges: extreme latency, intermittent connectivity, mission-critical delivery.

What I Designed

Congestion Monitor Protocol:

Periodic heartbeats (1/sec) carrying queue occupancy for high/medium/low priority
Per-bundle ACKs with sequence numbers
Local neighbor view without global coordination
<0.008% bandwidth overhead (50 KB/s on 622 MB/s links)

Dynamic Fragmentation:

Fragment Size = F_max / (1 + α_p × c_avg)

F_max = 1 MB, α_p = {2.0, 1.0, 0.5} for {High, Med, Low} priority
High-priority/high-congestion → smaller fragments (faster queue traversal)
Low-priority/low-congestion → larger fragments (minimize header overhead)

Duplication Algorithm:

Flooding (High-Priority): BFS via TSUNAMI primitive, KILL-ACK to purge caches
Timeout-Driven (Routine): Duplicate after 2 consecutive timeouts to second-best neighbor

Architecture

Physical Network

| Node Type | Distance | Bandwidth | Storage | RTT | |-----------|----------|-----------|---------|-----| | LEOCom | 833 km | 2 GB/s | 0.5 TB | 10 ms | | GEOCom | 36,000 km | 1.2 GB/s | 0.5 TB | 250-300 ms | | Ground | Earth | 622 MB/s | 10 TB | - | | Relay | Moon/Mars | 500 MB/s | 0.5 TB | 2-20 min |

Storage Manager

Priority queues: High (1.2%), Medium (18.8%), Low (80%)

High-priority replicated to all neighbors
KILL-ACK removes copies after delivery
<6 GB overhead per node

Routing Manager

High: Flood to all neighbors
Medium: Route to lowest cong_i, fallback on timeout
Low: Single-path, lowest cong_i

Link failure detection via missed CMBs (>1s) or ACKs.

Technical Details

Bundle Protocol Extension

Extends NASA's Bundle Protocol:

Primary block (134B): IPv6 addresses, flags, timestamps
Flags: fragment, custody, ACK, priority (bits 0, 2, 3, 5, 6, 7-8)

Reliability Layer

TCP-inspired with priority scaling:

W ← W + α_p    (on ACK)
W ← W × β_p    (on timeout)
RTO = RTT_smooth + 4 × RTT_var

High-priority ramps faster, backs off less.

Results

Latency Overhead:

Fragmentation: 0.5s/hop for 100MB (0.6% of 20min transit)
Flooding: 24s for 10MB high-priority (within 10min SLA)

Failure Resilience:

45% latency reduction under stragglers (dynamic fragmentation)
95% delivery under 10% random failures (vs. 60% baseline)

Bandwidth:

CMBs: 0.008% overhead
KILL-ACKs: 200B per high-priority message

Design Trade-offs

Why Local (Not Global) Routing?

Earth-Mars RTT = 20min; global state always stale
150 nodes × full mesh = O(n²) overhead
Local decisions faster, more robust

Why Flooding for High-Priority?

Guarantees delivery even if 90% paths fail
No stale routing tables
Acceptable for <1% of traffic

Lessons Learned

Systems Design:

Local signals scale better than global coordination
Simplicity wins under extreme conditions
Throughput vs. latency, reliability vs. overhead-trade-offs everywhere

Networking:

Per-flow fairness wrong for collective operations
Adapt fragmentation to network state
Test failure modes: 10% outages, link drops, bursts

Evaluation:

"45% latency reduction" > "faster"
Tail latency matters more than average

Future Work

ns-3 simulation for realistic packet-level testing
ML-based congestion prediction
Dynamic priority adjustment by deadline
Multi-job fairness (currently single-job optimized)