Skip to main content

Federated Learning

Coming Q1 2026 — This feature is in development. Request early access to be notified when available.

Overview

Train machine learning models across distributed Earth and orbital infrastructure. Each node trains locally on its data, then synchronizes compressed gradients during ground station passes.

Key Components

FederatedClient

Local training client for Earth or orbital nodes

GradientAggregator

Central coordinator for gradient synchronization

CompressionConfig

Gradient compression settings (TopK + quantization)

Error Feedback

Lossless compression via error accumulation

Gradient Compression

Bandwidth between orbital and ground nodes is extremely limited. Raw gradient synchronization is infeasible for large models. Our compression pipeline achieves 100x reduction with minimal accuracy loss:

Compression Pipeline

1

Original Gradients (4.2 MB)

Raw gradient tensor from backpropagation, e.g., ∇ = [0.12, -0.08, 0.003, ...]
2

TopK Sparsification (42 KB)

Keep only top 1% of gradients by magnitude. Reduces size by 100x while preserving the most important updates.
3

8-bit Stochastic Quantization (10.5 KB)

Convert Float32 to Int8 with scale factor. Further 4x reduction with minimal precision loss.
4

Error Feedback

Accumulate dropped gradients for the next round. Guarantees eventual convergence despite aggressive compression.

Configuration

from rotastellar_distributed import CompressionConfig

# Standard compression (100x reduction)
compression = CompressionConfig(
    method="topk_quantized",
    k_ratio=0.01,           # Keep top 1%
    quantization_bits=8,    # 8-bit quantization
    error_feedback=True     # Accumulate errors
)

# Aggressive compression (200x reduction)
aggressive = CompressionConfig(
    method="topk_quantized",
    k_ratio=0.005,          # Keep top 0.5%
    quantization_bits=4,    # 4-bit quantization
    error_feedback=True
)

# Light compression (10x reduction)
light = CompressionConfig(
    method="topk",
    k_ratio=0.1,            # Keep top 10%
    error_feedback=True
)

Federated Client

The FederatedClient runs on each participating node (Earth or orbital):
from rotastellar_distributed import FederatedClient, CompressionConfig

# Initialize client
client = FederatedClient(
    api_key="rs_...",
    node_id="orbital-3",
    node_type="orbital",        # "orbital" or "ground"
    compression=CompressionConfig(
        method="topk_quantized",
        k_ratio=0.01,
        quantization_bits=8
    )
)

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        # Local forward/backward pass
        gradients = client.train_step(model, batch)

        # Compress gradients
        compressed = client.compress(gradients)

        # Queue for sync (happens during ground pass)
        client.queue_sync(compressed, priority="normal")

# Force sync during ground station pass
client.sync_now()

# Get updated global model
global_weights = client.get_global_model()
model.load_state_dict(global_weights)

Gradient Aggregator

The GradientAggregator runs on a ground station or cloud, coordinating updates from all nodes:
from rotastellar_distributed import GradientAggregator

# Initialize aggregator
aggregator = GradientAggregator(
    api_key="rs_...",
    strategy="async_fedavg",    # Async Federated Averaging
    min_nodes=3,                # Wait for at least 3 nodes
    staleness_limit=5           # Accept updates up to 5 rounds old
)

# Register callback for incoming gradients
@aggregator.on_gradient_received
def handle_gradient(node_id, gradients, metadata):
    print(f"Received from {node_id}: {metadata['compression_ratio']}x compressed")

# Start aggregation loop
aggregator.start()

# Periodically get global model update
while training:
    if aggregator.has_new_update():
        global_update = aggregator.get_update()
        broadcast_to_nodes(global_update)

Aggregation Strategies

StrategyDescriptionBest For
sync_fedavgWait for all nodes before aggregatingReliable connectivity
async_fedavgAggregate as updates arriveIntermittent connectivity
weighted_fedavgWeight by dataset sizeHeterogeneous data
momentum_fedavgAdd momentum to updatesFaster convergence

Handling Connectivity

Orbital nodes experience intermittent connectivity. The client handles this automatically:
# Client automatically:
# 1. Buffers gradients during no-contact periods
# 2. Syncs when ground station pass begins
# 3. Prioritizes critical updates
# 4. Resumes interrupted transfers

client = FederatedClient(
    api_key="rs_...",
    node_id="orbital-3",
    connectivity={
        "buffer_size_mb": 500,      # Local gradient buffer
        "auto_sync": True,          # Sync on pass detection
        "resume_partial": True,     # Resume interrupted transfers
        "priority_queue": True      # Priority-based queuing
    }
)

Convergence Guarantees

Despite compression and async updates, training converges to the same solution as centralized training:
PropertyGuarantee
Compression lossUnder 0.5% final accuracy vs uncompressed
Staleness impactUnder 1% accuracy loss with staleness_limit=10
Error feedbackMathematically lossless over time
Convergence rate1.2-1.5x more rounds than centralized

Example: Training LLaMA-70B

from rotastellar_distributed import FederatedClient, CompressionConfig, Topology

# Define topology
topology = Topology(
    ground_nodes=["ground-us-east", "ground-eu-west", "ground-asia"],
    orbital_nodes=["orbital-1", "orbital-2", "orbital-3", "orbital-4", "orbital-5"],
    aggregator="ground-us-east"
)

# Configure for LLaMA-70B
compression = CompressionConfig(
    method="topk_quantized",
    k_ratio=0.01,
    quantization_bits=8,
    error_feedback=True
)

# Each node trains on local data shard
client = FederatedClient(
    api_key="rs_...",
    node_id="orbital-3",
    topology=topology,
    compression=compression,
    model_config={
        "name": "llama-70b",
        "gradient_checkpointing": True,
        "mixed_precision": "bf16"
    }
)

# Training metrics
# - 8 nodes total (3 ground + 5 orbital)
# - 100x gradient compression
# - ~40% energy savings vs all-terrestrial
# - +18% training time vs centralized

Next Steps