Skip to main content

Federated Learning

Coming Q1 2026 — This feature is in development. Request early access to be notified when available.

Overview

Train machine learning models across distributed Earth and orbital infrastructure. Each node trains locally on its data, then synchronizes compressed gradients during ground station passes.

Key Components

FederatedClient

Local training client for Earth or orbital nodes

GradientAggregator

Central coordinator for gradient synchronization

CompressionConfig

Gradient compression settings (TopK + quantization)

Error Feedback

Lossless compression via error accumulation

Gradient Compression

Bandwidth between orbital and ground nodes is extremely limited. Raw gradient synchronization is infeasible for large models. Our compression pipeline achieves 100x reduction with minimal accuracy loss:

Compression Pipeline

1

Original Gradients (4.2 MB)

Raw gradient tensor from backpropagation, e.g., ∇ = [0.12, -0.08, 0.003, ...]
2

TopK Sparsification (42 KB)

Keep only top 1% of gradients by magnitude. Reduces size by 100x while preserving the most important updates.
3

8-bit Stochastic Quantization (10.5 KB)

Convert Float32 to Int8 with scale factor. Further 4x reduction with minimal precision loss.
4

Error Feedback

Accumulate dropped gradients for the next round. Guarantees eventual convergence despite aggressive compression.

Configuration

from rotastellar_distributed import CompressionConfig

# Standard compression (100x reduction)
compression = CompressionConfig(
    method="topk_quantized",
    k_ratio=0.01,           # Keep top 1%
    quantization_bits=8,    # 8-bit quantization
    error_feedback=True     # Accumulate errors
)

# Aggressive compression (200x reduction)
aggressive = CompressionConfig(
    method="topk_quantized",
    k_ratio=0.005,          # Keep top 0.5%
    quantization_bits=4,    # 4-bit quantization
    error_feedback=True
)

# Light compression (10x reduction)
light = CompressionConfig(
    method="topk",
    k_ratio=0.1,            # Keep top 10%
    error_feedback=True
)

Federated Client

The FederatedClient runs on each participating node (Earth or orbital):
from rotastellar_distributed import FederatedClient, CompressionConfig, CompressionMethod

# Initialize client
client = FederatedClient(
    node_id="orbital-3",
    node_type="orbital",        # "orbital" or "ground"
    compression=CompressionConfig(
        method=CompressionMethod.TOP_K_QUANTIZED,
        k_ratio=0.01,
        quantization_bits=8,
        error_feedback=True
    )
)

# Training loop
for epoch in range(num_epochs):
    for batch in dataloader:
        # Compute local gradients
        gradients = client.compute_gradients(model_params, batch)

        # Compress gradients for transmission
        compressed = client.compress(gradients)

        # Send to aggregator (implementation-specific)
        send_to_aggregator(compressed)

# Apply received global update
client.apply_update(global_update)

# Get compression statistics
stats = client.get_stats()
print(f"Compression ratio: {stats['compression_ratio']}x")

Gradient Aggregator

The GradientAggregator runs on a ground station or cloud, coordinating updates from all nodes:
from rotastellar_distributed import GradientAggregator

# Initialize aggregator
aggregator = GradientAggregator(
    api_key="rs_...",
    strategy="async_fedavg",    # Async Federated Averaging
    min_nodes=3,                # Wait for at least 3 nodes
    staleness_limit=5           # Accept updates up to 5 rounds old
)

# Register callback for incoming gradients
@aggregator.on_gradient_received
def handle_gradient(node_id, gradients, metadata):
    print(f"Received from {node_id}: {metadata['compression_ratio']}x compressed")

# Start aggregation loop
aggregator.start()

# Periodically get global model update
while training:
    if aggregator.has_new_update():
        global_update = aggregator.get_update()
        broadcast_to_nodes(global_update)

Aggregation Strategies

StrategyDescriptionBest For
sync_fedavgWait for all nodes before aggregatingReliable connectivity
async_fedavgAggregate as updates arriveIntermittent connectivity
weighted_fedavgWeight by dataset sizeHeterogeneous data
momentum_fedavgAdd momentum to updatesFaster convergence

Handling Connectivity

Orbital nodes experience intermittent connectivity. The client handles this automatically:
# The FederatedClient handles:
# 1. Gradient compression for bandwidth-limited links
# 2. Error feedback for lossless compression over time
# 3. Statistics tracking for monitoring

from rotastellar_distributed import FederatedClient, CompressionConfig, CompressionMethod

client = FederatedClient(
    node_id="orbital-3",
    node_type="orbital",
    compression=CompressionConfig(
        method=CompressionMethod.TOP_K_QUANTIZED,
        k_ratio=0.01,
        quantization_bits=8,
        error_feedback=True  # Accumulate dropped gradients
    )
)

Convergence Guarantees

Despite compression and async updates, training converges to the same solution as centralized training:
PropertyGuarantee
Compression lossUnder 0.5% final accuracy vs uncompressed
Staleness impactUnder 1% accuracy loss with staleness_limit=10
Error feedbackMathematically lossless over time
Convergence rate1.2-1.5x more rounds than centralized

Example: Training LLaMA-70B

from rotastellar_distributed import FederatedClient, CompressionConfig, CompressionMethod

# Configure compression for LLaMA-70B gradients
compression = CompressionConfig(
    method=CompressionMethod.TOP_K_QUANTIZED,
    k_ratio=0.01,           # Keep top 1%
    quantization_bits=8,    # 8-bit quantization
    error_feedback=True     # Lossless over time
)

# Initialize orbital node client
client = FederatedClient(
    node_id="orbital-3",
    node_type="orbital",
    compression=compression
)

# Training loop with gradient compression
gradients = client.compute_gradients(model_params, local_batch)
compressed = client.compress(gradients)

# Check compression stats
stats = client.get_stats()
print(f"Compression ratio: {stats['compression_ratio']}x")
print(f"Total compressed: {stats['total_compressed']}")

# Training metrics for 8-node setup:
# - 100x gradient compression
# - ~40% energy savings vs all-terrestrial
# - +18% training time vs centralized

Next Steps