Model Partitioning

Coming Q1 2026 — This feature is in development. Request early access to be notified when available.

Overview

Large neural networks can be split across Earth and orbital nodes to optimize for latency, bandwidth, or energy efficiency. The Model Partitioning system finds optimal cut points based on your infrastructure topology.

Key Components

PartitionOptimizer

Finds optimal model split points

ModelProfile

Analyzes model layer characteristics

LayerPlacement

Specifies ground vs orbital assignment

LatencyEstimator

Predicts end-to-end inference latency

Why Partition Models?

Scenario	Benefit
Large models, limited orbital memory	Run embedding layers on ground, attention in orbit
Latency-sensitive inference	Place early layers close to data source
Energy optimization	Compute-heavy layers on solar-powered orbital nodes
Bandwidth constraints	Minimize activation transfer between nodes

How It Works

Input Processing (Ground)

Input tokens are received at a ground node where the Embedding Layer (150M params) and Layers 0-10 (2.8B params) process the initial representation.

Activation Transfer (Uplink)

Compressed activations (12 MB) are transmitted to the orbital node via ground-to-space link.

Core Computation (Orbital)

Layers 11-60 (35B params) run on the orbital node - the most compute-intensive portion of the model, powered by solar energy.

Activation Transfer (Downlink)

Output activations (12 MB) are transmitted back to a ground node.

Output Generation (Ground)

Layers 61-80 (14B params) and the Output Head generate the final output tokens.

The partition optimizer automatically finds cut points that minimize total latency while respecting memory constraints on each node type.

Model Profile

First, analyze your model to understand layer characteristics:

from rotastellar_distributed import ModelProfile

# From PyTorch model
profile = ModelProfile.from_pytorch(model)

# From TensorFlow/Keras
profile = ModelProfile.from_tensorflow(model)

# From ONNX file
profile = ModelProfile.from_onnx("model.onnx")

# Inspect profile
print(f"Total parameters: {profile.total_params:,}")
print(f"Total layers: {profile.num_layers}")
print(f"Memory footprint: {profile.memory_mb:.1f} MB")

# Per-layer analysis
for layer in profile.layers:
    print(f"{layer.name}: {layer.params:,} params, "
          f"{layer.flops:,} FLOPs, "
          f"{layer.activation_size_mb:.1f} MB activations")

Partition Optimizer

Find optimal cut points based on your topology:

from rotastellar_distributed import PartitionOptimizer, ModelProfile

# Define your infrastructure
topology = {
    "ground_nodes": 2,
    "orbital_nodes": 4,
    "ground_flops": 100e12,       # 100 TFLOPS per ground node
    "orbital_flops": 20e12,       # 20 TFLOPS per orbital node
    "uplink_bandwidth": 100e6,    # 100 Mbps ground→orbit
    "downlink_bandwidth": 500e6,  # 500 Mbps orbit→ground
    "isl_bandwidth": 10e9,        # 10 Gbps inter-satellite
    "ground_orbit_latency_ms": 25 # LEO latency
}

# Profile your model
profile = ModelProfile.from_pytorch(model)

# Find optimal partition
optimizer = PartitionOptimizer(api_key="rs_...")
partition = optimizer.optimize(
    model=profile,
    topology=topology,
    objective="minimize_latency"  # or "minimize_bandwidth", "balance"
)

# View results
print(f"Optimal cut points: {partition.cut_points}")
print(f"Ground layers: {partition.ground_layers}")
print(f"Orbital layers: {partition.orbital_layers}")
print(f"Estimated latency: {partition.estimated_latency_ms:.1f} ms")
print(f"Activation transfer: {partition.transfer_size_mb:.1f} MB")

Optimization Objectives

Objective	Optimizes For	Best When
`minimize_latency`	End-to-end inference time	Real-time applications
`minimize_bandwidth`	Data transfer between nodes	Limited connectivity
`minimize_energy`	Total energy consumption	Battery/solar constraints
`balance`	Weighted combination	General purpose

Layer Placement

Manually specify or adjust layer placement:

from rotastellar_distributed import LayerPlacement

# Manual placement
placement = LayerPlacement()
placement.assign_ground(layers=[0, 1, 2, 3, 4])      # First 5 layers
placement.assign_orbital(layers=range(5, 75))         # Middle layers
placement.assign_ground(layers=[75, 76, 77, 78, 79])  # Last 5 layers

# Validate placement
validation = placement.validate(profile, topology)
if not validation.is_valid:
    print(f"Issues: {validation.issues}")

# Or refine optimizer result
partition = optimizer.optimize(model=profile, topology=topology)
partition.move_layer(15, to="ground")  # Manual adjustment
partition.recalculate()

Latency Estimation

Predict inference latency for a given partition:

from rotastellar_distributed import LatencyEstimator

estimator = LatencyEstimator(topology=topology)

# Estimate for a partition
estimate = estimator.estimate(partition)

print(f"Total latency: {estimate.total_ms:.1f} ms")
print(f"  Ground compute: {estimate.ground_compute_ms:.1f} ms")
print(f"  Orbital compute: {estimate.orbital_compute_ms:.1f} ms")
print(f"  Uplink transfer: {estimate.uplink_ms:.1f} ms")
print(f"  Downlink transfer: {estimate.downlink_ms:.1f} ms")
print(f"  Propagation: {estimate.propagation_ms:.1f} ms")

# Breakdown by layer
for layer_est in estimate.by_layer:
    print(f"  {layer_est.name}: {layer_est.total_ms:.1f} ms on {layer_est.node}")

Example: LLaMA-70B Partitioning

from rotastellar_distributed import PartitionOptimizer, ModelProfile

# LLaMA-70B architecture
profile = ModelProfile.from_pytorch(llama_70b)
# 80 transformer layers, ~70B parameters

topology = {
    "ground_nodes": 3,
    "orbital_nodes": 5,
    "ground_flops": 200e12,      # A100 equivalent
    "orbital_flops": 50e12,      # Space-qualified GPU
    "uplink_bandwidth": 200e6,
    "downlink_bandwidth": 1e9,
    "isl_bandwidth": 25e9
}

partition = optimizer.optimize(
    model=profile,
    topology=topology,
    objective="minimize_latency"
)

# Result for LLaMA-70B:
# - Layers 0-8: Ground (embeddings + early attention)
# - Layers 9-72: Orbital (bulk computation)
# - Layers 73-79 + head: Ground (final layers)
# - Activation transfer: 24 MB per inference
# - Estimated latency: 180 ms (vs 400 ms all-ground)

Getting Started

Planning Tools

Orbital Intelligence

CAE

Operator Agent

Mission Control

Orbital Runtime

Orbital Sim

Distributed Compute

SDKs

Resources

Model Partitioning

Model Partitioning

Overview

Key Components

PartitionOptimizer

ModelProfile

LayerPlacement

LatencyEstimator

Why Partition Models?

How It Works

Model Profile

Partition Optimizer

Optimization Objectives

Layer Placement

Latency Estimation

Example: LLaMA-70B Partitioning

Next Steps

Sync Scheduler

Space Mesh

Getting Started

Planning Tools

Orbital Intelligence

CAE

Operator Agent

Mission Control

Orbital Runtime

Orbital Sim

Distributed Compute

SDKs

Resources

​Model Partitioning

​Overview

​Key Components

PartitionOptimizer

ModelProfile

LayerPlacement

LatencyEstimator

​Why Partition Models?

​How It Works

​Model Profile

​Partition Optimizer

​Optimization Objectives

​Layer Placement

​Latency Estimation

​Example: LLaMA-70B Partitioning

​Next Steps

Sync Scheduler

Space Mesh

Model Partitioning

Overview

Key Components

Why Partition Models?

How It Works

Model Profile

Partition Optimizer

Optimization Objectives

Layer Placement

Latency Estimation

Example: LLaMA-70B Partitioning

Next Steps