Staged Training Report ✓ Complete

Run ID: multiheight_deconly_run7
Generated: 2026-03-07 14:39:42
Stages Completed: 1
Total Elapsed Time: 00:05:46

Configuration

No config defaults changed since last commit.

All Staged Training Parameters (72 parameters)
ParameterValue
total_samples10000000
batch_size8
stage_samples_multiplier100000000000
update_interval250
window_size100
num_best_models_to_keep1
sampling_modeLoss-weighted
loss_weight_temperature0.5
loss_weight_refresh_interval50
stop_on_divergenceTrue
divergence_gap0.002
divergence_ratio1.5
divergence_patience30
divergence_min_updates10
val_spike_threshold2.0
val_spike_window15
val_spike_frequency0.75
val_plateau_patience250
val_plateau_min_delta0.0001
custom_lr1e-05
disable_lr_scalingTrue
custom_warmup-1
lr_min_ratio0.001
resume_warmup_ratio0.05
plateau_factor0.8
plateau_patience15
preserve_optimizerFalse
preserve_schedulerTrue
samples_modeTrain additional samples
num_random_obs_to_visualize2
selected_frame_offset3
runs_per_stage1
serial_runsTrue
clean_old_checkpointsTrue
enable_baselineFalse
baseline_runs_per_stage1
run_idmultiheight_deconly_run7
seed42
enable_wandbTrue
wandb_projectdevelopmental-robot-movement
lr_sweep.lr_min1e-07
lr_sweep.lr_max0.01
lr_sweep.phase_a_num_candidates5
lr_sweep.phase_a_seeds1
lr_sweep.phase_a_time_budget_min3.0
lr_sweep.phase_a_survivor_count2
lr_sweep.phase_b_seeds3
lr_sweep.phase_b_time_budget_min10.0
lr_sweep.ranking_metricmedian_best_val
lr_sweep.min_samples_before_timeout1000
lr_sweep.min_evals_before_stop5
lr_sweep.save_sweep_stateTrue
plateau_sweep.enabledFalse
plateau_sweep.plateau_ema_alpha0.85
plateau_sweep.plateau_improvement_threshold0.0015
plateau_sweep.plateau_patience25
plateau_sweep.cooldown_updates5
plateau_sweep.max_sweeps_per_stage2
plateau_sweep.min_sweep_improvement0.0
initial_sweep_enabledFalse
stage_time_budget_min180
max_workersNone
model_typeNone
vae_typeNone
vae_checkpointNone
dit_embed_dimNone
dit_depthNone
dit_num_headsNone
dit_prediction_typeNone
dit_num_train_timestepsNone
dit_num_inference_stepsNone
dit_beta_scheduleNone
World Model Architecture (config.py)
ParameterValue
AUTOENCODER_LR0.0003
BATCH_SIZE1
CANVAS_HISTORY_SIZE3
DECODER_DEPTH12
DECODER_EMBED_DIM256
DECODER_NUM_HEADS8
DINOV2_VARIANTvitb14
DIT_BETA_END0.02
DIT_BETA_SCHEDULElinear
DIT_BETA_START0.0001
DIT_DEPTH12
DIT_EMBED_DIM256
DIT_LATENT_PATCH_SIZE2
DIT_NUM_HEADS4
DIT_NUM_INFERENCE_STEPS50
DIT_NUM_TRAIN_TIMESTEPS1000
DIT_PREDICTION_TYPEepsilon
DIT_TRAINING_MODEunconditional
ENCODER_DEPTH5
ENCODER_EMBED_DIM512
ENCODER_NUM_HEADS8
FOCAL_BETA5
FOCAL_LOSS_ALPHA1.0
FRAME_SIZE(224, 224)
GRADIO_UPDATE_INTERVAL1
LR_MIN_RATIO0.001
MODEL_TYPEdecoder_only
PATCH_SIZE16
PERCEPTUAL_LOSS_WEIGHT0.01
SEPARATOR_WIDTH32
VAE_CHECKPOINTNone
VAE_COMPRESSION_FACTOR8
VAE_LATENT_CHANNELS4
VAE_MODEvae
VAE_TYPEpretrained_sd
WARMUP_STEPS1000
WEIGHT_DECAY0.01
MASK_RATIO_MIN1
MASK_RATIO_MAX1
TRAIN_MASK_RATIO_MIN1.0
TRAIN_MASK_RATIO_MAX1.0

Timing Summary

Stage Plateau Sweeps Sweep Time Training Time Stage Total
Stage 1 0 00:00:00 00:05:10 00:05:10
TOTAL 0 00:00:00 00:05:10 00:05:10

Stage Results

Stage Best Loss Stop Reason Samples Trained Time Sweeps LR (Initial→Final)
Stage 1 0.008065 divergence 18,176 00:05:10 0 1.0e-05

Total Plateau Sweeps: 0

Stop Reason Breakdown

Loss Across Full Training Run

Best Checkpoint

Name: best_model_auto_session_so101_multiheight_part1_1345_multiheight_deconly_run7_00091392_cont_val_0.008065.pth
Stage: 1
Hybrid Loss (full session): 0.009178

Learning Rate Timeline with Plateau Sweeps

Stage Progression

Stage Orig Loss Train Loss Time Samples Stop Reason
1 ⭐ 0.009178 0.008065 00:05:10 18176 divergence

Hybrid Loss Over Original Session (per Stage)

Stage 1 (Best) - Hybrid Loss: 0.009178

Sample Counts

Cumulative Across All Stages

Per Stage

Stage 1 (Best) - Total Samples: 18,176

Best Checkpoint Inference

Selected Frame 3

Action 0

Action 1

Action 2

Random Observations

Observation 645

Action 0
Action 1
Action 2

Observation 704

Action 0
Action 1
Action 2