Eval Report: mae - 20260317

ITERATION 3: Switch to MAE model (no autoregressive compounding) ================================================================ Changes from Iteration 2: - Model type: GPT -> MAE (Masked Autoencoder) - Architecture: encoder-decoder with spatial masking - embed_dim=384, depth=8 (encoder), decoder_embed_dim=192, decoder_depth=4 - Mask strategy: mask last frame region, reconstruct from context - NO autoregressive generation — all patches predicted in parallel Rationale: The GPT model's biggest limitation is autoregressive error compounding. The per-position loss shows 10-50x higher error at the end of the raster scan (motor strip patches). The TF/FR gap remained at 0.006 even with 3x more capacity. MAE predicts all masked patches simultaneously, eliminating: 1. Error compounding (each patch is independent of other predictions) 2. TF/FR gap (no autoregressive generation) 3. Position-dependent quality degradation Expected improvements: - Motor strip accuracy (no longer at bottom of raster scan) - More uniform spatial error distribution - Potentially better SSIM (no blur accumulation) Potential downsides: - MAE doesn't capture sequential dependencies between patches - May produce less coherent local structure - Different failure modes (may produce "average" predictions) Comparison targets from iter2 (GPT 384d12): val_mse=0.0102, SSIM=0.756, PSNR=20.4, action_disc=0.031 motor_dir_acc=83.1%, motor_pos_mae=0.612, motor_consistency=1.01

Metrics

Recommendations

Counterfactual Action Grids

Each grid: Row 1 = GT, Row 2 = STAY (red), Row 3 = MOVE+ (green), Row 4 = MOVE- (blue)

Metric	Value
val_mse	0.014740
val_mse_visual	0.014740
ssim	0.616064
psnr	19.056644
val_mse_motor_strip	0.012590
val_mse_action_1	0.013135
val_mse_action_2	0.016605
val_mse_static	0.004921
val_mse_dynamic	0.027464
motor_position_mae_mean	1.695826
motor_velocity_mae_mean	0.068468
motor_direction_accuracy	0.806250
motor_consistency_error	1.946142
action_discrimination_score	0.021702
motor_discrimination_score	0.047564
motor_position_mae_per_joint	[4.2821, 1.0353, 1.6430, 2.2400, 0.9011, 0.0733]
motor_position_mae_action_1	1.649179
motor_position_mae_action_2	1.750037

Sample 0

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	-39.7233	-45.6685	-35.6004	-56.2754
J1	-90.9068	-89.3808	-90.1750	-89.8725
J2	66.3224	63.1548	70.2561	69.9488
J3	39.2845	38.9908	38.9596	38.4939
J4	8.5083	8.5706	8.6750	8.5793
J5	10.3896	10.5482	10.5452	9.9446

Sample 1

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	-59.4725	-25.0172	-31.5712	-53.5583
J1	-89.7081	-84.6556	-89.4076	-89.4094
J2	72.1129	72.4946	73.0660	71.8873
J3	39.2845	39.4672	38.7582	38.3339
J4	8.5083	8.8270	8.7659	8.6613
J5	10.3127	9.9885	10.5320	9.8787

Sample 2

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	33.1292	27.9879	31.1145	14.8720
J1	-89.7081	-89.7625	-89.3492	-89.4558
J2	70.5801	67.5131	70.9676	70.6356
J3	39.2845	38.5689	38.5775	38.8647
J4	8.3953	8.6432	8.6598	8.9428
J5	10.5433	10.2750	10.2948	10.3580

Sample 3

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	43.2233	37.6790	41.6237	17.5690
J1	-89.1403	-88.0972	-89.0935	-89.6248
J2	84.2048	79.1703	84.0419	84.6894
J3	40.3077	39.5126	39.6855	39.8198
J4	8.5083	8.7527	8.8158	9.1169
J5	10.5433	10.4297	10.4391	10.3780

Evaluation Report: iter3_mae_384d8

Analysis Notes

Metrics

Recommendations

Counterfactual Action Grids

Sample 0

Sample 1

Sample 2

Sample 3