Eval Report: diffusion - 20260319

DIFFUSION ITERATION 4: Wider model (512d) + weight decay 0.01 for regularization ================================================================================= Changes from Iteration 3: - embed_dim: 384 -> 512 (wider model) - depth: 12 (same) - num_heads: 16 (matched to embed_dim) - weight_decay: 0.0 -> 0.01 (light regularization to reduce overfitting) - epochs: 500 -> 300 (iter3 best was at ~210, didn't improve after) - lr: 5e-4 -> 3e-4 (slightly lower for larger model) Rationale: Iteration 3 finally got diffusion working with sample prediction + fixed inference. The model achieves competitive results with GPT but motor accuracy is weaker (pos MAE 0.954 vs GPT's 0.612). The iter3 model overfitted heavily (train/val gap grew to -0.013). Adding light weight decay should help regularize. A wider model should provide more capacity for precise predictions. Key: iter3 showed diffusion models need NO weight decay to start training, but too little regularization causes overfitting after ~200 epochs. Weight decay 0.01 is a middle ground. Results so far: - diff_iter1: 256d8, epsilon, wd=0.05 -> NOISE (total failure) - diff_iter2: 384d12, epsilon, wd=0.0, cosine -> NOISE (DDIM broken) - diff_iter3: 384d12, sample, wd=0.0, cosine -> SSIM=0.730 (working!) - This run: 512d12, sample, wd=0.01, cosine -> targeting SSIM>0.75

Metrics

Recommendations

Counterfactual Action Grids

Each grid: Row 1 = GT, Row 2 = STAY (red), Row 3 = MOVE+ (green), Row 4 = MOVE- (blue)

Metric	Value
val_mse	0.008753
val_mse_visual	0.008753
ssim	0.774989
psnr	21.009076
val_mse_motor_strip	0.011761
val_mse_action_1	0.007847
val_mse_action_2	0.009807
val_mse_static	0.001445
val_mse_dynamic	0.018052
motor_position_mae_mean	0.956825
motor_velocity_mae_mean	0.056059
motor_direction_accuracy	0.825000
motor_consistency_error	1.205545
diffusion_loss_t_0_250	0.002058
diffusion_loss_t_250_500	0.004465
diffusion_loss_t_500_750	0.008339
diffusion_loss_t_750_1000	0.017111
action_discrimination_score	0.037558
motor_discrimination_score	0.049088
motor_position_mae_per_joint	[2.4623, 0.4575, 0.9046, 1.0166, 0.8754, 0.0246]
motor_position_mae_action_1	0.865819
motor_position_mae_action_2	1.062589

Sample 0

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	-39.7233	-58.4362	-46.4312	-58.7470
J1	-90.9068	-90.5657	-90.5461	-90.6769
J2	66.3224	66.3306	65.9561	66.3921
J3	39.2845	38.5997	39.0355	38.8833
J4	8.5083	7.9224	8.6771	8.4876
J5	10.3896	10.5004	10.4672	10.3546

Sample 1

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	-59.4725	-56.0054	-40.7560	-58.8133
J1	-89.7081	-89.6776	-89.4568	-89.5167
J2	72.1129	72.7499	72.6902	72.8331
J3	39.2845	38.5274	39.3407	38.9401
J4	8.5083	7.6856	8.7110	8.3896
J5	10.3127	10.5167	10.5440	10.4008

Sample 2

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	33.1292	24.1081	33.6663	18.4541
J1	-89.7081	-90.2747	-89.8306	-90.5096
J2	70.5801	71.1516	71.2903	71.1998
J3	39.2845	38.6666	39.2701	39.4000
J4	8.3953	8.3575	8.6081	8.8027
J5	10.5433	10.6071	10.5073	10.5585

Sample 3

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	43.2233	29.0241	43.1118	25.5336
J1	-89.1403	-89.2733	-89.3082	-89.3114
J2	84.2048	83.3181	84.5922	85.4572
J3	40.3077	38.7707	39.5202	39.6215
J4	8.5083	7.4355	8.8270	8.7954
J5	10.5433	10.6310	10.5792	10.6138

Evaluation Report: diff_iter4_512d12_wd001

Analysis Notes

Metrics

Recommendations

Counterfactual Action Grids

Sample 0

Sample 1

Sample 2

Sample 3

Diffusion Loss by Timestep Bucket

Bucket	Loss
t_0_250	0.002058
t_250_500	0.004465
t_500_750	0.008339
t_750_1000	0.017111