Eval Report: diffusion - 20260319

DIFFUSION ITERATION 3: Sample prediction + higher LR + 500 epochs ================================================================== Changes from Iteration 2 (still producing noise): - prediction_type: epsilon -> sample (predict clean image directly) - epochs: 300 -> 500 (diffusion needs much more training) - lr: 3e-4 -> 5e-4 (higher peak since sample prediction is easier to optimize) - warmup_epochs: 10 -> 20 (longer warmup for stability at higher LR) Rationale: Iteration 2 reduced denoising loss from 0.675 to 0.506, but inference still produces noise (SSIM=0.008). The issue: epsilon prediction asks the model to predict the noise component, which is harder to learn than predicting the clean image directly. With "sample" prediction: - The model predicts x_0 (clean patches) instead of epsilon (noise) - More intuitive: model learns "what should this look like?" not "what noise was added?" - Converges faster in practice for conditional generation - The DDIM step still works: uses pred_x0 directly instead of deriving it from epsilon Also, this canvas has an asymmetry: context patches are ALWAYS clean while last-frame patches are noisy. Epsilon prediction struggles with this because the model must learn different behaviors for context vs. target regions. Sample prediction is more uniform: predict clean for everything, training only optimizes the last-frame predictions. The previous training ran for 300 epochs but loss was still decreasing. 500 epochs should allow better convergence.

Metrics

Recommendations

Counterfactual Action Grids

Each grid: Row 1 = GT, Row 2 = STAY (red), Row 3 = MOVE+ (green), Row 4 = MOVE- (blue)

Metric	Value
val_mse	0.011394
val_mse_visual	0.011394
ssim	0.730140
psnr	19.706405
val_mse_motor_strip	0.015084
val_mse_action_1	0.010401
val_mse_action_2	0.012549
val_mse_static	0.002058
val_mse_dynamic	0.023309
motor_position_mae_mean	0.953616
motor_velocity_mae_mean	0.093113
motor_direction_accuracy	0.843750
motor_consistency_error	1.194727
diffusion_loss_t_0_250	0.002217
diffusion_loss_t_250_500	0.004562
diffusion_loss_t_500_750	0.008389
diffusion_loss_t_750_1000	0.018411
action_discrimination_score	0.038722
motor_discrimination_score	0.055427
motor_position_mae_per_joint	[2.9318, 0.6501, 0.8192, 0.2109, 1.0452, 0.0645]
motor_position_mae_action_1	0.954987
motor_position_mae_action_2	0.952021

Sample 0

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	-39.7233	-53.1471	-44.0113	-58.1136
J1	-90.9068	-90.5143	-90.5498	-90.7645
J2	66.3224	66.5114	65.4488	65.0265
J3	39.2845	39.6671	39.9770	39.7421
J4	8.5083	8.4293	9.1014	8.6378
J5	10.3896	10.6055	10.6353	10.5477

Sample 1

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	-59.4725	-49.4470	-41.1102	-58.3401
J1	-89.7081	-89.5667	-89.5738	-90.6919
J2	72.1129	72.4014	71.2697	71.2285
J3	39.2845	39.6514	39.9730	39.8268
J4	8.5083	8.6531	8.9874	8.8648
J5	10.3127	10.5807	10.6643	10.5929

Sample 2

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	33.1292	25.1916	34.0501	11.7716
J1	-89.7081	-90.2027	-90.0745	-90.0278
J2	70.5801	70.5743	70.1452	70.0515
J3	39.2845	39.6044	39.8749	40.0282
J4	8.3953	9.0581	8.8724	9.0839
J5	10.5433	10.6597	10.6393	10.6636

Sample 3

Error heatmap (jet colormap)

Joint	GT Pos	STAY	MOVE+	MOVE-
J0	43.2233	34.7238	51.9994	26.8380
J1	-89.1403	-89.3265	-89.4593	-89.1761
J2	84.2048	83.2126	84.9713	84.2591
J3	40.3077	39.8958	40.1965	40.0169
J4	8.5083	8.9923	9.1526	9.0080
J5	10.5433	10.6727	10.6717	10.6596

Evaluation Report: diff_iter3_sample_fixed_inference

Analysis Notes

Metrics

Recommendations

Counterfactual Action Grids

Sample 0

Sample 1

Sample 2

Sample 3

Diffusion Loss by Timestep Bucket

Bucket	Loss
t_0_250	0.002217
t_250_500	0.004562
t_500_750	0.008389
t_750_1000	0.018411