Evaluation Report: iter5_gpt_512d12_wider

Run Name: iter5_gpt_512d12_wider

Model Type: gpt

Checkpoint: local/checkpoints/gpt_iter5_wider/best.pth

Dataset: local/datasets/single-action-shoulder-pan-700-combined

Date: 2026-03-18 01:11:18

Val Samples: 80

Analysis Notes

ITERATION 5: Wider GPT (embed_dim=512) instead of deeper ======================================================== Changes from Iteration 2 (best so far): - embed_dim: 384 -> 512 (33% wider) - depth: 12 (same) - num_heads: 16 (matched to embed_dim) - ~38M parameters (up from 22M) - lr: 1e-4 (slightly lower for larger model) - batch_size: 4 Rationale: Iteration 4 showed that depth=16 hurts — the extra layers amplify exposure bias in autoregressive generation (TF/FR gap went from 0.006 to 0.009). Width scaling is generally more stable than depth for ViTs, as shown in scaling law papers. Architecture comparison so far: - iter1: GPT 256d8 (6.9M) -> val_mse=0.0127, SSIM=0.697 - iter2: GPT 384d12 (22M) -> val_mse=0.0102, SSIM=0.756 *** BEST *** - iter3: MAE 384d8 -> val_mse=0.0147, SSIM=0.616 (worst) - iter4: GPT 384d16 (29M) -> val_mse=0.0133, SSIM=0.730 (worse than iter2) Hypothesis: wider representations (512 dims) will capture richer features per token without the instability of more sequential layers. The attention heads can attend to more diverse features with 512d/16 heads vs 384d/12 heads.

Metrics

MetricValue
val_mse0.010951
val_mse_visual0.010951
ssim0.745914
psnr20.128148
val_mse_motor_strip0.006325
val_mse_action_10.010272
val_mse_action_20.011740
val_mse_static0.001598
val_mse_dynamic0.022959
motor_position_mae_mean0.614608
motor_velocity_mae_mean0.055433
motor_direction_accuracy0.875000
motor_consistency_error1.036756
gpt_teacher_forcing_loss0.004312
gpt_free_running_loss0.010858
gpt_tf_fr_gap0.006546
action_discrimination_score0.032071
motor_discrimination_score0.048447
motor_position_mae_per_joint[1.9679, 0.4020, 0.5150, 0.1226, 0.6442, 0.0359]
motor_position_mae_action_10.622564
motor_position_mae_action_20.605363

Recommendations

Motor position and velocity predictions are inconsistent. Consider adding a consistency loss or simplifying the velocity encoding.

Counterfactual Action Grids

Each grid: Row 1 = GT, Row 2 = STAY (red), Row 3 = MOVE+ (green), Row 4 = MOVE- (blue)

Sample 0

Counterfactual grid 0
Error heatmap 0

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J0-39.7233-12.5490-38.3007-57.7037
J1-90.9068-89.7117-90.7776-90.6163
J266.322465.499466.092466.9252
J339.284539.072939.247639.1186
J48.50838.52068.43798.4380
J510.389610.279910.531610.1576

Sample 1

Counterfactual grid 1
Error heatmap 1

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J0-59.4725-7.5981-36.9734-55.7658
J1-89.7081-88.4212-90.1794-89.7475
J272.112969.619271.273672.7784
J339.284538.935939.257839.0027
J48.50838.54228.48838.4900
J510.312710.245610.569110.1913

Sample 2

Counterfactual grid 2
Error heatmap 2

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J033.1292-5.404134.040615.9055
J1-89.7081-88.1873-90.0212-89.7407
J270.580169.548470.895570.1429
J339.284539.310339.200639.2875
J48.39538.84738.62558.8006
J510.543310.498910.573310.5287

Sample 3

Counterfactual grid 3
Error heatmap 3

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J043.22330.537143.569424.2516
J1-89.1403-86.6366-89.3674-89.2437
J284.204882.572984.389984.4728
J340.307739.821939.553939.7815
J48.50838.92858.71708.9575
J510.543310.540310.582110.5530

GPT Per-Position Loss (last frame, raster order)

PositionMSE
00.004466
10.003868
20.002848
30.004336
40.002460
50.001100
60.000401
70.000334
80.000254
90.000272
100.000555
110.004851
120.005865
130.007603
140.003435
150.001126
160.001565
170.001252
180.000838
190.000635
200.000413
210.000335
220.000407
230.001894
240.005692
250.005489
260.007226
270.008823
280.000842
290.001365
300.000596
310.001145
320.001013
330.000282
340.000387
350.000312
360.000571
370.004104
380.005751
390.005883
400.005751
410.009281
420.001026
430.000788
440.001156
450.001093
460.000568
470.000303
480.000377
490.000297
500.000892
510.004486
520.004765
530.004705
540.005951
550.006543
560.000406
570.000712
580.000538
590.000382
600.000342
610.000396
620.000296
630.000266
640.000984
650.003841
660.004385
670.005169
680.004328
690.005961
700.000532
710.000560
720.000355
730.000238
740.000315
750.000170
760.000161
770.001459
780.002653
790.002568
800.001654
810.002405
820.002330
830.002770
840.000900
850.000489
860.000280
870.000162
880.000187
890.000152
900.000140
910.000289
920.001142
930.001061
940.000707
950.001597
960.002875
970.002245
980.000429
990.000315
1000.000289
1010.000238
1020.000178
1030.000145
1040.000152
1050.000279
1060.000612
1070.000600
1080.000508
1090.000429
1100.001413
1110.004046
1120.001114
1130.000235
1140.000259
1150.000216
1160.000214
1170.000192
1180.000145
1190.000273
1200.000462
1210.000391
1220.000278
1230.000179
1240.001500
1250.004942
1260.000463
1270.000258
1280.000235
1290.000230
1300.000241
1310.000155
1320.000128
1330.000167
1340.000230
1350.000268
1360.000195
1370.000360
1380.002194
1390.002584
1400.000212
1410.000166
1420.000169
1430.000173
1440.000171
1450.000158
1460.000131
1470.000115
1480.000237
1490.000220
1500.000411
1510.001101
1520.001588
1530.002269
1540.000204
1550.000166
1560.000158
1570.000155
1580.000160
1590.000135
1600.000121
1610.000082
1620.000085
1630.000176
1640.000308
1650.001854
1660.000702
1670.001732
1680.000173
1690.000144
1700.000129
1710.000119
1720.000132
1730.000134
1740.000110
1750.000085
1760.000074
1770.000100
1780.000286
1790.000635
1800.000233
1810.000941
1820.000164
1830.000133
1840.000126
1850.000112
1860.000122
1870.000137
1880.000222
1890.000289
1900.000272
1910.000158
1920.000157
1930.000468
1940.000384
1950.000823
1960.024217
1970.020741
1980.014729
1990.013567
2000.014224
2010.016774
2020.022256
2030.018091
2040.021605
2050.019349
2060.020265
2070.021839
2080.019799
2090.017377
2100.017416
2110.014983
2120.008980
2130.011506
2140.006936
2150.008765
2160.011200
2170.014595
2180.014613
2190.012786
2200.011574
2210.009518
2220.008872
2230.009045
2240.018374
2250.014404
2260.009987
2270.011446
2280.008452
2290.007650
2300.010365
2310.011636
2320.013439
2330.013164
2340.009280
2350.007783
2360.007477
2370.006833
2380.015354
2390.010565
2400.010548
2410.011281
2420.007266
2430.007072
2440.009188
2450.011272
2460.011597
2470.010759
2480.008433
2490.007693
2500.008263
2510.008611
2520.012961
2530.007605
2540.008638
2550.008821
2560.008706
2570.006206
2580.007846
2590.010737
2600.010487
2610.009221
2620.008744
2630.007327
2640.005009
2650.005902
2660.009195
2670.006214
2680.008953
2690.006909
2700.008034
2710.007772
2720.006713
2730.008418
2740.009810
2750.006650
2760.004319
2770.004691
2780.004192
2790.004158
2800.008339
2810.005364
2820.007248
2830.008171
2840.007198
2850.008280
2860.007101
2870.011460
2880.007127
2890.005857
2900.003819
2910.003166
2920.004211
2930.004761
2940.006680
2950.006938
2960.004988
2970.005628
2980.007591
2990.003741
3000.006530
3010.010767
3020.009231
3030.005164
3040.002889
3050.002276
3060.003391
3070.001972
3080.006427
3090.003560
3100.004300
3110.005692
3120.001240
3130.010556
3140.006052
3150.005788
3160.005994
3170.004581
3180.003959
3190.002867
3200.001293
3210.001444
3220.005221
3230.003034
3240.004636
3250.001329
3260.000289
3270.011655
3280.007112
3290.001253
3300.005594
3310.004489
3320.002446
3330.001895
3340.000882
3350.001041
3360.005668
3370.005745
3380.006096
3390.000293
3400.000222
3410.008033
3420.008186
3430.000985
3440.003188
3450.004300
3460.002544
3470.001770
3480.001118
3490.001589
3500.009521
3510.007170
3520.004230
3530.000227
3540.000210
3550.019148
3560.010214
3570.001057
3580.000565
3590.005248
3600.002886
3610.001319
3620.001351
3630.000822
3640.007977
3650.010582
3660.001243
3670.000151
3680.000319
3690.013222
3700.004374
3710.001727
3720.000360
3730.000481
3740.002236
3750.002357
3760.001062
3770.000531
3780.010970
3790.004002
3800.000169
3810.000129
3820.000553
3830.007189
3840.003874
3850.001621
3860.000202
3870.000146
3880.000581
3890.001920
3900.001118
3910.000724
3920.000658
3930.001408
3940.000297
3950.000168
3960.021531
3970.000055
3980.000109
3990.021276
4000.020397
4010.000164
4020.021998
4030.001898
4040.000022
4050.000022