Evaluation Report: iter1_baseline_gpt_700

Run Name: iter1_baseline_gpt_700

Model Type: gpt

Checkpoint: local/checkpoints/gpt_iter1_baseline/best.pth

Dataset: local/datasets/single-action-shoulder-pan-700-combined

Date: 2026-03-17 22:42:46

Val Samples: 80

Analysis Notes

ITERATION 1: Baseline GPT on 700-canvas dataset ================================================ Model: GPT AutoregressiveViT (default params) - embed_dim=256, depth=8, num_heads=8, patch_size=16 - 6.9M parameters - 100 epochs, batch_size=8, lr=1.5e-4 Dataset: 700 canvases (single-action-shoulder-pan-700-combined) - 630 train / 70 val - Action distribution: 345 move+ (action=1), 355 move- (action=2), 0 stay (action=0) - Canvas: 464x480 (448 visual + 16 motor strip, 6 joints) Purpose: Establish baseline with 7x more data than previous 100-sample run. Previous run (100 samples): best_val=0.009839, SSIM=0.62, PSNR=17.6, action_disc=0.016, motor_dir_acc=67.9% Training result: best_val=0.005279 (46% improvement from more data) Key questions for this evaluation: 1. Did the train/val gap close with more data? 2. Did action discrimination improve? 3. Is GPT error compounding still the dominant issue? 4. How does motor prediction quality change?

Metrics

MetricValue
val_mse0.012655
val_mse_visual0.012655
ssim0.696908
psnr19.419347
val_mse_motor_strip0.007126
val_mse_action_10.013139
val_mse_action_20.012093
val_mse_static0.002685
val_mse_dynamic0.025371
motor_position_mae_mean0.876543
motor_velocity_mae_mean0.063530
motor_direction_accuracy0.868750
motor_consistency_error1.169797
gpt_teacher_forcing_loss0.005279
gpt_free_running_loss0.012531
gpt_tf_fr_gap0.007253
action_discrimination_score0.029936
motor_discrimination_score0.053826
motor_position_mae_per_joint[2.9007, 0.5437, 0.9453, 0.1488, 0.6802, 0.0406]
motor_position_mae_action_11.022885
motor_position_mae_action_20.706469

Recommendations

Motor position and velocity predictions are inconsistent. Consider adding a consistency loss or simplifying the velocity encoding.

Counterfactual Action Grids

Each grid: Row 1 = GT, Row 2 = STAY (red), Row 3 = MOVE+ (green), Row 4 = MOVE- (blue)

Sample 0

Counterfactual grid 0
Error heatmap 0

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J0-39.7233-12.6752-39.9395-58.7795
J1-90.9068-89.7562-90.6437-90.9544
J266.322467.123665.486266.0939
J339.284539.325339.409839.4086
J48.50838.26768.41538.4163
J510.38969.892710.384510.1845

Sample 1

Counterfactual grid 1
Error heatmap 1

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J0-59.4725-11.3310-35.5370-57.9923
J1-89.7081-88.8376-89.7039-90.1658
J272.112973.163072.209872.2999
J339.284539.265139.341339.1486
J48.50838.41008.50728.6224
J510.312710.242810.467810.3805

Sample 2

Counterfactual grid 2
Error heatmap 2

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J033.1292-11.117828.07677.8497
J1-89.7081-88.5362-90.0627-89.8683
J270.580171.327370.389269.8753
J339.284539.406339.357839.2719
J48.39538.80468.39268.7811
J510.543310.431410.419510.5387

Sample 3

Counterfactual grid 3
Error heatmap 3

Error heatmap (jet colormap)

JointGT PosSTAYMOVE+MOVE-
J043.2233-6.613442.098821.3623
J1-89.1403-87.7668-89.1414-88.9937
J284.204885.460182.862982.0960
J340.307739.675939.601639.6488
J48.50838.80738.61798.7744
J510.543310.374310.466310.4710

GPT Per-Position Loss (last frame, raster order)

PositionMSE
00.004843
10.004338
20.003212
30.004779
40.002410
50.002137
60.000953
70.000897
80.000719
90.000594
100.000658
110.006855
120.007609
130.008151
140.005726
150.001310
160.001745
170.002418
180.001825
190.001108
200.000997
210.001029
220.001051
230.001899
240.006937
250.007255
260.009636
270.010152
280.002039
290.001908
300.000742
310.001558
320.001227
330.000491
340.000950
350.000657
360.001118
370.004137
380.006188
390.007774
400.007824
410.009640
420.002102
430.001634
440.001679
450.001309
460.001055
470.000649
480.000896
490.000779
500.001851
510.005636
520.006273
530.006065
540.007580
550.008797
560.000835
570.001168
580.000915
590.000603
600.000678
610.000808
620.000786
630.000726
640.001569
650.006118
660.005586
670.005688
680.005527
690.008137
700.001004
710.001027
720.000630
730.000453
740.000596
750.000386
760.000353
770.001910
780.003930
790.003209
800.001977
810.003468
820.003614
830.003228
840.001715
850.001041
860.000562
870.000339
880.000407
890.000305
900.000277
910.000528
920.001756
930.001913
940.001214
950.002339
960.003840
970.002351
980.000746
990.000722
1000.000640
1010.000536
1020.000437
1030.000330
1040.000330
1050.000815
1060.001018
1070.001284
1080.000681
1090.001065
1100.001290
1110.004436
1120.001129
1130.000527
1140.000550
1150.000576
1160.000517
1170.000417
1180.000373
1190.000781
1200.000956
1210.000716
1220.000432
1230.000400
1240.001863
1250.006926
1260.000797
1270.000440
1280.000439
1290.000533
1300.000519
1310.000299
1320.000268
1330.000433
1340.000560
1350.000725
1360.000388
1370.000616
1380.003617
1390.005174
1400.000317
1410.000239
1420.000261
1430.000267
1440.000328
1450.000253
1460.000294
1470.000409
1480.000606
1490.000791
1500.001231
1510.002570
1520.003553
1530.004642
1540.000319
1550.000258
1560.000232
1570.000259
1580.000295
1590.000230
1600.000294
1610.000177
1620.000209
1630.000632
1640.000943
1650.005024
1660.001502
1670.004027
1680.000314
1690.000229
1700.000197
1710.000235
1720.000224
1730.000250
1740.000256
1750.000235
1760.000187
1770.000203
1780.000690
1790.001500
1800.000480
1810.001742
1820.000260
1830.000231
1840.000226
1850.000222
1860.000279
1870.000236
1880.000474
1890.000572
1900.000464
1910.000323
1920.000312
1930.001135
1940.000912
1950.001413
1960.025451
1970.016685
1980.014141
1990.014177
2000.012128
2010.014981
2020.020606
2030.021069
2040.022623
2050.018999
2060.020197
2070.026655
2080.018792
2090.020055
2100.019154
2110.016210
2120.010705
2130.012972
2140.011106
2150.010687
2160.016270
2170.022554
2180.017890
2190.016418
2200.016083
2210.012017
2220.009457
2230.011365
2240.019695
2250.015597
2260.011073
2270.013625
2280.010256
2290.010029
2300.014167
2310.017066
2320.016075
2330.016814
2340.011123
2350.008899
2360.009232
2370.007224
2380.016828
2390.011391
2400.013375
2410.012189
2420.008872
2430.007739
2440.011257
2450.015136
2460.013977
2470.016142
2480.012630
2490.008530
2500.009430
2510.008909
2520.013592
2530.009558
2540.010632
2550.010821
2560.009237
2570.007196
2580.009152
2590.013031
2600.013779
2610.012466
2620.010057
2630.007795
2640.006594
2650.006512
2660.011282
2670.006524
2680.008110
2690.007580
2700.008749
2710.007899
2720.009374
2730.010576
2740.013120
2750.009921
2760.005853
2770.005572
2780.005271
2790.004949
2800.010664
2810.007397
2820.008621
2830.007710
2840.008719
2850.009898
2860.009119
2870.013306
2880.009166
2890.008753
2900.005926
2910.004311
2920.006350
2930.005394
2940.009174
2950.010061
2960.007325
2970.007421
2980.010495
2990.005913
3000.009006
3010.014707
3020.009792
3030.007263
3040.003950
3050.003340
3060.005117
3070.002248
3080.006894
3090.004815
3100.005312
3110.006394
3120.002016
3130.009896
3140.008098
3150.007185
3160.008793
3170.005231
3180.004513
3190.003758
3200.002188
3210.002118
3220.007122
3230.004108
3240.009072
3250.002131
3260.000647
3270.012720
3280.010072
3290.001848
3300.008278
3310.006240
3320.003238
3330.002761
3340.001436
3350.001589
3360.006540
3370.006195
3380.006620
3390.000657
3400.000578
3410.008179
3420.009520
3430.001505
3440.004120
3450.006665
3460.003401
3470.002281
3480.001475
3490.002099
3500.013373
3510.011191
3520.005178
3530.000552
3540.000488
3550.016753
3560.009345
3570.002157
3580.000982
3590.005085
3600.004556
3610.001536
3620.001750
3630.001077
3640.009603
3650.011765
3660.002339
3670.000391
3680.000649
3690.011375
3700.006675
3710.002528
3720.000681
3730.000947
3740.003385
3750.003098
3760.001542
3770.000945
3780.013377
3790.006084
3800.000320
3810.000313
3820.001006
3830.009562
3840.008211
3850.002551
3860.000370
3870.000253
3880.001020
3890.002406
3900.001378
3910.001035
3920.002001
3930.002197
3940.001101
3950.000345
3960.020397
3970.000099
3980.000220
3990.023983
4000.021714
4010.000239
4020.022358
4030.002617
4040.000057
4050.000054