Date: March 6-7, 2026
Context: First overnight session using Claude Code as an automated research agent. Starting from the best manual decoder-only checkpoint, Claude explored architecture variants, hyperparameters, and loss functions across 14 experiment runs.
Two architecture families were explored:
The decoder-only architecture consistently outperformed encoder-decoder, with the best run (run7) achieving a hybrid loss of 0.009178.
These runs explored the encoder-decoder (masked autoencoder) architecture with depth=5, embed_dim=256. Most runs failed to converge well, with 3 out of 6 runs stuck at loss ~0.089.
| Run | Key Changes | Best Loss | Report |
|---|---|---|---|
| encdec_debug | Baseline encoder-decoder, sep_width=16, no weight decay | 0.089601 | report |
| encdec_run1 | Same as debug | 0.089601 | report |
| encdec_run1_2 | Re-run with fixes | 0.016448 | report |
| encdec_run2 (Mar 6) | Further iteration | 0.089601 | report |
| encdec_run2 (Mar 7) | sep_width=32, weight_decay=0.01 | 0.018353 | report |
| encdec_run3 | focal_beta=10, focal_loss_alpha=0.3 | 0.075089 | report |
Starting from the successful decoder-only architecture (embed=256, depth=12), Claude systematically explored perceptual loss weighting, model scaling, and VGG loss.
| Run | Key Changes vs Baseline | Best Loss | Report |
|---|---|---|---|
| run4 | Baseline (no perceptual loss) | 0.009492 | report |
| run5 | Continued training from run4 | 0.009360 | report |
| run5_2 | Further continuation | 0.009239 | report |
| run6 | perceptual_loss_weight=0.1 (high) | 0.012570 | report |
| run7 | perceptual_loss_weight=0.01 | 0.009178 | report |
| run8 | perceptual_loss_weight=0.02 | 0.009435 | report |
| run9 | embed=512, depth=8 (wider, shallower) | 0.010240 | report |
| run10 | depth=16 (deeper) | 0.011437 | report |
| Rank | Run | Architecture | Best Loss | Key Config |
|---|---|---|---|---|
| 1 | run7 | Decoder-Only | 0.009178 | embed=256, depth=12, perceptual=0.01 |
| 2 | run5_2 | Decoder-Only | 0.009239 | embed=256, depth=12, perceptual=0 (continued) |
| 3 | run5 | Decoder-Only | 0.009360 | embed=256, depth=12, perceptual=0 (continued) |
| 4 | run8 | Decoder-Only | 0.009435 | embed=256, depth=12, perceptual=0.02 |
| 5 | run4 | Decoder-Only | 0.009492 | embed=256, depth=12, perceptual=0 (baseline) |
| 6 | run9 | Decoder-Only | 0.010240 | embed=512, depth=8, perceptual=0.01 |
| 7 | run10 | Decoder-Only | 0.011437 | embed=256, depth=16, perceptual=0.01 |
| 8 | run6 | Decoder-Only | 0.012570 | embed=256, depth=12, perceptual=0.1 |
| 9 | encdec_run1_2 | Encoder-Decoder | 0.016448 | embed=256, depth=5 |
| 10 | encdec_run2 (Mar 7) | Encoder-Decoder | 0.018353 | embed=256, depth=5, sep_width=32 |
In a single overnight session, Claude Code explored 14 experiment variations across two architecture families — work that would have taken over a week manually. Key findings:
These results motivated collecting more data and creating the canvas-world-model repo for further experiments.