Can a model rebuild a screen from words alone?
A vision model describes a mobile screenshot in text. A coding agent rebuilds the screen from that text only — never seeing the original — and we render it on an iOS simulator and score the match.
screenshot → vision model → text → harness + model → iOS render → score
2
screens
2
vision models
3
harnesses
12/12
cells scored
Browse →
gallery
Vision models
Decomposition — marginal over harness × model
| # | Vision model | n | Score | |
|---|---|---|---|---|
| 1 | anthropic/claude-opus-4.5 | 6 | 52.4 | |
| 2 | google/gemini-2.5-pro | 6 | 49.9 |
Coding harnesses
Reconstruction — marginal over vision × model
| # | Harness | n | Score | |
|---|---|---|---|---|
| 1 | codex | 4 | 53.5 | |
| 2 | claude-code | 4 | 52.3 | |
| 3 | opencode | 4 | 47.7 |
Coding models
Marginal over vision × harness
| # | Coding model | n | Score | |
|---|---|---|---|---|
| 1 | gpt-5.5 | 4 | 53.5 | |
| 2 | sonnet | 4 | 52.3 | |
| 3 | deepseek/deepseek-v4-pro | 4 | 47.7 |
Full pipeline
Vision · harness / model, end to end
| # | Pipeline | Vision | Harness | n | Score | |
|---|---|---|---|---|---|---|
| 1 | anthropic/claude-opus-4.5 · claude-code/sonnet | — | — | 2 | 54.7 | |
| 2 | google/gemini-2.5-pro · codex/gpt-5.5 | — | — | 2 | 54.4 | |
| 3 | anthropic/claude-opus-4.5 · codex/gpt-5.5 | — | — | 2 | 52.6 | |
| 4 | google/gemini-2.5-pro · claude-code/sonnet | — | — | 2 | 49.8 | |
| 5 | anthropic/claude-opus-4.5 · opencode/deepseek/deepseek-v4-pro | — | — | 2 | 49.8 | |
| 6 | google/gemini-2.5-pro · opencode/deepseek/deepseek-v4-pro | — | — | 2 | 45.6 |