Can a model rebuild a screen from words alone?
A vision model describes a mobile screenshot in text. A coding agent rebuilds the screen from that text only — never seeing the original — and we render it on an iOS simulator and score the match.
screenshot → vision model → text → harness + model → iOS render → score
2
screens
2
vision models
3
harnesses
12/12
cells scored
Browse →
gallery
Vision models
Decomposition — marginal over harness × model
| # | Vision model | n | Score | |
|---|---|---|---|---|
| 1 | google/gemini-2.5-pro | 6 | 69.7 | |
| 2 | anthropic/claude-opus-4.5 | 6 | 66.5 |
Coding harnesses
Reconstruction — marginal over vision × model
| # | Harness | n | Score | |
|---|---|---|---|---|
| 1 | codex | 4 | 69.3 | |
| 2 | claude-code | 4 | 69.3 | |
| 3 | opencode | 4 | 65.8 |
Coding models
Marginal over vision × harness
| # | Coding model | n | Score | |
|---|---|---|---|---|
| 1 | gpt-5.5 | 4 | 69.3 | |
| 2 | sonnet | 4 | 69.3 | |
| 3 | deepseek/deepseek-v4-pro | 4 | 65.8 |
Full pipeline
Vision · harness / model, end to end
| # | Pipeline | Cost | n | Score | |
|---|---|---|---|---|---|
| 1 | google/gemini-2.5-pro · claude-code/sonnet | — | 2 | 71.7 | |
| 2 | google/gemini-2.5-pro · codex/gpt-5.5 | — | 2 | 71.0 | |
| 3 | anthropic/claude-opus-4.5 · codex/gpt-5.5 | — | 2 | 67.6 | |
| 4 | anthropic/claude-opus-4.5 · claude-code/sonnet | — | 2 | 66.9 | |
| 5 | google/gemini-2.5-pro · opencode/deepseek/deepseek-v4-pro | — | 2 | 66.6 | |
| 6 | anthropic/claude-opus-4.5 · opencode/deepseek/deepseek-v4-pro | — | 2 | 65.0 |