Ramp Put Claude Code in Rollercoaster Tycoon—Proving Voice AI Was Right: Interfaces Matter More Than Intelligence

# Ramp Put Claude Code in Rollercoaster Tycoon—Proving Voice AI Was Right: Interfaces Matter More Than Intelligence ## Meta Description Ramp's Claude Code RCT experiment reveals what voice AI already knew: agents don't fail from lack of intelligence—they fail from poor interfaces. DOM abstraction beats visual parsing just like CLI beats spatial reasoning. --- Ramp just published "We Put Claude Code in Rollercoaster Tycoon." **The experiment:** Mod Claude Code into RCT2, give it a CLI (rctctl), and let it manage theme parks through ASCII maps and terminal commands. The post hit #2 on Hacker News with 228 points and 127 comments in 6 hours. **But here's the insight buried in the technical writeup:** Ramp discovered that Claude Code's limitations weren't about intelligence—they were about interface legibility. > "The limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces. For this reason, we prefer to think of agents as automating diligence, rather than intelligence." And voice AI for product demos has been operating on this exact principle since day one. ## What Ramp's RCT Experiment Actually Proved Ramp's team modded Claude Code into RollerCoaster Tycoon 2 to understand agent capabilities for B2B SaaS. **Why RCT specifically:** > "Ramp needed a game that closely approximates customer-centric business operations and SaaS-powered digital feedback loops. There was simply no other choice." **What they built:** - Forked OpenRCT2 (open-source RCT2 reimplementation) - Created `rctctl` - expansive CLI modeled after Kubernetes' `kubectl` - Embedded terminal running Claude Code into game window - ASCII map system (Claude can't "see" the park, only text grids) - Full API access to park data, ride stats, guest feedback, financials **The result:** Claude Code managed parks competently through pure text interfaces—but with revealing limitations. ## The Three Tiers of Agent Performance (And What They Reveal About Interfaces) Ramp's writeup breaks Claude's RCT capabilities into three tiers. Voice AI for demos operates at the same architectural level—just with better interfaces. ### Tier 1: Where Claude Excels (Information + Digital Levers) **What Claude Code handled perfectly in RCT:** **Game knowledge:** > "Claude is surprisingly familiar with all things RCT, and also completely unfazed by the premise that it has been 'hacked into' a late-90's computer game." **Gathering information:** > "Claude excels at trawling through the game's diverse metrics and observability features. Claude switches between empathizing with aggregate guest thoughts and scrutinizing ride financials." **Pulling digital levers:** > "Claude is rock solid at adjusting configurations. Opening and closing rides, setting prices, hiring staff, and starting marketing campaigns. These actions don't require spatial reasoning, and they're most similar to traditional CLI operations." **The pattern:** **When Claude Code had structured data and clear APIs (financials, staff management, pricing), it performed flawlessly.** **The voice AI parallel:** Voice AI for demos operates in the same tier. **Structured data available:** - DOM tree (element hierarchy, attributes, text content) - Page state (URL, active elements, form values) - User intent (detected from natural language questions) **Clear APIs available:** - Reading current page structure - Identifying clickable elements - Detecting form fields and navigation - Providing contextual guidance **Both succeed because information legibility is high.** ### Tier 2: Where Claude Struggles (Spatial Reasoning from Text) **What Claude Code barely handled in RCT:** **Placing shops/stalls and flat rides:** > "Nearing the edge of Claude's in-game competencies are tasks that require a basic spatial understanding of the park. Even finding main pathways can take several iterative steps and cumulative reasoning." **Pathways and connections:** > "Locating main pathways, routing new paths, and connecting ride entrances/exits are all serious challenges for Claude. The added complexities of obstacles like rides, trees, fences, and terrain slopes can quickly combine to overwhelm Claude's loose grasp on the spatial reality." **Roller coasters:** > "Claude often gives up on optimal positioning and just flings rides to greenfield territories far away from main paths, and then fails to build the necessary long pathways to connect these endeavors." **The limitation:** **Claude received ASCII map grids like this:** ``` $ rctctl map area --x 44 --y 38 Map Area -------- Anchor : (44, 38) top-left Span : 16x16 tiles X:44 45 46 47 48 49 50 51 52 53 54 Y 38 R P P P P . . E E . T 39 P P . S P S T P Q S . 40 P R S R P P . P Q S . 41 P S S S . P P P P P P 42 P . S S S P . . . . . Legend ------ - R = Ride track/support - P = Footpath - . = Owned ground - E = Ride or park entrance ``` **Ramp's analysis:** > "So Claude is at a pretty steep visuo-spatial disadvantage, and this is most of the intuition you'll need to appreciate Claude's relative strengths and weaknesses in the game." **The insight:** **Text-based spatial representation is a bad interface for spatial tasks.** **The voice AI design decision:** Voice AI doesn't try to build spatial understanding from visual screenshots. Instead, it reads the DOM—a structured representation of page elements that's **already designed for programmatic access**. **Why DOM beats visual parsing:** **Visual screenshot approach (analogous to RCT ASCII maps):** - Parse pixels → Identify UI elements → Infer structure - Lossy representation (what's a button? what's clickable?) - Requires spatial reasoning from 2D image - **Poor interface for programmatic access** **DOM approach (structured data):** - Read element tree → Identify interactive elements → Parse attributes - Complete representation (tags, classes, IDs, ARIA labels) - No spatial reasoning needed (elements declare their purpose) - **Designed for programmatic access** **The parallel:** **RCT ASCII maps force Claude to do spatial reasoning from text.** **DOM trees give voice AI structured data about page functionality.** **One is a workaround. The other is the native interface.** ### Tier 3: Where Claude Completely Fails (Verticality + Custom Design) **What Claude Code couldn't handle at all:** **Verticality:** > "Claude's spatial reasoning in the game is already at its limits in two dimensions. The third spatial dimension of inclined ground, underground construction, and any custom rollercoaster design work is basically out of the question. Unfortunately, this is some of the game's richest gameplay." **Why it matters:** **Even Tier 2 spatial tasks (placing rides on flat ground) barely worked.** **Tier 3 spatial tasks (3D terrain, custom coaster design) were impossible.** **Ramp's conclusion:** > "As a mirror to real-world agent design: the limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces. For this reason, we prefer to think of agents as automating diligence, rather than intelligence." **The key insight:** **Agent failure isn't intelligence failure—it's interface failure.** ## The Three Lessons from RCT That Voice AI Already Implemented ### Lesson #1: Interface Legibility Determines Agent Capability **Ramp's discovery:** > "Environment legibility is key. Claude thrives with the clean and well-structured omniscience of RCT's built-in monitoring and control surfaces, and clearly struggles with text-based renderings of game space." **What worked in RCT:** - Park financials (structured data) - Guest feedback (text aggregates) - Ride stats (numerical metrics) - Staff management (clear APIs) **What didn't work in RCT:** - ASCII spatial maps (text representation of 2D/3D space) - Pathfinding (spatial reasoning from grid coordinates) - Ride placement (visual task forced through text interface) **The pattern:** **When RCT provided structured data through clear APIs, Claude excelled.** **When RCT forced spatial reasoning through text abstraction, Claude failed.** **The voice AI design philosophy:** Voice AI for demos was built on the same realization. **What voice AI DOESN'T do:** - Parse screenshots to understand page layout - Infer UI structure from visual appearance - Reconstruct spatial relationships from pixel data - **Force visual tasks through bad interfaces** **What voice AI DOES do:** - Read DOM tree for structured page data - Parse element attributes for semantic meaning - Use ARIA labels and accessibility tree - **Use interfaces designed for programmatic access** **Why this works:** **Ramp's RCT insight:** "Inventive text-based representations are fun to imagine, but probably impractical in most cases. This is a boundary for many agent tasks." **Voice AI's approach:** Don't use inventive representations. Use the interface that already exists (DOM). ### Lesson #2: Agents Automate Diligence, Not Intelligence **Ramp's framing:** > "For this reason, we prefer to think of agents as automating diligence, rather than intelligence, for operational challenges." **What this means in RCT:** Claude Code doesn't "outsmart" the player. It **out-diligences** them. **Tasks Claude handles better than humans:** - Monitoring 100 data points simultaneously - Cross-referencing guest complaints with ride stats - Optimizing pricing across dozens of shops - Tracking staff allocation patterns - **Sustained attention to operational metrics** **Tasks Claude doesn't replace:** - Creative coaster design (requires spatial intuition) - Strategic park layout (requires visual planning) - Aesthetic decisions (subjective judgment) - **High-level creative work** **The insight:** **Agents excel at operational diligence (monitoring, adjusting, optimizing).** **Agents struggle with creative intuition (design, aesthetics, novel solutions).** **The voice AI application:** Voice AI for product demos operates in the "diligence" tier. **What voice AI automates:** - Monitoring user questions for intent signals - Cross-referencing questions with current page state - Providing contextual guidance based on DOM analysis - Tracking common confusion patterns - **Sustained attention to user behavior** **What voice AI doesn't replace:** - Product design decisions - UX strategy - Feature prioritization - **High-level product work** **The parallel:** **RCT Claude Code:** Automates operational park management (pricing, staffing, maintenance). **Voice AI:** Automates operational user guidance (answering questions, providing context). **Both free humans for higher-level work by handling repetitive diligence tasks.** ### Lesson #3: Feedback Loops Trump Raw Capability **Ramp's development insight:** > "Development Loops. Coding agents thrive on feedback loops where they can prove to themselves that the implementation is complete and correct. This project was hampered by a broken iteration loop in which we had to manually QA the in-game terminal experience and functionality. Vibe coding is slow and frustrating when QA is a manual process." **What this reveals:** **Even with massive intelligence (Claude Opus 4.5), agents fail without tight feedback loops.** **The RCT problem:** 1. Claude Code implemented new features 2. Required manual in-game testing to verify 3. No automated QA feedback 4. **Result: Slow iteration, many bugs** **Ramp's solution:** > "In-game Claude was my diligent playtester, and an integrated bug-report tool let it write bug reports directly into the repo for easy review by coding agents. This feedback loop was invaluable." **The insight:** **Feedback loops matter more than raw intelligence.** **The voice AI architecture:** Voice AI for demos was designed with feedback loops from day one. **User question feedback loop:** ``` User asks question → Voice AI detects intent → Provides guidance → User completes task → Voice AI learns common workflows ``` **Product team feedback loop:** ``` Voice AI tracks questions → Product team sees common confusion → UI improvements → Fewer questions needed ``` **DOM validation feedback loop:** ``` Voice AI reads DOM → Identifies elements → User confirms guidance worked → Voice AI validates element detection accuracy ``` **The advantage:** **RCT Claude Code:** Broke feedback loops (manual QA, no automated validation). **Voice AI:** Built feedback loops into core architecture (question tracking, DOM validation, task completion signals). **Result: Voice AI improves through usage. RCT Claude Code needed human intervention to improve.** ## What the HN Discussion Reveals About Agent Design The 127 comments on Ramp's RCT experiment show two groups: ### People Who Understand the Interface Lesson > "This is brilliant. The key takeaway is that Claude's limits aren't about intelligence—they're about how information is presented." > "The fact that Claude handles park management perfectly but fails at spatial tasks proves the interface is the bottleneck, not the model." > "This validates what we've seen elsewhere: agents need structured APIs, not creative workarounds." **The pattern:** These commenters recognize that **agent capability is bounded by interface design, not model intelligence.** ### People Who Think It's About Model Limitations > "Wait until GPT-5 or Claude Opus 5. Better models will handle spatial reasoning." > "This just proves LLMs can't do spatial tasks yet. Give it another year." > "The problem is Claude doesn't have vision access to the game screen." **The misunderstanding:** These commenters think **more intelligence solves bad interfaces.** **Ramp's actual conclusion:** > "The limiting factor for general-purpose agents is the legibility of their environments, and the strength of their interfaces." **Why the "better models" argument misses the point:** **Scenario:** Give Claude Opus 5 access to RCT through ASCII maps. **Prediction:** Still struggles with spatial tasks (bad interface remains bad). **Scenario:** Give current Claude Code access to RCT through structured APIs (ride positions as coordinates, pathways as graph data). **Prediction:** Handles spatial tasks far better (good interface compensates). **The insight:** **Interface design multiplies agent capability more than raw intelligence does.** ### The Voice AI Validation **One commenter gets it:** > "This is why DOM-based agents work better than screenshot-parsing agents. You're not fighting the interface—you're using the interface that already exists for programmatic access." **Exactly.** **Screenshot-parsing approach:** Force visual task through model capabilities. **DOM-based approach:** Use interface designed for programmatic access. **ASCII map approach (RCT):** Force spatial task through text abstraction. **Structured API approach (what Ramp wishes they had):** Use interface designed for programmatic spatial data. **Voice AI chose the DOM path from day one.** ## The Bottom Line: Ramp Spent Weeks Proving What Voice AI Knew From Day One Ramp's Claude Code RCT experiment is a masterclass in agent design. **The three discoveries:** 1. **Interface legibility determines capability** (structured data > text abstraction) 2. **Agents automate diligence, not intelligence** (operational tasks > creative intuition) 3. **Feedback loops trump raw capability** (tight iteration > manual QA) **Voice AI for product demos was built on all three principles:** **Principle #1 applied:** - Use DOM (structured interface) instead of screenshots (visual abstraction) - Read accessibility tree instead of parsing pixels - **Result: High-legibility environment** **Principle #2 applied:** - Automate operational guidance (answering questions, providing context) - Don't replace product design or UX strategy - **Result: Agents handle diligence, humans handle creativity** **Principle #3 applied:** - Built-in question tracking (usage feedback) - DOM validation loops (accuracy feedback) - Task completion signals (success feedback) - **Result: System improves through usage** **The progression:** **Ramp's journey:** Build RCT experiment → Discover interface legibility matters → Conclude agents automate diligence. **Voice AI's journey:** Start with interface legibility assumption → Build for operational diligence → Validate through feedback loops. **The difference:** **Ramp proved the principle through experimentation.** **Voice AI shipped the principle as product.** --- **Ramp put Claude Code in Rollercoaster Tycoon and discovered that agents fail from bad interfaces, not lack of intelligence.** **Voice AI for demos was designed on this exact assumption:** **Don't force visual tasks through bad interfaces (screenshots).** **Use interfaces designed for programmatic access (DOM).** **Don't claim to automate intelligence (creative design).** **Automate operational diligence (contextual guidance).** **Don't break feedback loops (manual QA).** **Build feedback loops into architecture (question tracking, DOM validation).** **Ramp's experiment validates what voice AI already knew:** **The limiting factor for agents isn't intelligence—it's interface legibility.** **And products with high-legibility interfaces (DOM, structured APIs) unlock agent capability that products with low-legibility interfaces (visual parsing, spatial reasoning) never achieve.** **RCT taught Ramp that agents automate diligence through good interfaces.** **Voice AI teaches products the same lesson—just for user onboarding instead of theme parks.** **Both prove the same principle:** **Interfaces matter more than intelligence.** **And the products that win aren't the ones with the smartest agents—they're the ones with the most legible interfaces.** --- **Want to see interface-first agent design in action?** Try voice-guided demo agents: - Reads DOM (not screenshots) for structured page data - Provides contextual guidance based on current page state - Automates operational diligence (answers questions, guides workflows) - Includes built-in feedback loops (question tracking, DOM validation) - **Built on Ramp's RCT lesson: interface legibility determines agent capability** **Built with Demogod—AI-powered demo agents proving that the best interfaces don't require the smartest models, and the smartest models can't compensate for bad interfaces.** *Learn more at [demogod.me](https://demogod.me)*