Soft Delete Pollutes Live Tables With Dead Data—Voice AI for Demos Proves Why Guidance Must Read Clean DOM, Not Mixed Hallucinations

# Soft Delete Pollutes Live Tables With Dead Data—Voice AI for Demos Proves Why Guidance Must Read Clean DOM, Not Mixed Hallucinations *Hacker News #12 (85 points, 57 comments, 4hr): Database engineer reveals soft delete's hidden cost—archived_at columns make queries complex, indexes bloated, migrations risky, and restoration buggy. The solution: separate live data from archived data. Voice AI applies the same pattern to demos.* --- ## The Simple Solution That Oozes Complexity Everywhere A developer writes about their experience with soft delete patterns: "Adding an `archived_at` column seems to ooze complexity out into queries, operations, and applications. Recovering deleted records does happen, but 99% of archived records are never going to be read." The pattern is seductive: **Instead of:** ```sql DELETE FROM users WHERE id = '123'; -- User is gone forever ``` **You do:** ```sql UPDATE users SET archived_at = NOW() WHERE id = '123'; -- User still exists, just "archived" ``` **The pitch:** Customers can recover accidentally deleted data. Customer support teams don't need to restore from backups. Compliance requirements are met. **The reality:** Your database tables become graveyards full of dead data that every query must step over to find what's actually alive. ## The Seven Ways Soft Delete Creates Complexity ### Problem #1: Every Query Gets More Complex **Without soft delete:** ```sql SELECT * FROM users WHERE email = 'user@example.com'; ``` **With soft delete:** ```sql SELECT * FROM users WHERE email = 'user@example.com' AND archived_at IS NULL; ``` The article: "Applications need to make sure they always avoid the archived data that's sitting right next to the live data. Manual queries run for debugging or analytics are longer and more complicated. There's always a risk that archived data accidentally leaks in when it's not wanted." Every. Single. Query. Must remember to filter out the dead rows. ### Problem #2: Indexes Become Bloated When you have millions of archived rows and thousands of live rows, your indexes contain 99% dead data. **The index structure:** ``` users_email_idx: - deleted_user_1@example.com → archived_at: 2024-03-15 - deleted_user_2@example.com → archived_at: 2024-04-20 - deleted_user_3@example.com → archived_at: 2024-05-10 ... - live_user_1@example.com → archived_at: NULL - live_user_2@example.com → archived_at: NULL ``` The article: "Indexes need to be careful to avoid archived rows." PostgreSQL must scan past millions of archived entries to find the handful of live users. Your index is no longer an optimization—it's a performance liability. ### Problem #3: Backups Become Massive and Slow The article: "If your project is popular, you might have a giant database full of dead data that takes a long time to recreate from a dump file." **The scenario:** You need to restore a database backup (hopefully for testing, not because production died at 11 AM). **Your backup contains:** - 10,000 live user records - 5,000,000 archived user records (from years of deletions) Restoring takes hours. 99.8% of that time is spent loading data nobody will ever read. ### Problem #4: Migrations Become Risky The article: "Migrations may involve more than just schema changes – perhaps you need to fix a mistake with default values, or add a new column and backfill values. Is that going to work on records from 2 years ago? I've done migrations where these questions were not trivial to answer." **The migration:** ```sql ALTER TABLE users ADD COLUMN account_tier TEXT NOT NULL DEFAULT 'free'; ``` **The problem:** You're backfilling values for 5 million archived users who had accounts when your product worked completely differently. Do those old accounts even have valid `account_tier` values? Should `archived_at` users get the new column at all? Your migration code must handle edge cases for data that's effectively dead. ### Problem #5: Deletion Creates More Dead Rows The article mentions APIs that "didn't work well with Terraform, so Terraform would delete + recreate records on every run, and over time that led to millions of dead rows." **The pattern:** ```bash terraform apply # Deletes 100 records (marks archived_at) # Creates 100 new records # Net result: 100 live records, 100 new dead records terraform apply # (2nd run) # Deletes 100 records (marks archived_at) # Creates 100 new records # Net result: 100 live records, 200 dead records terraform apply # (100th run) # Net result: 100 live records, 10,000 dead records ``` Over time, your "live" table is 99% graveyard. ### Problem #6: Restoration Code Is Always Buggy The article: "Restoring an archived record is not always as simple as just running `SET archived_at = null` – creating a record may involve making calls to external systems as well." **The restoration workflow:** 1. User deletes account (sets `archived_at`) 2. Deleting account cancels Stripe subscription 3. Deleting account removes user from email list 4. Deleting account revokes API keys 5. User wants to restore account (sets `archived_at = NULL`) 6. **Now what?** The article: "I've seen complex restoration code that was always a buggy, partial implementation of the 'create' API endpoint. In the end, we removed the specialized restoration code and required all restoration to go through the standard APIs – that simplified the server implementation, and ensured that old data that had since become invalid, could not be restored incorrectly – it needs to pass the new validation rules." Restoration becomes its own feature with its own bugs. ### Problem #7: Nobody Cleans Up the Archive The article: "Hopefully, the project decided on a retention period in the beginning, and set up a periodic job to clean up those rows. Unfortunately, I'd bet that a significant percentage of projects did neither – it's really easy to ignore the archived data for a long time." **Reality check:** How many projects with `archived_at` columns actually have automated cleanup jobs running? Most teams: 1. Add `archived_at` column for "soft delete" 2. Ship feature 3. Never think about it again 4. Discover years later they have 10 million archived rows ## The Pattern: Mixing Live Data With Dead Data Creates Overhead Everywhere The article's core insight: "I'm not a fan of the `archived_at` column approach. It's simple at first, but in my experience, it's full of pitfalls down the line." **Why it fails:** Live data and archived data have different access patterns: - **Live data:** Read constantly, updated frequently, must be fast - **Archived data:** Read rarely (if ever), never updated, can be slow Storing them in the same table forces every operation to handle both: - Queries must filter out archived rows - Indexes must skip archived entries - Migrations must handle archived edge cases - Backups must dump archived bytes **The overhead is constant and unavoidable.** ## The Three Alternative Approaches (And Why Triggers Win) The article proposes three solutions for separating live data from archived data: ### Solution #1: Application-Level Events (SQS + S3) **How it works:** - When a record is deleted, application emits an event to SQS - Background service consumes event and writes archived record to S3 **Benefits:** - Primary database stays clean (no archived rows) - Async processing improves performance - Archived records serialized in application-friendly JSON format **Tradeoffs:** - More infrastructure to operate (SQS, background services) - Risk of bugs losing archived data (happened multiple times) - Archived S3 objects not easy to query without extra tooling ### Solution #2: Database Triggers (Separate Archive Table) **How it works:** - Create generic `archive` table that stores JSON blobs: ```sql CREATE TABLE archive ( id UUID PRIMARY KEY, table_name TEXT NOT NULL, record_id TEXT NOT NULL, data JSONB NOT NULL, archived_at TIMESTAMPTZ NOT NULL DEFAULT NOW() ); ``` - Attach `BEFORE DELETE` trigger to every table: ```sql CREATE TRIGGER archive_users BEFORE DELETE ON users FOR EACH ROW EXECUTE FUNCTION archive_on_delete(); ``` - Trigger function converts deleted row to JSON and inserts into `archive` **Benefits:** - Live tables stay clean (no `archived_at` columns, no dead rows) - Queries don't need to filter archived records - Indexes stay efficient - Migrations only deal with live data - Backups of main tables are smaller - Archive table can live in separate tablespace or be partitioned **Tradeoffs:** - Triggers add overhead to deletes (but deletes are typically infrequent) - Archive table grows (but cleanup is trivial: `WHERE archived_at < NOW() - INTERVAL '90 days'`) ### Solution #3: WAL-Based CDC (Debezium/Kafka) **How it works:** - PostgreSQL writes every change to Write-Ahead Log (WAL) - CDC tool (Debezium) reads WAL and streams changes to Kafka - Consumer filters for DELETE events and writes archived records to storage **Benefits:** - Captures all changes without modifying application code or triggers - Can stream to any destination (S3, Elasticsearch, data warehouses) - Primary database has no additional query load **Tradeoffs:** - Significant operational complexity (Kafka cluster, Debezium, monitoring) - Risk to primary database if consumers fall behind (WAL fills disk) - Schema changes require coordination between source and consumers ## Why Triggers Are the Goldilocks Solution The article's conclusion: "If I were starting a new project today and needed soft delete, I'd reach for the trigger-based approach first. It's simple to set up, keeps live tables clean, and doesn't require extra infrastructure." **Why triggers win:** 1. **Clean separation:** Live data in `users` table, archived data in `archive` table 2. **No query complexity:** `SELECT * FROM users` returns only live users (no `WHERE archived_at IS NULL`) 3. **Efficient indexes:** Indexes only contain live data 4. **Simple cleanup:** `DELETE FROM archive WHERE archived_at < NOW() - INTERVAL '90 days'` 5. **No extra infrastructure:** No SQS, no Kafka, no background services 6. **Easy to query:** Archived data is in PostgreSQL, query it with SQL when needed **The core principle:** Separate live data from archived data at the storage layer. ## The Parallel: Chatbot Demos Mix Live DOM With Dead Hallucinations The soft delete problem mirrors the chatbot demo problem exactly: ### Soft Delete Problem (Database) **Pattern:** - Live user records + archived user records in same `users` table - Every query must filter `WHERE archived_at IS NULL` - Indexes bloated with 99% archived data - Migrations must handle archived edge cases **Why it fails:** Mixing live data with dead data creates overhead everywhere. ### Chatbot Demo Problem (User Interface) **Pattern:** - Actual page content + hallucinated AI responses in same conversation - User must verify every AI statement against actual page - Memory bloated with conversation history - Page updates (DOM changes) don't update AI's outdated responses **Why it fails:** Mixing live DOM with dead hallucinations creates overhead everywhere. ## The Three Patterns of Mixed Data ### Pattern #1: Soft Delete Mixes Live Users With Archived Users **The `users` table:** ``` | id | email | archived_at | |-----|-------------------|-------------| | 001 | user@example.com | NULL | ← LIVE | 002 | old@example.com | 2024-01-15 | ← DEAD | 003 | gone@example.com | 2024-02-20 | ← DEAD | 004 | active@test.com | NULL | ← LIVE ``` **Every query must filter:** ```sql SELECT * FROM users WHERE archived_at IS NULL; ``` **The overhead:** Database must scan past dead rows to find live data. ### Pattern #2: Chatbot Demos Mix Live DOM With Dead Responses **The conversation context:** ``` [ { role: "user", content: "What features does this plan have?" }, { role: "assistant", content: "The Pro plan includes 10GB storage..." }, ← DEAD (DOM changed) { role: "user", content: "What about the Enterprise plan?" }, { role: "assistant", content: "Enterprise includes 100GB..." }, ← DEAD (hallucinated) { role: "user", content: "Does Pro include API access?" }, { role: "assistant", content: "No, API access is Enterprise only" } ← DEAD (wrong) ] ``` **Every user question must verify:** - Is this response based on current DOM? - Is this response hallucinated? - Has the page changed since this response? **The overhead:** User must fact-check AI responses against actual page. ### Pattern #3: Both Create Separation Problems **Database soft delete:** - Live data: Fast reads required - Archived data: Rarely read, can be slow - Stored together → Live reads become slow **Chatbot demos:** - Live DOM: Current page state, accurate - AI responses: Generated from past context, potentially stale/wrong - Mixed together → Users can't trust responses ## Why Voice AI Reads Clean DOM (No Hallucination Mixing) The trigger-based archive solution reveals why voice AI must read the DOM directly: ### Trigger Approach = Clean Separation **Database design:** ```sql CREATE TABLE users ( id UUID PRIMARY KEY, email TEXT NOT NULL -- No archived_at column ); CREATE TABLE archive ( id UUID PRIMARY KEY, table_name TEXT, record_id TEXT, data JSONB ); ``` **The pattern:** - Live users in `users` table - Archived users in `archive` table - Never mixed **Voice AI design:** ```javascript // Read live DOM structure const headings = document.querySelectorAll('h1, h2, h3'); const buttons = document.querySelectorAll('button'); const forms = document.querySelectorAll('form'); // Describe what exists NOW voiceAI.say(`The pricing page shows three plans: ${headings.map(h => h.textContent).join(', ')}`); // No conversation history // No hallucinated responses // No archived answers ``` **The pattern:** - Live DOM in browser - Voice AI reads current state - Never mixes with past responses ## The Three Reasons Separation Beats Mixing ### Reason #1: Live Data Performance Isn't Penalized by Dead Data **Soft delete (mixed):** - Query: `SELECT * FROM users WHERE archived_at IS NULL;` - Database scans 5,000,000 archived rows + 10,000 live rows - Index contains 99.8% dead data - Performance degrades as archived data grows **Trigger approach (separated):** - Query: `SELECT * FROM users;` - Database scans 10,000 live rows - Index contains 100% live data - Performance constant regardless of archive size **Voice AI (separated):** - User asks: "What plans are available?" - Voice AI reads current DOM (headings, prices) - No conversation history to search - Response time constant regardless of session length ### Reason #2: Cleanup Is Trivial **Soft delete (mixed):** ```sql DELETE FROM users WHERE archived_at < NOW() - INTERVAL '90 days'; -- Must search entire table to find old archived rows -- Impacts live queries while running ``` **Trigger approach (separated):** ```sql DELETE FROM archive WHERE archived_at < NOW() - INTERVAL '90 days'; -- Only touches archive table -- Zero impact on live queries ``` **Voice AI (separated):** - No cleanup needed - Each request reads fresh DOM - No conversation history to expire ### Reason #3: Migrations Don't Touch Dead Data **Soft delete (mixed):** ```sql ALTER TABLE users ADD COLUMN account_tier TEXT NOT NULL DEFAULT 'free'; -- Backfills value for 5,000,000 archived users -- Must handle edge cases for archived data from 2 years ago ``` **Trigger approach (separated):** ```sql ALTER TABLE users ADD COLUMN account_tier TEXT NOT NULL DEFAULT 'free'; -- Only touches 10,000 live users -- Archived data preserved as JSON snapshot, never migrated ``` **Voice AI (separated):** - Page structure changes (new pricing plan added) - Voice AI reads new DOM structure - No need to "migrate" past responses to new schema ## The Restoration Problem Proves Why Generation Is Dangerous The article reveals a critical insight about restoration: "Restoring an archived record is not always as simple as just running `SET archived_at = null` – creating a record may involve making calls to external systems as well. I've seen complex restoration code that was always a buggy, partial implementation of the 'create' API endpoint." **The restoration workflow failure:** 1. User deletes account (sets `archived_at`, cancels Stripe subscription, revokes API keys) 2. User wants to restore account (sets `archived_at = NULL`) 3. Restoration code must recreate Stripe subscription, regenerate API keys, re-add to email lists 4. **Restoration code is always incomplete or buggy** 5. Team gives up and requires users to create new accounts instead **The lesson:** Recreating state from archived data is harder than it looks. ### Chatbot Restoration Problem **The same failure mode:** 1. User asks: "What's included in the Pro plan?" 2. Chatbot generates response: "Pro includes 10GB storage, 5 users, email support" 3. Product team changes Pro plan (now 20GB storage, 10 users, chat support) 4. User asks follow-up: "Can I upgrade later?" 5. Chatbot must "restore" context: Does it use old plan details (10GB) or new plan details (20GB)? 6. **Chatbot generates inconsistent response mixing old and new information** **The lesson:** Regenerating answers from mixed context (old conversation + new DOM) is unreliable. ### Voice AI Avoids Restoration Entirely **Voice AI approach:** 1. User asks: "What's included in the Pro plan?" 2. Voice AI reads current DOM: `document.querySelector('.pro-plan').textContent` 3. Voice AI describes: "Pro includes 20GB storage, 10 users, chat support" 4. Product team changes Pro plan 5. User asks follow-up: "Can I upgrade later?" 6. Voice AI reads current DOM again: Always fresh, always accurate **No restoration needed. No inconsistency possible.** ## The Foreign Key Cascade Problem The article introduces an advanced problem with soft delete: foreign key cascades. **The scenario:** ```sql -- User has many documents CREATE TABLE users (id UUID PRIMARY KEY); CREATE TABLE documents ( id UUID PRIMARY KEY, user_id UUID REFERENCES users(id) ON DELETE CASCADE ); ``` **Without soft delete:** ```sql DELETE FROM users WHERE id = '123'; -- PostgreSQL automatically deletes all documents for user 123 ``` **With soft delete:** ```sql UPDATE users SET archived_at = NOW() WHERE id = '123'; -- Documents are NOT automatically archived -- You must manually find and archive all related documents ``` **The complexity:** ```sql -- Application must manually cascade soft delete UPDATE documents SET archived_at = NOW() WHERE user_id = '123'; UPDATE comments SET archived_at = NOW() WHERE document_id IN ( SELECT id FROM documents WHERE user_id = '123' ); UPDATE attachments SET archived_at = NOW() WHERE comment_id IN ( SELECT id FROM comments WHERE document_id IN ( SELECT id FROM documents WHERE user_id = '123' ) ); -- And so on for every related table... ``` **The trigger solution:** The article shows how triggers handle this using session variables: ```sql CREATE OR REPLACE FUNCTION archive_on_delete() RETURNS TRIGGER AS $$ DECLARE cause_table TEXT; cause_id TEXT; BEGIN -- Track what caused this deletion cause_table := current_setting('archive.cause_table', true); IF cause_table IS NULL THEN -- This is the root delete, set ourselves as cause PERFORM set_config('archive.cause_table', TG_TABLE_NAME, true); PERFORM set_config('archive.cause_id', OLD.id::TEXT, true); END IF; -- Archive with cause information INSERT INTO archive (table_name, record_id, data, caused_by_table, caused_by_id) VALUES (TG_TABLE_NAME, OLD.id::TEXT, to_jsonb(OLD), cause_table, cause_id); RETURN OLD; END; $$ LANGUAGE plpgsql; ``` **Now:** ```sql DELETE FROM users WHERE id = '123'; -- PostgreSQL cascades to documents, comments, attachments -- Each delete fires trigger with cause tracking ``` **Query archived cascade:** ```sql SELECT * FROM archive WHERE caused_by_table = 'users' AND caused_by_id = '123'; -- Returns all documents, comments, attachments archived due to user deletion ``` ## Why Chatbot Demos Have No Cascade Tracking The foreign key cascade problem reveals a critical gap in chatbot demos: **Chatbot conversation:** ``` User: "What features are in the Pro plan?" Chatbot: "Pro includes 10GB storage, 5 users, email support" ↓ (related to "Pro plan") User: "How much does Pro cost?" Chatbot: "$29/month" ↓ (related to "Pro plan") User: "Can I add more users?" Chatbot: "Yes, $5 per additional user" ↓ (related to "Pro plan") ``` **Product team changes Pro plan:** - 10GB → 20GB storage - 5 users → 10 users - $29/month → $39/month **Problem:** Chatbot has no "cascade tracking" to invalidate all answers related to Pro plan. **User asks:** "So I can get 10 users for $29/month?" **Chatbot:** [Generates response mixing old context with new reality] **No cascade occurs.** Past responses about Pro plan remain in conversation context even though the plan details changed. ### Voice AI Has Implicit Cascade Via DOM Reading **Voice AI approach:** ``` User: "What features are in the Pro plan?" Voice AI: [Reads .pro-plan DOM] "Pro includes 20GB storage, 10 users, chat support" User: "How much does Pro cost?" Voice AI: [Reads .pro-plan .price DOM] "$39/month" User: "Can I add more users?" Voice AI: [Reads .pro-plan .add-ons DOM] "Yes, $5 per additional user" ``` **Product team changes Pro plan:** - DOM updates automatically - Every voice AI response re-reads current DOM - **Cascade is implicit: All answers automatically reflect new plan details** **No tracking needed. Reading reality provides cascade for free.** ## The Replica That Doesn't Process Deletes (Thought Experiment) The article ends with an interesting thought experiment: "What if you kept a PostgreSQL replica (e.g. using logical replication) that just didn't process DELETE queries? Would it effectively accumulate records and updates without conflict over time?" **The idea:** - Primary database: Processes INSERT, UPDATE, DELETE normally - Archive replica: Processes INSERT and UPDATE, **ignores DELETE** - Result: Archive replica has all records ever created, never deleted **Benefits:** - Archive is fully queryable (it's a real PostgreSQL database) - Finding deleted records is easy (query the replica) - No `archived_at` columns needed **The transformation variation:** Instead of ignoring DELETE entirely: ```sql -- Transform DELETE into UPDATE on replica DELETE FROM users WHERE id = '123'; -- Becomes on replica: UPDATE users SET deleted_at = NOW() WHERE id = '123'; ``` **Now you can query:** ```sql -- Show deleted users SELECT * FROM users WHERE deleted_at IS NOT NULL; -- Show users deleted in last hour SELECT * FROM users WHERE deleted_at > NOW() - INTERVAL '1 hour'; ``` **Tradeoffs:** - Cost: Running a full replica is expensive - Complexity: Managing logical replication and transformation rules - Migrations: Does the replica run migrations on deleted data? ## Why Voice AI Is Like the Replica That Only Processes Current State The replica thought experiment reveals the voice AI pattern: **Archive replica approach:** - Primary DB: Contains current state (live users) - Archive replica: Contains all historical state (live + deleted users) - Application reads from primary (clean, current data) - Customer support reads from replica (can see deleted users) **Voice AI approach:** - Page DOM: Contains current state (actual content) - LLM training data: Contains all historical web content - Voice AI reads from DOM (clean, current page structure) - LLM context doesn't pollute guidance (no conversation mixing) **The parallel:** - Database replica separates live state from historical state - Voice AI separates current DOM from LLM knowledge Both avoid mixing live data with archived data. ## The Verdict: Separate Live Data from Dead Data The HN article proves that soft delete's core problem is **mixing live data with archived data in the same table**. The solution: **Separate them at the storage layer** (triggers, CDC, replicas). The lesson for demo guidance: **Separate live DOM from dead hallucinations**. **Soft delete pattern (mixing):** - Live users + archived users in same table - Every query must filter `WHERE archived_at IS NULL` - Performance degrades as archives grow - Migrations handle dead data - Restoration code is buggy **Trigger pattern (separation):** - Live users in `users` table - Archived users in `archive` table - Queries are simple: `SELECT * FROM users` - Performance constant - Migrations only touch live data - Archived data preserved as-is **Chatbot demo pattern (mixing):** - Live DOM + hallucinated responses in same conversation - Every answer must be verified against page - Accuracy degrades as conversation grows - Context updates don't fix past responses - Inconsistency inevitable **Voice AI pattern (separation):** - Live DOM in browser - Voice AI reads current state - No conversation history - Accuracy constant - DOM updates automatically reflected - Hallucinations impossible ## The Three Lessons from Soft Delete for Demo Guidance ### Lesson #1: Mixing Live and Dead Data Creates Constant Overhead **Soft delete:** Every query must filter `WHERE archived_at IS NULL` forever. **Chatbot demos:** Every response must be verified against page forever. **The cost:** Overhead doesn't decrease. It's permanent. ### Lesson #2: Separation Makes Cleanup Trivial **Trigger approach:** `DELETE FROM archive WHERE archived_at < NOW() - INTERVAL '90 days'` (doesn't impact live tables) **Voice AI:** No cleanup needed (reads fresh DOM every time) **The benefit:** Maintenance is either trivial or unnecessary. ### Lesson #3: Live Data Performance Shouldn't Depend on Archive Size **Soft delete:** As archived rows grow, live queries slow down (indexes bloated with dead data). **Voice AI:** As session length grows, guidance stays fast (no conversation history to search). **The principle:** Performance of live operations must be independent of historical data volume. ## The Alternative: `archived_at` Columns Everywhere Imagine if the article's advice was: **Bad database advice:** - Add `archived_at` columns to every table - Make every query filter `WHERE archived_at IS NULL` - Let archived rows accumulate forever - Write buggy restoration code that never works **Why this fails:** Mixing live and archived data creates overhead that compounds with every feature. Demo guidance has the same failure mode: **Bad demo guidance:** - Generate responses from LLM training data - Make users verify every response against page - Let conversation history accumulate - Write inconsistent follow-ups that mix old and new info **Why this fails:** Mixing live DOM with dead hallucinations creates overhead that compounds with every question. Voice AI provides the separated version: **Strategic demo guidance:** - Read DOM structure directly (no generation) - Users trust accuracy immediately (no verification needed) - No conversation history (stateless guidance) - Consistent responses (always reflects current DOM) **Why this works:** Separation avoids overhead entirely. ## The Pattern: Storage Layer Separation Prevents Application Layer Complexity The article's trigger solution reveals a critical principle: **Storage layer separation (database):** ```sql -- GOOD: Separate tables CREATE TABLE users (id, email); CREATE TABLE archive (table_name, record_id, data); -- BAD: Mixed table CREATE TABLE users (id, email, archived_at); ``` **When separation happens at storage layer:** - Application code is simple: `SELECT * FROM users` (no filtering) - Indexes are efficient: Only contain live data - Migrations are safe: Only touch live data - Performance is predictable: Doesn't degrade with archive size **When separation happens at application layer:** - Application code is complex: Every query filters `WHERE archived_at IS NULL` - Indexes are bloated: Contain 99% dead data - Migrations are risky: Must handle archived edge cases - Performance degrades: Scales with archive size **The lesson:** Architectural separation at the storage layer prevents complexity from leaking into every application operation. ### Demo Guidance Has the Same Choice **Storage layer separation (voice AI):** - Read DOM directly (storage = browser's DOM tree) - Application code is simple: Query selectors return current elements - No conversation history (no mixed state) - Performance is constant (DOM queries are fast) **Application layer separation (chatbot):** - Generate responses with LLM - Application code is complex: Must verify responses against page, detect stale info, handle inconsistencies - Conversation history mixed with page state - Performance degrades: Context window fills with conversation **The principle:** When you separate at the right layer, complexity disappears. ## Why Reading Reality Beats Archiving Hallucinations The article's conclusion applies to demo guidance: **Database expert:** "If I were starting a new project today and needed soft delete, I'd reach for the trigger-based approach first." **Why triggers:** Separate live data from archived data, keep queries simple, maintain performance. **Demo guidance expert:** "If I were starting a new project today and needed demo guidance, I'd reach for DOM reading first." **Why DOM reading:** Separate live page state from LLM context, keep guidance accurate, maintain trust. Both patterns choose **separation over mixing**. The author: "Adding an `archived_at` column seems to ooze complexity out into queries, operations, and applications." Voice AI avoids this: Reading the DOM directly keeps demo guidance simple, accurate, and fast. **No mixing. No overhead. No hallucinations.** --- *Demogod's voice AI reads your site's DOM directly—like database triggers that separate live data from archived data. One line of code. Zero soft-delete complexity. [Try it on your site](https://demogod.me).*