PostgreSQL Experts Read EXPLAIN ANALYZE, Not Guesses—Voice AI for Demos Proves Why Guidance Must Read DOM, Not Generate Responses

# PostgreSQL Experts Read EXPLAIN ANALYZE, Not Guesses—Voice AI for Demos Proves Why Guidance Must Read DOM, Not Generate Responses *Hacker News #4 (146 points, 16 comments, 5hr): Database expert shares unconventional optimization techniques. The pattern: Read actual execution plans (`EXPLAIN ANALYZE`), read check constraints, read statistics—never guess query performance. Demo guidance works the same way: Read DOM structure, don't generate responses.* --- ## The Three Unconventional Optimizations (And Why They All Read Reality) A PostgreSQL expert shared three unconventional optimization techniques on Hacker News. Each technique follows the same pattern: **Read what actually exists, don't guess or assume.** ### Technique #1: Eliminate Full Table Scans Based on Check Constraints **The Problem:** You have a users table with a check constraint: ```sql CREATE TABLE users ( id INT PRIMARY KEY, username TEXT NOT NULL, plan TEXT NOT NULL, CONSTRAINT plan_check CHECK (plan IN ('free', 'pro')) ); ``` An analyst writes this query: ```sql SELECT * FROM users WHERE plan = 'Pro'; -- Capital 'P' ``` **The result:** 0 rows (the plan is 'pro' with lowercase 'p') **The cost:** PostgreSQL scans the entire table! Even though the check constraint guarantees no row can have the value 'Pro', the database still checks every row. **The Fix:** ```sql SET constraint_exclusion to 'on'; ``` Now PostgreSQL **reads the check constraint** before executing the query. It realizes the condition `plan = 'Pro'` will never return rows, so it skips the scan entirely. **Execution time:** 7.4ms → 0.008ms **The principle:** Read constraints (reality about data rules) instead of scanning tables (guessing what exists). ### Technique #2: Optimize for Lower Cardinality with Function-Based Indexes **The Problem:** You have 10 million sales records with timestamps: ```sql CREATE TABLE sale ( id INT PRIMARY KEY, sold_at TIMESTAMPTZ NOT NULL, charged INT NOT NULL ); ``` Analysts produce daily reports: ```sql SELECT date_trunc('day', sold_at AT TIME ZONE 'UTC'), SUM(charged) FROM sale WHERE '2025-01-01 UTC' <= sold_at AND sold_at < '2025-02-01 UTC' GROUP BY 1; ``` **Full table scan:** 627ms **After adding B-Tree index on `sold_at`:** 187ms **Index size:** 214 MB (almost half the table size!) **The insight:** Analysts want **daily** reports, but you indexed **millisecond** precision. You're giving them far more granularity than they need. **The unconventional fix:** Create a function-based index on just the **date**, not the full timestamp: ```sql CREATE INDEX sale_sold_at_date_ix ON sale((date_trunc('day', sold_at AT TIME ZONE 'UTC'))::date); ``` **Index size:** 66 MB (3x smaller) **Query time:** 145ms (faster than the large index!) **The problem:** Function-based indexes are fragile. If the analyst writes the expression slightly differently, the database won't use the index. **The solution:** Virtual generated columns (PostgreSQL 18+): ```sql ALTER TABLE sale ADD sold_at_date DATE GENERATED ALWAYS AS (date_trunc('day', sold_at AT TIME ZONE 'UTC')); ``` Now analysts use `sold_at_date` in queries, and the database is guaranteed to use the index. No discipline required. **The principle:** Read the actual cardinality (distinct dates, not distinct timestamps) instead of abstracting away precision. ### Technique #3: Enforce Uniqueness with Hash Indexes **The Problem:** You store URLs to avoid reprocessing the same page twice: ```sql CREATE TABLE urls ( id INT PRIMARY KEY, url TEXT NOT NULL, data JSON ); CREATE UNIQUE INDEX urls_url_unique_ix ON urls(url); ``` **Table size:** 160 MB **Index size:** 154 MB (almost the entire table!) Why? B-Tree indexes store the actual values in leaf blocks. Web URLs can be massive (some apps store entire application state in URLs). **The unconventional fix:** Use a Hash index via exclusion constraint: ```sql ALTER TABLE urls ADD CONSTRAINT urls_url_unique_hash EXCLUDE USING HASH (url WITH =); ``` **Hash index size:** 32 MB (5x smaller!) **Why it works:** Hash indexes store hash values, not the actual URLs. Much smaller. **Query performance:** 0.022ms (faster than B-Tree's 0.046ms) **The principle:** Read hash values (fixed-size representation of data) instead of storing full values (variable-size reality). ## The Common Pattern: Read Actual Execution, Don't Guess Performance All three techniques share the same approach: **Conventional optimization:** 1. Query is slow 2. Guess what might help 3. Add index 4. Hope it works **Unconventional optimization:** 1. Query is slow 2. **Read `EXPLAIN ANALYZE` output** (actual execution plan) 3. **Read table statistics** (actual cardinality, actual distribution) 4. **Read constraints** (actual data rules) 5. Optimize based on reality, not assumptions ## Why Database Experts Use EXPLAIN ANALYZE The article reveals the fundamental tool of database optimization: ```sql EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro'; ``` **Output:** ``` Seq Scan on users (cost=0.00..2185.00 rows=1) (actual time=7.406..7.407 rows=0.00 loops=1) Filter: (plan = 'Pro'::text) Rows Removed by Filter: 100000 Execution Time: 7.436 ms ``` **What EXPLAIN ANALYZE reveals:** - **Actual execution:** Sequential scan (not index scan) - **Actual rows processed:** 100,000 (not estimated) - **Actual time:** 7.4ms (not guessed) - **Actual filtering:** Every row checked (not skipped) **The insight:** You can't optimize what you don't measure. `EXPLAIN ANALYZE` reads the actual query execution—it doesn't generate a prediction, it observes reality. ## The Three Levels of "Reading Reality" in PostgreSQL ### Level #1: Read Execution Plans (EXPLAIN ANALYZE) **What it reads:** - Actual rows scanned - Actual indexes used - Actual execution time - Actual memory usage **Why it matters:** You see what the database **actually did**, not what the query planner **predicted** it would do. **The parallel to demos:** Voice AI reads what the page **actually contains** (DOM structure), not what it **predicts** users want to know. ### Level #2: Read Table Statistics **What it reads:** - Row count (actual table size) - Column cardinality (distinct values) - Data distribution (most common values, histogram) - Null fraction (percentage of nulls) **Why it matters:** The query planner uses these statistics to **estimate** execution cost. Reading statistics = reading reality about your data. **The parallel to demos:** Voice AI reads heading hierarchy, button labels, form fields—actual page semantics, not predicted user intent. ### Level #3: Read Constraints **What it reads:** - Check constraints (valid value ranges) - Unique constraints (no duplicates allowed) - Foreign keys (referential integrity) - NOT NULL constraints (required fields) **Why it matters:** Constraints define data rules. Reading constraints allows the database to eliminate impossible conditions **without scanning data**. **The parallel to demos:** Voice AI reads ARIA labels, semantic HTML, `alt` text—accessibility metadata that defines page meaning. ## Why Guessing Query Performance Doesn't Work The article shows what happens when you optimize without reading reality: **Scenario #1: Guessing index needs** - Analyst queries are slow - You guess: "Add an index on `sold_at`" - Result: 214 MB index (50% of table size), only marginally faster queries **Reading reality instead:** - Run `EXPLAIN ANALYZE` to see actual query pattern - Read query logs to see actual grouping (by day, not millisecond) - Create smaller, targeted index on just the date - Result: 66 MB index (3x smaller), faster queries **Scenario #2: Guessing constraint impact** - Query returns 0 rows - You guess: "Database must scan entire table to confirm no matches" - Result: 7.4ms wasted on every impossible query **Reading reality instead:** - Enable `constraint_exclusion` - Database reads check constraints before executing query - Realizes condition is impossible, skips scan - Result: 0.008ms (1000x faster) ## The Parallel: Chatbot Demos Guess, Voice AI Reads The PostgreSQL optimization pattern mirrors the demo guidance pattern exactly: ### Chatbot Demo Pattern = Guessing Query Performance **Chatbot approach:** 1. User asks: "How does pricing work?" 2. Chatbot generates answer from training data (guessing based on past patterns) 3. User must verify against page (like running `EXPLAIN ANALYZE` after optimizing) 4. If answer is wrong, user wasted time (like a bad index: costs resources, doesn't help) **Why it fails:** Generating responses without reading the page is like optimizing queries without `EXPLAIN ANALYZE`—you're guessing, not measuring. ### Voice AI Pattern = Reading Execution Plans **Voice AI approach:** 1. User asks: "How does pricing work?" 2. Voice AI reads page structure: - Heading: "Pricing Plans" - Three cards: "Free", "Pro", "Enterprise" - Bullet points under each 3. Voice AI describes what exists: "There are three plans. The Free plan includes..." 4. User understands immediately (no verification needed) **Why it works:** Reading the DOM is like running `EXPLAIN ANALYZE`—you're observing reality, not predicting it. ## The Three Optimization Principles That Apply to Demo Guidance ### Principle #1: Read Constraints to Eliminate Impossible Conditions **In PostgreSQL:** Enable `constraint_exclusion` so the database reads check constraints: ```sql SELECT * FROM users WHERE plan = 'Pro'; -- Lowercase 'pro' required ``` Database reads constraint: `CHECK (plan IN ('free', 'pro'))`. Condition impossible. Skip scan. **In Demo Guidance:** Voice AI reads semantic constraints on the page: ```html ``` User asks: "Is email optional?" Voice AI reads `required` attribute. Answer: "Email is required." No guessing needed. ### Principle #2: Read Actual Cardinality, Not Assumed Precision **In PostgreSQL:** Don't index millisecond precision when users query by day: ```sql -- Too precise (214 MB index) CREATE INDEX sale_sold_at_ix ON sale(sold_at); -- Right precision (66 MB index) CREATE INDEX sale_sold_at_date_ix ON sale((date_trunc('day', sold_at))::date); ``` Read actual query patterns. Optimize for the granularity users actually need. **In Demo Guidance:** Don't generate exhaustive feature lists when users need high-level categories: **Chatbot approach:** User: "What features do you have?" Chatbot: *Generates 3000-word comprehensive feature list from training data* **Voice AI approach:** User: "What features do you have?" Voice AI reads page: "The navigation shows five main categories: Dashboard, Analytics, Integrations, Settings, Support." Read actual page organization. Provide the level of detail that exists on the page. ### Principle #3: Read Hash Values for Large, Unique Data **In PostgreSQL:** Don't store entire URLs in B-Tree indexes: ```sql -- Stores full URLs (154 MB index) CREATE UNIQUE INDEX urls_url_unique_ix ON urls(url); -- Stores hash values (32 MB index) ALTER TABLE urls ADD CONSTRAINT urls_url_unique_hash EXCLUDE USING HASH (url WITH =); ``` Hash values are fixed-size, URLs are variable-size. Hash index is 5x smaller. **In Demo Guidance:** Don't regenerate entire page content in responses: **Chatbot approach:** User: "How do I export data?" Chatbot: *Generates step-by-step guide based on training data, potentially outdated or inaccurate* **Voice AI approach:** User: "How do I export data?" Voice AI reads button label: "There's an 'Export to CSV' button in the top-right corner." Reference actual page elements (pointers to DOM) instead of regenerating content (full descriptions). ## Why Reading EXPLAIN ANALYZE Is Like Reading the DOM The article reveals what `EXPLAIN ANALYZE` actually does: ```sql EXPLAIN ANALYZE SELECT * FROM users WHERE plan = 'Pro'; ``` **Output structure:** - **Operation type:** Sequential Scan / Index Scan / Hash Join - **Actual rows:** How many rows actually processed - **Actual time:** How long execution actually took - **Filter conditions:** What was actually checked **This is metadata about query execution—exactly like DOM metadata about page structure.** **DOM equivalent:** ```html

Free Plan

10 users included
2 GB storage

``` **DOM metadata:** - **Element type:** `

` (not generated, exists in HTML) - **Semantic label:** "Pricing Plans" (aria-label, actual content) - **Structure:** Heading + list (actual hierarchy) - **Content:** "10 users included" (actual text, not hallucinated) **The parallel:** `EXPLAIN ANALYZE` reads query execution metadata. Voice AI reads DOM metadata. Both observe reality instead of predicting it. ## The Three Mistakes Database Beginners Make (And Demo Chatbots Repeat) ### Mistake #1: Optimizing Without Measuring **Database beginner:** - Query is slow - Add index blindly - Don't run `EXPLAIN ANALYZE` to verify improvement - Result: Wasted storage, no performance gain **Chatbot demo:** - User confused - Generate more detailed response - Don't read page to verify accuracy - Result: More hallucinations, no clarity gain ### Mistake #2: Indexing Everything "Just in Case" **Database beginner:** - Worried about slow queries - Create indexes on every column - Don't read actual query patterns - Result: Massive storage cost, slower writes, minimal read improvement **Chatbot demo:** - Worried about user questions - Train on every possible topic - Don't read what's actually on the page - Result: Generates answers about features that don't exist ### Mistake #3: Trusting Estimates Over Measurements **Database beginner:** - Query planner estimates 10 rows - Actually returns 100,000 rows - Trusts estimate, doesn't measure actual execution - Result: Wrong optimization decisions **Chatbot demo:** - Training data suggests feature exists - Actually removed 6 months ago - Trusts training data, doesn't read current page - Result: Tells user about feature that doesn't exist ## Why Conventional Optimization Fails (And Why Chatbot Demos Fail) The article reveals why "slapping a B-Tree on it" is the conventional approach: **Why developers do this:** 1. Query is slow 2. Everyone says "add an index" 3. Add B-Tree index on the filtered column 4. Query gets faster 5. Done **Why it's suboptimal:** - Index size not considered (214 MB for timestamps when 66 MB date index would work) - Index maintenance cost ignored (slower inserts/updates) - Storage cost ignored (paying for precision you don't need) - Alternative approaches not explored (hash indexes, partial indexes, expression indexes) **The chatbot parallel:** **Why teams build chatbot demos:** 1. Users need guidance 2. Everyone says "add AI chatbot" 3. Add LLM that generates responses 4. Users can ask questions 5. Done **Why it's suboptimal:** - Hallucination risk not considered (generates features that don't exist) - Maintenance cost ignored (retraining when features change) - Accuracy cost ignored (users must verify every response) - Alternative approaches not explored (read DOM directly instead of generating responses) ## The Virtual Generated Column Principle The article introduces PostgreSQL 18's virtual generated columns as the solution to the "discipline problem": **The problem:** Function-based indexes require exact expression match: ```sql -- Index defined as: CREATE INDEX sale_sold_at_date_ix ON sale((date_trunc('day', sold_at AT TIME ZONE 'UTC'))::date); -- Query MUST use exact same expression to use index: WHERE date_trunc('day', sold_at AT TIME ZONE 'UTC')::date = '2025-01-15' -- Works -- Slightly different expression won't use index: WHERE (sold_at AT TIME ZONE 'UTC')::date = '2025-01-15' -- Full table scan! ``` **The solution:** Virtual generated column: ```sql ALTER TABLE sale ADD sold_at_date DATE GENERATED ALWAYS AS (date_trunc('day', sold_at AT TIME ZONE 'UTC')); CREATE INDEX sale_sold_at_date_ix ON sale(sold_at_date); ``` Now queries use the column name: ```sql WHERE sold_at_date = '2025-01-15' -- Always uses index! ``` **The principle:** Create a canonical representation that guarantees the right expression is used. **The parallel to demos:** **The problem:** Different pages describe the same feature differently: - Homepage: "Real-time analytics dashboard" - Features page: "Analytics & Reporting" - Docs: "Dashboard - Analytics Module" Chatbot must guess which term to use, which description is current. **The solution:** Voice AI reads what's actually on the current page: - On homepage: Reads "Real-time analytics dashboard" - On features page: Reads "Analytics & Reporting" - On docs: Reads "Dashboard - Analytics Module" No canonical representation needed—just read what exists on each page. ## The Constraint Exclusion Insight The article reveals the most powerful optimization: `constraint_exclusion`. **What it does:** Before executing a query, PostgreSQL reads table constraints and eliminates conditions that can never be true. **Example:** ```sql -- Constraint: plan must be 'free' or 'pro' CONSTRAINT plan_check CHECK (plan IN ('free', 'pro')) -- Query for 'Pro' (capital P): SELECT * FROM users WHERE plan = 'Pro'; ``` **Without `constraint_exclusion`:** - Scan all 100,000 rows - Check each row: does `plan = 'Pro'`? - Find 0 matches - Time: 7.4ms **With `constraint_exclusion = on`:** - Read constraint: `plan IN ('free', 'pro')` - Check query: `plan = 'Pro'` - Realize: 'Pro' not in allowed values - Skip scan entirely - Time: 0.008ms (1000x faster!) **The insight:** Reading constraints (metadata about valid data) is faster than scanning data itself. **The parallel to demos:** **Without reading DOM:** - User asks: "Can I add unlimited users?" - Chatbot generates answer from training data: "The Enterprise plan includes unlimited users." - User must verify against page - If wrong, wasted time **With reading DOM:** - User asks: "Can I add unlimited users?" - Voice AI reads pricing table: ```html Users: Up to 100 ``` - Voice AI: "The plan includes up to 100 users." - Accurate immediately (reading actual page metadata) ## The Three Reasons Unconventional Optimizations Work ### Reason #1: They Read Multiple Levels of Reality **Conventional optimization:** Read query text only **Unconventional optimization:** - Read query text - Read execution plan (`EXPLAIN ANALYZE`) - Read table statistics - Read constraints - Read data distribution **More data sources = more accurate optimizations** **The parallel to demos:** **Chatbot:** Reads training data only **Voice AI:** - Reads page HTML - Reads ARIA labels - Reads heading hierarchy - Reads form structure - Reads button labels **More data sources = more accurate guidance** ### Reason #2: They Measure Actual Execution, Not Predicted Cost **Query planner estimates:** - Estimated rows: 10 - Estimated cost: 100 - Estimated time: 50ms **Actual execution (`EXPLAIN ANALYZE`):** - Actual rows: 100,000 - Actual cost: 10,000 - Actual time: 5000ms **The problem:** Estimates can be wildly wrong. Only measurement reveals reality. **The parallel to demos:** **Chatbot prediction:** - Predicts user wants feature overview - Generates comprehensive description - Predicted helpfulness: high **Actual user need:** - User looking at specific button - Wants to know what clicking it does - Actual helpfulness: low (irrelevant response) **Voice AI measurement:** - Reads which page section user is viewing - Describes elements in that section - Measured helpfulness: high (contextually relevant) ### Reason #3: They Optimize for Actual Usage Patterns, Not Theoretical Needs **The timestamp index example:** **Theoretical need:** "Users might want to query at any precision" **Actual usage:** Analysts only produce daily reports **Theoretical optimization:** Index full timestamp (214 MB) **Actual optimization:** Index just the date (66 MB, faster) **The parallel to demos:** **Theoretical need:** "Users might ask any question about the product" **Actual usage:** Users ask about elements visible on current page **Theoretical optimization:** Chatbot trained on entire documentation **Actual optimization:** Voice AI reads current page structure ## The Verdict: Database Experts Read Reality, They Don't Guess The HN article proves database optimization requires reading multiple levels of reality: **Level 1:** Read query execution (`EXPLAIN ANALYZE`) **Level 2:** Read table statistics (cardinality, distribution) **Level 3:** Read constraints (data rules) **Level 4:** Read actual usage patterns (query logs) **The lesson:** You can't optimize what you don't measure. You can't measure without reading reality. **The parallel to demo guidance:** **Level 1:** Read page structure (DOM tree) **Level 2:** Read semantic metadata (ARIA labels, heading hierarchy) **Level 3:** Read element properties (`required`, `disabled`, `href`) **Level 4:** Read user context (which section they're viewing) **The lesson:** You can't guide what you don't read. You can't read without accessing the DOM. ## The Alternative: Guessing Performance Without Measurement Imagine if database optimization worked like chatbot demos: **Hypothetical "AI Database Optimizer":** 1. Developer: "This query is slow" 2. AI: *Predicts* query might need an index 3. AI: *Generates* index suggestion based on training data 4. Developer must run `EXPLAIN ANALYZE` to verify if suggestion helps 5. If wrong, wasted storage and time **Why this would fail:** You need to measure actual execution to optimize. Predictions without measurement are guesses. **Current chatbot demos work this way:** 1. User: "How does this work?" 2. Chatbot: *Predicts* user wants feature overview 3. Chatbot: *Generates* description based on training data 4. User must check page to verify if description is accurate 5. If wrong, wasted time and confusion **Why this fails:** You need to read actual page content to guide. Predictions without reading are hallucinations. **Voice AI works like EXPLAIN ANALYZE:** 1. User: "How does this work?" 2. Voice AI: *Reads* page structure 3. Voice AI: *Describes* what actually exists 4. User understands immediately (no verification needed) 5. Always accurate (reading reality) ## The Pattern: Read First, Optimize Second The article proves the fundamental pattern of database optimization: **You can't optimize what you haven't measured.** Every technique follows this sequence: 1. Identify slow query 2. **Run `EXPLAIN ANALYZE`** (read actual execution) 3. **Read table statistics** (understand data distribution) 4. **Read constraints** (understand data rules) 5. **Based on readings**, choose optimization approach 6. Implement optimization 7. **Run `EXPLAIN ANALYZE` again** (verify improvement) **The parallel to demo guidance:** **You can't guide what you haven't read.** Every interaction follows this sequence: 1. User asks question 2. **Read DOM structure** (understand page content) 3. **Read semantic metadata** (understand element purpose) 4. **Read user context** (understand which section they're viewing) 5. **Based on readings**, provide relevant guidance 6. Describe what exists 7. **User understands immediately** (no verification needed) --- *Demogod's voice AI reads your site's DOM structure like `EXPLAIN ANALYZE` reads query execution—observing reality instead of predicting it. One line of code. Zero hallucinations. [Try it on your site](https://demogod.me).*