SQL优化Skill sql-optimization

SQL 优化技能专注于提升数据库查询性能,通过索引设计、执行计划分析和查询重写等方法。它适用于慢查询优化、N+1问题解决、PostgreSQL调优等场景。关键词:SQL 优化,数据库性能,索引调优,执行计划,PostgreSQL,慢查询,查询重写,性能调优,SEO 搜索。

后端开发 0 次安装 0 次浏览 更新于 3/24/2026

name: sql-optimization description: 分析和优化 SQL 查询以提升性能,包括索引设计、查询重写、执行计划分析、数据库调优。覆盖 PostgreSQL 特定优化、N+1 预防、CTE/窗口函数优化、连接策略和常见反模式。触发关键词:SQL,查询优化,EXPLAIN,EXPLAIN ANALYZE,索引,慢查询,执行计划,查询计划,连接优化,子查询,CTE,公共表表达式,窗口函数,分区,N+1,查询缓存,数据库性能,顺序扫描,索引扫描,位图扫描,嵌套循环,哈希连接,合并连接,PostgreSQL,查询调优,表扫描,基数,统计,VACUUM,ANALYZE。 allowed-tools: Read, Grep, Glob, Bash

SQL 优化

概述

这个技能专注于分析和优化 SQL 查询以提升性能。它覆盖查询分析、索引优化、执行计划解释、查询重写策略、PostgreSQL 特定优化和常见反模式。使用这个技能处理慢查询、N+1 问题、连接优化、索引设计和数据库性能调优。

指令

1. 分析查询性能

  • 从日志中识别慢查询
  • 运行 EXPLAIN/EXPLAIN ANALYZE
  • 测量查询执行时间
  • 检查资源利用率

2. 理解执行计划

  • 识别扫描类型(顺序扫描、索引扫描、位图扫描)
  • 检查连接算法(嵌套循环、哈希连接、合并连接)
  • 分析索引使用和选择性
  • 查找瓶颈操作(排序、过滤、聚合)
  • 理解成本估计与实际行数
  • 检查缓冲使用和 I/O 模式

3. 应用优化

  • 设计合适索引(B-树、哈希、GiST、GIN)
  • 重写低效查询(子查询转 JOIN、CTE)
  • 优化连接顺序和算法
  • 使用窗口函数进行复杂聚合
  • 利用部分索引和覆盖索引
  • 考虑反规范化以处理读密集型工作负载
  • 更新表统计(ANALYZE)
  • 调整 PostgreSQL 配置参数

4. 验证改进

  • 比较优化前后指标
  • 用类似生产数据测试
  • 验证正确性
  • 部署后监控

最佳实践

  1. 策略性索引:索引 WHERE、JOIN、ORDER BY 中的列
  2. 避免 SELECT *:仅选择需要的列
  3. 使用 EXPLAIN ANALYZE:始终用实际时间分析执行计划
  4. 限制结果:对大数据集使用分页
  5. 避免 N+1:使用 JOIN 或批处理查询
  6. 优先 EXISTS 而非 IN:对于具有大结果集的子查询
  7. 更新统计:批量操作后运行 ANALYZE
  8. 使用 CTE 提升可读性:但注意优化屏障
  9. 避免在索引列上使用函数:防止索引使用
  10. 持续监控:随时间跟踪查询性能

PostgreSQL 特定优化

执行计划运算符

扫描类型:

  • 顺序扫描:全表扫描(对大表慢)
  • 索引扫描:使用索引加表查找(对低选择性好)
  • 仅索引扫描:使用覆盖索引(最快)
  • 位图索引扫描:多个索引扫描组合(对 OR 条件好)

连接算法:

  • 嵌套循环:对小表或索引查找最佳
  • 哈希连接:对中等大小表且等值连接最佳
  • 合并连接:对大型预排序表最佳

统计和维护

-- 更新表统计以改善查询计划
ANALYZE table_name;

-- 检查统计新鲜度
SELECT schemaname, tablename, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY last_analyze NULLS FIRST;

-- 查找膨胀表
SELECT schemaname, tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_dead_tup, n_live_tup,
    round(n_dead_tup * 100.0 / NULLIF(n_live_tup + n_dead_tup, 0), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

-- 清理膨胀表
VACUUM ANALYZE table_name;

配置调优

-- 关键参数检查
SHOW shared_buffers;        -- 应为 RAM 的 25%
SHOW effective_cache_size;  -- 应为 RAM 的 50-75%
SHOW work_mem;              -- 每操作内存
SHOW random_page_cost;      -- 对 SSD 降低(1.1-2.0)

常见反模式

1. 应用代码中的 SELECT *

-- 坏:获取不必要列
SELECT * FROM users WHERE id = 1;

-- 好:仅获取需要列
SELECT id, email, name FROM users WHERE id = 1;

2. 隐式类型转换

-- 坏:如果 id 是整数,不能使用索引
SELECT * FROM users WHERE id = '123';

-- 好:匹配列类型
SELECT * FROM users WHERE id = 123;

3. 无索引的 OR 条件

-- 坏:可能不高效使用索引
SELECT * FROM orders WHERE status = 'pending' OR status = 'processing';

-- 好:使用 IN 或创建部分索引
SELECT * FROM orders WHERE status IN ('pending', 'processing');

4. 相关子查询

-- 坏:为每行执行子查询
SELECT p.name,
    (SELECT COUNT(*) FROM order_items WHERE product_id = p.id) AS order_count
FROM products p;

-- 好:使用 JOIN 加聚合
SELECT p.name, COUNT(oi.id) AS order_count
FROM products p
LEFT JOIN order_items oi ON oi.product_id = p.id
GROUP BY p.id, p.name;

5. 缺少 WHERE 子句

-- 坏:更新整个表
UPDATE products SET updated_at = NOW();

-- 好:仅更新变化部分
UPDATE products SET updated_at = NOW()
WHERE id IN (SELECT product_id FROM price_changes);

高级模式

CTEs(公共表表达式)

-- CTE 用于可读性和复用
WITH recent_orders AS (
    SELECT customer_id, COUNT(*) AS order_count, SUM(total) AS total_spent
    FROM orders
    WHERE created_at > NOW() - INTERVAL '30 days'
    GROUP BY customer_id
),
high_value_customers AS (
    SELECT customer_id
    FROM recent_orders
    WHERE total_spent > 1000
)
SELECT c.name, c.email, ro.order_count, ro.total_spent
FROM customers c
INNER JOIN high_value_customers hvc ON c.id = hvc.customer_id
INNER JOIN recent_orders ro ON c.id = ro.customer_id;

-- 递归 CTE 用于层次数据
WITH RECURSIVE category_tree AS (
    SELECT id, name, parent_id, 1 AS level
    FROM categories
    WHERE parent_id IS NULL
    UNION ALL
    SELECT c.id, c.name, c.parent_id, ct.level + 1
    FROM categories c
    INNER JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY level, name;

窗口函数

-- 排名和行号
SELECT
    product_id,
    category_id,
    price,
    ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS price_rank,
    RANK() OVER (ORDER BY price DESC) AS overall_rank,
    DENSE_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS dense_rank
FROM products;

-- 运行总计和移动平均
SELECT
    date,
    revenue,
    SUM(revenue) OVER (ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
    AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS moving_avg_7day
FROM daily_sales
ORDER BY date;

-- Lead/Lag 用于时间序列分析
SELECT
    customer_id,
    order_date,
    total,
    LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS prev_order_date,
    LEAD(total) OVER (PARTITION BY customer_id ORDER BY order_date) AS next_order_total,
    total - LAG(total) OVER (PARTITION BY customer_id ORDER BY order_date) AS total_diff
FROM orders;

示例

示例 1:使用 EXPLAIN 进行查询优化

-- 原始慢查询
SELECT o.*, c.name, c.email
FROM orders o, customers c
WHERE o.customer_id = c.id
AND o.status = 'pending'
AND o.created_at > '2024-01-01'
ORDER BY o.created_at DESC;

-- 步骤 1:用 EXPLAIN ANALYZE 分析
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, c.name, c.email
FROM orders o, customers c
WHERE o.customer_id = c.id
AND o.status = 'pending'
AND o.created_at > '2024-01-01'
ORDER BY o.created_at DESC;

-- 输出分析:
-- Seq Scan on orders  (cost=0.00..15420.00 rows=50000)
--   Filter: (status = 'pending' AND created_at > '2024-01-01')
--   Rows Removed by Filter: 450000
-- 问题:对大表进行顺序扫描!

-- 步骤 2:创建复合索引
CREATE INDEX idx_orders_status_created
ON orders(status, created_at DESC)
WHERE status IN ('pending', 'processing');

-- 步骤 3:用显式 JOIN 重写
SELECT o.id, o.total, o.created_at, c.name, c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'pending'
AND o.created_at > '2024-01-01'
ORDER BY o.created_at DESC
LIMIT 100;

-- 优化后:
-- Index Scan using idx_orders_status_created (cost=0.42..125.50 rows=100)
-- 查询时间减少 99%!

示例 2:N+1 查询问题

-- 问题:N+1 查询
-- 应用代码:
-- orders = SELECT * FROM orders WHERE user_id = 1
-- for order in orders:
--     items = SELECT * FROM order_items WHERE order_id = order.id

-- 解决方案:单个查询加 JOIN
SELECT
    o.id AS order_id,
    o.total,
    o.created_at,
    oi.product_id,
    oi.quantity,
    oi.unit_price,
    p.name AS product_name
FROM orders o
LEFT JOIN order_items oi ON o.id = oi.order_id
LEFT JOIN products p ON oi.product_id = p.id
WHERE o.user_id = 1
ORDER BY o.created_at DESC, oi.id;

-- 替代方案:批处理查询
SELECT * FROM orders WHERE user_id = 1;
-- 获取订单 ID:[1, 2, 3, 4, 5]
SELECT * FROM order_items WHERE order_id IN (1, 2, 3, 4, 5);

示例 3:索引设计策略

-- 单列索引用于等值检查
CREATE INDEX idx_users_email ON users(email);

-- 复合索引用于多条件
-- 列顺序:等值先,然后范围,然后排序
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at DESC);

-- 部分索引用于过滤查询
CREATE INDEX idx_orders_pending
ON orders(created_at DESC)
WHERE status = 'pending';

-- 覆盖索引避免表查找
CREATE INDEX idx_orders_summary
ON orders(user_id, status)
INCLUDE (total, created_at);

-- 表达式索引用于计算条件
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- 检查现有索引
SELECT
    indexname,
    indexdef,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'orders';

-- 查找未使用索引
SELECT
    schemaname, tablename, indexname,
    idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;

示例 4:查询重写模式

-- 模式 1:用 JOIN 替换子查询
-- 之前
SELECT * FROM orders
WHERE customer_id IN (SELECT id FROM customers WHERE region = 'US');

-- 之后
SELECT o.* FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'US';

-- 模式 2:对大子查询使用 EXISTS 而非 IN
-- 之前
SELECT * FROM products
WHERE id IN (SELECT product_id FROM order_items);

-- 之后
SELECT * FROM products p
WHERE EXISTS (
    SELECT 1 FROM order_items oi WHERE oi.product_id = p.id
);

-- 模式 3:避免在索引列上使用函数
-- 之前(不能使用索引)
SELECT * FROM users WHERE YEAR(created_at) = 2024;

-- 之后(使用索引)
SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- 模式 4:优化分页
-- 之前(对大偏移慢)
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;

-- 之后(键集分页)
SELECT * FROM products
WHERE id > 10000
ORDER BY id
LIMIT 20;

-- 模式 5:批处理操作
-- 之前(逐行)
UPDATE products SET price = price * 1.1 WHERE id = 1;
UPDATE products SET price = price * 1.1 WHERE id = 2;
-- ... 重复 1000 次

-- 之后(单批)
UPDATE products SET price = price * 1.1
WHERE id = ANY(ARRAY[1, 2, 3, ..., 1000]);

-- 或使用 CTE 进行复杂批处理
WITH price_updates AS (
    SELECT id, new_price FROM temp_price_updates
)
UPDATE products p
SET price = pu.new_price
FROM price_updates pu
WHERE p.id = pu.id;