分享下最近因一條慢sql導緻的數據庫卡頓,因為是新項目,所以坑比較多。下面一起來看看吧!
項目經理反映說下午的時候系統很卡,觀察了一下服務器,發現mysqld進程占了很大一部分資源,像這種情況80%以上都是sql引起的。
通過 show processlist 一看,發現很多是 Waiting for table metadata lock 狀态的連接。
MySQL在進行一些alter table等DDL操作時,如果該表上有未提交的事務則會出現 Waiting for table metadata lock ,而一旦出現metadata lock,該表上的後續操作都會被阻塞。
觀察了一下當前數據庫慢查詢sql的情況,發現有一條sql執行需要到600秒
--200多行 SELECT ( CASE WHEN ( category.equipment_type = 'T' OR product.is_pallet = TRUE ) THEN 1 ELSE 0 END ) AS is_pallet, ( CASE WHEN ( ob.project_code = 'FS' ) THEN 0 ELSE 1 END ) AS is_create_pallet, ( pallet.capa city_length * pallet.capacity_width * pallet.capacity_height ) AS max_pallet_volume, ... pallet.weight AS pallet__weight, pallet.capacity_length AS pallet__capacity_length, pallet.capacity_width AS pallet__capacity_width, pallet.capacity_height AS pallet__capacity_height FROM fsl_order_base_line AS pack_boxes_order_base_lines LEFT JOIN fsl_order_base AS ob ON pack_boxes_order_base_lines.order_base = ob.id LEFT JOIN fsl_thing_type AS product ON pack_boxes_order_base_lines.product = product.id LEFT JOIN fsl_thing_type AS category ON product.category = category.id LEFT JOIN fsl_thing_type AS pallet ON product.pallet = pallet.id WHERE ( ( ( ( ob.auto_releasing = 'Y' ) AND ( ob.project_code = 'DD' ) ) AND ( ob.order_status = '0' ) ) AND ( ob.model1 = 'pack' ) ) AND pack_boxes_order_base_lines.domain_name IN ( 'FSL' ) ORDER BY pack_boxes_order_base_lines.insert_date DESC, pack_boxes_order_base_lines.id DESC;
set profiling=1; --改寫如下: select ..... pallet.capacity_length AS pallet__capacity_length, pallet.capacity_width AS pallet__capacity_width, pallet.capacity_height AS pallet__capacity_height FROM (select * from fsl_order_base_line where domain_name IN ( 'FSL' ) ) AS pack_boxes_order_base_lines LEFT JOIN (select * from fsl_order_base where auto_releasing = 'Y' and project_code = 'FS' and order_status = '0' and model1 = 'pack') AS ob ON pack_boxes_order_base_lines.order_base = ob.id LEFT JOIN fsl_thing_type AS product ON pack_boxes_order_base_lines.product = product.id LEFT JOIN fsl_thing_type AS category ON product.category = category.id LEFT JOIN fsl_thing_type AS pallet ON product.pallet = pallet.id ORDER BY pack_boxes_order_base_lines.insert_date DESC, pack_boxes_order_base_lines.id DESC;
其實原理就是縮小下結果集,調增後就剩100秒了。
對應的執行計劃:
pt-online-schema-change --user=root -pxxx --host=1xxxx --alter "add INDEX idx_domain(domain_name)" D=test,t=fsl_order_base_line --execute
最新執行計劃:
效果不大,選擇去掉索引
pt-online-schema-change --user=root -pxxxx --host=xxxx --alter "drop INDEX idx_domain" D=test,t=fsl_order_base_line --execute
最後跟業務和開發确認後發現這是一個定時器的邏輯:
如果數據正常的話,一次會拿到很少的數據,但中間中斷過,導緻這次裡面有10000 的數據,而代碼去針對這批數據逐行執行一堆邏輯,所以會一直很慢,加上還有更新操作,就導緻了這條sql執行的很慢。之前都是很快就跑完了。
在代碼上做個限制,一次性不獲取全部數據,每次隻獲取500行,跑完就提交。然後sql上也做相應的優化。
覺得有用的朋友多幫忙轉發哦!後面會分享更多devops和DBA方面的内容,感興趣的朋友可以關注下~
,
更多精彩资讯请关注tft每日頭條,我们将持续为您更新最新资讯!