MySQL Case-带你感受Oracle与MySQL下SQL执行效率
机器环境
Oracle
- SGA 736M
- CPU 信息
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz
Stepping: 5
CPU MHz: 2494.140
BogoMIPS: 4988.28
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 4096K
L3 cache: 36608K
NUMA node0 CPU(s): 0
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat avx512_vnni
主机规格
MySQL
- innodb_buffer_pool_size 768M
- 实例规格 mysql.n1.micro.1
通过上面环境对比看出,测试环境无大差别
表结构与测试SQL
Oracle中表结构如下,表中600万数据,测试表为分区表,这没关系,因为查询表中所有的数据,和单表一样
CREATE TABLE "TPCH"."ORDERS"
( "O_ORDERKEY" NUMBER(20,2) NOT NULL ENABLE,
"O_CUSTKEY" NUMBER(20,2) NOT NULL ENABLE,
"O_ORDERSTATUS" CHAR(10) NOT NULL ENABLE,
"O_TOTALPRICE" NUMBER(20,2) NOT NULL ENABLE,
"O_ORDERDATE" DATE NOT NULL ENABLE,
"O_ORDERPRIORITY" CHAR(15) NOT NULL ENABLE,
"O_CLERK" CHAR(15) NOT NULL ENABLE,
"O_SHIPPRIORITY" NUMBER(20,2) NOT NULL ENABLE,
"O_COMMENT" VARCHAR2(79) NOT NULL ENABLE,
PRIMARY KEY ("O_ORDERKEY")
USING INDEX PCTFREE 10 INITRANS 2
MAXTRANS 255 COMPUTE STATISTICS
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "TPCH" ENABLE
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
STORAGE(
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "TPCH"
PARTITION BY RANGE ("O_ORDERDATE") INTERVAL (NUMTOYMINTERVAL(1, 'MONTH'))
(PARTITION
"O_ORDERDATE_01" VALUES LESS THAN (TO_DATE(' 1992-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) SEGMENT CREATION DEFERRED
PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255
NOCOMPRESS LOGGING
STORAGE(
BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
TABLESPACE "TPCH" ) ;
MySQL下表结构,表中600万数据
CREATE TABLE `orders` (
`o_orderkey` int(11) NOT NULL,
`O_CUSTKEY` int(11) NOT NULL,
`O_ORDERSTATUS` char(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`O_CLERK` char(15) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`O_SHIPPRIORITY` int(11) NOT NULL,
`O_COMMENT` varchar(79) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
PRIMARY KEY (`o_orderkey`),
KEY `orders_fk1` (`O_CUSTKEY`),
KEY `i_orders_date_clerk` (`O_ORDERDATE`,`O_CLERK`),
CONSTRAINT `orders_ibfk_1` FOREIGN KEY (`O_CUSTKEY`) REFERENCES `customer` (`c_custkey`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci
SQL如下,计算出每个员工最后成交的订单时间
SELECT
*
FROM
tpch.orders
WHERE
(o_clerk , o_orderdate) IN (
SELECT
o_clerk, MAX(o_orderdate)
FROM
tpch.orders
GROUP BY o_clerk);
Oracle中的效率
不创建任何索引情况下,执行效率如下
Plan hash value: 3518479617
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5335 |00:00:06.71 | 206K| 106K| | | |
|* 1 | HASH JOIN RIGHT SEMI | | 1 | 2494 | 5335 |00:00:06.71 | 206K| 106K| 1335K| 1335K| 1676K (0)|
| 2 | PART JOIN FILTER CREATE | :BF0000 | 1 | 4000 | 4000 |00:00:06.66 | 202K| 105K| | | |
| 3 | VIEW | VW_NSO_1 | 1 | 4000 | 4000 |00:00:06.65 | 202K| 105K| | | |
| 4 | HASH GROUP BY | | 1 | 4000 | 4000 |00:00:06.65 | 202K| 105K| 1214K| 1214K| 1570K (0)|
| 5 | PARTITION RANGE ALL | | 1 | 6000K| 6000K|00:00:05.88 | 202K| 105K| | | |
| 6 | TABLE ACCESS FULL | ORDERS | 81 | 6000K| 6000K|00:00:05.86 | 202K| 105K| | | |
| 7 | PARTITION RANGE JOIN-FILTER| | 1 | 6000K| 82351 |00:00:00.03 | 3173 | 1454 | | | |
| 8 | TABLE ACCESS FULL | ORDERS | 2 | 6000K| 82351 |00:00:00.03 | 3173 | 1454 | | | |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("O_ORDERDATE"="MAX(O_ORDERDATE)" AND "O_CLERK"="O_CLERK")
很显然,如果优化上述SQL,Oracle中要建立o_orderdate和o_clerk的组合索引
create index tpch.i_orders_date_clerk on tpch.orders(o_orderdate,o_clerk) tablespace tpch;
创建索引后执行计划如下
Plan hash value: 1581592608
---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5335 |00:00:02.23 | 46776 | 33744 | | | |
| 1 | NESTED LOOPS | | 1 | 5371 | 5335 |00:00:02.22 | 46776 | 33744 | | | |
| 2 | NESTED LOOPS | | 1 | 5371 | 5335 |00:00:02.18 | 41488 | 32741 | | | |
| 3 | VIEW | VW_NSO_1 | 1 | 4000 | 4000 |00:00:02.16 | 32767 | 32741 | | | |
| 4 | HASH GROUP BY | | 1 | 4000 | 4000 |00:00:02.16 | 32767 | 32741 | 1214K| 1214K| 1478K (0)|
| 5 | INDEX FAST FULL SCAN | I_ORDERS_DATE_CLERK | 1 | 6000K| 6000K|00:00:00.87 | 32767 | 32741 | | | |
|* 6 | INDEX RANGE SCAN | I_ORDERS_DATE_CLERK | 4000 | 1 | 5335 |00:00:00.02 | 8721 | 0 | | | |
| 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| ORDERS | 5335 | 1 | 5335 |00:00:00.04 | 5288 | 1003 | | | |
---------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("O_ORDERDATE"="MAX(O_ORDERDATE)" AND "O_CLERK"="O_CLERK")
执行计划中显示2.23秒返回返回结果,没有此索引要6.71秒,执行计划也从hash自动转变为nested loop。datagrip中显示也是2秒多,如下图
- 返回所有数据总共需要32秒873毫秒
- 服务器端执行2秒578毫秒
- 网络传输数据消耗30秒295毫秒
至此,Oracle部分分析完成,那么我们看下MySQL中的执行计划是什么样子的,执行效率是什么样的呢?
MySQL
MySQL中创建同样的索引,datadrip看到的结果
- 返回所有数据总共需要25秒148毫秒
- 服务器端执行9秒639毫秒
- 网络传输数据消耗15秒509毫秒
重点来了
执行计划 format=tree
-> Filter: <in_optimizer>((orders.O_CLERK,orders.O_ORDERDATE),(orders.O_CLERK,orders.O_ORDERDATE) in (select #2)) (cost=586310.50 rows=5645000)
-> Table scan on orders (cost=586310.50 rows=5645000)
-> Select #2 (subquery in condition; run only once)
-> Table scan on <temporary>
-> Aggregate using temporary table
-> Index scan on orders using i_orders_date_clerk (cost=586310.50 rows=5645000)
从上面执行计划看到,Using temporary说明用到了排序。
Extra这个字段中的“Using filesort”表示的就是需要排序,MySQL会给每个线程分配一块内存用于排序,称为sort_buffer。sort_buffer_size就是MySQL为排序开辟的内存sort_buffer的大小。如果要排序的数据量小于sort_buffer_size,排序就在内存中完成。但如果排序数据量太大,内存放不下,则不得不利用磁盘临时文件辅助排序。如上SQL就是由于sort buffer都不够了,使用到了磁盘。
要消除排序,因为group by的是clerk列,所以要创建如下索引
create index i_orders_clerk_date on orders(o_clerk,o_orderdate);
再次查看效率,效率非常高,消除排序后378毫秒之行结束
执行计划变为
-> Filter: <in_optimizer>((orders.O_CLERK,orders.O_ORDERDATE),(orders.O_CLERK,orders.O_ORDERDATE) in (select #2)) (cost=590764.20 rows=5645000)
-> Table scan on orders (cost=590764.20 rows=5645000)
-> Select #2 (subquery in condition; run only once)
-> Group aggregate (computed in earlier step): max(orders.O_ORDERDATE)
-> Index range scan on orders using index_for_group_by(i_orders_clerk_date) (cost=15589.00 rows=11135)
相同的索引,Oracle需要2秒,看IO消耗358ms,主要消耗在CPU上,Oracle的执行计划里面显示还是有排序的,这也是MySQL中的优势
Plan hash value: 2198191425
---------------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5335 |00:00:02.22 | 46809 | 34371 | | | |
| 1 | NESTED LOOPS | | 1 | 5371 | 5335 |00:00:02.22 | 46809 | 34371 | | | |
| 2 | NESTED LOOPS | | 1 | 5371 | 5335 |00:00:02.19 | 41509 | 33369 | | | |
| 3 | VIEW | VW_NSO_1 | 1 | 4000 | 4000 |00:00:02.15 | 32759 | 31450 | | | |
| 4 | HASH GROUP BY | | 1 | 4000 | 4000 |00:00:02.14 | 32759 | 31450 | 1214K| 1214K| 1485K (0)|
| 5 | INDEX FAST FULL SCAN | I_ORDERS_CLERK_DATE | 1 | 6000K| 6000K|00:00:01.06 | 32759 | 31450 | | | |
|* 6 | INDEX RANGE SCAN | I_ORDERS_CLERK_DATE | 4000 | 1 | 5335 |00:00:00.04 | 8750 | 1919 | | | |
| 7 | TABLE ACCESS BY GLOBAL INDEX ROWID| ORDERS | 5335 | 1 | 5335 |00:00:00.03 | 5300 | 1002 | | | |
---------------------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
6 - access("O_CLERK"="O_CLERK" AND "O_ORDERDATE"="MAX(O_ORDERDATE)")
结论
单纯讨论压测TPS(TPM-C)值来评估哪个数据库更强,在我看来是没有意义的,因为硬件因素有着很重要的因素,也是看不出优化器或者表底层结构的区别的;今天我只是举了一个简单的SQL为例,对比说明Oracle和MySQL的执行效率,从中可以看到,当前场景下MySQL的执行效率是比Oracle高的,这也给我自己对数据库国产化增添了信心。
数据库是基础设施,最终为业务业务服务,能满足业务需求就是好数据库。
更多文章欢迎关注本人公众号,搜dbachongzi或扫二维码
作者:姚崇 Oracle OCM、MySQL OCP、Oceanbase OBCA、PingCAP PCTA认证,擅长基于Oracle、MySQL Performance Turning及多种关系型 NoSQL数据库。