Elasticsearch 7.10.1集群压测报告(4核16G*3,AMD)
本文描述问题及解决方法同样适用于 腾讯云 Elasticsearch Service(ES)。
另外使用到:腾讯云 云服务器(Cloud Virtual Machine,CVM)
本文延续上一篇 Elasticsearch压测工具esrally部署之踩坑实录(二)
本文另有延续:
Elasticsearch 7.10.1集群3节点4核16G压测报告(Intel)
Elasticsearch 7.10.1压测对比(4核16G*3,AMD vs Intel)
环境配置
注:这套环境配置为本文验证通过的环境配置及版本,避免踩坑请尽量按照环境配置里提到的配置及版本
Esrally客户端环境
- 版本
Linux环境:Centos 7.9
Python:3.8.7
Pip:pip 20.2.3 from pip (python 3.8)
Java:openjdk version 1.8.0_302 (build 1.8.0_302-b08)
Git:2.7.5
Esrally:2.3.0
- 配置
内存:32G
硬盘:SSD云硬盘 100GB
CPU个数:1
CPU核心数:16
Elasticsearch服务端环境
- 版本
Linux环境:Centos 7.2
Java:openjdk version 11.0.9.1-ga (build 11.0.9.1-ga+1, mixed mode)
Elasticsearch版本:7.10.1(腾讯云 Elasticsearch Service 白金版)
- 配置
节点数量:3
内存:16G
硬盘:SSD云硬盘 1TB
CPU个数:1
CPU核心数:4
CPU型号:AMD EPYC 7K62 48-Core Processor
背景
在大数据时代的今天,业务量越来越大,每天动辄都会产生上百GB、上TB的数据,所以拥有一个性能强劲的Elasticsearch集群就显得尤为重要。我们需要模拟大量网络日志、用户行为日志的读写动作,衡量各性能的指标,找出集群瓶颈所在,以确认我们需要怎样的硬件配置以及业务优化,才能满足现有的业务量,这就是我们在业务上线前所必要做的。
压测
esrally 相关术语及参数
Rally 是汽车拉力赛的意思,所以关于它里面术语也是跟汽车的拉力赛有关。
- track: 即赛道的意思,这里指压测用到的样本数据和压测策略,使用
esrally list tracks
列出。rally 自带的 track 可在 https://github.com/elastic/rally-tracks 中查看,每个 track 的文件名中都存在 README.md 对压测的数据类型和参数做了详细的说明。如果没有指定 track, 则默认使用 geonames track 进行测试; - target-hosts:即远程elasticsearch的ip和端口,以ip:port的形式指定;
- pipeline: 指一个压测流程,可以通过
esrally list pipeline
查看,其中有一个benchmark-only
的流程,就是将 es 的管理交给用户来操作,rally 只用来做压测,如果你想针对已有的 es 进行压测,则使用该模式; - track-params:对默认的压测参数进行覆盖;
- user-tag:本次压测的 tag 标记;
- client-options:指定一些客户端连接选项,比如用户名和密码。
压测指令
esrally race \\
--track=geonames \\
--target-hosts=10.0.10.4:9200 \\
--pipeline=benchmark-only \\
--track-params="number_of_shards:3, number_of_replicas:1" \\
--user-tag="version:AMD_4C16G_1T*3" \\
--client-options="basic_auth_user:'elastic', basic_auth_password:'your_password'"
压测报告
压测指标 |
压测任务 |
压测结果 |
单位 |
---|---|---|---|
Cumulative indexing time of primary |
|
16.6515 |
min |
Min cumulative indexing time across |
|
0 |
min |
Median cumulative indexing time across |
|
0.001258 |
min |
Max cumulative indexing time across |
|
5.89373 |
min |
Cumulative indexing throttle time of |
|
0 |
min |
Min cumulative indexing throttle time |
|
0 |
min |
Median cumulative indexing throttle time |
|
0 |
min |
Max cumulative indexing throttle time |
|
0 |
min |
Cumulative merge time of primary |
|
5.12393 |
min |
Cumulative merge count of primary |
|
113 |
|
Min cumulative merge time across primary |
|
0 |
min |
Median cumulative merge time across |
|
0.001775 |
min |
Max cumulative merge time across primary |
|
1.8119 |
min |
Cumulative merge throttle time of |
|
0.954067 |
min |
Min cumulative merge throttle time |
|
0 |
min |
Median cumulative merge throttle time |
|
0 |
min |
Max cumulative merge throttle time |
|
0.367133 |
min |
Cumulative refresh time of primary |
|
1.98815 |
min |
Cumulative refresh count of primary |
|
1037 |
|
Min cumulative refresh time across |
|
0 |
min |
Median cumulative refresh time across |
|
0.007558 |
min |
99th percentile service time |
phrase |
4.84451 |
ms |
99.9th percentile service time |
phrase |
22.3893 |
ms |
100th percentile service time |
phrase |
38.3952 |
ms |
error rate |
phrase |
0 |
% |
Min Throughput |
country_agg_uncached |
2.99 |
ops/s |
Mean Throughput |
country_agg_uncached |
2.99 |
ops/s |
Median Throughput |
country_agg_uncached |
2.99 |
ops/s |
Max Throughput |
country_agg_uncached |
2.99 |
ops/s |
50th percentile latency |
country_agg_uncached |
263.52 |
ms |
90th percentile latency |
country_agg_uncached |
274.372 |
ms |
99th percentile latency |
country_agg_uncached |
298.735 |
ms |
100th percentile latency |
country_agg_uncached |
307.146 |
ms |
50th percentile service time |
country_agg_uncached |
262.61 |
ms |
90th percentile service time |
country_agg_uncached |
273.913 |
ms |
99th percentile service time |
country_agg_uncached |
297.431 |
ms |
100th percentile service time |
country_agg_uncached |
306.319 |
ms |
error rate |
country_agg_uncached |
0 |
% |
Min Throughput |
country_agg_cached |
97.24 |
ops/s |
Mean Throughput |
country_agg_cached |
97.97 |
ops/s |
Median Throughput |
country_agg_cached |
98.03 |
ops/s |
Max Throughput |
country_agg_cached |
98.48 |
ops/s |
50th percentile latency |
country_agg_cached |
2.28737 |
ms |
90th percentile latency |
country_agg_cached |
3.5575 |
ms |
99th percentile latency |
country_agg_cached |
3.89999 |
ms |
99.9th percentile latency |
country_agg_cached |
17.5393 |
ms |
100th percentile latency |
country_agg_cached |
23.545 |
ms |
50th percentile service time |
country_agg_cached |
1.53483 |
ms |
90th percentile service time |
country_agg_cached |
1.83408 |
ms |
99th percentile service time |
country_agg_cached |
2.41103 |
ms |
99.9th percentile service time |
country_agg_cached |
6.71132 |
ms |
100th percentile service time |
country_agg_cached |
23.0925 |
ms |
error rate |
country_agg_cached |
0 |
% |
Min Throughput |
scroll |
20.03 |
pages/s |
Mean Throughput |
scroll |
20.03 |
pages/s |
Median Throughput |
scroll |
20.03 |
pages/s |
Max Throughput |
scroll |
20.04 |
pages/s |
50th percentile latency |
scroll |
603.447 |
ms |
90th percentile latency |
scroll |
617.022 |
ms |
99th percentile latency |
scroll |
619.746 |
ms |
100th percentile latency |
scroll |
629.479 |
ms |
50th percentile service time |
scroll |
601.664 |
ms |
90th percentile service time |
scroll |
615.662 |
ms |
99th percentile service time |
scroll |
618.066 |
ms |
100th percentile service time |
scroll |
627.276 |
ms |
error rate |
scroll |
0 |
% |
Min Throughput |
expression |
1.5 |
ops/s |
Mean Throughput |
expression |
1.5 |
ops/s |
Median Throughput |
expression |
1.5 |
ops/s |
Max Throughput |
expression |
1.5 |
ops/s |
50th percentile latency |
expression |
480.244 |
ms |
90th percentile latency |
expression |
497.217 |
ms |
99th percentile latency |
expression |
535.208 |
ms |
100th percentile latency |
expression |
677.822 |
ms |
50th percentile service time |
expression |
478.77 |
ms |
90th percentile service time |
expression |
496.211 |
ms |
99th percentile service time |
expression |
534.702 |
ms |
100th percentile service time |
expression |
677.301 |
ms |
error rate |
expression |
0 |
% |
Min Throughput |
painless_static |
1.4 |
ops/s |
Mean Throughput |
painless_static |
1.4 |
ops/s |
Median Throughput |
painless_static |
1.4 |
ops/s |
Max Throughput |
painless_static |
1.4 |
ops/s |
50th percentile latency |
painless_static |
610.906 |
ms |
90th percentile latency |
painless_static |
650.136 |
ms |
99th percentile latency |
painless_static |
731.019 |
ms |
100th percentile latency |
painless_static |
770.706 |
ms |
50th percentile service time |
painless_static |
610.065 |
ms |
90th percentile service time |
painless_static |
648.115 |
ms |
99th percentile service time |
painless_static |
730.345 |
ms |
100th percentile service time |
painless_static |
769.865 |
ms |
error rate |
painless_static |
0 |
% |
Min Throughput |
painless_dynamic |
1.4 |
ops/s |
Mean Throughput |
painless_dynamic |
1.4 |
ops/s |
Median Throughput |
painless_dynamic |
1.4 |
ops/s |
Max Throughput |
painless_dynamic |
1.4 |
ops/s |
50th percentile latency |
painless_dynamic |
608.727 |
ms |
90th percentile latency |
painless_dynamic |
649.741 |
ms |
99th percentile latency |
painless_dynamic |
695.298 |
ms |
100th percentile latency |
painless_dynamic |
702.601 |
ms |
50th percentile service time |
painless_dynamic |
608.169 |
ms |
90th percentile service time |
painless_dynamic |
649.34 |
ms |
99th percentile service time |
painless_dynamic |
694.455 |
ms |
100th percentile service time |
painless_dynamic |
701.651 |
ms |
error rate |
painless_dynamic |
0 |
% |
Min Throughput |
decay_geo_gauss_function_score |
1 |
ops/s |
Mean Throughput |
decay_geo_gauss_function_score |
1 |
ops/s |
Median Throughput |
decay_geo_gauss_function_score |
1 |
ops/s |
Max Throughput |
decay_geo_gauss_function_score |
1 |
ops/s |
50th percentile latency |
decay_geo_gauss_function_score |
560.088 |
ms |
90th percentile latency |
decay_geo_gauss_function_score |
616.046 |
ms |
99th percentile latency |
decay_geo_gauss_function_score |
644.189 |
ms |
100th percentile latency |
decay_geo_gauss_function_score |
652.326 |
ms |
50th percentile service time |
decay_geo_gauss_function_score |
558.796 |
ms |
90th percentile service time |
decay_geo_gauss_function_score |
614.672 |
ms |
99th percentile service time |
decay_geo_gauss_function_score |
643.052 |
ms |
100th percentile service time |
decay_geo_gauss_function_score |
650.823 |
ms |
error rate |
decay_geo_gauss_function_score |
0 |
% |
Min Throughput |
decay_geo_gauss_script_score |
1 |
ops/s |
Mean Throughput |
decay_geo_gauss_script_score |
1 |
ops/s |
Median Throughput |
decay_geo_gauss_script_score |
1 |
ops/s |
Max Throughput |
decay_geo_gauss_script_score |
1 |
ops/s |
50th percentile latency |
decay_geo_gauss_script_score |
575.714 |
ms |
90th percentile latency |
decay_geo_gauss_script_score |
602.96 |
ms |
99th percentile latency |
decay_geo_gauss_script_score |
629.875 |
ms |
100th percentile latency |
decay_geo_gauss_script_score |
643.619 |
ms |
50th percentile service time |
decay_geo_gauss_script_score |
574.411 |
ms |
90th percentile service time |
decay_geo_gauss_script_score |
602.263 |
ms |
99th percentile service time |
decay_geo_gauss_script_score |
628.526 |
ms |
100th percentile service time |
decay_geo_gauss_script_score |
641.251 |
ms |
error rate |
decay_geo_gauss_script_score |
0 |
% |
Min Throughput |
field_value_function_score |
1.5 |
ops/s |
Mean Throughput |
field_value_function_score |
1.5 |
ops/s |
Median Throughput |
field_value_function_score |
1.5 |
ops/s |
Max Throughput |
field_value_function_score |
1.5 |
ops/s |
50th percentile latency |
field_value_function_score |
231.966 |
ms |
90th percentile latency |
field_value_function_score |
268.346 |
ms |
99th percentile latency |
field_value_function_score |
332.754 |
ms |
100th percentile latency |
field_value_function_score |
334.69 |
ms |
50th percentile service time |
field_value_function_score |
230.874 |
ms |
90th percentile service time |
field_value_function_score |
267.318 |
ms |
99th percentile service time |
field_value_function_score |
332.027 |
ms |
100th percentile service time |
field_value_function_score |
333.704 |
ms |
error rate |
field_value_function_score |
0 |
% |
Min Throughput |
field_value_script_score |
1.5 |
ops/s |
Mean Throughput |
field_value_script_score |
1.5 |
ops/s |
Max Throughput |
desc_sort_with_after_geonameid |
6.01 |
ops/s |
50th percentile latency |
desc_sort_with_after_geonameid |
125.9 |
ms |
90th percentile latency |
desc_sort_with_after_geonameid |
151.684 |
ms |
99th percentile latency |
desc_sort_with_after_geonameid |
185.673 |
ms |
100th percentile latency |
desc_sort_with_after_geonameid |
200.655 |
ms |
50th percentile service time |
desc_sort_with_after_geonameid |
124.833 |
ms |
90th percentile service time |
desc_sort_with_after_geonameid |
148.707 |
ms |
99th percentile service time |
desc_sort_with_after_geonameid |
185.15 |
ms |
100th percentile service time |
desc_sort_with_after_geonameid |
200.042 |
ms |
error rate |
desc_sort_with_after_geonameid |
0 |
% |
Min Throughput |
asc_sort_geonameid |
6.02 |
ops/s |
Mean Throughput |
asc_sort_geonameid |
6.02 |
ops/s |
Median Throughput |
asc_sort_geonameid |
6.02 |
ops/s |
Max Throughput |
asc_sort_geonameid |
6.03 |
ops/s |
50th percentile latency |
asc_sort_geonameid |
5.46044 |
ms |
90th percentile latency |
asc_sort_geonameid |
6.02821 |
ms |
99th percentile latency |
asc_sort_geonameid |
7.26891 |
ms |
100th percentile latency |
asc_sort_geonameid |
7.97036 |
ms |
50th percentile service time |
asc_sort_geonameid |
4.58443 |
ms |
90th percentile service time |
asc_sort_geonameid |
5.08835 |
ms |
99th percentile service time |
asc_sort_geonameid |
6.91502 |
ms |
100th percentile service time |
asc_sort_geonameid |
7.10789 |
ms |
error rate |
asc_sort_geonameid |
0 |
% |
Min Throughput |
asc_sort_with_after_geonameid |
6.01 |
ops/s |
Mean Throughput |
asc_sort_with_after_geonameid |
6.01 |
ops/s |
Median Throughput |
asc_sort_with_after_geonameid |
6.01 |
ops/s |
Max Throughput |
asc_sort_with_after_geonameid |
6.01 |
ops/s |
50th percentile latency |
asc_sort_with_after_geonameid |
112.296 |
ms |
90th percentile latency |
asc_sort_with_after_geonameid |
132.813 |
ms |
99th percentile latency |
asc_sort_with_after_geonameid |
156.594 |
ms |
100th percentile latency |
asc_sort_with_after_geonameid |
176.157 |
ms |
50th percentile service time |
asc_sort_with_after_geonameid |
111.349 |
ms |
90th percentile service time |
asc_sort_with_after_geonameid |
132.107 |
ms |
99th percentile service time |
asc_sort_with_after_geonameid |
155.66 |
ms |
100th percentile service time |
asc_sort_with_after_geonameid |
175.446 |
ms |
error rate |
asc_sort_with_after_geonameid |
0 |
% |