本文最后更新于 320 天前,其中的信息可能已经过时,如有错误请发送邮件到wuxianglongblog@163.com
zookeeper集群优化及压测案例
一.zookeeper集群优化
1.调大zookeeper的堆内存大小
vi /oldboyedu/softwares/zookeeper/conf/java.env
#!/bin/bash
#指定zookeeper的heap内存大小
export JVMFLAGS="-Xms256m -Xmx256m $JVMFLAGS"
温馨提示:
默认的堆内存大小为1G,此处我将内存修改为256M进行测试,生产环境建议配置为2G-4G即可!
2.生产调优参数模板参考
vi /oldboyedu/softwares/zookeeper/conf/zoo.cfg
# 滴答,计时的基本单位,默认是2000毫秒,即2秒。它是zookeeper最小的时间单位,用于丈量心跳时间和超时时间等,通常设置成默认2秒即可。
tickTime=2000
# 初始化限制是10滴答,默认是10个滴答,即默认是20秒。指定follower节点初始化是链接leader节点的最大tick次数。
initLimit=5
# 数据同步的时间限制,默认是5个滴答,即默认时间是10秒。设定了follower节点与leader节点进行同步的最大时间。与initLimit类似,它也是以tickTime为单位进行指定的。
syncLimit=2
# 指定zookeeper的工作目录,这是一个非常重要的参数,zookeeper会在内存中在内存只能中保存系统快照,并定期写入该路径指定的文件夹中。生产环境中需要注意该文件夹的磁盘占用情况。
dataDir=/oldboyedu/zookeeper
# 监听zookeeper的默认端口。zookeeper监听客户端链接的端口,一般设置成默认2181即可。
clientPort=2181
# 这个操作将限制连接到 ZooKeeper 的客户端的数量,限制并发连接的数量,它通过 IP 来区分不同的客户端。此配置选项可以用来阻止某些类别的 Dos 攻击。将它设置为 0 或者忽略而不进行设置将会取消对并发连接的限制。
#maxClientCnxns=60
# 在上文中已经提到,3.4.0及之后版本,ZK提供了自动清理事务日志和快照文件的功能,这个参数指定了清理频率,单位是小时,需要配置一个1或更大的整数,默认是0,表示不开启自动清理功能。
#autopurge.purgeInterval=1
# 这个参数和上面的参数搭配使用,这个参数指定了需要保留的文件数目。默认是保留3个。
#autopurge.snapRetainCount=3
#server.A=B:C:D[:E]
# A:
# myid文件的名称,唯一标识一个zookeeper实例.
# B:
# myid对应的主机地址.
# C:
# leader的选举端口,谁是leader,哪个zookeeper实例就有相应的端口.
# D:
# 数据传输端口.
# E:
# 指定zookeeper的角色,分为"participant(参与者)"和"observer(观察者)"
# participant角色可以投票选举为leader,而observer无法参与leader的选举,也无法进行投票!
server.106=10.0.0.106:2888:3888:observer
server.107=10.0.0.107:2888:3888:participant
server.108=10.0.0.108:2888:3888:participant
# 跳过权限检查
# skipACL=yes
# 开启4字命令白名单.
4lw.commands.whitelist=*
3.调优指南
生产环境中可以修改zookeeper的数据存储目录,JVM的堆内存大小,配置集群相关参数调优即可.
二.zookeeper集群压测案例
[root@elk102.oldboyedu.com ~]# docker run --rm --name ztest -it daocloud.io/daocloud/zookeeper:feature-pressure_test /bin/bash
Unable to find image 'daocloud.io/daocloud/zookeeper:feature-pressure_test' locally
Trying to pull repository daocloud.io/daocloud/zookeeper ...
feature-pressure_test: Pulling from daocloud.io/daocloud/zookeeper
4fd9376ba1c2: Pull complete
a3ed95caeb02: Pull complete
eccf25f58b53: Pull complete
cb1b2495580e: Pull complete
b185638a17e1: Pull complete
f3f8f2ef51dc: Pull complete
81efe7cd2a85: Pull complete
5525b9d5e84c: Pull complete
480c378eb9ff: Pull complete
Digest: sha256:39fd8d3cdee744674f1fadb2f71d2339d3504d4a988095736c2bb9f4d4201a64
Status: Downloaded newer image for daocloud.io/daocloud/zookeeper:feature-pressure_test
root@db5d2bee4be3:/local/git/zookeeper-benchmark#
root@db5d2bee4be3:/local/git/zookeeper-benchmark# ls
LICENSE README.md all.plot benchmark.conf multi.plot pom.xml runBenchmark.sh src target
root@db5d2bee4be3:/local/git/zookeeper-benchmark#
root@db5d2bee4be3:/local/git/zookeeper-benchmark# vim benchmark.conf
root@db5d2bee4be3:/local/git/zookeeper-benchmark#
root@db5d2bee4be3:/local/git/zookeeper-benchmark# egrep -v "^#|^$" benchmark.conf
totalTime=30000
interval=200
totalOperations=20000
lowerbound=8000
sync=true
server.106=10.0.0.106:2888:3888:observer
server.107=10.0.0.107:2888:3888:participant
server.108=10.0.0.108:2888:3888:participant
root@db5d2bee4be3:/local/git/zookeeper-benchmark#
root@db5d2bee4be3:/local/git/zookeeper-benchmark# ./runBenchmark.sh test01 ./benchmark.conf
Using configuration: ./benchmark.conf
Detailed logs going to: zk-benchmark.log
Running warm-up benchmark for 30 seconds...
...(出现如下的错误信息请直接忽略,因为这是并发量较大时,目前的zookeeper集群已经处理不过来足够多的请求了,会初心对应的错误。)
21/04/30 03:32:14 ERROR curator.ConnectionState: Connection timed out for connection string (172.200.1.101:2181) and timeout (5000) / elapsed (50
58)org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
at com.netflix.curator.utils.EnsurePath$InitialHelper$1.call(EnsurePath.java:110)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:86)
at com.netflix.curator.utils.EnsurePath$InitialHelper.ensure(EnsurePath.java:102)
at com.netflix.curator.utils.EnsurePath.ensure(EnsurePath.java:90)
at com.netflix.curator.framework.imps.NamespaceImpl.ensurePath(NamespaceImpl.java:52)
at com.netflix.curator.framework.imps.NamespaceImpl.fixForNamespace(NamespaceImpl.java:34)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.fixForNamespace(CuratorFrameworkImpl.java:504)
at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:138)
at com.netflix.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:35)
at edu.brown.cs.zkbenchmark.BenchmarkClient.run(BenchmarkClient.java:91)
at java.lang.Thread.run(Thread.java:745)
...
done
Running znode read benchmark for 30 seconds... done
Running repeated single-znode write benchmark for 30 seconds... done
Running znode create benchmark for 30 seconds... done
Running different znode write benchmark for 30 seconds... done
Running znode delete benchmark for 30 seconds... done
root@db5d2bee4be3:/local/git/zookeeper-benchmark#
root@1a202ecfa2d4:/local/git/zookeeper-benchmark# cat test01/zk-benchmark.log # 注意哈,我下面删除了不少东西,只是大致做了一个压测,从结果可以看到我们现有集群存在的瓶颈。
21/04/30 08:54:57 INFO zkbenchmark.ZooKeeperBenchmark: Loading benchmark from configuration file: benchmark.conf
21/04/30 08:54:57 INFO zkbenchmark.ZooKeeperBenchmark: benchmark set with: interval: 200 total number: 20000 threshold: 8000 time: 30000 sync: SYNC
...
21/04/30 08:55:42 INFO zkbenchmark.BenchmarkClient: Client #0 -- Current test complete. Completed 28253 operations.
21/04/30 08:55:42 INFO zkbenchmark.ZooKeeperBenchmark: READ finished, time elapsed (sec): 30.0 operations: 93533 avg rate: 3117.766666666667
21/04/30 08:55:42 INFO zkbenchmark.BenchmarkClient: Client #0 is sending srst command:
Server stats reset.
...
21/04/30 08:56:12 INFO zkbenchmark.BenchmarkClient: Client #2 -- Current test complete. Completed 28211 operations.
21/04/30 08:56:12 INFO zkbenchmark.ZooKeeperBenchmark: READ finished, time elapsed (sec): 30.0 operations: 89001 avg rate: 2966.7
...
21/04/30 08:56:42 INFO zkbenchmark.BenchmarkClient: Client #0 -- Current test complete. Completed 7396 operations.
21/04/30 08:56:42 INFO zkbenchmark.ZooKeeperBenchmark: SETSINGLE finished, time elapsed (sec): 30.0 operations: 22863 avg rate: 762.1
21/04/30 08:56:42 INFO zkbenchmark.BenchmarkClient: Client #0 is sending srst command:
...
21/04/30 08:57:12 INFO zkbenchmark.BenchmarkClient: Client #1 -- Current test complete. Completed 7042 operations.
21/04/30 08:57:12 INFO zkbenchmark.ZooKeeperBenchmark: CREATE finished, time elapsed (sec): 30.0 operations: 22646 avg rate: 754.8666666666667
...
21/04/30 08:57:42 INFO zkbenchmark.BenchmarkClient: Client #1 -- Current test complete. Completed 7333 operations.
21/04/30 08:57:42 INFO zkbenchmark.ZooKeeperBenchmark: SETMULTI finished, time elapsed (sec): 30.0 operations: 23485 avg rate: 782.8333333333334
21/04/30 08:57:42 INFO zkbenchmark.BenchmarkClient: Client #0 is sending srst command:
...
21/04/30 08:58:12 INFO zkbenchmark.BenchmarkClient: Client #0 -- Current test complete. Completed 7828 operations.
21/04/30 08:58:12 INFO zkbenchmark.ZooKeeperBenchmark: DELETE finished, time elapsed (sec): 30.0 operations: 24171 avg rate: 805.7
21/04/30 08:58:12 INFO zkbenchmark.ZooKeeperBenchmark: Tests completed, now cleaning-up
21/04/30 08:58:12 INFO zkbenchmark.ZooKeeperBenchmark: All tests are complete
root@1a202ecfa2d4:/local/git/zookeeper-benchmark#
...