PGXL集群搭建和实现高可用

公司新游戏《Solitaire Squirrel》要求使用postgreSql数据库,据说优于mongodb(不知道从什么角度看的),于是着手开搞。在搭建数据库集群之前首先恶补基础知识(postgreSql、postgres-xc、postgres-xl)……一周以后,小有心得~~

基本概念

PostgreSQL是完全支持ACID(原子性-Atomicity、一致性-Consistency、隔离性-Isolation、持久性-Durability)特性的开源数据库;Postgres-XC是开源的,写可扩展的,同步对称的,多租户安全的PostgreSQL集群方案;Postgres-XL出于XC而胜于XC。

PGXL集群架构(摘自这里

PGXL是一系列PostgreSQL数据库的集群,在上层看来就像使用一个数据库一样。根据设计方案的不同,每张表可以是replicated或是distributed的形式。

PGXL有三个主要组件,分别是GTM,Coordinator和Datanode。

  • GTM(Gloable Transaction Manager)负责提供事务的ACID属性。
  • Datanode负责存储表的数据和本地执行由Coordinator派发的SQL任务。
  • Coordinator负责处理每个来自Application的SQL任务,并且决定由哪个Datanode执行,然后将任务计划派发给相应的Datanode,根据需要收集结果返还给Application。

GTM通常由一台独立的服务器承担,因为GTM需要处理来自所有Coordinator和Datanode的事务请求。为了将Coordinator和Datanode上进程的请求和响应聚集到一台机器上,可以配置GTM-Proxy。GTM-Proxy会减少GTM的负载,同时会帮助处理GTM失效的情况。即便如此,GTM还是可能会发生单点失效问题,这时可以配置一个GTM-Standby节点作为GTM的备用节点。

每台机器最好同时配置一个Coordinator和一个Datanode,这样既不用担心二者的负载均衡,而且可以降低网络流量。

pgxl_001

PGXL集群环境搭建(pgxl-v9.2)

两台主机:
172.16.0.3 node1
172.16.0.4 node2
操作系统:CentOS7
node1部署gtm、gtm-proxy、datanode1、coordinator1、datanode2-standby、coordinator2-standby
node2部署gtm-standby、gtm-proxy、datanode2、coordinator2、datanode1-standby、coordinator1-standby

由于当前用户规模较小,所以仅用两台机器实现最小规模的集群环境,任意一台机器宕机做故障切换后都不会破坏数据和组件完整。

PGXL集群的管理(初始化、启动停止、故障切换、增减节点等)可以沿用pgxc_ctl脚本实现。该脚本读取含有定义集群结构的配置文件pgxc_ctl.conf,将操作步骤细节封装在各个方法内部,完成集群管理。通过此方式实现集群管理简单可靠,但有两点需要注意:

  1. 对集群执行pgxc_ctl脚本的机器必须被其他各个节点机器ssh信任。
  2. 某些方法会动态更新pgxc_ctl.conf的内容,比如增减节点、故障切换等。

通常指定一台机器作为管理者,保证每次执行命令加载或修改的pgxc_ctl.conf是同一个,集群中的各个节点机器ssh信任这个管理者。这里选择node1作为管理者,用于通过执行pgxc_ctl命令来管理整个集群。

  • 各节点新建用户pgxl。
useradd pgxl
su pgxl
mkdir ~/.ssh
chmod 700 ~/.ssh
  • 各节点对管理者node1用户pgxl设置ssh信任。

登录管理者node1

su pgxl
ssh-keygen -t rsa   # passphrase为空
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
scp -p ~/.ssh/authorized_keys pgxl@node2:~/.ssh/  #将authorized_keys复制到其他node的pgxl用户下
  • 各节点安装编译pgxl v9.2。
yum update  #可选
yum install gcc gcc-c++ kernel-devel readline-devel flex bison bison-devel zlib zlib-devel make docbook-style-dsssl jade

wget http://sourceforge.net/projects/postgres-xl/files/Releases/Version_9.2rc/postgres-xl-v9.2-src.tar.gz/download
tar -zxvf postgres-xl-v9.2-src.tar.gz
cd postgres-xl
./configure --prefix=/usr/local/pgxl/
make
make install
vi /etc/profile
# 增加 export PATH=$PATH:/usr/local/pgxl/bin
source /etc/profile
  • 集群初始化。

登录管理者node1,安装编译pgxc_ctl

cd postgres-xl/contrib/pgxc_ctl/
./configure
make
make install

预初始化,生成pgxc_ctl.conf模板文件

su pgxl
cd ~
pgxc_ctl prepare
vi ~/pgxc_ctl/pgxc_ctl.conf

编辑pgxc_ctl.conf,内容示例:

#!/bin/bash
#
# Postgres-XC Configuration file for pgxc_ctl utility. 
#
# Configuration file can be specified as -c option from pgxc_ctl command.   Default is
# $PGXC_CTL_HOME/pgxc_ctl.org.
#
# This is bash script so you can make any addition for your convenience to configure
# your Postgres-XC cluster.
#
# Please understand that pgxc_ctl provides only a subset of configuration which pgxc_ctl
# provide.  Here's several several assumptions/restrictions pgxc_ctl depends on.
#
# 1) All the resources of pgxc nodes has to be owned by the same user.   Same user means
#    user with the same user name.  User ID may be different from server to server.
#    This must be specified as a variable $pgxcOwner.
#
# 2) All the servers must be reacheable via ssh without password.   It is highly recommended
#    to setup key-based authentication among all the servers.
#
# 3) All the databases in coordinator/datanode has at least one same superuser.  Pgxc_ctl
#    uses this user to connect to coordinators and datanodes.   Again, no password should
#    be used to connect.  You have many options to do this, pg_hba.conf, pg_ident.conf and
#    others.  Pgxc_ctl provides a way to configure pg_hba.conf but not pg_ident.conf.   This
#    will be implemented in the later releases.
#
# 4) Gtm master and slave can have different port to listen, while coordinator and datanode
#    slave should be assigned the same port number as master.
#
# 5) Port nuber of a coordinator slave must be the same as its master.
#
# 6) Master and slave are connected using synchronous replication.  Asynchronous replication
#    have slight (almost none) chance to bring total cluster into inconsistent state.
#    This chance is very low and may be negligible.  Support of asynchronous replication
#    may be supported in the later release.
#
# 7) Each coordinator and datanode can have only one slave each.  Cascaded replication and
#    multiple slave are not supported in the current pgxc_ctl.
#
# 8) Killing nodes may end up with IPC resource leak, such as semafor and shared memory.
#    Only listening port (socket) will be cleaned with clean command.
#
# 9) Backup and restore are not supported in pgxc_ctl at present.   This is a big task and
#    may need considerable resource.
#
#========================================================================================
#
#
# pgxcInstallDir variable is needed if you invoke "deploy" command from pgxc_ctl utility.
# If don't you don't need this variable.
pgxcInstallDir=/data1/pgxc
#---- OVERALL -----------------------------------------------------------------------------
#
pgxcOwner=pgxl		# owner of the Postgres-XC databaseo cluster.  Here, we use this
						# both as linus user and database user.  This must be
						# the super user of each coordinator and datanode.
pgxcUser=$pgxcOwner		# OS user of Postgres-XC owner

tmpDir=/tmp					# temporary dir used in XC servers
localTmpDir=$tmpDir			# temporary dir used here locally

configBackup=n					# If you want config file backup, specify y to this value.
configBackupHost=pgxc-linker	# host to backup config file
configBackupDir=/data1/pgxc		# Backup directory
configBackupFile=pgxc_ctl.bak	# Backup file name --> Need to synchronize when original changed.

#---- GTM ------------------------------------------------------------------------------------

# GTM is mandatory.  You must have at least (and only) one GTM master in your Postgres-XC cluster.
# If GTM crashes and you need to reconfigure it, you can do it by pgxc_update_gtm command to update
# GTM master with others.   Of course, we provide pgxc_remove_gtm command to remove it.  This command
# will not stop the current GTM.  It is up to the operator.

#---- Overall -------
gtmName=gtm

#---- GTM Master -----------------------------------------------

#---- Overall ----
gtmMasterServer=node1
gtmMasterPort=20001
gtmMasterDir=/data1/pgxc/nodes/gtm

#---- Configuration ---
gtmExtraConfig=none			# Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none	# Will be added to Master's gtm.conf (done at initialization only)

#---- GTM Slave -----------------------------------------------

# Because GTM is a key component to maintain database consistency, you may want to configure GTM slave
# for backup.

#---- Overall ------
gtmSlave=y					# Specify y if you configure GTM Slave.   Otherwise, GTM slave will not be configured and
							# all the following variables will be reset.
gtmSlaveServer=node2		# value none means GTM slave is not available.  Give none if you don't configure GTM Slave.
gtmSlavePort=20001			# Not used if you don't configure GTM slave.
gtmSlaveDir=/data1/pgxc/nodes/gtm	# Not used if you don't configure GTM slave.
# Please note that when you have GTM failover, then there will be no slave available until you configure the slave
# again. (pgxc_add_gtm_slave function will handle it)

#---- Configuration ----
gtmSlaveSpecificExtraConfig=none # Will be added to Slave's gtm.conf (done at initialization only)

#---- GTM Proxy -------------------------------------------------------------------------------------------------------
# GTM proxy will be selected based upon which server each component runs on.
# When fails over to the slave, the slave inherits its master's gtm proxy.  It should be
# reconfigured based upon the new location.
#
# To do so, slave should be restarted.   So pg_ctl promote -> (edit postgresql.conf and recovery.conf) -> pg_ctl restart
#
# You don't have to configure GTM Proxy if you dont' configure GTM slave or you are happy if every component connects
# to GTM Master directly.  If you configure GTL slave, you must configure GTM proxy too.

#---- Shortcuts ------
gtmProxyDir=/data1/pgxc/nodes/gtm_pxy

#---- Overall -------
gtmProxy=y				# Specify y if you conifugre at least one GTM proxy.   You may not configure gtm proxies
						# only when you dont' configure GTM slaves.
						# If you specify this value not to y, the following parameters will be set to default empty values.
						# If we find there're no valid Proxy server names (means, every servers are specified
						# as none), then gtmProxy value will be set to "n" and all the entries will be set to
						# empty values.
gtmProxyNames=(gtm_pxy1 gtm_pxy2)	# No used if it is not configured
gtmProxyServers=(node1 node2)			# Specify none if you dont' configure it.
gtmProxyPorts=(20002 20002)				# Not used if it is not configured.
gtmProxyDirs=($gtmProxyDir $gtmProxyDir)	# Not used if it is not configured.

#---- Configuration ----
gtmPxyExtraConfig=none		# Extra configuration parameter for gtm_proxy.  Coordinator section has an example.
gtmPxySpecificExtraConfig=(none none)

#---- Coordinators ----------------------------------------------------------------------------------------------------

#---- shortcuts ----------
coordMasterDir=/data1/pgxc/nodes/coord
coordSlaveDir=/data1/pgxc/nodes/coord_slave
coordArchLogDir=/data1/pgxc/nodes/coord_archlog

#---- Overall ------------
# coordinator      master_node     slave_node     port
#    coord1          node06          node07      20004
#    coord2          node07          node08      20005
#    coord3          node08          node09      20004
#    coord4          node09          node06      20005
#
#  => node06:20004 -> coord1:master, node06:20005 -> coord4:slave
#     node07:20004 -> coord1:slave,  node07:20005 -> coord2:master
#     ...
#
coordNames=(coord1 coord2)		# Master and slave use the same name
coordPorts=(20004 20005)			# Master and slave use the same port
poolerPorts=(20010 20011)			# Master and slave use the same pooler port
coordPgHbaEntries=(172.16.0.0/12)				# Assumes that all the coordinator (master/slave) accepts
												# the same connection
												# This entry allows only $pgxcOwner to connect.
												# If you'd like to setup another connection, you should
												# supply these entries through files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using coordExtraPgHba
# and/or coordSpecificExtraPgHba variables.

#---- Master -------------
coordMasterServers=(node1 node2)		# none means this master is not available
coordMasterDirs=($coordMasterDir $coordMasterDir)
coordMaxWALsernder=5	# max_wal_senders: needed to configure slave. If zero value is specified,
						# it is expected to supply this parameter explicitly by external files
						# specified in the following.	If you don't configure slaves, leave this value to zero.
coordMaxWALSenders=($coordMaxWALsernder $coordMaxWALsernder)
						# max_wal_senders configuration for each coordinator.

#---- Slave -------------
coordSlave=y			# Specify y if you configure at least one coordiantor slave.  Otherwise, the following
						# configuration parameters will be set to empty values.
						# If no effective server names are found (that is, every servers are specified as none),
						# then coordSlave value will be set to n and all the following values will be set to
						# empty values.
coordSlaveSync=y		# Specify to connect with synchronized mode.
coordSlaveServers=(node2 node1)			# none means this slave is not available
coordSlaveDirs=($coordSlaveDir $coordSlaveDir)
coordArchLogDirs=($coordArchLogDir $coordArchLogDir)

#---- Configuration files---
# Need these when you'd like setup specific non-default configuration 
# These files will go to corresponding files for the master.
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries 
# Or you may supply these files manually.
coordExtraConfig=coordExtraConfig	# Extra configuration file for coordinators.  
						# This file will be added to all the coordinators'
						# postgresql.conf
# Pleae note that the following sets up minimum parameters which you may want to change.
# You can put your postgresql.conf lines here.
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 100
EOF

# Additional Configuration file for specific coordinator master.
# You can define each setting by similar means as above.
coordSpecificExtraConfig=(none none none none)
coordExtraPgHba=none	# Extra entry for pg_hba.conf.  This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none none none)

#----- Additional Slaves -----
#
# Please note that this section is just a suggestion how we extend the configuration for
# multiple and cascaded replication.   They're not used in the current version.
#
coordAdditionalSlaves=n		# Additional slave can be specified as follows: where you
coordAdditionalSlaveSet=(cad1)		# Each specifies set of slaves.   This case, two set of slaves are
											# configured
cad1_Sync=n		  		# All the slaves at "cad1" are connected with asynchronous mode.
							# If not, specify "y"
							# The following lines specifies detailed configuration for each
							# slave tag, cad1.  You can define cad2 similarly.
cad1_Servers=(node08 node09 node06 node07)	# Hosts
cad1_dir=/data1/pgxc/nodes/coord_slave_cad1
cad1_Dirs=($cad1_dir $cad1_dir $cad1_dir $cad1_dir)
cad1_ArchLogDir=/data1/pgxc/nodes/coord_archlog_cad1
cad1_ArchLogDirs=($cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir)


#---- Datanodes -------------------------------------------------------------------------------------------------------

#---- Shortcuts --------------
datanodeMasterDir=/data1/pgxc/nodes/dn_master
datanodeSlaveDir=/data1/pgxc/nodes/dn_slave
datanodeArchLogDir=/data1/pgxc/nodes/datanode_archlog

#---- Overall ---------------
#primaryDatanode=datanode1				# Primary Node.
# At present, xc has a priblem to issue ALTER NODE against the primay node.  Until it is fixed, the test will be done
# without this feature.
primaryDatanode=datanode1				# Primary Node.
datanodeNames=(datanode1 datanode2)
datanodePorts=(20008 20009)	# Master and slave use the same port!
datanodePoolerPorts=(20012 20013)	# Master and slave use the same port!
datanodePgHbaEntries=(172.16.0.0/12)	# Assumes that all the coordinator (master/slave) accepts
										# the same connection
										# This list sets up pg_hba.conf for $pgxcOwner user.
										# If you'd like to setup other entries, supply them
										# through extra configuration files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using datanodeExtraPgHba
# and/or datanodeSpecificExtraPgHba variables.

#---- Master ----------------
datanodeMasterServers=(node1 node2)	# none means this master is not available.
													# This means that there should be the master but is down.
													# The cluster is not operational until the master is
													# recovered and ready to run.	
datanodeMasterDirs=($datanodeMasterDir $datanodeMasterDir)
datanodeMaxWalSender=5								# max_wal_senders: needed to configure slave. If zero value is 
													# specified, it is expected this parameter is explicitly supplied
													# by external configuration files.
													# If you don't configure slaves, leave this value zero.
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
						# max_wal_senders configuration for each datanode

#---- Slave -----------------
datanodeSlave=y			# Specify y if you configure at least one coordiantor slave.  Otherwise, the following
						# configuration parameters will be set to empty values.
						# If no effective server names are found (that is, every servers are specified as none),
						# then datanodeSlave value will be set to n and all the following values will be set to
						# empty values.
datanodeSlaveServers=(node2 node1)	# value none means this slave is not available
datanodeSlaveSync=y		# If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=( $datanodeArchLogDir $datanodeArchLogDir)

# ---- Configuration files ---
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries here.
# These files will go to corresponding files for the master.
# Or you may supply these files manually.
datanodeExtraConfig=none	# Extra configuration file for datanodes.  This file will be added to all the 
							# datanodes' postgresql.conf
datanodeSpecificExtraConfig=(none none none none)
datanodeExtraPgHba=none		# Extra entry for pg_hba.conf.  This file will be added to all the datanodes' postgresql.conf
datanodeSpecificExtraPgHba=(none none none none)

#----- Additional Slaves -----
datanodeAdditionalSlaves=n	# Additional slave can be specified as follows: where you
# datanodeAdditionalSlaveSet=(dad1 dad2)		# Each specifies set of slaves.   This case, two set of slaves are
											# configured
# dad1_Sync=n		  		# All the slaves at "cad1" are connected with asynchronous mode.
							# If not, specify "y"
							# The following lines specifies detailed configuration for each
							# slave tag, cad1.  You can define cad2 similarly.
# dad1_Servers=(node08 node09 node06 node07)	# Hosts
# dad1_dir=/data1/pgxc/nodes/coord_slave_cad1
# dad1_Dirs=($cad1_dir $cad1_dir $cad1_dir $cad1_dir)
# dad1_ArchLogDir=/data1/pgxc/nodes/coord_archlog_cad1
# dad1_ArchLogDirs=($cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir $cad1_ArchLogDir)

#---- WAL archives -------------------------------------------------------------------------------------------------
walArchive=n	# If you'd like to configure WAL archive, edit this section.
				# Pgxc_ctl assumes that if you configure WAL archive, you configure it
				# for all the coordinators and datanodes.
				# Default is "no".   Please specify "y" here to turn it on.
#
#		End of Configuration Section
#
#==========================================================================================================================

#========================================================================================================================
# The following is for extension.  Just demonstrate how to write such extension.  There's no code
# which takes care of them so please ignore the following lines.  They are simply ignored by pgxc_ctl.
# No side effects.
#=============<< Beginning of future extension demonistration >> ========================================================
# You can setup more than one backup set for various purposes, such as disaster recovery.
walArchiveSet=(war1 war2)
war1_source=(master)	# you can specify master, slave or ano other additional slaves as a source of WAL archive.
					# Default is the master
wal1_source=(slave)
wal1_source=(additiona_coordinator_slave_set additional_datanode_slave_set)
war1_host=node10	# All the nodes are backed up at the same host for a given archive set
war1_backupdir=/data1/pgxc/backup_war1
wal2_source=(master)
war2_host=node11
war2_backupdir=/data1/pgxc/backup_war2
#=============<< End of future extension demonistration >> ========================================================

执行集群初始化,初始化完成后会带起集群各组件

pgxc_ctl init all

基本操作和注意事项

访问psql数据库

psql -h node2 -U pgxl -p 32001 -d postgres  #端口是coordinator master的
psql -p 32001 -d testdb

pgxc_ctl的操作默认日志输出到 $HOME/pgxc_ctl/pgxc_log/ 下,使用监控脚本调用pgxc_ctl会产生大量log,如果不希望输出log可以类似这么写

pgxc_ctl monitor all -l /dev -L null

pgxc_ctl的操作有时候会卡住或超时,依据客观情况可以这么写

timeout 3 pgxc_ctl monitor all -l /dev -L null

pgxc_ctl remove/add slave 会改变master配置,并重启master

pgxc_ctl remove datanode slave datanode2
pgxc_ctl add datanode slave datanode2 node2 ~/pgxc/nodes/dn_master  ~/pgxc/nodes/datanode_archlog

pgxc_ctl remove slave后重新add slave会失败,可能是脚本bug或某种考虑,需要手动更改master下的postgresql.conf,然后再add

archive_mode = on
max_wal_senders = 5
wal_level = hot_standby

failover后原来的master被踢出集群管理…failover后可以直接add slave

gtm failover后记得reconnect gtm proxy,否则执行sql语句时会报错(cannot assign TransactionIds during recovery)

pgxc_ctl init操作是个危险操作,尤其是init all,会将旧数据清空

实现HA

  • 自动故障切换(auto failover)。

现状:PGXL集群目前并没有在内部实现自动主备切换,需要借助外力实现高可用。

实现思路:每隔一段时间观察集群各节点状态,如果状态为Not Running则尝试重启;其他状态或重启失败的前提下,如果该组件为master且存在运行的slave则执行failover。

  • 监控脚本消除单点失败。

现状:通过pgxc_ctl脚本管理的集群如果只有一个管理者,理论上存在单点失败问题,尤其是管理者本身也包含集群组件时。

实现思路:利用zookeeper保存pgxc_ctl.conf,自动故障切换的监控脚本首先观察自己是否是zk master节点,如果是则从zk集群pull最新的pgxc_ctl.conf并执行核心业务逻辑;当failover发生时,push修改后的pgxc_ctl.conf到zk集群;手动通过pgxc_ctl调整pgxl集群,需要停止所有监控,在zk master上操作,完成后push pgxc_ctl.conf到zk集群

  • 编写监控脚本pgxl_ha.sh,注册systemctl服务并启动
#!/bin/bash
##
# pgxc zookeeper
# version 1.0.0
#
##

GTM_PROXY_NAMES=
DATANODE_NAMES=
COORDINATOR_NAMES=

TIMEOUT=3
INTERVAL=5

LOG_FILE=pgxl_ha.log
CONFIG_BACKUP_DIR=$HOME/pgxc_ctl/backup
CONFIG_FILE=$HOME/pgxc_ctl/pgxc_ctl.conf
PGXC_LOG_DIR=/dev
PGXC_LOG_FILE=null
MONITOR_GREP_TARGET="pgxl_ha.sh monitor"

ZOOKEEPER=on
#ZK_HOME=/data/zookeeper-3.4.7
#ZK_CLIENT_GROUP="node1:2181,node2:2181"
ZK_HOME=/opt/zookeeper-3.4.6
ZK_CLIENT_GROUP="pgxl-01:2181,pgxl-02:2181,web-01:2181"
ZK_ZNODE_ROOT=/squirrel
ZK_ZNODE_MODULE=$ZK_ZNODE_ROOT/pgxl
ZK_ZNODE_CONF=$ZK_ZNODE_MODULE/pgxc_ctl
ZK_ZNODE_VERSION=$ZK_ZNODE_MODULE/pgxc_ctl_version
ZK_GET_VALUE_LINE=3

var_monitor_status=0
var_zk_conf_version=0
var_zk_server_status=0

pgxl_init(){
    if [ "on" == "$ZOOKEEPER" ]; then
        pgxl_log "Postgres-XL zookeeper on."
        mkdir -p $CONFIG_BACKUP_DIR
        
        # init znodes
        echo "create $ZK_ZNODE_ROOT null" > /tmp/tmpzksh.info
        echo "create $ZK_ZNODE_MODULE null" >> /tmp/tmpzksh.info
        echo "create $ZK_ZNODE_CONF null" >> /tmp/tmpzksh.info
        echo "create $ZK_ZNODE_VERSION null" >> /tmp/tmpzksh.info
        echo "quit" >> /tmp/tmpzksh.info
        $ZK_HOME/bin/zkCli.sh -server $ZK_CLIENT_GROUP < /tmp/tmpzksh.info
        
        # init var_zk_server_status 1 leader or 0 follower
        local status=$($ZK_HOME/bin/zkServer.sh status | tail -n1)
        if [ "Mode: leader" == "$status" ]; then
            var_zk_server_status=1
        else
            var_zk_server_status=0
        fi
        
        # init pgxc_ctl.conf push or pull
        echo "get $ZK_ZNODE_VERSION" > /tmp/tmpzksh.info
        echo "quit" >> /tmp/tmpzksh.info
        local version=$($ZK_HOME/bin/zkCli.sh -server $ZK_CLIENT_GROUP < /tmp/tmpzksh.info |tail -n$ZK_GET_VALUE_LINE |head -n1)
        if [ "null" == "$version" ]; then
            if [ $var_zk_server_status -eq 1 ]; then
                pgxl_conf_push
            fi
        else
            pgxl_conf_pull $version
        fi
    fi
}

pgxl_monitor_startup(){
    #lock running
    exec 7<> .pgxl_ha.lock
    flock -n 7
    if [ $? -eq 1 ]; then
        return 1
    fi
    
    pgxl_log "Postgres-XL pgxl_ha monitor start."
    
    #init
    pgxl_init
    
    #monitor
    while [ $var_monitor_status -eq 0 ]
    do
        if [ "on" == "$ZOOKEEPER" ]; then
            local status=$(timeout $TIMEOUT $ZK_HOME/bin/zkServer.sh status | tail -n1)
            if [ "Mode: leader" == "$status" ]; then
                # follower change to leader. need pull
                if [ $var_zk_server_status -eq 0 ]; then
                    pgxl_conf_pull
                    var_zk_server_status=1
                fi
                
                # main
                pgxl_conf_load
                pgxl_keepalive_all
            else
                var_zk_server_status=0
            fi
        else
            pgxl_conf_load
            pgxl_keepalive_all
        fi
        sleep $INTERVAL
    done
}

pgxl_monitor_shutdown(){
    local pids=`ps -fe |grep "$MONITOR_GREP_TARGET" | grep -v "grep" | awk '{print $2}'`
    local pidarr=($pids)
    local length=${#pidarr[@]}
    local pid
    
    for ((i=0; i<$length; i++))
    do
        pid=${pidarr[$i]}
        kill -9 $pid
    done
    pgxl_log "Postgres-XL pgxl_ha monitor shutdown."
}

pgxl_conf_load(){
    while read line
    do
        echo $line
        eval "$line"
    done < $CONFIG_FILE

    GTM_PROXY_NAMES=${gtmProxyNames[@]}
    DATANODE_NAMES=${datanodeNames[@]}
    COORDINATOR_NAMES=${coordNames[@]}
}

pgxl_conf_pull(){
    if [ "on" == "$ZOOKEEPER" ]; then
        # got version
        local version
        if [ "x$1" == "x" ]; then
            echo "get $ZK_ZNODE_VERSION" > /tmp/tmpzksh.info
            echo "quit" >> /tmp/tmpzksh.info
            version=$(timeout $TIMEOUT $ZK_HOME/bin/zkCli.sh -server $ZK_CLIENT_GROUP < /tmp/tmpzksh.info |tail -n$ZK_GET_VALUE_LINE |head -n1)
        else
            version=$1
        fi
        
        if [ "$var_zk_conf_version" == "$version" ]; then
            :
        elif [[ $version == VERSION_* ]]; then
            echo "get $ZK_ZNODE_CONF" > /tmp/tmpzksh.info
            echo "quit" >> /tmp/tmpzksh.info
            $ZK_HOME/bin/zkCli.sh -server $ZK_CLIENT_GROUP < /tmp/tmpzksh.info |tail -n$ZK_GET_VALUE_LINE |head -n1 |tr "/" "\n" |xxd -p -r > /tmp/tmpconf.pull
            
            cp $CONFIG_FILE $CONFIG_BACKUP_DIR/conf.bak.$(date '+%Y-%m-%d%H:%M:%S')
            
            cp /tmp/tmpconf.pull $CONFIG_FILE
            
            var_zk_conf_version=$version
            
            pgxl_log "Postgres-XL pull conf from zookeeper cluster. version: "$var_zk_conf_version
        fi
    fi
}

pgxl_conf_push(){
    if [ "on" == "$ZOOKEEPER" ]; then
        cat $CONFIG_FILE | xxd -p | tr "\n" "/" > /tmp/tmpconf.push
        
        echo "set $ZK_ZNODE_CONF `cat /tmp/tmpconf.push`" > /tmp/tmpconf.push
        echo "quit" >> /tmp/tmpconf.push
        $ZK_HOME/bin/zkCli.sh -server $ZK_CLIENT_GROUP < /tmp/tmpconf.push
        
        var_zk_conf_version="VERSION_"$(date '+%Y-%m-%d%H:%M:%S')
        
        echo "set $ZK_ZNODE_VERSION $var_zk_conf_version" > /tmp/tmpzksh.info
        echo "quit" >> /tmp/tmpzksh.info
        $ZK_HOME/bin/zkCli.sh -server $ZK_CLIENT_GROUP < /tmp/tmpzksh.info
        
        pgxl_log "Postgres-XL push conf to zookeeper cluster. version: "$var_zk_conf_version
    fi
}

pgxl_keepalive_all(){
    #gtm master
    pgxl_keepalive_member "gtm master"
    if [ $? -eq 1 ]; then
        pgxl_keepalive_member "gtm slave"
        if [ $? -ne 1 ]; then
            pgxl_failover_gtm
            if [ $? -ne 1 ]; then
                pgxl_conf_push
            fi
        fi
    fi
    
    #gtm slave
    pgxl_keepalive_member "gtm slave"
    
    #gtm proxy
    for target in $GTM_PROXY_NAMES; do
        pgxl_keepalive_member "gtm_proxy $target"
    done
    
    #coordinator master
    for target in $COORDINATOR_NAMES; do
        pgxl_keepalive_member "coordinator master $target"
    done
    
    #coordinator slave
    for target in $COORDINATOR_NAMES; do
        pgxl_keepalive_member "coordinator slave $target"
    done
    
    #datanode master
    for target in $DATANODE_NAMES; do
        pgxl_keepalive_member "datanode master $target"
        if [ $? -eq 1 ]; then
            pgxl_keepalive_member "datanode slave $target"
            if [ $? -ne 1 ]; then
                pgxl_failover_datanode $target
                if [ $? -ne 1 ]; then
                    pgxl_conf_push
                fi
            fi
        fi
    done
    
    #datanode slave
    for target in $DATANODE_NAMES; do
        pgxl_keepalive_member "datanode slave $target"
    done
}

pgxl_keepalive_member(){
    local target="$1"
    local status=$(timeout $TIMEOUT pgxc_ctl monitor $target -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE | tail -n1)
    if [[ $status == *Running* ]]; then
#        pgxl_log "Postgres-XL $target is running"
        :
    elif [[ $status == *"Not running"* ]]; then
        pgxl_log "Postgres-XL $status. try to restart it."
        pgxc_ctl start $target -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE
        
        status=$(timeout $TIMEOUT pgxc_ctl monitor $target -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE | tail -n1)
        if [[ $status == *Running* ]]; then
            pgxl_log "Postgres-XL $target restart success"
        else
            pgxl_log "Postgres-XL $target restart failed."
            return 1
        fi
    else
        pgxl_log "Postgres-XL $status."
        return 1
    fi
}

pgxl_failover_gtm(){
    pgxl_log "Postgres-XL try to failover gtm."
    pgxc_ctl failover gtm -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE
    pgxc_ctl reconnect gtm_proxy all -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE
    
    local status=$(timeout $TIMEOUT pgxc_ctl monitor gtm master -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE | tail -n1)
    if [[ $status == *Running* ]]; then
        pgxl_log "Postgres-XL gtm failover success."
    else
        pgxl_log "Postgres-XL gtm failover failed."
        return 1
    fi
}

pgxl_failover_datanode(){
    local target="$1"
    pgxl_log "Postgres-XL try to failover datanode $target."
    pgxc_ctl failover datanode $target -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE
    
    local status=$(timeout $TIMEOUT pgxc_ctl monitor datanode master $target -l $PGXC_LOG_DIR -L $PGXC_LOG_FILE | tail -n1)
    if [[ $status == *Running* ]]; then
        pgxl_log "Postgres-XL datanode $target failover success."
    else
        pgxl_log "Postgres-XL datanode $target failover failed."
        return 1
    fi
}

pgxl_log(){
    echo $(date '+%Y-%m-%d %H:%M:%S')" ""$1" >> $LOG_FILE
}

case "$1" in
    monitor)        pgxl_monitor_startup > /dev/null 2>&1 &
                    if [ $? -eq 1 ]; then
                        echo "start fail."
                    fi
                    exit 1;;
                
    shutdown)       pgxl_monitor_shutdown
                    exit $?;;
                    
    push)           pgxl_conf_push
                    exit $?;;
                    
    pull)           pgxl_conf_pull
                    exit $?;;
                
    *)
                    echo "error method"
                    exit $?;;
esac
Creative Commons License

本文基于署名-非商业性使用-相同方式共享 4.0许可协议发布,欢迎转载、使用、重新发布,但请保留文章署名wanghengbin(包含链接:https://wanghengbin.com),不得用于商业目的,基于本文修改后的作品请以相同的许可发布。

评论(4) “PGXL集群搭建和实现高可用

发表评论