Pacemaker和Corosync学习记录

前阵子在搭建Postgres-XL集群过程中,为了实现高可用首先尝试了官方推荐的Pacemaker配合Corosync方案,由于官方并未给出具体实践案例,所以只能摸索实现。首先还是要恶补基础知识~~

什么是Pacemaker

Pacemaker是可扩展的高可用集群资源管理器,它利用外部的集群基础构件(如OpenAIS 、heartbeat或corosync)提供的消息和成员管理能力,来探测并从节点或资源级别的故障中恢复,实现集群高可用。它是Heartbeat资源管理器V3版本后独立出来的项目,延续自CRM(Heartbeat V2管理器),但是不再耦合消息通信层(heartbeat)。操作Pacemaker的用户命令或接口主要有crmsh(源自V2)和pcs等。

什么是Corosync

Corosync是开放性集群引擎工程,是OpenAIS(标准集群框架应用程序接口规范)的具体实现。Corosync是高可用集群中的Cluster Messaging Layer(集群信息层),主要用来传递集群信息与心跳信息,在传递信息的时候通过配置文件来定义信息传递的方式和协议等,并没有资源管理功能。

Pacemaker和Corosync集群环境搭建

三台主机:
172.16.0.3
172.16.0.4
172.16.0.5
操作系统:CentOS7

  • 各节点之间主机名互相解析,时间同步(ntp,略),ssh互信。

修改/etc/hosts,增加:

172.16.0.3  node01
172.16.0.4  node02
172.16.0.5  node03

ssh互信:

ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''
ssh-copy-id -i .ssh/id_rsa.pub root@node1
ssh-copy-id -i .ssh/id_rsa.pub root@node2
ssh-copy-id -i .ssh/id_rsa.pub root@node3
  • 各节点安装Pacemaker和Corosync。
yum install -y pacemaker pcs psmisc policycoreutils-python
  • 各节点关闭防火墙和SELinux。
setenforce 0  # 临时关闭SELinux
sed -i.bak "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config  # 修改/etc/selinux/config
systemctl disable firewalld.service
systemctl stop firewalld.service
iptables --flush
  • 各节点启用pcs服务。
systemctl start pcsd.service
systemctl enable pcsd.service
  • 各节点创建自定义用户hacluster(略)。
  • 集群认证(在任意一个节点执行),这里选择node01。
pcs cluster auth -u hacluster -p hacluster 172.16.0.3 172.16.0.4 172.16.0.5
#以下输出
172.16.0.3: Authorized
172.16.0.4: Authorized
172.16.0.5: Authorized
  • 同步配置(单一节点执行),继续node01。
pcs cluster setup --last_man_standing=1 --name pgcluster 172.16.0.3 172.16.0.4 172.16.0.5
#以下输出
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
172.16.0.3: Succeeded
172.16.0.4: Succeeded
172.16.0.5: Succeeded
  • 启动集群(单一节点执行),继续node01。
pcs cluster start --all
#以下输出
172.16.0.3: Starting Cluster...
172.16.0.4: Starting Cluster...
172.16.0.5: Starting Cluster...
  • 查看状态。
pcs status corosync
#以下输出
Membership information
 Nodeid      Votes Name
        1          1 172.16.0.3 (local)
        2          1 172.16.0.4 
        3          1 172.16.0.5 
pcs status
#以下输出
Cluster name: pgcluster
WARNING: no stonith devices and stonith-enabled is not false
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Mon Oct 19 15:08:06 2015 Last change:
Stack: unknown
Current DC: NONE
0 nodes and 0 resources configured
Full list of resources:
PCSD Status:
   node01 (172.16.0.3): Online
   node02 (172.16.0.4): Online
   node03 (172.16.0.5): Online
Daemon Status:
   corosync: active/disabled
   pacemaker: active/disabled
   pcsd: active/disabled

自定义ocf资源

#!/bin/sh
#
# Description:  OCF resource template
#
# Authors:      
#
# Copyright:    
#
# License:      GNU General Public License (GPL)
#
###############################################################################
# Initialization:

if [ !$OCF_ROOT ]
then
OCF_ROOT=/usr/lib/ocf
fi

: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs


OCF_RESKEY_username_default="pgxl"
OCF_RESKEY_slaveoo_default="on"

: ${OCF_RESKEY_username=${OCF_RESKEY_username_default}}
: ${OCF_RESKEY_slaveoo=${OCF_RESKEY_slaveoo_default}}

pgxl_meta_data() {
    ocf_log info "Postgres-XL ------------------------------pgxl_meta_data."
    cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="pgxl">
<version>1.0</version>
<longdesc lang="en">
Resource script for Postgres-XL. It manages a Postgres-XL as an HA resource.
</longdesc>
<shortdesc lang="en">Manages a Postgres-XL database instance</shortdesc>

<parameters>
	<parameter name="username" required="0" unique="0">
        <longdesc lang="en">
        This is a required parameter. Username of service process.
        </longdesc>
        <shortdesc>username of service process</shortdesc>
        <content type="string" default="$OCF_RESKEY_username_default"/>
    </parameter>
    <parameter name="slaveoo" unique="0" required="0">
        <longdesc lang="en">
        Specify on if you configure one slave for gtm or node.
        </longdesc>
        <shortdesc lang="en">configure slave for gtm or node</shortdesc>
        <content type="string" default="${OCF_RESKEY_slaveoo_default}" />
    </parameter>
</parameters>

<actions>
    <action name="start" timeout="120" />
    <action name="stop" timeout="120" />
    <action name="status" timeout="60" />
    <action name="monitor" depth="0" timeout="30" interval="20" />
    <action name="promote" timeout="120" />
    <action name="demote" timeout="120" />
    <action name="notify" timeout="90" />
    <action name="validate-all" timeout="5" />
    <action name="meta-data" timeout="5" />
</actions>
</resource-agent>
END
    return $OCF_SUCCESS
}

pgxl_status(){
    ocf_log info "Postgres-XL ------------------------------pgxl_status."
    return $OCF_SUCCESS
}

pgxl_start(){
    ocf_log info "Postgres-XL ------------------------------pgxl_start."
    return $OCF_SUCCESS
}

pgxl_stop(){
    ocf_log info "Postgres-XL ------------------------------pgxl_stop."
    return $OCF_SUCCESS
}

pgxl_monitor(){
    ocf_log info "Postgres-XL ------------------------------pgxl_monitor $OCF_RESKEY_username."
    return $OCF_SUCCESS
}

pgxl_promote(){
    ocf_log info "Postgres-XL ------------------------------pgxl_promote."
    return $OCF_SUCCESS
}

pgxl_demote(){
    ocf_log info "Postgres-XL ------------------------------pgxl_demote."
    return $OCF_SUCCESS
}

pgxl_notify(){
    ocf_log info "Postgres-XL ------------------------------pgxl_notify."
    return $OCF_SUCCESS
}

pgxl_validate_all(){
    ocf_log info "Postgres-XL ------------------------------pgxl_validate_all."
    return $OCF_SUCCESS
}

# Usage
usage() 
{
    cat <<-!
usage: $0 action

action:
        monitor-gtm   monitor and keep alive gtm

        monitor-datanode    monitor and keep alive datanode.

        monitor-coordinator    monitor and keep alive coordinator.
!
}

# What kind of method was invoked?
case "$1" in
    status)     pgxl_status
                exit $?;;

    monitor)    pgxl_monitor
                exit $?;;

    start)      pgxl_start
                exit $?;;
                
    stop)       pgxl_stop
                exit $?;;

    promote)    pgxl_promote
                exit $?;;

    demote)     pgxl_demote
                exit $?;;

    notify)     pgxl_notify
                exit $?;;

    validate-all)       pgxl_validate_all
                        exit $?;;
    
    meta-data)       pgxl_meta_data
                     exit $?;;
    *)
                exit $OCF_ERR_UNIMPLEMENTED;;
esac
Creative Commons License

本文基于署名-非商业性使用-相同方式共享 4.0许可协议发布,欢迎转载、使用、重新发布,但请保留文章署名wanghengbin(包含链接:https://wanghengbin.com),不得用于商业目的,基于本文修改后的作品请以相同的许可发布。

评论(2) “Pacemaker和Corosync学习记录

发表评论