System introduction
Tencent blue whale smart cloud system is composed of platform level products and general SaaS services. The platform includes control platform, configuration platform, operation platform, data platform, container management platform, mining platform, PaaS platform, mobile platform, etc. General SaaS includes node management, standard operation and maintenance, log retrieval, blue whale monitoring, fault self-healing, etc. it provides various cloud services Users of (public cloud, private cloud and hybrid cloud) provide one-stop technology operation solutions with different scenarios and different needs.
Relying on the concepts of enterprise SOA and integration, Tencent blue whale intelligent cloud system has built a new operation and maintenance mode by using Docker and other most advanced cloud technologies, and is committed to landing DevOps in the way of "atomic service integration" and "low-cost tool construction", so as to help the operation and maintenance quickly realize "unattended basic services" and "value-added services" And further realize more comprehensive and sustainable efficiency improvement of the enterprise through the implementation of DevOps.
Architecture diagram
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-2lssunro-1631774209308)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c8667c1e45.jpg)]
Environmental preparation
System environment
The system version requires CentOS7 or above
IP address | host name | to configure | describe | System version |
---|---|---|---|---|
192.168.31.221 | cmdb_node1 | 8c/16G | Central control computer | CentOS 7.6.1810 |
192.168.31.223 | cmdb_node2 | 4c/8G | CentOS 7.6.1810 | |
192.168.31.224 | cmdb_node3 | 4c/8G | CentOS 7.6.1810 |
Environment preparation all nodes need to operate
Close SELinux
setenforce 0 echo "/usr/sbin/setenforce 0" >> /etc/rc.local sed -i 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux sed -i 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
Close NetworkManager
systemctl stop NetworkManager systemctl disable NetworkManager
Set DNS
cat > /etc/resolv.conf << EOF nameserver 127.0.0.1 nameserver 114.114.114.114 nameserver 8.8.8.8 EOF
Set hosts file
cat >> /etc/hosts << EOF 192.168.31.221 cmdb_node1 192.168.31.223 cmdb_node2 192.168.31.224 cmdb_node3 EOF
Turn off firewall
systemctl stop iptables systemctl disable iptables systemctl stop firewalld systemctl disable firewalld
Clock calibration
/sbin/ntpdate ntp2.aliyun.com;/sbin/hwclock -w echo "#Clock synchronization" >> /var/spool/cron/root echo "01 00 * * * /sbin/ntpdate ntp2.aliyun.com;/sbin/hwclock -w" >> /var/spool/cron/root
Adjust file open count
cat <<EOF > /etc/security/limits.d/99-nofile.conf root soft nofile 102400 root hard nofile 102400 EOF
Restart the machine
reboot #After restart, use the sestatus command to see the disable command before selinux is shut down sestatus #Check whether the number of open files has been modified ulimit -n
Install rsync
yum -y install rsync
Replace yum source
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.cloud.tencent.com/repo/centos7_base.repo wget -O /etc/yum.repos.d/epel.repo http://mirrors.cloud.tencent.com/repo/epel-7.repo yum clean all yum makecache
Download the blue whale community version
Download address: https://bk.tencent.com/download/
Select the full version to download, and then upload it to the central control computer 192.168.31.221
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-ev1bel4k-1631774209314)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c83bfb293f.png)]
Download certificate
Before downloading the certificate, you need to obtain the MAC address of the machine
Get the mac addresses of the three machines
cat /sys/class/net/eth0/address
Fill in the mac address, separate it with English semicolons, download it, and upload it to the central control computer after downloading
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-aoj4ivpm-1631774209317)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c83d77a289.png)]
Central computer operation
Unzip the downloaded blue whale source file to the / data directory
mkdir /data tar xf bkce_src-4.1.16.tgz -C /data/ After decompression, Get two directories: src, install src: Store blue whale product software, And dependent open source components install: Store installation and deployment scripts, parameter configuration during installation, daily operation and maintenance scripts, etc
install.config
install.config is the configuration file of the corresponding relationship between the module and the server, which describes which modules are installed on which machines.
Each row has two columns. The first column is the IP address; the second column is the module name separated by English commas.
For details, refer to the install.config.3IP.sample file (you can copy install.config.3IP.sample to install.config).
cp -rf /data/install/install.config.3IP.sample /data/install/install.config #IP address before modifying this file cat /data/install/install.config 192.168.31.221 nginx,appt,rabbitmq,kafka,zk,es,bkdata,consul,fta 192.168.31.223 mongodb,appo,kafka,zk,es,mysql,beanstalk,consul 192.168.31.224 paas,cmdb,job,gse,license,kafka,zk,es,redis,consul,influxdb
explain:
The configuration file is separated from the service name by a space after the ip. For machines with multiple intranet IPS, the first intranet ip in the / sbin/ifconfig output is used by default, and the list of services to be installed on the machine is written after the ip. The standard private address is used by default during deployment. If the enterprise environment uses a non-standard private address, please refer to the processor of non-standard intranet ip processing Law.
zk stands for zookeeper and es stands for elasticsearch
gse and redis need to be deployed on the same machine
If gse needs cross cloud support, the machine where gse is located must have an external network IP
When increasing the number of machines, you can move the services in the above configuration to new machines to share the load. Ensure that the total number of each component of kafka, es and zk is 3
globals.env
This file defines the account password information, function switch control options, etc. of various components. We now specify it in this file. During the following installation, the script will read the contents of this file.
vim /data/install/globals.env 26 # Domain name information 27 export BK_DOMAIN="abcops.com" # Blue whale root domain name (excluding host name). This domain name may not exist. Later, we will use the hosts file to resolve it temporarily 28 export PAAS_FQDN="paas.$BK_DOMAIN" # PAAS full domain name 29 export CMDB_FQDN="cmdb.$BK_DOMAIN" # CMDB full domain name 30 export JOB_FQDN="job.$BK_DOMAIN" # JOB full domain name 31 export APPO_FQDN="o.$BK_DOMAIN" # Full domain name of official environment 32 export APPT_FQDN="t.$BK_DOMAIN" # Test environment full domain name 33 34 # HAS_DNS_SERVER option, domain name resolution through DNS server or by configuring hosts 35 # When configuring the mapping relationship through hosts, the default value is 0, indicating that you do not have your own DNS server 36 # At this time, the mapping relationship of paas,cmdb,job and other platforms will be added to the / etc/hosts file on all machines 37 export HAS_DNS_SERVER=0 38 39 # DB information 40 export MYSQL_USER="root" # mysql user name 41 export MYSQL_PASS='123456' # Specify mysql password 42 export REDIS_PASS='123456' # redis password 43 export MONGODB_USER="root" # mongodb user name 44 export MONGODB_PASS='123456' # mongodb password 45 46 # Account information (suggested modification) 47 export MQ_USER=admin 48 export MQ_PASS='123456' # MQ password 49 export ZK_USER=bkzk 50 export ZK_PASS='123456' # zookeeper password 51 52 export PAAS_ADMIN_USER=admin 53 export PAAS_ADMIN_PASS='123456' #Login paas platform password
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hfirqh53-1631774209320)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c86820cdb3.jpg)]
Configure password free login
Configure password free login on the central control computer (at this time, root must be able to log in to the system, and root login cannot be disabled)
cd /data/install bash configure_ssh_without_pass Generating public/private rsa key pair. Created directory '/root/.ssh'. Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:uCDzX7a3zHr7uugh5ylZpH69ce/Qa+JXCuEJCjgoDM8 root@cmdb_node1 The key's randomart image is: +---[RSA 2048]----+ | | |. | |oo . . | |..E o ... . . | | .o ...+S. o o | | + ...o +. .| | ..oo=.. o..o | | .+*.B+o.o+. | | .+B+OOo=+ | +----[SHA256]-----+ Warning: Permanently added '192.168.31.221' (ECDSA) to the list of known hosts. Warning: Permanently added '192.168.31.223' (ECDSA) to the list of known hosts. root@192.168.31.223's password: #Enter the password for 192.168.31.223 Warning: Permanently added '192.168.31.224' (ECDSA) to the list of known hosts. root@192.168.31.224's password: #Enter the password for 192.168.31.224
Import certificate SSL
tar xf /usr/local/src/ssl_certificates.tar.gz -C /data/src/cert/ ls /data/src/cert/ gse_agent.crt gse_api_client.key gse_esb_api_client.key gse_server.key job_esb_api_client.key license_prv.key platform.key gse_agent.key gseca.crt gse_job_api_client.p12 job_ca.crt job_server.p12 md5.txt gse_api_client.crt gse_esb_api_client.crt gse_server.crt job_esb_api_client.crt license_cert.cert platform.cert
Check whether the environment meets the requirements before installation
Before installation, verify whether the environment meets the requirements. After configuring the environment and deployment according to the requirements of the document, run the following script to verify whether it meets the requirements:
cd /data/install bash precheck.sh #If the current environment returns the following, it means that the environment is completely ok. If one item is not ok, check the problem or check the above steps <<check_ssh_nopass>> has been checked successfully... SKIP <<check_password>> has been checked successfully... SKIP start <<check_cert_mac>> ... [OK] start <<check_selinux>> ... [OK] start <<check_umask>> ... [OK] start <<check_get_lan_ip>> ... [OK] start <<check_rabbitmq_version>> ... Repository cr is listed more than once in the configuration Repository fasttrack is listed more than once in the configuration [OK] start <<check_http_proxy>> ... [OK] start <<check_open_files_limit>> ... [OK] start <<check_domain>> ... [OK] start <<check_rsync>> ... [OK] start <<check_networkmanager>> ... [OK] start <<check_firewalld>> ... [OK]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-hg2t4y3q-1631774209324)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c869ac9e2a.jpg)]
Deploy blue whale
Perform the following operations in sequence to complete the installation of blue whale foundation platform
If there is an error / failure in the following steps, you need to repair the error according to the prompt and re execute the same command (breakpoint continued installation).
If there are errors in each step, you need to fix the errors and ensure that the installation is successful before you can continue. Because the order of installing the blue whale platform is dependent. If the previous platform is not successful, you will encounter more errors if you continue to install.
Please refer to the related commands required to repair errors Maintain documents
Deploy paas platform
Central computer operation
cd /data/install/ ./bk_install paas where do you want to install blueking products. enter a absolute path [/data/bkce]: #Press enter here to confirm the installation path of paas
If you agree to the blue whale agreement, enter "yes"
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-eig9muu1-1631774209326)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c86ae3d8f3.jpg)]
After the paas platform is installed, print it as follows. We can add hosts parsing file on our own computer to access it
Add the hosts parsing file. This is a mac, so you can add it directly with the command. If you win, open the hosts file and add it like this
tail -1 /etc/hosts 192.168.31.221 paas.abcops.com
On the login page, the account password is "admin/123456" specified in the global.env configuration file just now
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-gqvw56s7-1631774209327)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c8700c9bcc.jpg)]
After the deployment is completed, the three machines log in again and reload the current environment variables because the host name has been changed
#cmdb_node1 is changed to [root@nginx-1 ~]# hostname nginx-1 #cmdb_node2 is changed to [root@mongodb-1 ~]# hostname mongodb-1 #cmdb_node3 is changed to [root@paas-1 ~]# hostname paas-1
Deploy cmdb platform
Central computer operation
[root@nginx-1 ~]# cd /data/install/ [root@nginx-1 install]# ./bk_install cmdb [192.168.31.224] server cmdb_adminserver RUNNING pid 32168, uptime 0:14:53 [192.168.31.224] server cmdb_apiserver RUNNING pid 32160, uptime 0:14:53 [192.168.31.224] server cmdb_auditcontoller RUNNING pid 32149, uptime 0:14:53 [192.168.31.224] server cmdb_datacollection RUNNING pid 32163, uptime 0:14:53 [192.168.31.224] server cmdb_eventserver RUNNING pid 32161, uptime 0:14:53 [192.168.31.224] server cmdb_hostcontroller RUNNING pid 32143, uptime 0:14:53 [192.168.31.224] server cmdb_hostserver RUNNING pid 32144, uptime 0:14:53 [192.168.31.224] server cmdb_objectcontroller RUNNING pid 32146, uptime 0:14:53 [192.168.31.224] server cmdb_proccontroller RUNNING pid 32170, uptime 0:14:53 [192.168.31.224] server cmdb_procserver RUNNING pid 32148, uptime 0:14:53 [192.168.31.224] server cmdb_toposerver RUNNING pid 32145, uptime 0:14:53 [192.168.31.224] server cmdb_webserver RUNNING pid 32147, uptime 0:14:53 If no error is reported in the above steps, You can pass now http://cmdb.abcops.com:80 visit the configuration platform,
After installation, return to the following
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-a9tfxtoj-1631774209330)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c87319cedc.jpg)]
Add the hosts parsing file. This is a mac, so you can add it directly with the command. If you win, open the hosts file and add it like this
tail -2 /etc/hosts 192.168.31.221 paas.abcops.com 192.168.31.221 cmdb.abcops.com
Login page - click Configure platform
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG mprrqxsc-1631774209332)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c8747d8d20.jpg)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-7j0ig8gb-1631774209333)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c8757754c4.jpg)]
Deploy job platform
Central computer operation
[root@nginx-1 install]# cd /data/install/ [root@nginx-1 install]# ./bk_install job
Screenshot of successful job installation
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-vbb490sf-1631774209336)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c876c8e48f.jpg)]
Add hosts parse file
tail -3 /etc/hosts 192.168.31.221 paas.abcops.com 192.168.31.221 cmdb.abcops.com 192.168.31.221 job.abcops.com
Click the operation platform after logging in the page
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-cfrim616-1631774209340)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c87814d4ff.jpg)]
Deploy app_mgr platform
[root@nginx-1 install]# cd /data/install/ [root@nginx-1 install]# ./bk_install app_mgr
After this step is completed, you can see the successfully activated server in the server information and third-party service information of the developer center
At the same time, saas applications (except blue whale monitoring and log retrieval) can also be uploaded and deployed
Screenshot of installation completion:
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-fua8x9zf-1631774209342)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c87aa40ff9.jpg)]
If there are no errors in the above steps, you can now complete the deployment of the formal environment and the test environment. You can:
- Adopted. / bk_install saas-o bk_nodeman deploys the node management app, or
- Deploy app s through the developer center
To install blue whale monitoring and log retrieval, you need to go through. / BK first_ Install bkdata
Start deploying saas-o bk_nodeman node management app
[root@nginx-1 install]# ./bk_install saas-o bk_nodeman
Deploy bkdata
Install the basic module of blue whale data platform and its dependent services
[root@nginx-1 install]# ./bk_install bkdata If no error is reported in the above steps, You can finish it now bkdata Deployment of,sure: 1. adopt./bk_install saas-o bk_monitor Deploy blue whale monitoring app, or 2. Deploy blue whale monitoring through the developer center app
Install blue whale monitoring app
./bk_install saas-o bk_monitor deploy blue whale monitoring app
[root@nginx-1 install]# ./bk_install saas-o bk_monitor
Deploy fta background
Install self-healing background service
[root@nginx-1 install]# ./bk_install fta If no error is reported in the above steps, You can now complete the deployment of the fault self-healing background,sure: 1. adopt./bk_install saas-o bk_fta Deploy fault self-healing app, or 2. Deploy fault self-healing through the developer center app
Install fault self-healing app
Note that the command printed by the script is ". / bk_install saas-o bk_fta". In fact, this installation package is incomplete. The installation package is stored in / data / SRC / official_ Under SaaS directory
[root@nginx-1 install]# ls /data/src/official_saas/ bk_fta_solutions_V4.1.15.tar.gz bk_log_search_V1.1.24.tar.gz bk_monitor_V1.4.73.tar.gz bk_nodeman_V1.0.80.tar.gz bk_sops_V3.1.32-ce.tar.gz
The following command should be executed
[root@nginx-1 install]# ./bk_install saas-o bk_fta_solutions
Deploy gse_agent
Reinstall gse_agent and register the correct cluster module to the configuration platform
[root@nginx-1 install]# ./bkcec install gse_agent
Deploy saas
Deploy official SaaS to the official environment (automatically deploy SaaS from / data/src/official_saas / directory through the command line)
[root@nginx-1 install]# ./bkcec install saas-o
At this time, there are 7 modules to refresh the platform
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-1frbiyg9-1631774209344)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c885ab3754.jpg)]
Installation of third party cooperation components
Install network management platform
Download address: https://bk.tencent.com/download_sdk/
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG acvrlxom-1631774209346)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c88876786f.jpg)]
After downloading, upload it to the central control computer
#Decompression compression [root@nginx-1 src]# tar xf bknetwork.tgz -C /data/src/ [root@nginx-1 src]# tar xf /data/src/bknetwork/bknetwork-3.6.1.tgz -C /data/src/ #Synchronize package content [root@nginx-1 src]# rsync -a /data/src/bknetwork/install/ /data/install/
What is saved in this file is the complete domain name of the blue whale network management platform, which can be modified according to the actual situation or not
[root@nginx-1 src]# cat /data/install/third/globals_bknetwork.env # vim:ft=sh # Domain name information (blue whale partner application) export BKNETWORK_FQDN="bknetwork.$BK_DOMAIN" # Complete domain name of blue whale network management platform
Deploy network management
[root@nginx-1 src]# cd /data/install/ [root@nginx-1 install]# ./bkco_install bknetwork [192.168.31.221]20190926-144533 43 please add 'bknetwork' to 'install.config' #The error message says that bknetwork should be synchronized to install.config [root@nginx-1 install]# cat install.config 192.168.31.221 nginx,appt,rabbitmq,kafka,zk,es,bkdata,consul,fta 192.168.31.223 mongodb,appo,kafka,zk,es,mysql,beanstalk,consul,bknetwork #I added it here to deploy bknetwork to the 31.223 machine 192.168.31.224 paas,cmdb,job,gse,license,kafka,zk,es,redis,consul,influxdb #Install again [root@nginx-1 install]# ./bkco_install bknetwork
After successful installation, add the hosts parsing file, and add the last one for the three cmdb devices and their own devices
192.168.31.221 bknetwork.abcops.com
[root@nginx-1 nginx]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.31.221 cmdb_node1 192.168.31.223 cmdb_node2 192.168.31.224 cmdb_node3 192.168.31.221 nginx-1 192.168.31.221 paas.abcops.com 192.168.31.221 cmdb.abcops.com 192.168.31.221 job.abcops.com 192.168.31.221 rbtnode1 192.168.31.221 bknetwork.abcops.com [root@mongodb-1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.31.221 cmdb_node1 192.168.31.223 cmdb_node2 192.168.31.224 cmdb_node3 192.168.31.223 mongodb-1 192.168.31.221 paas.abcops.com 192.168.31.221 cmdb.abcops.com 192.168.31.221 job.abcops.com 192.168.31.221 bknetwork.abcops.com [root@paas-1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.31.217 cmdb_node1 192.168.31.221 cmdb_node1 192.168.31.223 cmdb_node2 192.168.31.224 cmdb_node3 192.168.31.224 paas-1 192.168.31.221 paas.abcops.com 192.168.31.221 cmdb.abcops.com 192.168.31.221 job.abcops.com 192.168.31.221 bknetwork.abcops.com
Then use your own device to access the parsing. The following is the parsing page
[external chain picture transfer failed. The source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-xhwwccvy-1631774209349)( http://172.26.3.89:4999/server/ …/Public/Uploads/2019-09-26/5d8c89ebd41bf.jpg)]