网络知识 娱乐 使用py-mysql2pgsql同步Mysql数据至Greenplum

使用py-mysql2pgsql同步Mysql数据至Greenplum

说明

本文延续上一篇文章 云数据库MySQL导入云数据仓库PostgreSQL最佳实践,继续介绍云数据库MySQL导入云数据仓库PostgreSQL的使用问题。其中描述的问题及解决方法同样适用于 腾讯云 云数据仓库 PostgreSQL(CDWPG)

背景

在上一个实验过程中我们发现,DTS数据同步硬性要求:"schema 和 table 必须提前创建好,如果没有创建好,则会报错"。

但在实际数据同步的场景中,大多的需求是迁移。这种情况下,如果表的数量很多的话,那数据同步的成本是非常大的,因为目标端需要提前构建出全部的表结构。这个时候我们可以视情况选择使用数据同步开源工具 (py-mysql2pgsql)

安装依赖

在安装py-mysql2pgsql之前,需要安装一些程序的依赖,否则会遇到各种问题。

[root@VM-5-48-centos ~]# yum install python-pip python-devel mysql-devel postgresql-devel -y
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
Package python2-pip-8.1.2-14.el7.noarch already installed and latest version
Resolving Dependencies
--> Running transaction check
---> Package mariadb-devel.x86_64 1:5.5.68-1.el7 will be installed
--> Processing Dependency: openssl-devel(x86-64) for package: 1:mariadb-devel-5.5.68-1.el7.x86_64
---> Package postgresql-devel.x86_64 0:9.2.24-4.el7_8 will be installed
--> Processing Dependency: postgresql-libs(x86-64) = 9.2.24-4.el7_8 for package: postgresql-devel-9.2.24-4.el7_8.x86_64
--> Processing Dependency: postgresql(x86-64) = 9.2.24-4.el7_8 for package: postgresql-devel-9.2.24-4.el7_8.x86_64
--> Processing Dependency: libpq.so.5()(64bit) for package: postgresql-devel-9.2.24-4.el7_8.x86_64
--> Processing Dependency: libpgtypes.so.3()(64bit) for package: postgresql-devel-9.2.24-4.el7_8.x86_64
--> Processing Dependency: libecpg_compat.so.3()(64bit) for package: postgresql-devel-9.2.24-4.el7_8.x86_64
--> Processing Dependency: libecpg.so.6()(64bit) for package: postgresql-devel-9.2.24-4.el7_8.x86_64
---> Package python-devel.x86_64 0:2.7.5-90.el7 will be installed
--> Processing Dependency: python(x86-64) = 2.7.5-90.el7 for package: python-devel-2.7.5-90.el7.x86_64
--> Processing Dependency: python2-rpm-macros > 3-30 for package: python-devel-2.7.5-90.el7.x86_64
--> Processing Dependency: python-rpm-macros > 3-30 for package: python-devel-2.7.5-90.el7.x86_64
--> Running transaction check
---> Package openssl-devel.x86_64 1:1.0.2k-21.el7_9 will be installed
--> Processing Dependency: openssl-libs(x86-64) = 1:1.0.2k-21.el7_9 for package: 1:openssl-devel-1.0.2k-21.el7_9.x86_64
--> Processing Dependency: zlib-devel(x86-64) for package: 1:openssl-devel-1.0.2k-21.el7_9.x86_64
--> Processing Dependency: krb5-devel(x86-64) for package: 1:openssl-devel-1.0.2k-21.el7_9.x86_64
---> Package postgresql.x86_64 0:9.2.24-4.el7_8 will be installed
---> Package postgresql-libs.x86_64 0:9.2.24-4.el7_8 will be installed
---> Package python.x86_64 0:2.7.5-88.el7 will be updated
---> Package python.x86_64 0:2.7.5-90.el7 will be an update
--> Processing Dependency: python-libs(x86-64) = 2.7.5-90.el7 for package: python-2.7.5-90.el7.x86_64
---> Package python-rpm-macros.noarch 0:3-34.el7 will be installed
--> Processing Dependency: python-srpm-macros for package: python-rpm-macros-3-34.el7.noarch
---> Package python2-rpm-macros.noarch 0:3-34.el7 will be installed
--> Running transaction check
---> Package krb5-devel.x86_64 0:1.15.1-50.el7 will be installed
--> Processing Dependency: libkadm5(x86-64) = 1.15.1-50.el7 for package: krb5-devel-1.15.1-50.el7.x86_64
--> Processing Dependency: krb5-libs(x86-64) = 1.15.1-50.el7 for package: krb5-devel-1.15.1-50.el7.x86_64
--> Processing Dependency: libverto-devel for package: krb5-devel-1.15.1-50.el7.x86_64
--> Processing Dependency: libselinux-devel for package: krb5-devel-1.15.1-50.el7.x86_64
--> Processing Dependency: libcom_err-devel for package: krb5-devel-1.15.1-50.el7.x86_64
--> Processing Dependency: keyutils-libs-devel for package: krb5-devel-1.15.1-50.el7.x86_64
---> Package openssl-libs.x86_64 1:1.0.2k-19.el7 will be updated
--> Processing Dependency: openssl-libs(x86-64) = 1:1.0.2k-19.el7 for package: 1:openssl-1.0.2k-19.el7.x86_64
---> Package openssl-libs.x86_64 1:1.0.2k-21.el7_9 will be an update
---> Package python-libs.x86_64 0:2.7.5-88.el7 will be updated
---> Package python-libs.x86_64 0:2.7.5-90.el7 will be an update
---> Package python-srpm-macros.noarch 0:3-34.el7 will be installed
---> Package zlib-devel.x86_64 0:1.2.7-19.el7_9 will be installed
--> Processing Dependency: zlib = 1.2.7-19.el7_9 for package: zlib-devel-1.2.7-19.el7_9.x86_64
--> Running transaction check
---> Package keyutils-libs-devel.x86_64 0:1.5.8-3.el7 will be installed
---> Package krb5-libs.x86_64 0:1.15.1-46.el7 will be updated
---> Package krb5-libs.x86_64 0:1.15.1-50.el7 will be an update
---> Package libcom_err-devel.x86_64 0:1.42.9-19.el7 will be installed
--> Processing Dependency: libcom_err(x86-64) = 1.42.9-19.el7 for package: libcom_err-devel-1.42.9-19.el7.x86_64
---> Package libkadm5.x86_64 0:1.15.1-50.el7 will be installed
---> Package libselinux-devel.x86_64 0:2.5-15.el7 will be installed
--> Processing Dependency: libsepol-devel(x86-64) >= 2.5-10 for package: libselinux-devel-2.5-15.el7.x86_64
--> Processing Dependency: pkgconfig(libsepol) for package: libselinux-devel-2.5-15.el7.x86_64
--> Processing Dependency: pkgconfig(libpcre) for package: libselinux-devel-2.5-15.el7.x86_64
---> Package libverto-devel.x86_64 0:0.2.5-4.el7 will be installed
---> Package openssl.x86_64 1:1.0.2k-19.el7 will be updated
---> Package openssl.x86_64 1:1.0.2k-21.el7_9 will be an update
---> Package zlib.x86_64 0:1.2.7-18.el7 will be updated
---> Package zlib.x86_64 0:1.2.7-19.el7_9 will be an update
--> Running transaction check
---> Package libcom_err.x86_64 0:1.42.9-17.el7 will be updated
--> Processing Dependency: libcom_err(x86-64) = 1.42.9-17.el7 for package: e2fsprogs-libs-1.42.9-17.el7.x86_64
--> Processing Dependency: libcom_err(x86-64) = 1.42.9-17.el7 for package: e2fsprogs-1.42.9-17.el7.x86_64
--> Processing Dependency: libcom_err(x86-64) = 1.42.9-17.el7 for package: libss-1.42.9-17.el7.x86_64
---> Package libcom_err.x86_64 0:1.42.9-19.el7 will be an update
---> Package libsepol-devel.x86_64 0:2.5-10.el7 will be installed
---> Package pcre-devel.x86_64 0:8.32-17.el7 will be installed
--> Running transaction check
---> Package e2fsprogs.x86_64 0:1.42.9-17.el7 will be updated
---> Package e2fsprogs.x86_64 0:1.42.9-19.el7 will be an update
---> Package e2fsprogs-libs.x86_64 0:1.42.9-17.el7 will be updated
---> Package e2fsprogs-libs.x86_64 0:1.42.9-19.el7 will be an update
---> Package libss.x86_64 0:1.42.9-17.el7 will be updated
---> Package libss.x86_64 0:1.42.9-19.el7 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

===============================================================================================================================================================================================
 Package                                             Arch                                   Version                                              Repository                               Size
===============================================================================================================================================================================================
Installing:
 mariadb-devel                                       x86_64                                 1:5.5.68-1.el7                                       os                                      757 k
 postgresql-devel                                    x86_64                                 9.2.24-4.el7_8                                       os                                      952 k
 python-devel                                        x86_64                                 2.7.5-90.el7                                         updates                                 399 k
Installing for dependencies:
 keyutils-libs-devel                                 x86_64                                 1.5.8-3.el7                                          os                                       37 k
 krb5-devel                                          x86_64                                 1.15.1-50.el7                                        os                                      273 k
 libcom_err-devel                                    x86_64                                 1.42.9-19.el7                                        os                                       32 k
 libkadm5                                            x86_64                                 1.15.1-50.el7                                        os                                      179 k
 libselinux-devel                                    x86_64                                 2.5-15.el7                                           os                                      187 k
 libsepol-devel                                      x86_64                                 2.5-10.el7                                           os                                       77 k
 libverto-devel                                      x86_64                                 0.2.5-4.el7                                          os                                       12 k
 openssl-devel                                       x86_64                                 1:1.0.2k-21.el7_9                                    updates                                 1.5 M
 pcre-devel                                          x86_64                                 8.32-17.el7                                          os                                      480 k
 postgresql                                          x86_64                                 9.2.24-4.el7_8                                       os                                      3.0 M
 postgresql-libs                                     x86_64                                 9.2.24-4.el7_8                                       os                                      234 k
 python-rpm-macros                                   noarch                                 3-34.el7                                             os                                      9.1 k
 python-srpm-macros                                  noarch                                 3-34.el7                                             os                                      8.8 k
 python2-rpm-macros                                  noarch                                 3-34.el7                                             os                                      8.1 k
 zlib-devel                                          x86_64                                 1.2.7-19.el7_9                                       updates                                  50 k
Updating for dependencies:
 e2fsprogs                                           x86_64                                 1.42.9-19.el7                                        os                                      701 k
 e2fsprogs-libs                                      x86_64                                 1.42.9-19.el7                                        os                                      168 k
 krb5-libs                                           x86_64                                 1.15.1-50.el7                                        os                                      809 k
 libcom_err                                          x86_64                                 1.42.9-19.el7                                        os                                       42 k
 libss                                               x86_64                                 1.42.9-19.el7                                        os                                       47 k
 openssl                                             x86_64                                 1:1.0.2k-21.el7_9                                    updates                                 493 k
 openssl-libs                                        x86_64                                 1:1.0.2k-21.el7_9                                    updates                                 1.2 M
 python                                              x86_64                                 2.7.5-90.el7                                         updates                                  96 k
 python-libs                                         x86_64                                 2.7.5-90.el7                                         updates                                 5.6 M
 zlib                                                x86_64                                 1.2.7-19.el7_9                                       updates                                  90 k

Installed:
  mariadb-devel.x86_64 1:5.5.68-1.el7                           postgresql-devel.x86_64 0:9.2.24-4.el7_8                           python-devel.x86_64 0:2.7.5-90.el7                          

Dependency Installed:
  keyutils-libs-devel.x86_64 0:1.5.8-3.el7         krb5-devel.x86_64 0:1.15.1-50.el7            libcom_err-devel.x86_64 0:1.42.9-19.el7         libkadm5.x86_64 0:1.15.1-50.el7               
  libselinux-devel.x86_64 0:2.5-15.el7             libsepol-devel.x86_64 0:2.5-10.el7           libverto-devel.x86_64 0:0.2.5-4.el7             openssl-devel.x86_64 1:1.0.2k-21.el7_9        
  pcre-devel.x86_64 0:8.32-17.el7                  postgresql.x86_64 0:9.2.24-4.el7_8           postgresql-libs.x86_64 0:9.2.24-4.el7_8         python-rpm-macros.noarch 0:3-34.el7           
  python-srpm-macros.noarch 0:3-34.el7             python2-rpm-macros.noarch 0:3-34.el7         zlib-devel.x86_64 0:1.2.7-19.el7_9             

Dependency Updated:
  e2fsprogs.x86_64 0:1.42.9-19.el7     e2fsprogs-libs.x86_64 0:1.42.9-19.el7     krb5-libs.x86_64 0:1.15.1-50.el7     libcom_err.x86_64 0:1.42.9-19.el7     libss.x86_64 0:1.42.9-19.el7    
  openssl.x86_64 1:1.0.2k-21.el7_9     openssl-libs.x86_64 1:1.0.2k-21.el7_9     python.x86_64 0:2.7.5-90.el7         python-libs.x86_64 0:2.7.5-90.el7     zlib.x86_64 0:1.2.7-19.el7_9    

Complete!

安装py-mysql2pgsql

这里直接使用pip进行安装即可。

[root@VM-5-48-centos ~]# pip install py-mysql2pgsql
Collecting py-mysql2pgsql
  Downloading http://mirrors.tencentyun.com/pypi/packages/28/49/c82e4313c63b26224950bef47ab19f52a921dd4c279abf31aa3ee9d9cbbe/py-mysql2pgsql-0.1.5.tar.gz (109kB)
    100% |████████████████████████████████| 112kB 807kB/s 
Collecting mysql-python>=1.2.3 (from py-mysql2pgsql)
  Downloading http://mirrors.tencentyun.com/pypi/packages/a5/e9/51b544da85a36a68debe7a7091f068d802fc515a3a202652828c73453cad/MySQL-python-1.2.5.zip (108kB)
    100% |████████████████████████████████| 112kB 1.6MB/s 
Collecting psycopg2>=2.4.2 (from py-mysql2pgsql)
  Downloading http://mirrors.tencentyun.com/pypi/packages/fd/ae/98cb7a0cbb1d748ee547b058b14604bd0e9bf285a8e0cc5d148f8a8a952e/psycopg2-2.8.6.tar.gz (383kB)
    100% |████████████████████████████████| 389kB 3.2MB/s 
Requirement already satisfied (use --upgrade to upgrade): pyyaml>=3.10.0 in /usr/lib64/python2.7/site-packages (from py-mysql2pgsql)
Collecting argparse (from py-mysql2pgsql)
  Downloading http://mirrors.tencentyun.com/pypi/packages/f2/94/3af39d34be01a24a6e65433d19e107099374224905f1e0cc6bbe1fd22a2f/argparse-1.4.0-py2.py3-none-any.whl
Collecting termcolor>=1.1.0 (from py-mysql2pgsql)
  Downloading http://mirrors.tencentyun.com/pypi/packages/8a/48/a76be51647d0eb9f10e2a4511bf3ffb8cc1e6b14e9e4fab46173aa79f981/termcolor-1.1.0.tar.gz
Installing collected packages: mysql-python, psycopg2, argparse, termcolor, py-mysql2pgsql
  Running setup.py install for mysql-python ... done
  Running setup.py install for psycopg2 ... done
  Running setup.py install for termcolor ... done
  Running setup.py install for py-mysql2pgsql ... done
Successfully installed argparse-1.4.0 mysql-python-1.2.5 psycopg2-2.8.6 py-mysql2pgsql-0.1.5 termcolor-1.1.0

配置任务

生成配置文件模板

在全新第一次运行时,会创建一个demo模板mysql2pgsql.yml,然后在模板里按需做一下修改即可,很简单。

[root@VM-5-48-centos ~]# py-mysql2pgsql
No configuration file found.
A new file has been initialized at: mysql2pgsql.yml
Please review the configuration and retry...

修改配置文件

这里贴一下我的配置文件,需要注意的是socket要注释掉。

[root@VM-5-48-centos ~]# egrep -v "^#" mysql2pgsql.yml 

mysql:
 hostname: 10.0.5.25
 port: 3306
# socket: /tmp/mysql.sock
 username: dts_user
 password: dts_admin
 database: dts_demo
 compress: false
destination:
 # if file is given, output goes to file, else postgres
 file: 
 postgres:
  hostname: 10.0.5.18
  port: 5436
  username: dts_user
  password: dts_admin
  database: dts_demo

supress_data: false

supress_ddl: false

force_truncate: false

数据同步

同步数据命令很简单,-v是打印详细过程,-f是指定配置文件。time是Linux系统命令,可以获取后面指令的耗时情况,如果没有记录耗时的需求,可以将其省略。

[root@VM-5-48-centos ~]# time py-mysql2pgsql -v -f mysql2pgsql.yml
>>>>>>>>>> STARTING <<<<<<<<<<


START CREATING TABLES
  START  - CREATING TABLE user_info
  FINISH - CREATING TABLE user_info
DONE CREATING TABLES
START WRITING TABLE DATA
  START  - WRITING DATA TO user_info
  25073.34 rows/sec [1000000]   
  FINISH - WRITING DATA TO user_info
DONE WRITING TABLE DATA
START CREATING INDEXES AND CONSTRAINTS
  START  - ADDING INDEXES TO user_info
  FINISH - ADDING INDEXES TO user_info
  START  - ADDING CONSTRAINTS ON user_info
  FINISH - ADDING CONSTRAINTS ON user_info
DONE CREATING INDEXES AND CONSTRAINTS


>>>>>>>>>> FINISHED <<<<<<<<<<

real	0m42.928s
user	0m39.899s
sys	0m0.451s

100万条数据,同步花了43秒,速度还是挺快的。

[root@VM-5-48-centos ~]# psql -d dts_demo -U dts_user -h 10.0.5.18 -p 5436 
Password for user dts_user: 
psql (9.5.25, server 9.4.24)
Type "help" for help.

dts_demo=> d
                 List of relations
  Schema  |       Name       |   Type   |  Owner   
----------+------------------+----------+----------
 dts_demo | user_info        | table    | dts_user
 dts_demo | user_info_id_seq | sequence | dts_user
(2 rows)

dts_demo=> SELECT COUNT(1) FROM user_info;
  count  
---------
 1000000
(1 row)

dts_demo=> SELECT * FROM user_info LIMIT 20;
 id |              c_user_id               |        c_name        | c_province_id | c_city_id |     create_time     
----+--------------------------------------+----------------------+---------------+-----------+---------------------
  2 | 1afd300e-88bc-11eb-9c30-0c42a125994e | Nj27hTrqAwIQUPiO0qXo |           727 |        95 | 2028-03-19 22:05:05
  3 | 1afd4041-88bc-11eb-9c30-0c42a125994e | J9rzo41MCC2dM5Whp4Zy |           482 |        22 | 2026-03-19 22:05:05
  4 | 1afd4562-88bc-11eb-9c30-0c42a125994e | RX3eSuFHkqXmNJ8hSoas |           517 |        67 | 2023-03-19 22:05:05
  6 | 1afd4ebd-88bc-11eb-9c30-0c42a125994e | ydfrgRm1VlPX8FLFSeo5 |           968 |         3 | 2027-03-19 22:05:05
  7 | 1afd530c-88bc-11eb-9c30-0c42a125994e | rsMpwgyPk0TiBXO2AFr3 |           585 |        25 | 2027-03-19 22:05:05
  8 | 1afd574a-88bc-11eb-9c30-0c42a125994e | H5aqu0qT4xgB06i1341J |           293 |        73 | 2027-03-19 22:05:05
  9 | 1afd5cf9-88bc-11eb-9c30-0c42a125994e | Y10PZgc4AzTDjxyY5ke0 |            31 |        60 | 2025-03-19 22:05:05
 10 | 1afd61a8-88bc-11eb-9c30-0c42a125994e | 761DXGqU7GUjHpKns2E0 |           732 |        12 | 2022-03-19 22:05:05
 13 | 1afd6f01-88bc-11eb-9c30-0c42a125994e | pNCyKUaVYVyQqowgB3kl |           370 |        31 | 2028-03-19 22:05:05
 16 | 1afd7bcf-88bc-11eb-9c30-0c42a125994e | j8zjGigivtHUhwDq2OK9 |           172 |        90 | 2025-03-19 22:05:05
 18 | 1afd842c-88bc-11eb-9c30-0c42a125994e | 0DZUqdFwtEGifda3AA4p |           480 |        67 | 2028-03-19 22:05:05
 19 | 1afd886b-88bc-11eb-9c30-0c42a125994e | 6SRyZ7v0mCP981zBaSIL |           374 |         5 | 2022-03-19 22:05:05
 21 | 1afd913b-88bc-11eb-9c30-0c42a125994e | JHbEzIIg037fKPJ0FbK4 |           730 |        93 | 2027-03-19 22:05:05
 22 | 1afd9596-88bc-11eb-9c30-0c42a125994e | FSemWreIG6i3eQm7k7qE |           673 |        87 | 2028-03-19 22:05:05
 24 | 1afd9ea8-88bc-11eb-9c30-0c42a125994e | YnMBwEvqqI8mg4oAzZ25 |           960 |         4 | 2026-03-19 22:05:05
 27 | 1afdab41-88bc-11eb-9c30-0c42a125994e | z1eUGAFq0zFvl2ZFf2ie |           953 |        53 | 2028-03-19 22:05:05
 28 | 1afdafc8-88bc-11eb-9c30-0c42a125994e | ZJvabgB2dPk0TfrhVB7D |           500 |         6 | 2028-03-19 22:05:05
 29 | 1afdb407-88bc-11eb-9c30-0c42a125994e | Y4fRnay6I454UaZgaSJS |           848 |        10 | 2028-03-19 22:05:05
 32 | 1afdc0bc-88bc-11eb-9c30-0c42a125994e | Lam7pt0r0zFs9dqnoJi6 |           300 |        68 | 2028-03-19 22:05:05
 33 | 1afdc4fc-88bc-11eb-9c30-0c42a125994e | HPPx2oG7mid4xiGpRSEu |           190 |        94 | 2028-03-19 22:05:05
(20 rows)

简单验证了一下,确认同步完成。

小结

不管是使用mysql2pgsql,还是DTS,都各有各的优势,也各有各的不足。根据实际需求,选择更符合业务场景的工具,才是最合适的。