setup airflow on MySQL
SQLite Database
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-sqlite-database
用于开发环境,有一些限制,只支持 序列执行器, 不能用作产品环境。
SQLite database can be used to run Airflow for development purpose as it does not require any database server (the database is stored in a local file). There are a few limitations of using the SQLite database (for example it only works with Sequential Executor) and it should NEVER be used for production.
MySQL Database
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#setting-up-a-mysql-database
先建立数据库:
You need to create a database and a database user that Airflow will use to access this database. In the example below, a database
airflow_db
and user with usernameairflow_user
with passwordairflow_pass
will be createdCREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; CREATE USER ‘airflow_user‘ IDENTIFIED BY ‘airflow_pass‘; GRANT ALL PRIVILEGES ON airflow_db.* TO ‘airflow_user‘;
设置数据库配置文件 explicit_defaults_for_timestamp=1
We rely on more strict ANSI SQL settings for MySQL in order to have sane defaults. Make sure to have specified
explicit_defaults_for_timestamp=1
option under[mysqld]
section in yourmy.cnf
file. You can also activate these options with the--explicit-defaults-for-timestamp
switch passed tomysqld
executable
安装python的客户端驱动库
pip install mysqlclient
We recommend using the
mysqlclient
driver and specifying it in your SqlAlchemy connection string.
设置DATABASE_URI
https://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#database-uri
修改 airflow.cfg配置文件中的 sql_alchemy_conn
参数,或者配置环境变量AIRFLOW__CORE__SQL_ALCHEMY_CONN
。
Airflow uses SQLAlchemy to connect to the database, which requires you to configure the Database URL. You can do this in option
sql_alchemy_conn
in section[core]
. It is also common to configure this option withAIRFLOW__CORE__SQL_ALCHEMY_CONN
environment variable.
mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname>
初始化数据库
https://stackoverflow.com/questions/61663681/configure-sql-server-in-airflow-with-sql-alchemy-conn
airflow initdb
另外一种客户端驱动
支持SSL免证书选项。
But we also support the
mysql-connector-python
driver, which lets you connect through SSL without any cert options provided.mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname>
客户端驱动issue
如果驱动未安装或者安装不正确, 执行 airflow initdb 会报错
ModuleNotFoundError: No module named ‘MySQLdb‘
解决办法就是使用pip安装相应的驱动。
https://stackoverflow.com/questions/51245121/mysqldb-modulenotfounderror
Database Urls of SQLAlchemy
https://docs.sqlalchemy.org/en/14/core/engines.html#database-urls
The
create_engine()
function produces anEngine
object based on a URL. These URLs follow RFC-1738, and usually can include username, password, hostname, database name as well as optional keyword arguments for additional configuration. In some cases a file path is accepted, and in others a “data source name” replaces the “host” and “database” portions. The typical form of a database URL is:dialect+driver://username:password@host:port/database
Dialect names include the identifying name of the SQLAlchemy dialect, a name such as
sqlite
,mysql
,postgresql
,oracle
, ormssql
. The drivername is the name of the DBAPI to be used to connect to the database using all lowercase letters. If not specified, a “default” DBAPI will be imported if available - this default is typically the most widely known driver available for that backend.
The MySQL dialect uses mysql-python as the default DBAPI. There are many MySQL DBAPIs available, including MySQL-connector-python and OurSQL:
# default engine = create_engine(‘mysql://scott:tiger@localhost/foo‘) # mysqlclient (a maintained fork of MySQL-Python) engine = create_engine(‘mysql+mysqldb://scott:tiger@localhost/foo‘) # PyMySQL engine = create_engine(‘mysql+pymysql://scott:tiger@localhost/foo‘)
Excercise
https://github.com/fanqingsong/machine_learning_workflow_on_airflow
基于本地的调测环境,将此项目改造为基于MySQL数据库后台, 并在readme中记录过程。