When I first started playing around with dbt (data build tool), being a traditional MySQL developer, the first thing that came to my mind is Can we connect dbt with mysql ? After some research on getdbt site, I found that there is a dbt-mysql adapter which can connect dbt with mysql. But then as I digged deeper to understand dbt in a real world scenario, the immediate question that popped up is what if I have to do the data processing at scale using MySQL ? MySQL is scalable however it's still single master node. Well that's when Vitess came to a rescue. Since we are in cloud era and that I don't want to manage Vitess on my own, went ahead and saw who provides Vitess as a managed service and thats when PlanetScale came into picture and is a perfect fit for our use case.
What if we combine the power of dbt data transformations with mysql compatible Vitess at scale?
Thats exactly what this case study is all about. In this post, we will connect dbt with Vitess using dbt-mysql adapter on PlanetScale Vitess database Cloud.
Vitess is a MySQL compatible database clustering system for horizontal scaling of MySQL. Since MySQL works with single master, Vitess helps you partition your data into multiple partitions with a unique key ( called as sharding key). It's a proven technology at scale originally developed at Youtube.
PlanetScale is a managed MySQL compatible serverless database platform. They provide database as a service with Vitess as a backend.
So there we go. We have a perfect combination of
- dbt ( data build tool)
- MySQL with dbt-mysql
- Vitess hosted on PlanetScale
Let's jump into connecting each of them. The flow goes like this.
- Vitess is a "MySQL compatible" database. MySQL compatible database indicates that it can support MySQL syntax for all the query needs even though the underlying technology of how data is stored is different from how MySQL natively stores the data.
- To connect dbt with Vitess, we will use dbt-mysql adapter which is a plugin to connect dbt with mysql or mariadb.
- dbt-mysql adapter is not supported by dbt cloud (which is a dbt cloud based online ide). So for this tutorial, we will use dbt cli for working with dbt. dbt cli is a client based tool which can be used to manage the entire dbt workflows.
- PlanetScale is a fully managed Vitess based service (The MySQL-compatible serverless database platform) .
Let's dive into the actual workflows.
Step 1: Create PlanetScale database
First thing we will do is to signup an account with PlanetScale. This is a straight forward step like any other account signup. Here is the planetscale signup link
Once you create an account using above steps, you can configure a
pscale planetscale client (which is a shell based tool) to connect to your db. To do that, for MacOS you use below steps. For other operating systems you can follow this guide
#Install Planetscale Client i.e. pscale brew install planetscale/tap/pscale #Install MySQL client that pscale can connect. brew install mysql-client #Login to planetscale account for cli pscale auth login #Create database named `demo` pscale database create demo #Connect to demo database. pscale connect demo
You can access PlanetScale
demo database using
pscale shell using
pscale shell demo
This is just like a mysql shell.
Step 2: Setup dbt client with dbt-mysql adapter
We will connect dbt-mysql adapter for using dbt client. Setting up dbt cli depends on the type of machine that you use. For the purpose of this tutorial , we will use
pip to setup dbt cli with dbt-mysql adapter.
If you have pip installed already you can skip this step.
Step 2.1: Setup pip:
pip (package installer for Python) is a simple package management system written in python that can be used install and manage software packages/libraries written in Python. If you don't have pip installed already you can use run below commands in your terminal.
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py python3 get-pip.py
Depending on the operating system this process might be a bit different and can be installed using these pip instructions
- If you are using python3 , this might be more like pip3.
Step 2.2: Setup dbt mysql adapter
Once you have pip installed , you can install dbt-mysql using the below command in your terminal.
pip install dbt-mysql
To verify if dbt installed fine, you can use
which should give results similar to this
tipseason$ dbt --version installed version: 1.0.8 latest version: 1.1.1 Your version of dbt is out of date! You can find instructions for upgrading here: https://docs.getdbt.com/docs/installation Plugins: - mysql5: 1.0.0 - No PYPI version available - mariadb: 1.0.0 - No PYPI version available - mysql: 1.0.0 - Up to date!
Once you have dbt-cli installed, you can create a sample dbt project using this.
dbt init demoproject
Now that we have dbt project ready along with planetscale database , let's connect the two pieces.
Step 3: Connect dbt-mysql with planetscale database
Inorder to connect dbt with mysql adapter, we need to modify profiles.yml file which serves as the configuration file to database connections.
For mac users the location of this file is in the home folder under
.dbt subfolder. In this if you don't have a
profiles.yml file, you can create one using
This file should have the following configuration information.
demoproject: target: dev outputs: dev: type: mysql5 server: [planetscale_servername] port: 3306 # optional database: [demo] # optional, should be same as schema schema: [planetscale_schema] username: [planetscale_username] password: [planetscale_password] driver: MySQL ODBC 8.0 ANSI Driver prod: type: mysql5 server: [server/host] port: [port] # optional database: [schema] # optional, should be same as schema schema: [schema] username: [username] password: [password] driver: MySQL ODBC 8.0 ANSI Driver
Make sure you input the data without square brackets (no
That's it. We have all the required configurations ready.
Perform sample queries with dbt-mysql and pscale:
In Step 2.2, we created
demoproject for dbt. dbt comes with sample models(my_first_dbt_model, my_second_dbt_model, my_third_dbt_model) when we initiate a project.
So let's go ahead and run our models from dbt. In terminal,
This will create 3 tables in planetscale vitess database. You can check it in your pscale shell using.
tipseason$ cd demoproject tipseason$ dbt run tipseason$ pscale shell demo demo/main> show tables; +---------------------+ | Tables_in_demo | +---------------------+ | my_first_dbt_model | | my_second_dbt_model | | my_third_dbt_model | +---------------------+ demo/main> select * from my_first_dbt_model; +------+ | id | +------+ | 1 | | NULL | +------+ demo/main> select * from my_second_dbt_model; +------+ | id | +------+ | 1 | +------+ demo/main>
As you can see the three models are properly being populated using PlanetScale Vitess MySQL database with dbt-mysql adapter.
Note that dbt-mysql is still in a testing phase and its only a community supported project. So if you plan to use this in production make sure you evaluate in more detail.
Hope this article finds it helpful. Let me know if any specific insights are needed in this integration in the comments. Happy to help.
Did you know?
Data Analytics + Data Engineers = Analytics Engineers
dbt (data build tool) helps you in becoming Analytics Engineers. If you don’t know what these roles are or you just started playing around data, dbt course towards dbt certification here.
We are launching a dbt course for beginners with practical hands on project.
First 20 people to signup for the course, will get course for free and next 30 people will get a 50% discount. Make sure to signup here.
- Tip: dbt is never spelled in capital letters.
Hope this post make things clear about dbt. In the next tutorial, we will talk about
- dbt Cloud and Step by step tutorial to setup your first dbt cloud project.
Make sure you subscribe to our website by subscribing to our mailing list and by following us on Twitter (@thetipseason)
Did you find this article valuable?
Support Mani Gopal by becoming a sponsor. Any amount is appreciated!