Skip to content

Commit 9aee3dc

Browse files
committed
docs: A lot of documentation improvements
1 parent eeaf1eb commit 9aee3dc

File tree

10 files changed

+396
-17
lines changed

10 files changed

+396
-17
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,8 @@ with DAG(
7575
exclude=["tag:deprecated"],
7676
target="production",
7777
profile="my-project",
78-
full_refresh=True,
78+
full_refresh=True,See the full example [here](examples/use_dbt_artifacts_dag.py).
79+
7980
do_xcom_push_artifacts=["manifest.json", "run_results.json"],
8081
)
8182

docs/autodoc.rst

Lines changed: 0 additions & 15 deletions
This file was deleted.

docs/development.rst

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
.. _development:
2+
3+
Development
4+
===========
5+
6+
This section describes how to setup a development environment. If you are looking to dig into the internals of airflow-dbt-python and make a (very appreciated) contribution to the project, read along.
7+
8+
Poetry
9+
------
10+
11+
airflow-dbt-python uses `Poetry <https://python-poetry.org/>`_ for project management. Ensure it's installed before running: see `Poetry's installation documentation <https://python-poetry.org/docs/#installation>`_.
12+
13+
Additionally, we recommend running the following commands in a virtual environment.
14+
15+
Installing Airflow
16+
------------------
17+
18+
For running unit-tests we require a local installation of Airflow. We can install a specific version using ``pip``:
19+
20+
.. code-block:: shell
21+
22+
pip install apache-airflow==1.10.12
23+
24+
.. note::
25+
Installin any 1.X version of Airflow will raise warnings due to dependency conflicts with ``dbt-core``. These conflicts should not impact airflow-dbt-python.
26+
27+
Or install the ``airflow`` extra which will fetch the latest version of Airflow with major version 2:
28+
29+
.. code-block:: shell
30+
31+
cd airflow-dbt-python
32+
poetry install -E airflow
33+
34+
35+
Building from source
36+
--------------------
37+
38+
Clone the main repo and install it:
39+
40+
41+
.. code-block:: shell
42+
43+
git clone https://github.com/tomasfarias/airflow-dbt-python.git
44+
cd airflow-dbt-python
45+
poetry install
46+
47+
48+
Testing
49+
-------
50+
51+
Unit tests are available for all operators and hooks. That being said, only a fraction of the large amount of possible inputs that the operators and hooks can take is currently covered, so the unit tests do not offer perfect coverage (a single peek at the ``DbtBaseOperator`` should give you an idea of the level of state explosion we manage).
52+
53+
.. note::
54+
Unit tests (and airflow-dbt-python) assume dbt works correctly and do not assert the behavior of the dbt commands themselves.
55+
56+
Requirements
57+
^^^^^^^^^^^^
58+
59+
Unit tests interact with a `PostgreSQL <https://www.postgresql.org/>`_ database as a target to run dbt commands. This requires PostgreSQL to be installed in your local environment. Installation instructions for all major platforms can be found here: https://www.postgresql.org/download/.
60+
61+
Some unit tests require the `Amazon provider package for Airflow <https://pypi.org/project/apache-airflow-providers-amazon/>`_. Ensure it's installed via the ``amazon`` extra:
62+
63+
.. code-block:: shell
64+
65+
poetry install -E amazon
66+
67+
Running unit tests with pytest
68+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
69+
70+
airflow-dbt-python uses `pytest <https://docs.pytest.org/>`_ as its testing framework. After you have saved your changes, all unit tests can be run with:
71+
72+
.. code-block:: shell
73+
74+
poetry run pytest tests/ -vv
75+
76+
Generating coverage reports with pytest-cov can be done with:
77+
78+
.. code-block:: shell
79+
80+
poetry run pytest -vv --cov=./airflow_dbt_python --cov-report=xml:./coverage.xml --cov-report term-missing tests/
81+
82+
Pre-commit hooks
83+
----------------

docs/example_dags.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Example DAGs
2+
============
3+
4+
This section contains a few DAGs showing off some dbt pipelines to get you going.

docs/getting_started.rst

Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
Getting started
2+
===============
3+
4+
This section gives a quick run-down on installing airflow-dbt-python and getting your first DAG running.
5+
6+
.. _requirements:
7+
8+
Requirements
9+
------------
10+
11+
airflow-dbt-python requires the latest major version of ```dbt-core`` <https://pypi.org/project/dbt-core/>`_ which at the time of writing is version 1.
12+
13+
To line up with ``dbt-core``, airflow-dbt-python supports Python 3.7, 3.8, and 3.9. We also include Python 3.10 in our testing pipeline, although as of the time of writing ``dbt-core`` does not yet support it.
14+
15+
On the Airflow side, we support the release version 1.10.12 and all Airflow major version 2 releases.
16+
17+
.. note::
18+
``apache-airflow==1.10.12`` has a dependency conflict with ``dbt-core>=1.0.0``. airflow-dbt-python does not require the conflicting dependency, nor does it access the parts of ``dbt-core`` that use it, so it should work regardless.
19+
20+
That being said, installing airflow-dbt-python in an environment with ``apache-airflow==1.10.12`` will produce warnings, and we do recommend upgrading to version 2 or later due to higher likelihood of future versions of airflow-dbt-python dropping support for version 1.10.12 entirely if the conflicts become unmanageable.
21+
22+
.. warning::
23+
Due to the dependency conflict just now described, airflow-dbt-python does not include Airflow as a dependency. We expect it to be installed into an environment with Airflow already in it. For instructions on setting up a development environment, see :ref:`development`.
24+
25+
26+
Installation
27+
------------
28+
29+
airflow-dbt-python can be installed in any environment that has a supported version of Airflow already installed. See :ref:`requirements` for details, and refer to the `Airflow documentation <https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html>`_ for instructions on how to install it.
30+
31+
From PyPI
32+
^^^^^^^^^
33+
34+
airflow-dbt-python is available in `PyPI <https://pypi.org/project/airflow-dbt-python/>`_ and can be installed with ``pip``:
35+
36+
.. code-block:: shell
37+
38+
pip install airflow-dbt-python
39+
40+
As a convinience, any dbt adapters that are required can be installed by specifying extras. The ``all`` extra includes all adapters:
41+
42+
.. code-block:: shell
43+
44+
pip install airflow-dbt-python[snowflake,postgres,redshift,bigquery]
45+
pip install airflow-dbt-python[all]
46+
47+
Building from source
48+
^^^^^^^^^^^^^^^^^^^^
49+
50+
airflow-dbt-python can also be built from source by cloning the main repo:
51+
52+
.. code-block:: shell
53+
54+
git clone https://github.com/tomasfarias/airflow-dbt-python.git
55+
cd airflow-dbt-python
56+
57+
And installing with ``poetry`` (without development dependencies):
58+
59+
.. code-block:: shell
60+
61+
poetry install --no-dev
62+
63+
As with ``pip``, any extra adapters can be installed:
64+
65+
.. code-block:: shell
66+
67+
poetry install -E postgres -E redshift -E bigquery -E snowflake --no-dev
68+
poetry install -E all --no-dev
69+
70+
Installing in MWAA
71+
^^^^^^^^^^^^^^^^^^
72+
73+
airflow-dbt-python can be installed in an Airflow environment managed by AWS via their `Managed Workflows for Apache Airflow <https://aws.amazon.com/managed-workflows-for-apache-airflow/>`_ service. To do so, include airflow-dbt-python in MWAA's ``requirements.txt`` file, for example:
74+
75+
.. code-block:: shell
76+
:caption: requirements.txt
77+
78+
airflow-dbt-python[redshift,amazon]
79+
80+
Installs airflow-dbt-python, dbt's Redshift adapter, and Airflow's Amazon providers library.
81+
82+
83+
Setting up a dbt project
84+
------------------------
85+
86+
Setting up a dbt project for airflow-dbt-python to run depends on the type of executor running in your production Airflow environment:
87+
88+
1. Using a `LocalExecutor <https://airflow.apache.org/docs/apache-airflow/stable/executor/local.html>`_ with a single-machine deployment means we can rely on the local machine's filesystem to store our project. This also applies to DebugExecutor and SequentialExecutor, but these executors are generally only used for debugging/development so we will ignore them.
89+
90+
2. However, once your setup has evolved to a multi-machine/cloud installation, we must rely on an external backend to store any dbt files. The only currently supported backend is S3 although more are in plans to be added (see :ref:`download-dbt-files-from-s3`).
91+
92+
93+
Single-machine setup
94+
^^^^^^^^^^^^^^^^^^^^
95+
96+
As we can rely on the local machine's filesystem, simply copy your dbt project files and dbt ``profiles.yml`` to a path in your local machine. In your local machine, files may be laid out as:
97+
98+
.. code::
99+
100+
.
101+
|-- ~/.dbt/
102+
| `-- profiles.yml
103+
`-- /path/to/project/
104+
|-- dbt_project.yml
105+
|-- models/
106+
| |-- model1.sql
107+
| `-- model2.sql
108+
|-- seeds/
109+
| |-- seed1.csv
110+
| `-- seed2.csv
111+
|-- macros/
112+
| |-- macro1.csv
113+
| `-- macro2.csv
114+
`-- tests/
115+
|-- test1.sql
116+
`-- test2.sql
117+
118+
119+
So we can simply set ``project_dir`` and ``profiles_dir`` to ``"/path/to/project/"`` and ``"~/.dbt/"`` respectively:
120+
121+
.. code-block:: python
122+
:linenos:
123+
:caption: example_local_1.py
124+
125+
import datetime as dt
126+
127+
from airflow.utils.dates import days_ago
128+
from airflow_dbt_python.operators.dbt import DbtRunOperator
129+
130+
with DAG(
131+
dag_id="example_dbt_artifacts",
132+
schedule_interval="0 0 * * *",
133+
start_date=days_ago(1),
134+
catchup=False,
135+
dagrun_timeout=dt.timedelta(minutes=60),
136+
) as dag:
137+
dbt_run = DbtRunOperator(
138+
task_id="dbt_run_daily",
139+
project_dir="/path/to/project",
140+
profiles_dir="~/.dbt/",
141+
select=["+tag:daily"],
142+
exclude=["tag:deprecated"],
143+
target="production",
144+
profile="my-project",
145+
)
146+
147+
.. note::
148+
Setting ``profiles_dir`` to ``"~/.dbt/"`` can be ommitted as this is the default value.
149+
150+
151+
If we have multiple operators, we can also utilize default arguments and include other parameters like the profile and target to use:
152+
153+
.. code-block:: python
154+
:linenos:
155+
:caption: example_local_2.py
156+
157+
import datetime as dt
158+
159+
from airflow.utils.dates import days_ago
160+
from airflow_dbt_python.operators.dbt import DbtRunOperator, DbtSeedOperator
161+
162+
default_args = {
163+
"project_dir": "/path/to/project/",
164+
"profiles_dir": "~/.dbt/",
165+
"target": "production",
166+
"profile": "my-project",
167+
}
168+
169+
with DAG(
170+
dag_id="example_dbt_artifacts",
171+
schedule_interval="0 0 * * *",
172+
start_date=days_ago(1),
173+
catchup=False,
174+
dagrun_timeout=dt.timedelta(minutes=60),
175+
default_args=default_args,
176+
) as dag:
177+
dbt_seed = DbtSeedOperator(
178+
task_id="dbt_seed",
179+
)
180+
181+
dbt_run = DbtRunOperator(
182+
task_id="dbt_run_daily",
183+
select=["+tag:daily"],
184+
exclude=["tag:deprecated"],
185+
)
186+
187+
dbt_seed >> dbt_run
188+
189+
190+
.. note::
191+
dbt supports configuration via environment variables, which may also be used. Additionally, ``profile`` and ``target`` may be ommitted if already specified in ``dbt_project.yml`` and ``profiles.yml`` respectively.
192+
193+
Multi-machien/cloud installation
194+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

docs/index.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,11 @@ Welcome to airflow-dbt-python's documentation!
55
:maxdepth: 2
66
:caption: Contents:
77

8-
autodoc
8+
introduction.rst
9+
getting_started.rst
10+
example_dags.rst
11+
development.rst
12+
reference.rst
913

1014
Indices and tables
1115
==================

0 commit comments

Comments
 (0)