Skip to content

Commit da149fe

Browse files
committed
docs: Update documentation with examples and reference
1 parent 9aee3dc commit da149fe

File tree

7 files changed

+360
-23
lines changed

7 files changed

+360
-23
lines changed

docs/development.rst

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,21 +10,19 @@ Poetry
1010

1111
airflow-dbt-python uses `Poetry <https://python-poetry.org/>`_ for project management. Ensure it's installed before running: see `Poetry's installation documentation <https://python-poetry.org/docs/#installation>`_.
1212

13-
Additionally, we recommend running the following commands in a virtual environment.
14-
1513
Installing Airflow
1614
------------------
1715

18-
For running unit-tests we require a local installation of Airflow. We can install a specific version using ``pip``:
16+
Development requires a local installation of Airflow, as airflow-dbt-python doesn't come bundled with one. We can install a specific version using ``pip``:
1917

2018
.. code-block:: shell
2119
22-
pip install apache-airflow==1.10.12
20+
pip install apache-airflow==2.2
2321
2422
.. note::
25-
Installin any 1.X version of Airflow will raise warnings due to dependency conflicts with ``dbt-core``. These conflicts should not impact airflow-dbt-python.
23+
Installing any 1.X version of Airflow will raise warnings due to dependency conflicts with ``dbt-core``. However, these conflicts should not impact airflow-dbt-python.
2624

27-
Or install the ``airflow`` extra which will fetch the latest version of Airflow with major version 2:
25+
Installing the ``airflow`` extra will fetch the latest version of Airflow with major version 2:
2826

2927
.. code-block:: shell
3028
@@ -45,6 +43,27 @@ Clone the main repo and install it:
4543
poetry install
4644
4745
46+
Pre-commit hooks
47+
----------------
48+
49+
A handful of `pre-commit <https://pre-commit.com/>`_ hooks are provided, including:
50+
* Trailing whitespace trimming.
51+
* Ensure EOF newline.
52+
* Detect secrets.
53+
* Code formatting (`black <https://github.com/psf/black>`_).
54+
* PEP8 linting (`flake8 <https://github.com/pycqa/flake8/>`_).
55+
* Static type checking (`mypy <https://github.com/python/mypy>`_).
56+
* Import sorting (`isort <https://github.com/PyCQA/isort>`_).
57+
58+
59+
Install hooks after cloning airflow-dbt-python:
60+
61+
.. code-block:: shell
62+
63+
pre-commit install
64+
65+
Ensuring hooks pass is highly recommended as hooks are mapped to CI/CD checks that will block PRs.
66+
4867
Testing
4968
-------
5069

@@ -78,6 +97,3 @@ Generating coverage reports with pytest-cov can be done with:
7897
.. code-block:: shell
7998
8099
poetry run pytest -vv --cov=./airflow_dbt_python --cov-report=xml:./coverage.xml --cov-report term-missing tests/
81-
82-
Pre-commit hooks
83-
----------------

docs/example_dags.rst

Lines changed: 247 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,250 @@ Example DAGs
22
============
33

44
This section contains a few DAGs showing off some dbt pipelines to get you going.
5+
6+
Basic DAG
7+
^^^^^^^^^
8+
9+
This basic DAG shows off a single ``DbtRunOperator`` that executes daily:
10+
11+
.. code-block:: python
12+
:linenos:
13+
:caption: basic_dag.py
14+
15+
"""Sample basic DAG which dbt runs a project."""
16+
import datetime as dt
17+
18+
from airflow import DAG
19+
from airflow.utils.dates import days_ago
20+
from airflow_dbt_python.dbt.operators import DbtRunOperator
21+
22+
with DAG(
23+
dag_id="example_basic_dbt_run",
24+
schedule_interval="0 * * * *",
25+
start_date=days_ago(1),
26+
catchup=False,
27+
dagrun_timeout=dt.timedelta(minutes=60),
28+
) as dag:
29+
30+
dbt_run = DbtRunOperator(
31+
task_id="dbt_run_hourly",
32+
project_dir="/path/to/my/dbt/project/",
33+
profiles_dir="~/.dbt/",
34+
select=["+tag:hourly"],
35+
exclude=["tag:deprecated"],
36+
target="production",
37+
profile="my-project",
38+
full_refresh=False,
39+
)
40+
41+
42+
Run and Docs from S3
43+
^^^^^^^^^^^^^^^^^^^^
44+
45+
This DAG shows off a ``DbtRunOperator`` followed by a ``DbtDocsGenerateOperator``. Both execute daily, and run from dbt project files available in an S3 URL:
46+
47+
.. code-block:: python
48+
:linenos:
49+
:caption: dbt_project_in_s3_dag.py
50+
51+
"""Sample basic DAG which showcases a dbt project being pulled from S3."""
52+
import datetime as dt
53+
54+
from airflow import DAG
55+
from airflow.utils.dates import days_ago
56+
from airflow_dbt_python.dbt.operators import DbtDocsGenerateOperator, DbtRunOperator
57+
58+
with DAG(
59+
dag_id="example_basic_dbt_run_with_s3",
60+
schedule_interval="0 * * * *",
61+
start_date=days_ago(1),
62+
catchup=False,
63+
dagrun_timeout=dt.timedelta(minutes=60),
64+
) as dag:
65+
66+
# Project files will be pulled from "s3://my-bucket/dbt/profiles/key/prefix/"
67+
dbt_run = DbtRunOperator(
68+
task_id="dbt_run_hourly",
69+
project_dir="s3://my-bucket/dbt/project/key/prefix/",
70+
profiles_dir="s3://my-bucket/dbt/profiles/key/prefix/",
71+
select=["+tag:hourly"],
72+
exclude=["tag:deprecated"],
73+
target="production",
74+
profile="my-project",
75+
full_refresh=False,
76+
)
77+
78+
# Documentation files (target/manifest.json, target/index.html, and
79+
# target/catalog.json) will be pushed back to S3 after compilation is done.
80+
dbt_docs = DbtDocsGenerateOperator(
81+
task_id="dbt_run_hourly",
82+
project_dir="s3://my-bucket/dbt/project/key/prefix/",
83+
profiles_dir="s3://my-bucket/dbt/profiles/key/prefix/",
84+
)
85+
86+
dbt_run >> dbt_docs
87+
88+
Complete dbt workflow
89+
^^^^^^^^^^^^^^^^^^^^^
90+
91+
This DAG shows off a (almost) complete dbt workflow as it would be run from the CLI: we begin by running ``DbtSourceOperator`` to test the freshness of our source tables, ``DbtSeedOperator`` follows to load up any static data. Then, two instances of ``DbtRunOperator`` are created: one to handle incremental data, and the other one to run any non-incremental models. Finally, we run our tests to ensure our models remain correct.
92+
93+
.. code-block:: python
94+
:linenos:
95+
:caption: complete_dbt_workflow_dag.py
96+
97+
"""Sample DAG showcasing a complete dbt workflow.
98+
99+
The complete workflow includes a sequence of source, seed, and several run commands.
100+
"""
101+
import datetime as dt
102+
103+
from airflow import DAG
104+
from airflow.utils.dates import days_ago
105+
from airflow_dbt_python.dbt.operators import (
106+
DbtRunOperator,
107+
DbtSeedOperator,
108+
DbtSourceOperator,
109+
DbtTestOperator,
110+
)
111+
112+
with DAG(
113+
dag_id="example_complete_dbt_workflow",
114+
schedule_interval="0 * * * *",
115+
start_date=days_ago(1),
116+
catchup=False,
117+
dagrun_timeout=dt.timedelta(minutes=60),
118+
) as dag:
119+
dbt_source = DbtSourceOperator(
120+
task_id="dbt_run_incremental_hourly",
121+
project_dir="/path/to/my/dbt/project/",
122+
profiles_dir="~/.dbt/",
123+
target="production",
124+
profile="my-project",
125+
do_xcom_push_artifacts=["sources.json"],
126+
)
127+
128+
dbt_seed = DbtSeedOperator(
129+
task_id="dbt_seed",
130+
project_dir="/path/to/my/dbt/project/",
131+
profiles_dir="~/.dbt/",
132+
target="production",
133+
profile="my-project",
134+
)
135+
136+
dbt_run_incremental = DbtRunOperator(
137+
task_id="dbt_run_incremental_hourly",
138+
project_dir="/path/to/my/dbt/project/",
139+
profiles_dir="~/.dbt/",
140+
select=["tag:hourly,config.materialized:incremental"],
141+
exclude=["tag:deprecated"],
142+
target="production",
143+
profile="my-project",
144+
full_refresh=False,
145+
)
146+
147+
dbt_run = DbtRunOperator(
148+
task_id="dbt_run_hourly",
149+
project_dir="/path/to/my/dbt/project/",
150+
profiles_dir="~/.dbt/",
151+
select=["+tag:hourly"],
152+
exclude=["tag:deprecated,config.materialized:incremental"],
153+
target="production",
154+
profile="my-project",
155+
full_refresh=True,
156+
)
157+
158+
dbt_test = DbtTestOperator(
159+
task_id="dbt_test",
160+
project_dir="/path/to/my/dbt/project/",
161+
profiles_dir="~/.dbt/",
162+
target="production",
163+
profile="my-project",
164+
)
165+
166+
dbt_source >> dbt_seed >> dbt_run_incremental >> dbt_run >> dbt_test
167+
168+
Using dbt artifacts
169+
^^^^^^^^^^^^^^^^^^^
170+
171+
The following DAG showcases how to use `dbt artifacts <https://docs.getdbt.com/reference/artifacts/dbt-artifacts/>`_ that are made available via XCom by airflow-dbt-python. A sample function calculates the longest running dbt model by pulling the artifacts that were generated after ``DbtRunOperator`` executes. We specify which dbt artifacts via the ``do_xcom_push_artifacts`` parameter.
172+
173+
.. code-block:: python
174+
:linenos:
175+
:caption: use_dbt_artifacts_dag.py
176+
177+
"""Sample DAG to showcase pulling dbt artifacts from XCOM."""
178+
import datetime as dt
179+
180+
from airflow import DAG
181+
from airflow.operators.python_operator import PythonOperator
182+
from airflow.utils.dates import days_ago
183+
from airflow_dbt_python.dbt.operators import DbtRunOperator
184+
185+
186+
def process_dbt_artifacts(**context):
187+
"""Report which model or models took the longest to compile and execute."""
188+
run_results = context["ti"].xcom_pull(
189+
key="run_results.json", task_ids="dbt_run_daily"
190+
)
191+
longest_compile = None
192+
longest_execute = None
193+
194+
for result in run_results["results"]:
195+
if result["status"] != "success":
196+
continue
197+
198+
model_id = result["unique_id"]
199+
for timing in result["timing"]:
200+
duration = (
201+
dt.datetime.strptime(
202+
timing["started_at"], format="%Y-%m-%dT%H:%M:%S.%fZ"
203+
)
204+
- dt.datetime.strptime(
205+
timing["completed_at"], format="%Y-%m-%dT%H:%M:%S.%fZ"
206+
)
207+
).total_seconds()
208+
209+
if timing["name"] == "execute":
210+
if longest_execute is None or duration > longest_execute[1]:
211+
longest_execute = (model_id, duration)
212+
213+
elif timing["name"] == "compile":
214+
if longest_compile is None or duration > longest_compile[1]:
215+
longest_compile = (model_id, duration)
216+
217+
print(
218+
f"{longest_execute[0]} took the longest to execute with a time of "
219+
f"{longest_execute[1]} seconds!"
220+
)
221+
print(
222+
f"{longest_compile[0]} took the longest to compile with a time of "
223+
f"{longest_compile[1]} seconds!"
224+
)
225+
226+
with DAG(
227+
dag_id="example_dbt_artifacts",
228+
schedule_interval="0 0 * * *",
229+
start_date=days_ago(1),
230+
catchup=False,
231+
dagrun_timeout=dt.timedelta(minutes=60),
232+
) as dag:
233+
dbt_run = DbtRunOperator(
234+
task_id="dbt_run_daily",
235+
project_dir="/path/to/my/dbt/project/",
236+
profiles_dir="~/.dbt/",
237+
select=["+tag:daily"],
238+
exclude=["tag:deprecated"],
239+
target="production",
240+
profile="my-project",
241+
full_refresh=True,
242+
do_xcom_push_artifacts=["manifest.json", "run_results.json"],
243+
)
244+
245+
process_artifacts = PythonOperator(
246+
task_id="process_artifacts",
247+
python_callable=process_dbt_artifacts,
248+
provide_context=True,
249+
)
250+
251+
dbt_run >> process_artifacts

0 commit comments

Comments
 (0)