|
| 1 | +Getting started |
| 2 | +=============== |
| 3 | + |
| 4 | +This section gives a quick run-down on installing airflow-dbt-python and getting your first DAG running. |
| 5 | + |
| 6 | +.. _requirements: |
| 7 | + |
| 8 | +Requirements |
| 9 | +------------ |
| 10 | + |
| 11 | +airflow-dbt-python requires the latest major version of ```dbt-core`` <https://pypi.org/project/dbt-core/>`_ which at the time of writing is version 1. |
| 12 | + |
| 13 | +To line up with ``dbt-core``, airflow-dbt-python supports Python 3.7, 3.8, and 3.9. We also include Python 3.10 in our testing pipeline, although as of the time of writing ``dbt-core`` does not yet support it. |
| 14 | + |
| 15 | +On the Airflow side, we support the release version 1.10.12 and all Airflow major version 2 releases. |
| 16 | + |
| 17 | +.. note:: |
| 18 | + ``apache-airflow==1.10.12`` has a dependency conflict with ``dbt-core>=1.0.0``. airflow-dbt-python does not require the conflicting dependency, nor does it access the parts of ``dbt-core`` that use it, so it should work regardless. |
| 19 | + |
| 20 | + That being said, installing airflow-dbt-python in an environment with ``apache-airflow==1.10.12`` will produce warnings, and we do recommend upgrading to version 2 or later due to higher likelihood of future versions of airflow-dbt-python dropping support for version 1.10.12 entirely if the conflicts become unmanageable. |
| 21 | + |
| 22 | +.. warning:: |
| 23 | + Due to the dependency conflict just now described, airflow-dbt-python does not include Airflow as a dependency. We expect it to be installed into an environment with Airflow already in it. For instructions on setting up a development environment, see :ref:`development`. |
| 24 | + |
| 25 | + |
| 26 | +Installation |
| 27 | +------------ |
| 28 | + |
| 29 | +airflow-dbt-python can be installed in any environment that has a supported version of Airflow already installed. See :ref:`requirements` for details, and refer to the `Airflow documentation <https://airflow.apache.org/docs/apache-airflow/stable/installation/index.html>`_ for instructions on how to install it. |
| 30 | + |
| 31 | +From PyPI |
| 32 | +^^^^^^^^^ |
| 33 | + |
| 34 | +airflow-dbt-python is available in `PyPI <https://pypi.org/project/airflow-dbt-python/>`_ and can be installed with ``pip``: |
| 35 | + |
| 36 | +.. code-block:: shell |
| 37 | +
|
| 38 | + pip install airflow-dbt-python |
| 39 | +
|
| 40 | +As a convinience, any dbt adapters that are required can be installed by specifying extras. The ``all`` extra includes all adapters: |
| 41 | + |
| 42 | +.. code-block:: shell |
| 43 | +
|
| 44 | + pip install airflow-dbt-python[snowflake,postgres,redshift,bigquery] |
| 45 | + pip install airflow-dbt-python[all] |
| 46 | +
|
| 47 | +Building from source |
| 48 | +^^^^^^^^^^^^^^^^^^^^ |
| 49 | + |
| 50 | +airflow-dbt-python can also be built from source by cloning the main repo: |
| 51 | + |
| 52 | +.. code-block:: shell |
| 53 | +
|
| 54 | + git clone https://github.com/tomasfarias/airflow-dbt-python.git |
| 55 | + cd airflow-dbt-python |
| 56 | +
|
| 57 | +And installing with ``poetry`` (without development dependencies): |
| 58 | + |
| 59 | +.. code-block:: shell |
| 60 | +
|
| 61 | + poetry install --no-dev |
| 62 | +
|
| 63 | +As with ``pip``, any extra adapters can be installed: |
| 64 | + |
| 65 | +.. code-block:: shell |
| 66 | +
|
| 67 | + poetry install -E postgres -E redshift -E bigquery -E snowflake --no-dev |
| 68 | + poetry install -E all --no-dev |
| 69 | +
|
| 70 | +Installing in MWAA |
| 71 | +^^^^^^^^^^^^^^^^^^ |
| 72 | + |
| 73 | +airflow-dbt-python can be installed in an Airflow environment managed by AWS via their `Managed Workflows for Apache Airflow <https://aws.amazon.com/managed-workflows-for-apache-airflow/>`_ service. To do so, include airflow-dbt-python in MWAA's ``requirements.txt`` file, for example: |
| 74 | + |
| 75 | +.. code-block:: shell |
| 76 | + :caption: requirements.txt |
| 77 | +
|
| 78 | + airflow-dbt-python[redshift,amazon] |
| 79 | +
|
| 80 | +Installs airflow-dbt-python, dbt's Redshift adapter, and Airflow's Amazon providers library. |
| 81 | + |
| 82 | + |
| 83 | +Setting up a dbt project |
| 84 | +------------------------ |
| 85 | + |
| 86 | +Setting up a dbt project for airflow-dbt-python to run depends on the type of executor running in your production Airflow environment: |
| 87 | + |
| 88 | +1. Using a `LocalExecutor <https://airflow.apache.org/docs/apache-airflow/stable/executor/local.html>`_ with a single-machine deployment means we can rely on the local machine's filesystem to store our project. This also applies to DebugExecutor and SequentialExecutor, but these executors are generally only used for debugging/development so we will ignore them. |
| 89 | + |
| 90 | +2. However, once your setup has evolved to a multi-machine/cloud installation, we must rely on an external backend to store any dbt files. The only currently supported backend is S3 although more are in plans to be added (see :ref:`download-dbt-files-from-s3`). |
| 91 | + |
| 92 | + |
| 93 | +Single-machine setup |
| 94 | +^^^^^^^^^^^^^^^^^^^^ |
| 95 | + |
| 96 | +As we can rely on the local machine's filesystem, simply copy your dbt project files and dbt ``profiles.yml`` to a path in your local machine. In your local machine, files may be laid out as: |
| 97 | + |
| 98 | +.. code:: |
| 99 | +
|
| 100 | + . |
| 101 | + |-- ~/.dbt/ |
| 102 | + | `-- profiles.yml |
| 103 | + `-- /path/to/project/ |
| 104 | + |-- dbt_project.yml |
| 105 | + |-- models/ |
| 106 | + | |-- model1.sql |
| 107 | + | `-- model2.sql |
| 108 | + |-- seeds/ |
| 109 | + | |-- seed1.csv |
| 110 | + | `-- seed2.csv |
| 111 | + |-- macros/ |
| 112 | + | |-- macro1.csv |
| 113 | + | `-- macro2.csv |
| 114 | + `-- tests/ |
| 115 | + |-- test1.sql |
| 116 | + `-- test2.sql |
| 117 | +
|
| 118 | +
|
| 119 | +So we can simply set ``project_dir`` and ``profiles_dir`` to ``"/path/to/project/"`` and ``"~/.dbt/"`` respectively: |
| 120 | + |
| 121 | +.. code-block:: python |
| 122 | + :linenos: |
| 123 | + :caption: example_local_1.py |
| 124 | +
|
| 125 | + import datetime as dt |
| 126 | +
|
| 127 | + from airflow.utils.dates import days_ago |
| 128 | + from airflow_dbt_python.operators.dbt import DbtRunOperator |
| 129 | +
|
| 130 | + with DAG( |
| 131 | + dag_id="example_dbt_artifacts", |
| 132 | + schedule_interval="0 0 * * *", |
| 133 | + start_date=days_ago(1), |
| 134 | + catchup=False, |
| 135 | + dagrun_timeout=dt.timedelta(minutes=60), |
| 136 | + ) as dag: |
| 137 | + dbt_run = DbtRunOperator( |
| 138 | + task_id="dbt_run_daily", |
| 139 | + project_dir="/path/to/project", |
| 140 | + profiles_dir="~/.dbt/", |
| 141 | + select=["+tag:daily"], |
| 142 | + exclude=["tag:deprecated"], |
| 143 | + target="production", |
| 144 | + profile="my-project", |
| 145 | + ) |
| 146 | +
|
| 147 | +.. note:: |
| 148 | + Setting ``profiles_dir`` to ``"~/.dbt/"`` can be ommitted as this is the default value. |
| 149 | + |
| 150 | + |
| 151 | +If we have multiple operators, we can also utilize default arguments and include other parameters like the profile and target to use: |
| 152 | + |
| 153 | +.. code-block:: python |
| 154 | + :linenos: |
| 155 | + :caption: example_local_2.py |
| 156 | +
|
| 157 | + import datetime as dt |
| 158 | +
|
| 159 | + from airflow.utils.dates import days_ago |
| 160 | + from airflow_dbt_python.operators.dbt import DbtRunOperator, DbtSeedOperator |
| 161 | +
|
| 162 | + default_args = { |
| 163 | + "project_dir": "/path/to/project/", |
| 164 | + "profiles_dir": "~/.dbt/", |
| 165 | + "target": "production", |
| 166 | + "profile": "my-project", |
| 167 | + } |
| 168 | +
|
| 169 | + with DAG( |
| 170 | + dag_id="example_dbt_artifacts", |
| 171 | + schedule_interval="0 0 * * *", |
| 172 | + start_date=days_ago(1), |
| 173 | + catchup=False, |
| 174 | + dagrun_timeout=dt.timedelta(minutes=60), |
| 175 | + default_args=default_args, |
| 176 | + ) as dag: |
| 177 | + dbt_seed = DbtSeedOperator( |
| 178 | + task_id="dbt_seed", |
| 179 | + ) |
| 180 | +
|
| 181 | + dbt_run = DbtRunOperator( |
| 182 | + task_id="dbt_run_daily", |
| 183 | + select=["+tag:daily"], |
| 184 | + exclude=["tag:deprecated"], |
| 185 | + ) |
| 186 | +
|
| 187 | + dbt_seed >> dbt_run |
| 188 | +
|
| 189 | +
|
| 190 | +.. note:: |
| 191 | + dbt supports configuration via environment variables, which may also be used. Additionally, ``profile`` and ``target`` may be ommitted if already specified in ``dbt_project.yml`` and ``profiles.yml`` respectively. |
| 192 | + |
| 193 | +Multi-machien/cloud installation |
| 194 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
0 commit comments