Skip to content

Commit 67c8d78

Browse files
adamantiketomasfarias
authored andcommitted
fix: Avoid heavy top-level imports in operators
For complex DAGs, the import time added by this library could generate `DagBag` import timeouts if the configured value is low enough. An Airflow documented best practice is to [avoid heavy Python code](https://airflow.apache.org/docs/apache-airflow/stable/best-practices.html#top-level-python-code) that runs on DAG and Operator creation, and `dbt` imports are slow, based on profiling. Profiling can be easily run locally, with the following command: ```shell python -X importtime -c "from airflow_dbt_python.operators.dbt import DbtRunOperator" 2>import-times.log ``` And then parsed using a tool like [tuna](https://github.com/nschloe/tuna). Before this change, the operator import takes ~1.37s, which is reduced to ~0.25s with this fix. It's important to note that, from those 0.25s, more than 80% of the time is spent importing `airflow` components, which will be commonly already loaded in DAGs, so this library's import time for operators becomes insignificant.
1 parent 3844351 commit 67c8d78

File tree

1 file changed

+15
-4
lines changed
  • airflow_dbt_python/operators

1 file changed

+15
-4
lines changed

airflow_dbt_python/operators/dbt.py

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,12 @@
88
from dataclasses import asdict, is_dataclass
99
from pathlib import Path
1010
from tempfile import TemporaryDirectory
11-
from typing import Any, Callable, Iterator, Optional, TypeVar, Union
11+
from typing import TYPE_CHECKING, Any, Callable, Iterator, Optional, TypeVar, Union
1212

1313
from airflow import AirflowException
1414
from airflow.models.baseoperator import BaseOperator
1515
from airflow.models.xcom import XCOM_RETURN_KEY
1616
from airflow.version import version
17-
from dbt.contracts.results import RunExecutionResult, agate
18-
19-
from airflow_dbt_python.hooks.dbt import BaseConfig, DbtHook, LogFormat, Output
2017

2118
# apply_defaults is deprecated in version 2 and beyond. This allows us to
2219
# support version 1 and deal with the deprecation warning.
@@ -30,6 +27,12 @@ def apply_defaults(func: T) -> T:
3027
return func
3128

3229

30+
if TYPE_CHECKING:
31+
from dbt.contracts.results import RunExecutionResult
32+
33+
from airflow_dbt_python.hooks.dbt import BaseConfig
34+
35+
3336
class DbtBaseOperator(BaseOperator):
3437
"""The basic Airflow dbt operator.
3538
@@ -107,6 +110,8 @@ def __init__(
107110
replace_on_push: bool = False,
108111
**kwargs,
109112
) -> None:
113+
from airflow_dbt_python.hooks.dbt import LogFormat
114+
110115
super().__init__(**kwargs)
111116
self.project_dir = project_dir
112117
self.profiles_dir = profiles_dir
@@ -314,6 +319,8 @@ def prepare_directory(self, tmp_dir: str):
314319
def dbt_hook(self):
315320
"""Provides an existing DbtHook or creates one."""
316321
if self._dbt_hook is None:
322+
from airflow_dbt_python.hooks.dbt import DbtHook
323+
317324
self._dbt_hook = DbtHook()
318325
return self._dbt_hook
319326

@@ -611,6 +618,8 @@ def __init__(
611618
indirect_selection: Optional[str] = None,
612619
**kwargs,
613620
) -> None:
621+
from airflow_dbt_python.hooks.dbt import Output
622+
614623
super().__init__(**kwargs)
615624
self.resource_types = resource_types
616625
self.select = select
@@ -757,6 +766,8 @@ def run_result_factory(data: list[tuple[Any, Any]]):
757766
We need to handle dt.datetime and agate.table.Table.
758767
The rest of the types should already be JSON-serializable.
759768
"""
769+
from dbt.contracts.results import agate
770+
760771
d = {}
761772
for key, val in data:
762773
if isinstance(val, dt.datetime):

0 commit comments

Comments
 (0)