A data engineering practice project with toy data, inspired by this data engineering project video. The video demonstrates many good practices and procedures, though this repo adapts and changes some elements in order to experiment with other tools.
- Python 3.12.11
- See
requirements.txtfor package dependencies:- loguru
- requests
- beautifulsoup4
- fire
- pydantic
- polars
Install dependencies with:
pip install -r requirements.txtData is sourced from: https://web.ais.dk/aisdata/
ingestion/: Scripts to download, extract, and process AIS data.
utils/: Utility scripts (e.g., file path helpers).
tests/: Pytest-based tests for ingestion logic.
make run-ingest start_date="2025-01-01" end_date="2025-01-01"
``
## ToDos
[] Build a lazyframe validation to deal with unconventional columns names