This workspace is organised to make it easy to collaborate, keep notebooks reproducible, and avoid losing context (data sources, assumptions, decisions).
- Project brief / plan:
docs/PROJECT_CONTEXT.md - Candidate dataset ideas:
docs/DATA_SOURCE_CANDIDATES.md - How we get data from the web (manual vs URL vs API):
docs/GETTING_DATA_FROM_THE_WEB.md - Synthetic scenario datasets (baseline + interventions):
docs/SCENARIO_DATASETS.md - Dataset provenance (what you actually used):
docs/DATA_SOURCES.md - Decision log (important choices + assumptions):
docs/DECISIONS.md
notebooks/: Jupyter notebooks (analysis, exploration, modelling)src/epidemiology_project/: reusable Python code (loading/cleaning/modelling helpers)data/:data/raw/: raw datasets (do not edit in-place)data/external/: externally sourced reference data (e.g. shapefiles)data/interim/: intermediate datasets (cleaned but not final)data/processed/: final datasets ready for analysis
reports/: write-ups, slides, exportsreports/figures/: figures for the report/posterreports/tables/: tables for the report/poster
figures/: quick ad-hoc figures (optional; preferreports/figures/for final outputs)docs/: project context (brief, data sources, decisions)
- Create a virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
pip install -e .Install pre-commit + ruff and enable the git hook:
pip install -r requirements-dev.txt
pre-commit install- Launch Jupyter:
jupyter lab- In GitHub, open the repo and click Code → Codespaces → Create codespace on main.
- Wait for the dev container setup to finish (it installs dependencies automatically).
- Open a notebook in
notebooks/and run cells using the Jupyter extension.
If you prefer launching JupyterLab explicitly:
jupyter lab --ip=0.0.0.0 --no-browser --port 8888Codespaces will prompt you to open/forward port 8888.
If you want pre-existing real data first (time vs infected proxy):
notebooks/09_download_baseline_observed_ukhsa_covid_cases.ipynb
If you then need separate intervention datasets (clean “what-if” scenarios):
notebooks/10_generate_baseline_no_intervention.ipynbnotebooks/11_generate_intervention_scenarios.ipynbnotebooks/12_compare_scenarios.ipynb
If you want to understand how the UKHSA API/Swagger/OpenAPI browsing works:
notebooks/00_tutorial_swagger_openapi_ukhsa.ipynb
See CONTRIBUTING.md for the workflow. The short version:
- Work on a branch:
git checkout -b yourname/topic - Commit small changes often
- Merge to
mainwhen notebooks run and the story still makes sense
- Keep raw data immutable: add new raw files to
data/raw/, never overwrite with cleaned versions. - Notebooks should use relative paths rooted at the repo, e.g.
../data/raw/...from insidenotebooks/. - When you make a key modelling choice, record it in
docs/DECISIONS.md.
See docs/PROJECT_CONTEXT.md for the living project brief and current plan.