generated from VectorInstitute/aieng-template-implementation
-
Notifications
You must be signed in to change notification settings - Fork 19
Report Generation: Adding notebooks to better explain the usage #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
53fd42e
Halfway through the first notebook
lotif 41766a2
Merge branch 'main' into marcelo/notebooks
lotif 9e239a7
Finishing the first notebook, adding the second notebook
lotif 6a5d8ca
Finishing up the notebooks
lotif 38ecb4b
Small adjustments to the notebooks
lotif 87a38bc
Merge branch 'main' into marcelo/notebooks
amrit110 d0724f8
Some other small improvements
lotif 49a6832
Merge remote-tracking branch 'origin/marcelo/notebooks' into marcelo/β¦
lotif 3122783
CR by Amrit
lotif File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
195 changes: 195 additions & 0 deletions
195
implementations/report_generation/01_Importing_the_Dataset.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,195 @@ | ||
| { | ||
| "cells": [ | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "cc016c29-e5e8-4338-9a31-0fa6700505ff", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "# Importing the Dataset for the Report Generation Agent\n", | ||
| "\n", | ||
| "This notebook implements the **data import** for the **Report Generation Agent** for single-table relational\n", | ||
| "data source.\n", | ||
| "\n", | ||
| "The data source implemented here is an [SQLite](https://sqlite.org/) database which is supported\n", | ||
| "natively by Python and saves the data in disk.\n", | ||
| "[SQLAlchemy](https://www.sqlalchemy.org/) is used as a SQL connection tool so this\n", | ||
| "SQL connection can be easily swapped for other databases.\n", | ||
| "\n", | ||
| "The SQL Alchemy tool is set up to allow **read-only queries**, so there is **no risk** the agent runs queries that can modify the DB data." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "8499a56f-716f-47a6-b255-8bdbbe0fd777", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Setting up\n", | ||
| "\n", | ||
| "The code below sets the notebook default folder, sets the default constants and checks the presence of the environment variables.\n", | ||
| "\n", | ||
| "The environment variables can be set in the `.env` file in the root folder of the project." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "4cc2db20-296f-4822-916c-b8255073c066", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import os\n", | ||
| "import ssl\n", | ||
| "import urllib.request\n", | ||
| "import zipfile\n", | ||
| "from pathlib import Path\n", | ||
| "\n", | ||
| "import certifi\n", | ||
| "import pandas as pd\n", | ||
| "from aieng.agent_evals.async_client_manager import AsyncClientManager\n", | ||
| "\n", | ||
| "\n", | ||
| "# Setting the notebook directory to the project's root folder\n", | ||
| "if Path(\"\").absolute().name == \"eval-agents\":\n", | ||
| " print(f\"Notebook path is already the root path: {Path('').absolute()}\")\n", | ||
| "else:\n", | ||
| " os.chdir(Path(\"\").absolute().parent.parent)\n", | ||
| " print(f\"The notebook path has been set to: {Path('').absolute()}\")\n", | ||
| "\n", | ||
| "client_manager = AsyncClientManager.get_instance()\n", | ||
| "assert client_manager.configs.report_generation_db.database, (\n", | ||
| " \"[ERROR] The database path is not set! Please configure the REPORT_GENERATION_DB__DATABASE environment variable.\"\n", | ||
| ")\n", | ||
| "\n", | ||
| "print(\"All environment variables have been set.\")\n", | ||
| "\n", | ||
| "DATA_FOLDER = Path(\"implementations/report_generation/data\")\n", | ||
| "DATASET_PATH = DATA_FOLDER / \"OnlineRetail.csv\"\n", | ||
| "\n", | ||
| "from implementations.report_generation.data.import_online_retail_data import import_online_retail_data # noqa: E402" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "0aa0bdf0-a7ba-4458-868b-07d627b12ed9", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Dataset\n", | ||
| "\n", | ||
| "The dataset used in this example is the\n", | ||
| "**[Online Retail](https://archive.ics.uci.edu/dataset/352/online+retail) dataset**. It contains\n", | ||
| "information about **invoices** for products that were purchased by customers, which also includes\n", | ||
| "product quantity, the invoice date and the country that the customer resides in. For a more\n", | ||
| "detailed data structure, please check the [OnlineRetail.ddl](http://localhost:8888/lab/tree/implementations/report_generation/data/OnlineRetail.ddl) file." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "553dceaa-8fe7-4e4f-9940-b2d1e8d8d6ee", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Downloading the Dataset\n", | ||
| "\n", | ||
| "The code below will **download and unzip** the dataset to the `implementations/report_generation/data/` folder." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "554f1cc6-c42f-4fe3-8857-214fcbeafd95", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "url = \"https://archive.ics.uci.edu/static/public/352/online+retail.zip\"\n", | ||
| "zip_file_path = DATA_FOLDER / \"online_retail.zip\"\n", | ||
| "xlsx_file_path = DATA_FOLDER / \"Online Retail.xlsx\"\n", | ||
| "\n", | ||
| "print(\"Downloading the dataset...\")\n", | ||
| "ctx = ssl.create_default_context(cafile=certifi.where())\n", | ||
| "req = urllib.request.Request(url)\n", | ||
| "with urllib.request.urlopen(req, context=ctx) as resp, open(zip_file_path, \"wb\") as f:\n", | ||
| " f.write(resp.read())\n", | ||
| "\n", | ||
| "print(\"Extracting the dataset file...\")\n", | ||
| "with zipfile.ZipFile(zip_file_path, \"r\") as zf:\n", | ||
| " zf.extractall(DATA_FOLDER)\n", | ||
| "\n", | ||
| "print(\"Converting the dataset file from .xls to .csv...\")\n", | ||
| "df = pd.read_excel(xlsx_file_path)\n", | ||
| "df.to_csv(DATASET_PATH, index=False)\n", | ||
| "\n", | ||
| "print(\"Done!\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "2e4e45de-1f07-41de-bf9c-f9c2543b8cb3", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Visualizing the data" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "123d37f3-fd6f-4676-8f84-8bcfa45a0535", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "df = pd.read_csv(DATASET_PATH)\n", | ||
| "df # noqa: B018" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "ec3a84cb-3636-460a-a9bd-e6ea57d3f9d7", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Importing the Data\n", | ||
| "\n", | ||
| "The code below will import the `.csv` dataset to the database at the path set by the `REPORT_GENERATION_DB__DATABASE` environment variable." | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "ee28609b-6eed-4aea-a5e0-c7d6df57e0af", | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "import_online_retail_data(DATASET_PATH)\n", | ||
| "print(\"Done!\")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "markdown", | ||
| "id": "ce2ab32d-859f-44b1-9c54-cf9c2c1f1da4", | ||
| "metadata": {}, | ||
| "source": [ | ||
| "## Conclusion\n", | ||
| "\n", | ||
| "Now the data should be ready to be consumed by the agent on the **next notebook**." | ||
| ] | ||
| } | ||
| ], | ||
| "metadata": { | ||
| "kernelspec": { | ||
| "display_name": "Python 3 (ipykernel)", | ||
| "language": "python", | ||
| "name": "python3" | ||
| }, | ||
| "language_info": { | ||
| "codemirror_mode": { | ||
| "name": "ipython", | ||
| "version": 3 | ||
| }, | ||
| "file_extension": ".py", | ||
| "mimetype": "text/x-python", | ||
| "name": "python", | ||
| "nbconvert_exporter": "python", | ||
| "pygments_lexer": "ipython3", | ||
| "version": "3.12.0" | ||
| } | ||
| }, | ||
| "nbformat": 4, | ||
| "nbformat_minor": 5 | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm generally not in favour of committing the outputs of cells. It just adds a lot more clutter to the git history, and usually it can change between runs as well. So consider clearing the outputs and only commit the code.