chore(sinan DAGS): create DAG to fetch dengue data from SINAN by luabida · Pull Request #201 · thegraphnetwork/EpiGraphHub

luabida · 2023-10-09T14:21:07Z

No description provided.

fccoelho · 2023-10-11T13:23:19Z

containers/airflow/dags/brasil/sinan/dengue.py

+                except ProgrammingError as error:
+                    if str(error).startswith("(psycopg2.errors.UndefinedColumn)"):
+                        # Include new columns to table
+                        column_name = str(error).split('"')[1]


I think that obtaining the missing column name from the error message is not a good approach, because if psycopg2 changes the wording in their error messages it will break our code. I think we should instead look at the list of column names of the parquet files and compare them with the columns in the current schema. From the difference in these lists, which can be efficiently obtained as list(set(cols1)-set(cols2)), we can then create the alter table query adding the new columns to the database table. With this approach, we don't even need to rely on an exception being raised. This determination of the missing columns can be done before the first insert.

fccoelho · 2023-10-11T13:23:48Z

containers/airflow/dags/brasil/sinan/dengue.py

-                logging.debug(f"{file} inserted into db")
+            try:
+                insert_parquets(parquets.path, year)
+            except ProgrammingError as error:


Same comment as above

fccoelho

I am thinking if it would make sense to merge these three DAGs into a Single SINAN DAG, which would take the disease name as a parameter, much like we have in PySUS, as a single function to fetch all the "agravos"

chore(sinan DAGS): create DAG to fetch dengue data from SINAN

c0cee37

luabida force-pushed the create-sinan-dags branch from 89c1f65 to c0cee37 Compare October 9, 2023 14:22

luabida added 4 commits October 9, 2023 13:38

Include EGH_CONN var to image

e940b7a

fix sql statements

97fb367

finish SINAN_DENG DAG

73f2fec

Use parquets chunks, preventing the RAM to get fulfilled

08e498b

luabida marked this pull request as ready for review October 10, 2023 14:26

luabida requested a review from fccoelho October 10, 2023 14:26

Handle UndefinedColumn error & add column

caebfb6

fccoelho requested changes Oct 11, 2023

View reviewed changes

luabida added 4 commits October 11, 2023 10:24

Use recursion to handle to_sql

f743f5c

Add columns to table before inserting the dataframe

8ffb904

Parse all columns to TEXT before inserting to db

7d276b5

minor fixes

d7435c2

luabida requested a review from fccoelho October 11, 2023 19:03

Include SINAN_ZIKA DAG

afc99a1

fccoelho reviewed Oct 16, 2023

View reviewed changes

Include SINAN_CHIK DAG

c76c431

luabida force-pushed the create-sinan-dags branch from f6f0cd7 to c76c431 Compare October 17, 2023 19:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(sinan DAGS): create DAG to fetch dengue data from SINAN#201

chore(sinan DAGS): create DAG to fetch dengue data from SINAN#201
luabida wants to merge 12 commits intothegraphnetwork:mainfrom
luabida:create-sinan-dags

luabida commented Oct 9, 2023

Uh oh!

fccoelho Oct 11, 2023

Uh oh!

fccoelho Oct 11, 2023

Uh oh!

fccoelho left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

luabida commented Oct 9, 2023

Uh oh!

fccoelho Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

fccoelho Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

fccoelho left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants