Custom CSV handling, small improvements to types and enrichment example #16

DanDits · 2026-01-26T22:01:34Z

No description provided.

…only empty strings, by working on the input dataframe and not an empty dataframe

…tions and to allow working around issues with pandas csv parsing and writing

DanDits · 2026-01-26T22:24:27Z

Should something be mentioned in the CHANGELOG? If we merge this, the only user visible changes will be the slightly adjusted example, the "support" for python 3.11 and some type annotation improvements. The CSV changes are in that sense no new features or behavior changes, more fixes to achieve the expected behavior in various 'edge' cases.

buddemat · 2026-01-30T08:59:48Z

src/cadenzaanalytics/util/csv.py

+    lines.append(_format_row(columns_list, columns_list, None, None, None, None))
+
+    # Write data rows
+    for _, row in df.iterrows():


iterrows is very slow. It would probably be 5-10x faster to use itertuples or transform into a numpy array like so:

values = df.to_numpy(dtype=object, na_value=None) for row in values: lines.append(_format_row(list(row), ...))

buddemat · 2026-02-02T09:36:21Z

src/cadenzaanalytics/util/csv.py

+                # Quoted value - extract content (can contain newlines)
+                pos += 1
+                value = []
+                while pos < len(csv_data):


This looks at the whole payload character by character. For large data, this will be very slow.
We could user str.find() instead to find the next quote (should be implemented in C).

Something along the lines of

while pos < len(csv_data): next_quote = csv_data.find('"', pos) value_parts.append(csv_data[pos:next_quote]) pos = next_quote + 1

buddemat

I have 2 comments concerning performance, whcih I guess should be addressed. I have not looked at the tests.

From what I read in the channels, both @julianjanssen and @ArneBab see some issues with the "full custom csv import" approach. We might want to discuss this once more?

feat: improve type annotations for parameter values and mappings

b21062f

DanDits requested a review from buddemat January 26, 2026 22:01

DanDits self-assigned this Jan 26, 2026

DanDits force-pushed the slb/dd branch from b2de9e0 to ef4871d Compare January 26, 2026 22:14

dittmar added 3 commits January 26, 2026 23:18

fix example enrichment extension to actually enrich some values, not …

d2eff39

…only empty strings, by working on the input dataframe and not an empty dataframe

CADENZA-42792 feat: add custom csv parser handling all special defini…

1a08ee0

…tions and to allow working around issues with pandas csv parsing and writing

decrease required python version to 3.11

6e64ade

DanDits force-pushed the slb/dd branch from ef4871d to 6e64ade Compare January 26, 2026 22:19

add changelog entries

4e405cf

buddemat reviewed Jan 30, 2026

View reviewed changes

buddemat reviewed Feb 2, 2026

View reviewed changes

buddemat requested changes Feb 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom CSV handling, small improvements to types and enrichment example #16

Custom CSV handling, small improvements to types and enrichment example #16

Uh oh!

DanDits commented Jan 26, 2026

Uh oh!

DanDits commented Jan 26, 2026

Uh oh!

buddemat Jan 30, 2026

Uh oh!

buddemat Feb 2, 2026

Uh oh!

buddemat left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Custom CSV handling, small improvements to types and enrichment example #16

Are you sure you want to change the base?

Custom CSV handling, small improvements to types and enrichment example #16

Uh oh!

Conversation

DanDits commented Jan 26, 2026

Uh oh!

DanDits commented Jan 26, 2026

Uh oh!

buddemat Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

buddemat Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

buddemat left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants