-
Notifications
You must be signed in to change notification settings - Fork 9
Normalize version fields to strings in import scripts #39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Normalize version fields to strings in import scripts #39
Conversation
|
Thanks @arash77 this is a neat contribution. It's mixed together with formatting though, which albeit a great idea, it muddles what is the fix vs purely formatting. Is there a way you could split the two aspects? And if an adoption of PEP8 is desired in this repo, how about a GH Action that applies it automatically? |
|
I will exclude the formatting from this PR. I can create a separate PR to talk about how an automated formatting could be applied. |
17db00a to
74ba363
Compare
Add normalize_version_fields function to convert version fields (which can be int, float, or str) to string type for consistency. Integrate version normalization into all import scripts: - bioconda: normalize package.version - bioconductor: normalize Version - biotools: normalize version and nested version fields - galaxytool: normalize Suite_version, conda package version, and workflow versions
74ba363 to
c1bb215
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces a new common utility module for normalizing version fields from numeric types to strings across various metadata import scripts, addressing data integrity issues when processing tool and package metadata.
Changes:
- Added
common/metadata.pymodule withnormalize_version_to_stringandnormalize_version_fieldsfunctions - Updated four import scripts (galaxytool-import, biotools-import, bioconductor-import, bioconda-import) to use the new normalization functions
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| common/metadata.py | New utility module providing functions to normalize version fields (integers/floats) to strings with support for nested paths and list structures |
| galaxytool-import/galaxytool-import.py | Integrated version normalization for Suite_version, Latest_suite_conda_package_version, and Related_Workflows latest_version fields |
| biotools-import/import.py | Added version field normalization for both top-level version field and nested version fields within version arrays |
| bioconductor-import/import.py | Applied normalization to the Version field in package metadata |
| bioconda-import/bioconda_importer.py | Normalized package.version field in conda package data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try: | ||
| if "[" in field_path: | ||
| if "[]." not in field_path: | ||
| list_key = field_path.rstrip("[]") |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using rstrip("[]") removes all occurrences of '[' and ']' characters from the right side, not just the substring "[]". This could cause unexpected behavior if a field name contains these characters. Consider using string slicing instead: field_path[:-2] if field_path.endswith("[]") else field_path
| list_key = field_path.rstrip("[]") | |
| list_key = field_path[:-2] if field_path.endswith("[]") else field_path |
| except Exception: | ||
| continue |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The broad exception handler silently swallows all errors, making it difficult to debug issues with field path syntax or data structure problems. Consider logging the exception or at least handling specific expected exceptions separately to aid in debugging
| return value | ||
|
|
||
|
|
||
| def normalize_version_fields(data, field_paths): |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function lacks documentation explaining its purpose, parameters, return value, and the syntax for field_paths (e.g., "field", "field.nested", "field[]", "field[].nested"). Adding docstrings would improve maintainability and help users understand how to use the different path syntaxes
| def normalize_version_fields(data, field_paths): | |
| def normalize_version_fields(data, field_paths): | |
| """ | |
| Normalize version-like fields in a metadata dictionary to string form. | |
| This function walks the provided ``data`` mapping and applies | |
| :func:`normalize_version_to_string` to values located at the paths | |
| specified in ``field_paths``. The ``data`` dictionary is modified in | |
| place and also returned. | |
| Parameters | |
| ---------- | |
| data : dict | |
| A dictionary of metadata to normalize. Must be a mapping; a | |
| :class:`TypeError` is raised otherwise. | |
| field_paths : Iterable[str] | |
| An iterable of field path strings describing which values in | |
| ``data`` should be normalized. Supported path syntaxes are: | |
| * ``"field"`` – a top-level key in ``data``. | |
| * ``"field.nested"`` – a nested key reached by following | |
| dictionaries (e.g. ``data["field"]["nested"]``). | |
| * ``"field[]"`` – a top-level key whose value is a list; every | |
| element of that list is normalized. | |
| * ``"field[].nested"`` – a top-level key whose value is a list of | |
| dictionaries; for each dictionary item, the ``"nested"`` key is | |
| normalized (e.g. ``data["field"][i]["nested"]``). | |
| Returns | |
| ------- | |
| dict | |
| The same ``data`` dictionary, after in-place normalization. | |
| """ |
| @@ -0,0 +1,51 @@ | |||
| def normalize_version_to_string(value): | |||
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function lacks documentation explaining its purpose, parameters, and behavior. Consider adding a docstring that describes what values are normalized and which are preserved (e.g., None, bool, strings are returned as-is, while int/float are converted to strings)
| def normalize_version_to_string(value): | |
| def normalize_version_to_string(value): | |
| """ | |
| Normalize version-like values to strings while preserving certain types. | |
| This function is intended to prepare version fields for consistent handling | |
| by converting numeric values (int, float) to their string representation. | |
| Behavior: | |
| - None and bool values are returned as-is. | |
| - int and float values are converted to strings. | |
| - list values are processed recursively, applying the same rules to each | |
| element. | |
| - dict values are processed recursively, applying the same rules to each | |
| value. | |
| - All other values (e.g., existing strings, custom objects) are returned | |
| unchanged. | |
| :param value: The value to normalize (may be a scalar, list, or dict). | |
| :return: The normalized value with numeric types converted to strings and | |
| other types preserved according to the rules above. | |
| """ |
| drop_false = lambda path, key, value: bool(value) | ||
| tool_cleaned = remap(tool, visit=drop_false) | ||
| tool_cleaned = normalize_version_fields( | ||
| tool_cleaned, ["version", "version[].version"] |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using both "version" and "version[].version" may result in redundant processing. The normalize_version_to_string function recursively processes lists and dictionaries, so normalizing "version" would already handle nested version fields within list items. Consider whether both field paths are necessary or if "version" alone would suffice
| tool_cleaned, ["version", "version[].version"] | |
| tool_cleaned, ["version"] |
Introduce a normalization function to convert version fields to strings across various import scripts, ensuring consistent data formatting. This change enhances data integrity when processing tool and package metadata.
Closes research-software-ecosystem/content#1190