Validation_goldens modfication by niveditasing · Pull Request #2043 · datacommonsorg/data

niveditasing · 2026-05-28T05:51:46Z

Modified code to fix 2 bugs:
Problem 1: The "must-include" file (top_100k_places.csv) was being loaded into a set of IDs without namespaces (e.g., "Earth", "country/USA"). However, input data (WorldBank.csv) contains values with namespaces (e.g., "dcid:Earth", "dcid:country/USA").
Result: An exact match ("dcid:Earth" == "Earth") is False, so the script filtered out all 450,000+ nodes, resulting in 0 goldens.
Fix: By using strip_namespace(), the script compares "Earth" to "Earth", allowing the nodes to correctly pass the filter.

Problem 2: The default utility function (file_write_csv_dict) is designed for complex data structures. When it sees a single property, it doesn't always recognize it as a column header. Instead, it generates a "cluttered" format with a redundant value column containing stringified dictionaries (e.g., "{'Column name ': 'Column value'}").
Result: This format is difficult to read and doesn't match the standard "single column" CSV format used in other golden files.
Fix: Manual writing allows us to explicitly define the header (Output Column) and write only the clean values, ensuring the output is exactly one column as you expected.

gemini-code-assist

Code Review

This pull request updates validator_goldens.py to support namespace-stripped matching when generating goldens and replaces the helper function for writing CSVs with custom csv.DictWriter logic. A review comment points out that extracting CSV column headers from only the first node in golden_nodes assumes all nodes share the same keys, which can lead to errors or missing columns if keys vary. It suggests collecting the union of all keys across all nodes instead.

balit-raibot · 2026-05-28T06:35:26Z

@gemini-code-assist review

gemini-code-assist

Code Review

This pull request updates tools/import_validation/validator_goldens.py to import the csv module, strip namespaces when matching nodes against must_include_values, and replace the helper file_write_csv_dict with a custom csv.DictWriter implementation. The feedback suggests adding newline='' when opening the CSV file to prevent platform-specific carriage return issues and simplifying the logic for extracting and sorting unique keys.

modified few lines

5f363e3

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Comment thread tools/import_validation/validator_goldens.py

niveditasing added 2 commits May 28, 2026 06:01

modified few lines

399873e

fixed test

bd3f90d

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

Comment thread tools/import_validation/validator_goldens.py Outdated

niveditasing added 3 commits May 28, 2026 06:44

fixed test

4f14df9

fixed test

83b5647

fixed test

d88c10f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validation_goldens modfication#2043

Validation_goldens modfication#2043
niveditasing wants to merge 6 commits into
datacommonsorg:masterfrom
niveditasing:validation_golden_fix

niveditasing commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

balit-raibot commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

niveditasing commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

balit-raibot commented May 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants