Skip to content

fix: implement set_timezone_to_utc() for BigQuery and ClickHouse#52

Merged
dtsong merged 2 commits intomasterfrom
fix/29-timezone-bigquery-clickhouse
Mar 30, 2026
Merged

fix: implement set_timezone_to_utc() for BigQuery and ClickHouse#52
dtsong merged 2 commits intomasterfrom
fix/29-timezone-bigquery-clickhouse

Conversation

@dtsong
Copy link
Copy Markdown
Owner

@dtsong dtsong commented Mar 30, 2026

Summary

  • Implement set_timezone_to_utc() for BigQuery (SET @@time_zone = 'UTC') and ClickHouse (SET session_timezone = 'UTC')
  • Elevate timezone warning in _connect.py from DEBUG to WARNING level
  • Previously, timezone failures were silently swallowed, risking incorrect cross-database timestamp comparisons

Key Files

  • data_diff/databases/bigquery.py — implement set_timezone_to_utc()
  • data_diff/databases/clickhouse.py — implement set_timezone_to_utc()
  • data_diff/databases/_connect.py — elevate log level from DEBUG to WARNING

Test plan

  • Unit tests pass (uv run pytest tests/test_query.py tests/test_utils.py -x — 48 passed)
  • Ruff lint and format pass
  • BigQuery/ClickHouse connections now set UTC timezone on session creation
  • Warning logged when a database cannot set timezone

Closes #29

🤖 Generated with Claude Code

- BigQuery: use SET @@time_zone = 'UTC' session variable
- ClickHouse: use SET session_timezone = 'UTC'
- Elevate _connect.py timezone warning from DEBUG to WARNING level

Previously, when these databases couldn't set timezone to UTC, it was
silently logged at DEBUG level. For a data comparison tool, timezone
mismatches can produce incorrect diff results — users need to know.

Closes #29

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d3397f58ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


def set_timezone_to_utc(self) -> str:
raise NotImplementedError()
return "SET @@time_zone = 'UTC'"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Execute BigQuery timezone SET in a persistent session

Returning SET @@time_zone = 'UTC' here makes Connect_SetUTC treat BigQuery as fully supported, but BigQuery._query_atom() runs each statement as an independent self._client.query(...).result() call with no session creation or session reuse, so this SET does not persist to subsequent queries. In practice, timezone-dependent expressions can still run with BigQuery defaults while the previous warning is suppressed, leading to silent timestamp mismatches during cross-database diffs.

Useful? React with 👍 / 👎.

BigQuery uses stateless REST jobs — SET @@time_zone only affects the
single job it runs in, not subsequent queries. The elevated WARNING
in _connect.py will inform users. BigQuery stores timestamps in UTC
internally, so this is safe.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dtsong added a commit that referenced this pull request Mar 30, 2026
ClickHouse set_timezone_to_utc() is now implemented (PR #52), so
the "No session timezone" limitation no longer applies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dtsong dtsong merged commit 6c34e64 into master Mar 30, 2026
6 checks passed
@dtsong dtsong deleted the fix/29-timezone-bigquery-clickhouse branch March 30, 2026 09:20
dtsong added a commit that referenced this pull request Mar 30, 2026
* docs: add tiered database support matrix to README

Replace the flat "Supported" list with a tiered matrix showing
Production, Stable, Limited, and Experimental tiers with known
limitations per database. Users can now self-select appropriate
databases before investing setup time.

Closes #32

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct MsSQL and Presto limitation descriptions in support matrix

- MsSQL: change "OFFSET requires ORDER BY" to "no OFFSET support"
  (the driver raises NotImplementedError for non-zero offset)
- Presto: add "No primary key detection; fixed precision defaults"
  (lacks SUPPORTS_PRIMARY_KEY; hardcodes datetime/numeric precision to 3)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove stale ClickHouse timezone limitation from support matrix

ClickHouse set_timezone_to_utc() is now implemented (PR #52), so
the "No session timezone" limitation no longer applies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: elevate timezone warning and implement set_timezone_to_utc() for BigQuery and ClickHouse

1 participant