fix: implement set_timezone_to_utc() for BigQuery and ClickHouse#52
fix: implement set_timezone_to_utc() for BigQuery and ClickHouse#52
Conversation
- BigQuery: use SET @@time_zone = 'UTC' session variable - ClickHouse: use SET session_timezone = 'UTC' - Elevate _connect.py timezone warning from DEBUG to WARNING level Previously, when these databases couldn't set timezone to UTC, it was silently logged at DEBUG level. For a data comparison tool, timezone mismatches can produce incorrect diff results — users need to know. Closes #29 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3397f58ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
data_diff/databases/bigquery.py
Outdated
|
|
||
| def set_timezone_to_utc(self) -> str: | ||
| raise NotImplementedError() | ||
| return "SET @@time_zone = 'UTC'" |
There was a problem hiding this comment.
Execute BigQuery timezone SET in a persistent session
Returning SET @@time_zone = 'UTC' here makes Connect_SetUTC treat BigQuery as fully supported, but BigQuery._query_atom() runs each statement as an independent self._client.query(...).result() call with no session creation or session reuse, so this SET does not persist to subsequent queries. In practice, timezone-dependent expressions can still run with BigQuery defaults while the previous warning is suppressed, leading to silent timestamp mismatches during cross-database diffs.
Useful? React with 👍 / 👎.
BigQuery uses stateless REST jobs — SET @@time_zone only affects the single job it runs in, not subsequent queries. The elevated WARNING in _connect.py will inform users. BigQuery stores timestamps in UTC internally, so this is safe. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ClickHouse set_timezone_to_utc() is now implemented (PR #52), so the "No session timezone" limitation no longer applies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add tiered database support matrix to README Replace the flat "Supported" list with a tiered matrix showing Production, Stable, Limited, and Experimental tiers with known limitations per database. Users can now self-select appropriate databases before investing setup time. Closes #32 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct MsSQL and Presto limitation descriptions in support matrix - MsSQL: change "OFFSET requires ORDER BY" to "no OFFSET support" (the driver raises NotImplementedError for non-zero offset) - Presto: add "No primary key detection; fixed precision defaults" (lacks SUPPORTS_PRIMARY_KEY; hardcodes datetime/numeric precision to 3) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: remove stale ClickHouse timezone limitation from support matrix ClickHouse set_timezone_to_utc() is now implemented (PR #52), so the "No session timezone" limitation no longer applies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
set_timezone_to_utc()for BigQuery (SET @@time_zone = 'UTC') and ClickHouse (SET session_timezone = 'UTC')_connect.pyfrom DEBUG to WARNING levelKey Files
data_diff/databases/bigquery.py— implementset_timezone_to_utc()data_diff/databases/clickhouse.py— implementset_timezone_to_utc()data_diff/databases/_connect.py— elevate log level from DEBUG to WARNINGTest plan
uv run pytest tests/test_query.py tests/test_utils.py -x— 48 passed)Closes #29
🤖 Generated with Claude Code