Skip to content

Feat/furnace sql#113

Open
nicornk wants to merge 12 commits intomainfrom
feat/furnace-sql
Open

Feat/furnace sql#113
nicornk wants to merge 12 commits intomainfrom
feat/furnace-sql

Conversation

@nicornk
Copy link
Contributor

@nicornk nicornk commented Nov 27, 2025

Summary

Checklist

  • You agree with our CLA
  • Included tests (or is not applicable).
  • Updated documentation (or is not applicable).
  • Used pre-commit hooks to format and lint the code.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for a new “V2” Foundry SQL Server query flow (execute → poll status → stream results) and wires it into the context plus error handling, with integration tests covering the new client behavior.

Changes:

  • Introduce FoundrySqlServerClientV2 with Arrow-stream result retrieval.
  • Add FurnaceSql:SqlParseErrorFurnaceSqlSqlParseError mapping and a new SQL parse error class.
  • Add integration tests for the V2 client and expose it via FoundryContext.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/integration/clients/test_foundry_sql_server.py Adds V2 integration tests (smoke, return types, parse error, compression flag).
libs/foundry-dev-tools/src/foundry_dev_tools/errors/sql.py Adds a new Furnace SQL parse exception type.
libs/foundry-dev-tools/src/foundry_dev_tools/errors/handling.py Maps Furnace SQL parse errorName to the new exception.
libs/foundry-dev-tools/src/foundry_dev_tools/config/context.py Exposes foundry_sql_server_v2 as a cached context property.
libs/foundry-dev-tools/src/foundry_dev_tools/clients/foundry_sql_server.py Implements the V2 SQL client (execute/status/stream).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 399 to 403
# Poll for completion
while response_json.get("type") != "success":
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling loop has no timeout/escape condition and no way to pass a request timeout to api_status, so a query that never reaches type=success can hang indefinitely (default read timeout is None). Implement a timeout (like V1 does) and raise FoundrySqlQueryClientTimedOutError when exceeded.

Copilot uses AI. Check for mistakes.
Comment on lines 406 to 407
raise FoundrySqlQueryFailedError(response)

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On V2 failures you raise FoundrySqlQueryFailedError(response), but that exception extracts the message from the V1 response shape (status.failed.errorMessage). If the V2 failed payload differs, this will likely produce empty/incorrect error details. Consider adding a V2-specific error type (or updating FoundrySqlQueryFailedError to handle both shapes) so users get actionable messages.

Suggested change
raise FoundrySqlQueryFailedError(response)
# Adapt V2 failure payload to the V1 error shape expected by FoundrySqlQueryFailedError.
# Try to extract a meaningful message from the V2 response JSON.
message: str = (
response_json.get("message")
or response_json.get("error", {}).get("message")
or response_json.get("failed", {}).get("errorMessage")
or "SQL query failed"
)
class _V2ErrorResponseAdapter:
"""Minimal adapter to provide a V1-shaped JSON payload for error handling."""
def __init__(self, msg: str) -> None:
self.status_code = getattr(response, "status_code", 400)
self.text = msg
self._msg = msg
def json(self) -> dict[str, Any]:
# Shape compatible with V1: status.failed.errorMessage
return {
"status": {
"failed": {
"errorMessage": self._msg,
}
}
}
raise FoundrySqlQueryFailedError(_V2ErrorResponseAdapter(message))

Copilot uses AI. Check for mistakes.
Comment on lines 123 to 127
schema, rows = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
query=f"SELECT sepal_width, sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 3",
application_id=TEST_SINGLETON.iris_new.rid,
return_type="raw",
)
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, schema and rows are assigned but never used. Ruff will flag this as unused local variables (F841) even though the call is expected to raise. Consider calling the method without unpacking, or assigning to _ placeholders to avoid lint failures.

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 28
class FurnaceSqlSqlParseError(FoundryAPIError):
"""Exception is thrown when SQL Query is not valid."""

message = "Foundry SQL Query Parsing Failed."


Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception class name FurnaceSqlSqlParseError contains a duplicated "Sql" segment, which makes the public API a bit confusing and inconsistent with other error names in this module. Consider renaming it to something like FurnaceSqlParseError / FurnaceSqlQueryParseError and updating the error mapping + tests accordingly.

Suggested change
class FurnaceSqlSqlParseError(FoundryAPIError):
"""Exception is thrown when SQL Query is not valid."""
message = "Foundry SQL Query Parsing Failed."
class FurnaceSqlParseError(FoundryAPIError):
"""Exception is thrown when SQL Query is not valid."""
message = "Foundry SQL Query Parsing Failed."
# Backwards compatibility: keep the old name as an alias.
FurnaceSqlSqlParseError = FurnaceSqlParseError

Copilot uses AI. Check for mistakes.
Comment on lines 355 to 360
def query_foundry_sql(
self,
query: str,
return_type: SQLReturnType = "pandas",
disable_arrow_compression: bool = False,
application_id: str | None = None,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

query_foundry_sql overloads advertise a timeout parameter and require application_id, but the concrete implementation signature does not accept timeout and allows application_id=None. This mismatch breaks type-checking and also prevents callers from setting a client-side timeout for V2 queries.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 315 to 363
@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["pandas"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pd.core.frame.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
) -> pl.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["spark"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pyspark.sql.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["arrow"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pa.Table: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: SQLReturnType = ...,
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> tuple[dict, list[list]] | pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame: ...
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout parameter is declared in all overload signatures (lines 322, 332, 342, 352, 362) but is never accepted in the actual function implementation (line 365-371) nor is it used anywhere in the method body. Either implement timeout handling (similar to how V1 client does it in lines 149-150 of the same file) or remove the parameter from the overload signatures. This creates a misleading API contract where callers think they can pass a timeout, but it will be silently ignored.

Copilot uses AI. Check for mistakes.
Comment on lines 315 to 370
@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["pandas"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pd.core.frame.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
) -> pl.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["spark"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pyspark.sql.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: Literal["arrow"],
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> pa.Table: ...

@overload
def query_foundry_sql(
self,
query: str,
application_id: str,
return_type: SQLReturnType = ...,
disable_arrow_compression: bool = ...,
timeout: int = ...,
) -> tuple[dict, list[list]] | pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame: ...

def query_foundry_sql(
self,
query: str,
return_type: SQLReturnType = "pandas",
disable_arrow_compression: bool = False,
application_id: str | None = None,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overload signatures declare application_id as a required positional parameter (line 319, 339, 349, 359), but the actual implementation (line 370) declares it as an optional parameter with default None. This creates a type checking inconsistency. If application_id should be optional (as the implementation suggests), it should be marked as optional in the overload signatures with application_id: str | None = .... Alternatively, if it should be required, the implementation should not have a default value and should raise an error if not provided, rather than falling back to User-Agent.

Copilot uses AI. Check for mistakes.
Comment on lines 459 to 463
if response_json["type"] == "triggered" and "plan" in response_json["triggered"]:
plan = response_json["triggered"]["plan"]
LOGGER.debug("plan %s", plan)
return response_json[response_json["type"]]["query"]

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method directly accesses response_json[response_json["type"]]["query"] without checking if the keys exist. If the response structure is unexpected or if response_json["type"] doesn't match a key in the response, this will raise a KeyError. Consider adding defensive checks or using .get() with appropriate error handling to provide clearer error messages when the API response doesn't match the expected format.

Suggested change
if response_json["type"] == "triggered" and "plan" in response_json["triggered"]:
plan = response_json["triggered"]["plan"]
LOGGER.debug("plan %s", plan)
return response_json[response_json["type"]]["query"]
response_type = response_json.get("type")
if not isinstance(response_type, str):
msg = f"Unexpected execute response: missing or invalid 'type' field in {response_json!r}"
raise ValueError(msg)
# Log plan details for triggered queries, if present
if response_type == "triggered":
triggered_section = response_json.get("triggered") or {}
plan = triggered_section.get("plan")
if plan is not None:
LOGGER.debug("plan %s", plan)
section = response_json.get(response_type)
if not isinstance(section, dict):
msg = (
f"Unexpected execute response: missing or invalid '{response_type}' section "
f"in {response_json!r}"
)
raise ValueError(msg)
query_identifier = section.get("query")
if not isinstance(query_identifier, dict):
msg = (
f"Unexpected execute response: missing or invalid 'query' field in "
f"'{response_type}' section: {section!r}"
)
raise ValueError(msg)
return query_identifier

Copilot uses AI. Check for mistakes.
Comment on lines 429 to 431
# throws an ImportError when trying to access attributes of the module.
# This ImportError is caught below to fall back to query_foundry_sql_legacy
# which will again raise an ImportError when polars is not installed.
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment mentions falling back to query_foundry_sql_legacy when polars is not installed, but this fallback logic is not actually implemented in the V2 client. The V1 client has this fallback (lines 156-197), but the V2 implementation will just raise an ImportError directly if polars is not installed. Either remove this misleading comment or implement the fallback behavior if it's intended.

Suggested change
# throws an ImportError when trying to access attributes of the module.
# This ImportError is caught below to fall back to query_foundry_sql_legacy
# which will again raise an ImportError when polars is not installed.
# throws an ImportError when trying to access attributes of the module
# if polars is not installed. In that case, this ImportError will
# propagate to the caller and no automatic fallback is performed.

Copilot uses AI. Check for mistakes.
Comment on lines 329 to 332
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overload signature for the "polars" return type is inconsistent with other overloads and the actual implementation. This overload specifies branch and sql_dialect parameters, but the actual implementation (line 365-371) does not accept these parameters. Instead, it accepts application_id and disable_arrow_compression like the other overloads. The parameters should match: application_id: str, disable_arrow_compression: bool = ..., and timeout should be removed if not implemented (see other issue). The V2 API uses a different flow than V1 and does not use branch/sql_dialect parameters.

Suggested change
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: SqlDialect = ...,
timeout: int = ...,
application_id: str,
return_type: Literal["polars"],
disable_arrow_compression: bool = ...,

Copilot uses AI. Check for mistakes.

def test_v2_polars_return_type():
polars_df = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
f"SELECT sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 2",
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is missing the application_id parameter. While the implementation defaults to using the User-Agent header when application_id is None (line 400), all other V2 tests explicitly provide this parameter. For consistency and to properly test the API contract, this test should also pass application_id=TEST_SINGLETON.iris_new.rid like the other tests do.

Suggested change
f"SELECT sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 2",
query=f"SELECT sepal_length FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 2",
application_id=TEST_SINGLETON.iris_new.rid,

Copilot uses AI. Check for mistakes.
Comment on lines 410 to 417
while response_json.get("type") != "success":
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()

if response_json.get("type") == "failed":
raise FoundrySqlQueryFailedError(response)

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The polling loop lacks timeout protection and could run indefinitely if the query never completes or the server stops responding with "success" or "failed" status. Unlike the V1 client which implements timeout checking (see lines 149-150), this implementation has no safeguard against infinite loops. Consider adding timeout tracking similar to V1, or at minimum add a maximum iteration count to prevent runaway polling. The docstring claims to raise FoundrySqlQueryClientTimedOutError on timeout, but this is never actually raised.

Suggested change
while response_json.get("type") != "success":
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()
if response_json.get("type") == "failed":
raise FoundrySqlQueryFailedError(response)
start_time = time.monotonic()
while response_json.get("type") not in ("success", "failed"):
if timeout is not None and time.monotonic() - start_time > timeout:
raise FoundrySqlQueryClientTimedOutError(
f"Query timed out after {timeout} seconds."
)
time.sleep(0.2)
response = self.api_status(query_identifier)
response_json = response.json()
if response_json.get("type") == "failed":
raise FoundrySqlQueryFailedError(response)

Copilot uses AI. Check for mistakes.

Raises:
FoundrySqlQueryFailedError: If the query fails
FoundrySqlQueryClientTimedOutError: If the query times out
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring claims this method raises FoundrySqlQueryClientTimedOutError on timeout, but timeout handling is not implemented in the V2 client. This is inconsistent with the actual behavior. Either implement timeout handling or remove this from the Raises section of the docstring.

Suggested change
FoundrySqlQueryClientTimedOutError: If the query times out

Copilot uses AI. Check for mistakes.

raise ValueError(msg)

chunks = response_json["success"]["result"]["interactive"]["chunks"]
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method directly accesses deeply nested keys response_json["success"]["result"]["interactive"]["chunks"] without defensive checks. If any intermediate key is missing, this will raise a KeyError with an unclear error message. Consider adding proper error handling or using .get() with default values to provide more helpful error messages when the API response structure is unexpected.

Suggested change
chunks = response_json["success"]["result"]["interactive"]["chunks"]
try:
chunks = response_json["success"]["result"]["interactive"]["chunks"]
except (KeyError, TypeError) as exc:
msg = (
"Unexpected structure in success response: expected "
"'success.result.interactive.chunks' to be present."
)
raise ValueError(msg) from exc

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

def test_v2_return_type_raw_not_supported():
"""Test V2 client with raw return type."""
with pytest.raises(ValueError, match="The following return_type is not supported: .+"):
schema, rows = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inside this pytest.raises block, assigning to schema, rows is unnecessary and will be flagged by Ruff as an unused assignment. Just call query_foundry_sql(...) without binding the result.

Suggested change
schema, rows = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 12 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 453 to 455
# throws an ImportError when trying to access attributes of the module.
# This ImportError is caught below to fall back to query_foundry_sql_legacy
# which will again raise an ImportError when polars is not installed.
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "this ImportError is caught below to fall back to query_foundry_sql_legacy", but the V2 implementation doesn't have this fallback mechanism. Unlike the V1 client which has a try-except block that catches ImportError and falls back to query_foundry_sql_legacy, the V2 client will simply raise the ImportError if polars is not installed. Either remove this misleading comment or implement the fallback behavior to match V1.

Suggested change
# throws an ImportError when trying to access attributes of the module.
# This ImportError is caught below to fall back to query_foundry_sql_legacy
# which will again raise an ImportError when polars is not installed.
# throws an ImportError when trying to access attributes of the module
# if polars is not installed, causing this call to fail accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +126 to +192
def test_v2_experimental_use_trino(mocker, test_context_mock):
"""Test that experimental_use_trino parameter modifies the query correctly."""
import pandas as pd

mocker.patch("time.sleep") # we do not want to wait in tests

# Mock the arrow stream reader to return a simple pandas DataFrame
mock_arrow_reader = mocker.MagicMock()
mock_arrow_reader.read_pandas.return_value = pd.DataFrame({"col1": [1, 2, 3]})
mocker.patch.object(
test_context_mock.foundry_sql_server_v2,
"read_stream_results_arrow",
return_value=mock_arrow_reader,
)

# Mock the api_query endpoint (initial query execution)
query_matcher = mocker.MagicMock()
test_context_mock.mock_adapter.register_uri(
"POST",
build_api_url(TEST_HOST.url, "foundry-sql-server", "sql-endpoint/v1/queries/query"),
json={"type": "running", "running": {"queryHandle": {"queryId": "test-query-id-123", "type": "foundry"}}},
additional_matcher=query_matcher,
)

# Mock the api_status endpoint (poll for completion - returns ready immediately)
test_context_mock.mock_adapter.register_uri(
"POST",
build_api_url(TEST_HOST.url, "foundry-sql-server", "sql-endpoint/v1/queries/status"),
json={
"status": {
"type": "ready",
"ready": {"tickets": [{"tickets": ["eyJhbGc...mock-ticket-1", "eyJhbGc...mock-ticket-2"]}]},
}
},
)

# Test with experimental_use_trino=True
df = test_context_mock.foundry_sql_server_v2.query_foundry_sql(
"SELECT * FROM `ri.foundry.main.dataset.test-dataset`",
experimental_use_trino=True,
)

# Verify the query was modified to include the Trino backend hint
call_args = query_matcher.call_args_list[0]
request = call_args[0][0]
request_json = request.json()

assert "SELECT /*+ backend(trino) */ * FROM" in request_json["querySpec"]["query"]
assert df.shape[0] == 3

# Reset for second test
query_matcher.reset_mock()

# Test with experimental_use_trino=False (default)
df = test_context_mock.foundry_sql_server_v2.query_foundry_sql(
"SELECT * FROM `ri.foundry.main.dataset.test-dataset`",
experimental_use_trino=False,
)

# Verify the query was NOT modified
call_args = query_matcher.call_args_list[0]
request = call_args[0][0]
request_json = request.json()

assert request_json["querySpec"]["query"] == "SELECT * FROM `ri.foundry.main.dataset.test-dataset`"
assert "backend(trino)" not in request_json["querySpec"]["query"]
assert df.shape[0] == 3
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test for experimental_use_trino only tests the case where the query starts with uppercase "SELECT". Add test cases for lowercase "select", queries with leading whitespace, and queries that use CTEs (WITH ... SELECT) to ensure the string replacement works correctly in these common scenarios.

Copilot uses AI. Check for mistakes.
Comment on lines 417 to 424
if experimental_use_trino:
query = query.replace("SELECT ", "SELECT /*+ backend(trino) */ ", 1)

response_json = self.api_query(
query=query,
dialect=sql_dialect,
branch=branch,
arrow_compression_codec=arrow_compression_codec,
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arrow_compression_codec parameter is not validated with assert_in_literal like sql_dialect is. Add validation to ensure only valid codec values (NONE, LZ4, ZSTD) are passed to the API. This would provide better error messages if an invalid codec is specified.

Copilot uses AI. Check for mistakes.
def test_v2_disable_arrow_compression():
"""Test V2 client with arrow compression disabled."""
result = TEST_SINGLETON.ctx.foundry_sql_server_v2.query_foundry_sql(
query=f"SELECT * FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 5",
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name 'test_v2_disable_arrow_compression' suggests it's testing with compression disabled, but it doesn't pass arrow_compression_codec parameter. Either rename the test to something more descriptive like 'test_v2_default_compression' or explicitly pass arrow_compression_codec="NONE" to match the test name's intent.

Suggested change
query=f"SELECT * FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 5",
query=f"SELECT * FROM `{TEST_SINGLETON.iris_new.rid}` LIMIT 5",
arrow_compression_codec="NONE",

Copilot uses AI. Check for mistakes.
Comment on lines 496 to 502
Returns:
List of tickets for fetching results

"""
# we combine all tickets into one to get the full data
# if performance is a concern this should be done in parallel
return {
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ticket extraction uses a nested list comprehension that flattens tickets from multiple ticket groups. However, there's no error handling if the response structure is unexpected (e.g., missing "status", "ready", or "tickets" keys). Consider adding error handling to provide a more informative error message if the response structure is not as expected, similar to how _extract_query_handle uses KeyError-prone dictionary access.

Copilot uses AI. Check for mistakes.
Comment on lines +490 to +492
def _extract_ticket(self, response_json: dict[str, Any]) -> dict[str, Any]:
"""Extract tickets from success response.

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for _extract_ticket says "Returns: List of tickets for fetching results" but the method actually returns a dictionary with keys "id", "tickets", and "type", not a list. The return type annotation correctly shows dict[str, Any], but the docstring is misleading. Update the docstring to accurately describe the returned dictionary structure.

Copilot uses AI. Check for mistakes.
)
raise ValueError(msg)

def _extract_query_handle(self, response_json: dict[str, Any]) -> dict[str, Any]:
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Args section in the docstring for _extract_query_handle has an extra blank line after the response_json parameter description and before the Returns section. This is inconsistent with other docstrings in the codebase. Remove the extra blank line at line 477.

Copilot uses AI. Check for mistakes.
Comment on lines +220 to +227
def test_v2_ansi_dialect_not_supported(test_context_mock):
"""Test that V2 client rejects ANSI SQL dialect."""
with pytest.raises(TypeError, match="'ANSI' is not a valid option for dialect"):
test_context_mock.foundry_sql_server_v2.query_foundry_sql(
"SELECT * FROM `ri.foundry.main.dataset.test-dataset`",
sql_dialect="ANSI", # type: ignore[arg-type]
)

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a unit test for V2 query failure to verify that FoundrySqlQueryFailedError is raised correctly when api_status returns a failed status. This would complement the integration test and ensure the error handling logic works correctly with mocked responses.

Copilot uses AI. Check for mistakes.
Comment on lines 329 to 388
) -> pd.core.frame.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["polars"],
branch: Ref = ...,
sql_dialect: FurnaceSqlDialect = ...,
arrow_compression_codec: ArrowCompressionCodec = ...,
timeout: int = ...,
experimental_use_trino: bool = ...,
) -> pl.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["spark"],
branch: Ref = ...,
sql_dialect: FurnaceSqlDialect = ...,
arrow_compression_codec: ArrowCompressionCodec = ...,
timeout: int = ...,
experimental_use_trino: bool = ...,
) -> pyspark.sql.DataFrame: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["arrow"],
branch: Ref = ...,
sql_dialect: FurnaceSqlDialect = ...,
arrow_compression_codec: ArrowCompressionCodec = ...,
timeout: int = ...,
experimental_use_trino: bool = ...,
) -> pa.Table: ...

@overload
def query_foundry_sql(
self,
query: str,
return_type: Literal["pandas", "polars", "spark", "arrow"] = ...,
branch: Ref = ...,
sql_dialect: FurnaceSqlDialect = ...,
arrow_compression_codec: ArrowCompressionCodec = ...,
timeout: int = ...,
experimental_use_trino: bool = ...,
) -> pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame: ...

def query_foundry_sql(
self,
query: str,
return_type: Literal["pandas", "polars", "spark", "arrow"] = "pandas",
branch: Ref = "master",
sql_dialect: FurnaceSqlDialect = "SPARK",
arrow_compression_codec: ArrowCompressionCodec = "NONE",
timeout: int = 600,
experimental_use_trino: bool = False,
) -> pd.core.frame.DataFrame | pl.DataFrame | pa.Table | pyspark.sql.DataFrame:
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The V2 client uses pd.core.frame.DataFrame in type annotations while the V1 client uses pd.DataFrame. For consistency within the codebase and following Python conventions, use pd.DataFrame instead of the internal module path pd.core.frame.DataFrame. The shorter form is the public API and is what users would expect to see in type hints.

Copilot uses AI. Check for mistakes.
sql_dialect: The SQL dialect to use (only SPARK is supported for V2)
arrow_compression_codec: Arrow compression codec (NONE, LZ4, ZSTD)
timeout: Query timeout in seconds
experimental_use_trino: If True, modifies the query to use Trino backend by adding /*+ backend(trino) */ hint
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The experimental_use_trino parameter documentation should clarify its limitations: it only works for queries that start with uppercase "SELECT " (with space). It won't work for lowercase "select", queries with different whitespace, or queries starting with WITH clauses. Consider documenting these limitations in the parameter description to set proper expectations for users.

Suggested change
experimental_use_trino: If True, modifies the query to use Trino backend by adding /*+ backend(trino) */ hint
experimental_use_trino: If True, modifies the query to use the Trino backend by adding a
``/*+ backend(trino) */`` hint after the first ``SELECT ``. This is experimental and
only affects queries that contain an uppercase ``SELECT `` (with a trailing space);
it will not modify queries using lowercase ``select``, different whitespace, or
queries starting with ``WITH`` clauses.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants