Skip to content

Conversation

@yuanjieding-db
Copy link
Collaborator

What changes are proposed in this pull request?

WHAT

  • Extending retry function with a new parameter max_attempt to allow client to retry and fail after certain amount
  • Remove 500 from FilesExt retry status code
  • Add new config to set the retry attempts for FilesExt
  • Update the retry logic of FilesExt to fail after certain attempts.

WHY

  • 500 errors shouldn't be retried
  • The FilesExt should always prioritize fallback over retry to avoid regression

How is this tested?

Unit tests were updated to reflect the change.

@github-actions
Copy link

If integration tests don't run automatically, an authorized user can run them manually by following the instructions below:

Trigger:
go/deco-tests-run/sdk-py

Inputs:

  • PR number: 1211
  • Commit SHA: 87e5849e37a24942e5fd092edade3e5aa970907d

Checks will be approved automatically on success.

Comment on lines +236 to +239
# Maximum number of retry attempts for FilesExt cloud API operations.
# This works in conjunction with retry_timeout_seconds - whichever limit
# is hit first will stop the retry loop.
files_ext_cloud_api_max_retries: int = 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use this as a temporary parameter, since our end goal is not to fallback.

Suggested change
# Maximum number of retry attempts for FilesExt cloud API operations.
# This works in conjunction with retry_timeout_seconds - whichever limit
# is hit first will stop the retry loop.
files_ext_cloud_api_max_retries: int = 3
# Maximum number of retry attempts for FilesExt cloud API operations.
# This works in conjunction with retry_timeout_seconds - whichever limit
# is hit first will stop the retry loop.
experimental_files_ext_cloud_api_max_retries: int = 3


# Determine which limit was hit
if max_attempts is not None and attempt > max_attempts:
raise TimeoutError(f"Exceeded max retry attempts ({max_attempts})") from last_err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a better error to represent this error? TimeoutError feels a bit odd for this case, as the function is not actually timed out.

Comment on lines +334 to +337
def failing_function():
nonlocal call_count
call_count += 1
raise ValueError("test error")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we sleep on this failing function for a sec? I am a bit worried that this test will become flaky because it is possible to retry 100 times in 2 seconds.

Comment on lines +372 to +373
def test_max_attempts_none_preserves_backward_compatibility():
"""Test that max_attempts=None only uses timeout (backward compatibility)."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_max_attempts_none_preserves_backward_compatibility():
"""Test that max_attempts=None only uses timeout (backward compatibility)."""
def test_max_attempts_none():
"""Test that max_attempts=None only uses timeout."""

I think we don't need to mention that this test is for backward compatibility because, almost always, a unit test is for regression catching.

assert call_count == attempts


def test_max_attempts_respected():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make these a table test to simplify the tests?

with pytest.raises(TimeoutError) as exc_info:
failing_function()

# Should have attempted 3 times (initial + 2 retries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Should have attempted 3 times (initial + 2 retries)
# Should have attempted 3 times (initial + 2 retries).

Period after a sentence, ditto all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants