Enhance _info method to check file and directory info in parallel by yuxin00j · Pull Request #786 · fsspec/gcsfs

yuxin00j · 2026-03-25T10:03:57Z

Optimize the performance of the _info method by enabling concurrent checks for file paths and directory listings.

Early Return Strategy: If _get_object completes first and resolves to a valid file (not a directory marker), the execution cancels the directory scan tasks and returns the file metadata immediately.
Fallback Logic: If _get_object fails or yields a directory marker, it safely falls back to the directory tree scan result.

Benchmark run result

Folder Info

Execution times consistently dropped by 30% to 60% across all single-threaded and multi-process configurations.

File Info

Results are mixed but generally neutral, showing minor speedups of up to 24.6% in high process count runs. One outlier showed a minor regression in deep regional tests.

Bucket Info

This optimization does not affect info call for bucket.

codecov · 2026-03-25T10:14:35Z

Codecov Report

❌ Patch coverage is 90.32258% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.40%. Comparing base (6c5f744) to head (b4d835e).

Files with missing lines	Patch %	Lines
gcsfs/core.py	85.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #786      +/-   ##
==========================================
+ Coverage   75.96%   76.40%   +0.43%     
==========================================
  Files          14       15       +1     
  Lines        2663     2687      +24     
==========================================
+ Hits         2023     2053      +30     
+ Misses        640      634       -6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

gcsfs/core.py

yuxin00j · 2026-03-31T06:46:10Z

Hi @ankitaluthra1, you may check the update on optimization in _info here and in #780

ankitaluthra1 · 2026-03-31T17:07:13Z

/gcbrun

ankitaluthra1 · 2026-03-31T17:55:46Z

@yuxin00j Can you please check the e2e failure

… calls.

…and format with black

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

…view feedback

…first scenario

…wait and simplify parallel task evaluation in _info

yuxin00j · 2026-04-02T05:46:18Z

Hi @ankitaluthra1, I have fixed the test failure.

Mahalaxmibejugam · 2026-04-02T20:05:26Z

gcsfs/core.py

+                self._get_directory_info(path, bucket, key, generation),
+            ]
+        ) as (tasks, done, pending):
+            exact_task, dir_task = tasks


nit: Can we rename exact_task variable name to get_object_task or some other meaningful name?

Mahalaxmibejugam · 2026-04-02T20:07:09Z

QQ: Was the 30% to 60% improvement also observed for HNS buckets where we are parallelizing get_object and get_folder calls?

Mahalaxmibejugam · 2026-04-02T20:18:16Z

gcsfs/tests/test_core.py

-    placeholder = f"{base_dir}/folder_with_placeholder/"
+        res = await gcs._info(path)
+        # Should prefer directory info over marker
+        assert res["extra"] == "info"


Let's also assert the type of the result here similar to other cases

Mahalaxmibejugam · 2026-04-02T20:25:40Z

gcsfs/tests/test_core.py

-
-    assert found_names == expected_basenames
+@pytest.mark.asyncio
+async def test_info_parallel(gcs):


Consider refactoring these scenarios using @pytest.mark.parametrize. This will make the test cleaner by removing the need for manual mock resets (like mock_get_dir.side_effect = None) and ensures that a failure in one case doesn't prevent the remaining cases from running. Or check if we can breakdown the test into smaller ones.

Mahalaxmibejugam · 2026-04-02T20:28:57Z

gcsfs/concurrency.py

Let's add a test file to cover the code in this file.

Mahalaxmibejugam · 2026-04-02T20:45:22Z

File Info: Results are mixed but generally neutral, showing minor speedups of up to 24.6% in high process count runs. One outlier showed a minor regression in deep regional tests.

Is the speedup for file paths related to the changes in this PR? I am assuming it is variance and not related to this PR as the latency for file paths shouldn't be impacted by this change, let me know if I am missing something here.

yuxin00j marked this pull request as ready for review March 26, 2026 02:22

yuxin00j changed the title ~~Enhance _info method to check file and directory info in parallel.Optimize info~~ Enhance _info method to check file and directory info in parallel Mar 26, 2026

yuxin00j force-pushed the optimize-info branch from e3b924e to b113b77 Compare March 26, 2026 08:14

jasha26 reviewed Mar 27, 2026

View reviewed changes

gcsfs/core.py Outdated Show resolved Hide resolved

yuxin00j force-pushed the optimize-info branch from b113b77 to 9189002 Compare March 31, 2026 03:03

yuxin00j requested a review from jasha26 March 31, 2026 05:58

jasha26 approved these changes Mar 31, 2026

View reviewed changes

yuxin00j and others added 11 commits April 2, 2026 04:26

Enhance _info method to check file and directory info in parallel.

43b9ae7

perf: Add latency logging for _get_object and _get_directory_info…

59d8fa6

… calls.

Improve test coverage for parallel _info

2b405a3

Optimize _info with early return using asyncio.wait

4e32021

Refactor tasks dictionary to set in _info method per review feedback …

ee8d351

…and format with black

Update gcsfs/core.py

59bc2c2

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Implement try...finally block in _info for robust task cleanup per re…

2dd4551

…view feedback

Refactor _info to use parallel_tasks_first_completed and fix logic bugs

030965c

Move parallel_tasks_first_completed import to top level in core.py

25f509f

Add test_info_parallel_dir_first to cover directory listing finishes …

d3ef171

…first scenario

Fix info regression and directory marker check

f361127

yuxin00j force-pushed the optimize-info branch from b4d835e to f361127 Compare April 2, 2026 04:26

Address review comments: ensure Python 3.11 compatibility in asyncio.…

625ef11

…wait and simplify parallel task evaluation in _info

yuxin00j mentioned this pull request Apr 2, 2026

Parallelize bucket get and listing in _info for bucket paths #780

Open

Mahalaxmibejugam reviewed Apr 2, 2026

View reviewed changes

Mahalaxmibejugam suggested changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance _info method to check file and directory info in parallel#786

Enhance _info method to check file and directory info in parallel#786
yuxin00j wants to merge 12 commits intofsspec:mainfrom
ankitaluthra1:optimize-info

yuxin00j commented Mar 25, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

yuxin00j commented Mar 31, 2026

Uh oh!

ankitaluthra1 commented Mar 31, 2026

Uh oh!

ankitaluthra1 commented Mar 31, 2026

Uh oh!

yuxin00j commented Apr 2, 2026

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Uh oh!

Mahalaxmibejugam commented Apr 2, 2026

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Uh oh!

Mahalaxmibejugam commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yuxin00j commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark run result

Folder Info

File Info

Bucket Info

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

yuxin00j commented Mar 31, 2026

Uh oh!

ankitaluthra1 commented Mar 31, 2026

Uh oh!

ankitaluthra1 commented Mar 31, 2026

Uh oh!

yuxin00j commented Apr 2, 2026

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Mahalaxmibejugam commented Apr 2, 2026

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Mahalaxmibejugam Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Mahalaxmibejugam commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yuxin00j commented Mar 25, 2026 •

edited

Loading

codecov bot commented Mar 25, 2026 •

edited

Loading