Skip to content

[Confluence] Fix pagination for get_all_* methods and unify _get_paged across Cloud/Server#1616

Open
Zircoz wants to merge 7 commits intoatlassian-api:masterfrom
Zircoz:master
Open

[Confluence] Fix pagination for get_all_* methods and unify _get_paged across Cloud/Server#1616
Zircoz wants to merge 7 commits intoatlassian-api:masterfrom
Zircoz:master

Conversation

@Zircoz
Copy link

@Zircoz Zircoz commented Feb 15, 2026

Summary

Fixes #1598

get_all_pages_from_space and related get_all_* methods only returned the first page of results because they called self.get() directly instead of self._get_paged(). Additionally, pagination broke when the Confluence API returned _links.next as a plain URL string rather than a {"href": "..."} dict.


Changes

Pagination fix (core — #1598)

  • Switch get_all_* methods to use _get_paged on Server so they return fully-paginated generators instead of single-page dicts
  • Add get_all_pages_from_space and get_all_blog_posts_from_space to Cloud — these were missing from the Cloud implementation entirely
  • Unify _get_paged into ConfluenceBase — removed duplicate implementations from ConfluenceCloudBase and ConfluenceServerBase, replacing them with a single method that handles both str and dict formats for _links.next
  • Fix relative pagination URLs — when the API returns a relative path (e.g. /rest/api/content?start=25), the scheme and host are extracted from self.url via urlparse and correctly prepended

Routing improvements (Confluence wrapper)

  • Fix URL substring sanitization (CodeQL) — replaced naive "atlassian.net" in url checks with urlparse(url).hostname to prevent false matches on paths like evil.com/fake/atlassian.net/
  • Recognize api.atlassian.com — the Confluence wrapper now correctly routes OAuth2 API gateway URLs to ConfluenceCloud
  • Support explicit cloud= kwarg — allows callers to force Cloud or Server routing regardless of URL heuristics

Out of scope (intentionally not in this PR)

The Cloud api_root and api_version defaults (wiki/api/v2, "2") are unchanged. An earlier iteration of this branch changed these to wiki/rest/api / "latest" (v1 REST API), but that was reverted — it was unrelated to #1598 and would break existing Cloud users who have already set up v2 OAuth scopes. Users who need v1 REST API access can pass api_root="wiki/rest/api", api_version="latest" explicitly to the constructor.


Breaking change

The following methods now return generators instead of dicts:

get_all_pages_from_space, get_all_blog_posts_from_space, get_all_pages_by_label, get_all_blog_posts_by_label, get_all_draft_pages_from_space, get_all_draft_blog_posts_from_space, get_trash_content, get_all_pages_from_space_trash, get_all_blog_posts_from_space_trash

Before (broken — only first page):

result = confluence.get_all_pages_from_space("MYSPACE")
pages = result["results"]  # KeyError on Cloud; only 25 results on Server

After (correct — all pages):

for page in confluence.get_all_pages_from_space("MYSPACE"):
    process(page)

# or collect all at once
pages = list(confluence.get_all_pages_from_space("MYSPACE"))

🤖 Generated with Claude Code

@Zircoz Zircoz changed the title [Confluence] Fix get_all_pages_from_space pagination #1598 [Confluence] Fix pagination for get_all_* methods and unify _get_paged across Cloud/Server Feb 15, 2026
@Zircoz Zircoz force-pushed the master branch 2 times, most recently from 8c2f7f9 to ea2fa33 Compare February 15, 2026 18:36
@Zircoz Zircoz marked this pull request as ready for review February 15, 2026 18:37
…d across Cloud/Server

Fixes atlassian-api#1598
Fixes atlassian-api#1480

- Switch 10 get_all_* methods to use _get_paged for full pagination
- Unify _get_paged into ConfluenceBase (remove Cloud/Server duplicates)
- Handle _links.next as both string and dict formats
- Fix relative pagination URLs by prepending base URL correctly
- Fix Cloud api_root from wiki/api/v2 to wiki/rest/api (endpoints use v1 paths)
- Recognize api.atlassian.com in Cloud detection; support explicit cloud= kwarg
- Add routing tests and pagination edge-case tests for both Cloud and Server

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gonchik
Copy link
Member

gonchik commented Mar 3, 2026

So let's check how AI will work, fortunately it works to me

@gonchik
Copy link
Member

gonchik commented Mar 3, 2026

@Zircoz looks like a lot of extra code

@Zircoz
Copy link
Author

Zircoz commented Mar 3, 2026

Ill tell claude to review and rethink. :P

More seriously tho, I'll take a closer look myself with assistance from claude and come back to ya on this.

Zircoz and others added 6 commits March 4, 2026 10:06
Use urlparse to extract and check the hostname directly instead of
naive substring matching, preventing spoofing via paths like
evil.com/atlassian.net/...

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…itization

[Confluence] Fix pagination for get_all_* methods and CodeQL URL sanitization
- Revert Cloud api_version from "latest" back to "2" (original)
- Revert Cloud api_root from "wiki/rest/api" back to "wiki/api/v2" (original)
- Revert Cloud URL construction: remove api_root suffix appended to self.url
- Simplify _get_paged relative URL resolution: drop api_root-stripping
  branch (was only needed due to the Cloud URL change) and use
  urlparse(self.url).netloc directly
- Update test_init_defaults assertions to match reverted Cloud defaults

The Cloud api_version/api_root/URL changes were unrelated to atlassian-api#1598 and
constituted a breaking change for existing Cloud users. The complex
api_root stripping logic in _get_paged was a direct consequence of that
change and is no longer needed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion-11278499126321725972

Follow-up: CodeQL fix + PR scope simplification
@Zircoz
Copy link
Author

Zircoz commented Mar 7, 2026

Migration note for maintainers

Hi — a note for when this gets merged, in case it's useful for release notes or a changelog entry:

What changed

Several get_all_* methods that previously returned a single-page dict now return a generator that automatically paginates through all results. This is the fix for #1598 — the old behaviour silently truncated results at the first page.

Affected methods:

  • get_all_pages_from_space
  • get_all_blog_posts_from_space
  • get_all_pages_by_label
  • get_all_blog_posts_by_label
  • get_all_draft_pages_from_space
  • get_all_draft_blog_posts_from_space
  • get_trash_content
  • get_all_pages_from_space_trash
  • get_all_blog_posts_from_space_trash

Upgrading

# Before — broken (only first page returned, dict access required)
result = confluence.get_all_pages_from_space("MYSPACE")
pages = result["results"]

# After — iterate the generator directly
for page in confluence.get_all_pages_from_space("MYSPACE"):
    process(page)

# or collect everything at once
pages = list(confluence.get_all_pages_from_space("MYSPACE"))

Cloud users on v1 REST API

If you use api.atlassian.com with a v1 API token, pass these explicitly to the constructor — the Cloud defaults remain wiki/api/v2:

confluence = Confluence(
    url="https://api.atlassian.com/ex/confluence/<tenant-id>",
    username=email,
    password=api_token,
    cloud=True,
    api_root="wiki/rest/api",
    api_version="latest",
)

A couple of requests:

  • Could you add something along the above lines to the changelog / release notes when merging? Happy to submit a separate PR for that if you'd prefer.
  • Could you trigger the CI checks once more? There were some automated review comments from GitHub Advanced Security that I'd like to make sure are resolved.
  • Any feedback on the approach or anything you'd like changed before merging is very welcome.

Thanks!

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Confluence bug: get_all_pages_from_space only loads first page

2 participants