You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 24, 2025. It is now read-only.
The GitHub Extractor package is a Python library designed to facilitate the extraction of data from GitHub.
This package provides functions to fetch information about repositories, including languages used, releases, contributors, topics, workflows,
and more with robust error handling and configuration support.
Features
List organizations for a user from GitHub.
List repositories for a user from GitHub.
List repositories for a specified organization from GitHub.
Support for authentication using GitHub API tokens.
Filtering of organizations and repositories based on given patterns.
Pagination handling for API requests.
Installation
You can install GitHub Extractor via pip:
pip install wolfsoftware.github-extractor
Usage
Getting Token information
You an get basic information relating to the given token.
The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs
No
Should we return the results as slugs. (List of names and nothing else).
Listing Organizations
You can list organizations that you are a member of using British or American English spelling.
fromwolfsoftware.github_extractorimportlist_organisations, list_organizationsconfig= {
"token": "your_github_token",
"ignore_orgs": ["Test*"]
}
# Using British English spellingorganisations=list_organisations(config)
# Using American English spellingorganisations_us=list_organizations(config)
Parameters
Name
Required
Purpose
token
Yes
Authentication for the GitHub API.
timeout
No
The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs
No
Should we return the results as slugs. (List of names and nothing else).
Filtering Parameters
Name
Required
Purpose
include_orgs
No
A list of organisation names to include in the results.
ignore_orgs
No
A list of organisation names to exclude from the results.
get_members
No
Should we include organisation members in the results.
Listing User Repositories
You can list repositories for a user with optional filters:
The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs
No
Should we return the results as slugs. (List of names and nothing else).
username
No
The GitHub username to list repositories for. (Authenticated user will be used is this is not supplied).
Additional Data Parameter
Name
Required
Purpose
get_branches
No
Add details about all branches to each repository.
get_contributors
No
Add details about all contributors to each repository.
get_languages
No
Add the list of identified languages for each repository.
get_releases
No
Add details about all releases to each repository.
get_tags
No
Add details about all tags to each repository.
get_topics
No
Add the list of defined topics to each repository.
get_workflows
No
Add details about all workflows to each repository.
Filtering Parameter
Name
Required
Purpose
include_names
No
A list of repository names to include in the results.
ignore_names
No
A list of repository names to exclude from the results.
include_repos
No
A list of organisation names/repository names to include in the results.
ignore_repos
No
A list of organisation names/repository names to exclude from the results.
skip_private
No
Do not include private repositories, this is for the authenticated user only.
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
Listing Repositories by Organization
You can list repositories for a specific organization with optional filters:
The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs
No
Should we return the results as slugs. (List of names and nothing else).
org_name
No
The GitHub organisation to list repositories for.
Additional Data Parameter
Name
Required
Purpose
get_branches
No
Add details about all branches to each repository.
get_contributors
No
Add details about all contributors to each repository.
get_languages
No
Add the list of identified languages for each repository.
get_releases
No
Add details about all releases to each repository.
get_tags
No
Add details about all tags to each repository.
get_topics
No
Add the list of defined topics to each repository.
get_workflows
No
Add details about all workflows to each repository.
Filtering Parameter
Name
Required
Purpose
include_names
No
A list of repository names to include in the results.
ignore_names
No
A list of repository names to exclude from the results.
include_repos
No
A list of organisation names/repository names to include in the results.
ignore_repos
No
A list of organisation names/repository names to exclude from the results.
skip_private
No
Do not include private repositories, this is for the authenticated user only.
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
Listing all Organisation Repositories
You can list all repositories for all organisations you're a member of.
The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs
No
Should we return the results as slugs. (List of names and nothing else).
Additional Data Parameter
Name
Required
Purpose
get_branches
No
Add details about all branches to each repository.
get_contributors
No
Add details about all contributors to each repository.
get_languages
No
Add the list of identified languages for each repository.
get_releases
No
Add details about all releases to each repository.
get_tags
No
Add details about all tags to each repository.
get_topics
No
Add the list of defined topics to each repository.
get_workflows
No
Add details about all workflows to each repository.
Filtering Parameter
Name
Required
Purpose
include_names
No
A list of repository names to include in the results.
ignore_names
No
A list of repository names to exclude from the results.
include_repos
No
A list of organisation names/repository names to include in the results.
ignore_repos
No
A list of organisation names/repository names to exclude from the results.
skip_private
No
Do not include private repositories, this is for the authenticated user only.
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
Listing all Visible Repositories
You can list repositories that you are able to access.
The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs
No
Should we return the results as slugs. (List of names and nothing else).
Additional Data Parameter
Name
Required
Purpose
get_branches
No
Add details about all branches to each repository.
get_contributors
No
Add details about all contributors to each repository.
get_languages
No
Add the list of identified languages for each repository.
get_releases
No
Add details about all releases to each repository.
get_tags
No
Add details about all tags to each repository.
get_topics
No
Add the list of defined topics to each repository.
get_workflows
No
Add details about all workflows to each repository.
Filtering Parameter
Name
Required
Purpose
include_names
No
A list of repository names to include in the results.
ignore_names
No
A list of repository names to exclude from the results.
include_repos
No
A list of organisation names/repository names to include in the results.
ignore_repos
No
A list of organisation names/repository names to exclude from the results.
skip_private
No
Do not include private repositories, this is for the authenticated user only.
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
Exceptions
The following custom exceptions are used:
Name
Purpose
AuthenticationError
Raised when authentication fails. This is caused by an invalid token.
MissingOrgNameError
Raised when the organization name is missing.
MissingTokenError
Raised when the GitHub API token is missing but is required.
NotFoundError
Raised when a requested resource is not found. This is caused by incorrect scope of the token.
RateLimitExceededError
Raised when the GitHub API rate limit is exceeded.