GitHub Graph Explorer

A tool for collecting and analyzing GitHub collaboration data using Social Network Analysis. This project helps visualize and understand team dynamics through GitHub activity.

For background on the theory and application of Social Network Analysis to engineering teams, see Theory.md.

Usage

Dependencies

uv - for packaging and running the project
GitHub Personal Access Token - with Discussions, Issues, Pull requests scopes for accessing data.
docker - for running the Neo4j database (optional, but recommended for Neo4j output)

Installation

As a Python Library

Install the package from your repository or local directory:

pip install gh-graph-explorer
# or for editable install from local clone:
pip install -e .
# or with uv:
uv pip install -e .

Then import and use the API in your Python code:

from gh_graph_explorer import collect, analyze, get_edges, bipartite_collapse, __version__

# Example: collect data programmatically
import asyncio

orgs = [{"username": "octocat", "org": "github"}]
result = asyncio.run(collect(
    orgs,
    since_iso="2025-01-01",
    until_iso="2025-01-15",
    output="csv",
    output_file="data.csv"
))
print(result)

# Example: analyze data
analysis = analyze(source="csv", file="data.csv")
print(analysis)

# Example: retrieve edges
edges = get_edges(source="csv", file="data.csv")
for edge in edges:
    print(edge)

Using the CLI

The package includes a command-line interface. After installation, the gh-graph-explorer command will be available:

gh-graph-explorer --help

Set up your GitHub Personal Access Token
- Create a .env file in the root directory of the project.
- Add your GitHub token to the .env file:
```
GITHUB_TOKEN=your_github_token_here
```

For Docker Setup

In case you don't have uv installed or prefer to run the project in a container (note: currently doesn't work with neo4j or jupyter):

docker build -f Dockerfile.local -t gh-graph-explorer-local .
chmod +x ./run-local.sh
Run all the following commands with ./run-local.sh, for example: ./run-local.sh collect ...

Collecting Data

Use the CLI or library to collect GitHub data:

# Print output (default: last 1 day)
gh-graph-explorer collect --orgs data/orgs.json --output print

# CSV output
gh-graph-explorer collect --orgs data/orgs.json --output csv --output-file github_data.csv

# Neo4j output
gh-graph-explorer collect --orgs data/orgs.json --output neo4j --neo4j-uri bolt://localhost:7687

# Using specific date ranges (recommended for precise control)
gh-graph-explorer collect --orgs data/orgs.json --since-iso 2025-05-01 --until-iso 2025-05-20 --output csv --output-file github_data.csv

# Using full ISO datetime format
gh-graph-explorer collect --orgs data/orgs.json --since-iso 2025-05-01T00:00:00 --until-iso 2025-05-20T23:59:59 --output neo4j

Note: The CLI uses --since-iso and --until-iso to specify the date range for data collection. Both parameters accept either YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS formats. If not provided, the default is the last day.

Analyzing Data

# Analyze from CSV file
gh-graph-explorer analyze --source csv --file github_data.csv

# Analyze from Neo4j
gh-graph-explorer analyze --source neo4j --neo4j-uri bolt://localhost:7687

The analyzer will use the appropriate loader (CSVLoader or Neo4jLoader) to load the data, create a networkx MultiGraph, and then use the GraphAnalyzer's analyze method to display information about the graph, such as the number of nodes and edges.

If you want to customize the Neo4j query for analysis, you can also use the --neo4j-query parameter:

gh-graph-explorer analyze --source neo4j --neo4j-query "MATCH (source)-[rel]->(target) WHERE rel.created_at > \"2025-04-01\" RETURN source.name AS source, target.url AS target, type(rel) AS type, properties(rel) AS properties" --neo4j-uri bolt://localhost:7687

Retrieving Edges

# Get edges from CSV
gh-graph-explorer get-edges --source csv --file github_data.csv --output print

# Save edges to neo4j
gh-graph-explorer get-edges --source csv --file github_data.csv --output neo4j --neo4j-uri bolt://localhost:7687

Transforming Data

Bipartite Graph Collapse

# Collapse bipartite graph from CSV
gh-graph-explorer transform bipartite_collapse --source csv --file github_data.csv --output-file collapsed.csv

Or use the library API:

from gh_graph_explorer import bipartite_collapse

bipartite_collapse(source="csv", file="github_data.csv", output_file="collapsed.csv")

Analyzing Data with Jupyter Lab + CSVs

uv run jupyter lab

There is a template notebook available at notebooks/graph-explorer-template.ipynb that you can use to explore the data interactively.

Using the GitHub Action

You can use this tool as a GitHub Action in your own repositories. This will automatically collect GitHub repository data and commit the results to your repository.

Setting up the Action

Create a .github/workflows/collect-github-data.yml file in your repository with the following content:

name: Collect GitHub Graph Data

on:
  # Run daily at midnight
  schedule:
    - cron: '0 0 * * *'
  
  # Allow manual trigger
  workflow_dispatch:

jobs:
  collect-data:
    runs-on: ubuntu-latest
    
    steps:
      - name: Checkout repository
        uses: actions/checkout@v4
      
      - name: Run GitHub Graph Data Collector
        uses: yourusername/gh-graph-explore@v1
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          orgs_file: 'orgs.json'
          output_file: 'github_data.csv'
          commit_message: 'Update GitHub organization data [Skip CI]'

Creating an orgs.json file

Create an orgs.json file in the root of your repository with the following structure:

[
  {
    "username": "dependabot",
    "org": "octocat"
  },
  {
    "username": "user1",
    "org": "organization1"
  },
  {
    "username": "user2",
    "org": "organization2"
  }
]

Action Inputs

The GitHub Action accepts the following inputs:

github_token: GitHub token with read access to orgs (required)
orgs_file: Path to the JSON file containing organization information (default: orgs.json)
since_iso: Start date in ISO format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) (optional)
until_iso: End date in ISO format (YYYY-MM-DD or YYYY-MM-DDTHH:MM:SS) (optional)
output_file: Output file path for CSV (default: github_data.csv)
commit_message: Commit message for the CSV file update (default: Update GitHub repository data)

Note: For CLI usage, use --since-iso and --until-iso as shown above. The days parameter is only relevant for the GitHub Action workflow.

Setup with Claude Desktop

{
    "mcpServers": {
        "gh-graph-explorer": {
            "command": "uv",
            "args": [
                "--directory",
                "path to directory",
                "run",
                "mcp_server.py"
            ],
            "env": {
                "GITHUB_TOKEN": "*****",
                "NEO4J_PASSWORD": "password",
                "NEO4J_USERNAME": "neo4j",
                "NEO4J_URI": "bolt://localhost:7687"
            }
        }
      }
}

Useful queries

Filter by date

MATCH (source)-[rel:PR_REVIEW_APPROVED]->(target)  WHERE rel.created_at > "2025-04-15" RETURN * limit 100

Create a user Projection


MATCH (u1:User)-[] -> (g:GitHubObject) <- []-(u2:User)
WHERE u1.name < u2.name
WITH u1, u2, collect(g.name) as gitobjects, count(g) as weight
MERGE (u1)-[r:CONNECTED]->(u2)
SET r.name = gitobjects, r.weight = weight

View the user projection

MATCH p = (u1:User)-[]-(u2:User) Return p limit 200

Bipartite Graph Collapse Transformation

The BipartiteCollapser class in src/transformations/bipartite_collapser.py provides functionality to transform bipartite graphs by collapsing one set of nodes and creating direct connections between the other set.

Usage

Initialize with a Loader strategy
The collapser requires a loader implementing the Loader abstract class (see src/load_strategies/base.py).

from src.transformations.bipartite_collapser import BipartiteCollapser
from src.load_strategies.csv_loader import CSVLoader

loader = CSVLoader('path/to/input.csv')
collapser = BipartiteCollapser(loader)

Run the transformation pipeline
Use the run method to load, transform, and save the collapsed graph edges to a CSV file.
```
collapser.run('path/to/output.csv')
```
This will:
- Load the bipartite graph using the loader
- Collapse resource nodes, connecting users directly
- Save the resulting edges to the specified CSV file

Command-Line Example

You can also use the transformation in a script:

from src.transformations.bipartite_collapser import BipartiteCollapser
from src.load_strategies.csv_loader import CSVLoader

if __name__ == "__main__":
    loader = CSVLoader("github_bipartite_collapse.csv")
    collapser = BipartiteCollapser(loader)
    collapser.run("collapsed_edges.csv")

Output Format

The output CSV will contain the following columns:

source: Source node (user)
target: Target node (user)
type: Edge type
title: Resource title
created_at: Creation date
url: Resource URL

Notes

Ensure your loader is compatible with the expected graph format.
The transformation assumes resource nodes are identified by URLs (starting with https://).

For more details, see the source code in src/transformations/bipartite_collapser.py.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.vscode		.vscode
data		data
examples		examples
images		images
notebooks		notebooks
old		old
src/gh_graph_explorer		src/gh_graph_explorer
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Dockerfile.local		Dockerfile.local
LICENSE.md		LICENSE.md
README.md		README.md
Theory.md		Theory.md
__init__.py		__init__.py
action.yml		action.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
mcp_server.py		mcp_server.py
pyproject.toml		pyproject.toml
run-local.sh		run-local.sh
uv.lock		uv.lock

Section	Description
Dependencies	Required tools and libraries
Installation	Setup instructions
Collecting Data	Data collection commands
Analyzing Data	Data analysis commands
Using the GitHub Action	Automating data collection
Setup with Claude Desktop	Example configuration
Useful Queries	Example Neo4j queries

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GitHub Graph Explorer

Table of Contents

Usage

Dependencies

Installation

As a Python Library

Using the CLI

For Docker Setup

Collecting Data

Analyzing Data

Retrieving Edges

Transforming Data

Bipartite Graph Collapse

Analyzing Data with Jupyter Lab + CSVs

Using the GitHub Action

Setting up the Action

Creating an orgs.json file

Action Inputs

Setup with Claude Desktop

Useful queries

Bipartite Graph Collapse Transformation

Usage

Command-Line Example

Output Format

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

geramirez/gh-graph-explorer

Folders and files

Latest commit

History

Repository files navigation

GitHub Graph Explorer

Table of Contents

Usage

Dependencies

Installation

As a Python Library

Using the CLI

For Docker Setup

Collecting Data

Analyzing Data

Retrieving Edges

Transforming Data

Bipartite Graph Collapse

Analyzing Data with Jupyter Lab + CSVs

Using the GitHub Action

Setting up the Action

Creating an orgs.json file

Action Inputs

Setup with Claude Desktop

Useful queries

Bipartite Graph Collapse Transformation

Usage

Command-Line Example

Output Format

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages