feat: contributor start dates for data accuracy#89
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link.from_github() and update_from_github() now accept an optional `since` parameter. When provided, commits are filtered server-side via the GitHub API and counted manually instead of using the unfiltered contributor.contributions count. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Populates start_date for 62 contributors. Also adds firzaariany, renames aliziel to "Alison Ziel" and pantierra to "Felix Delattre". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
✨ Preview deployed to S3! Visit http://ds-preview-contributor-network-89.s3-website-us-west-2.amazonaws.com/ |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@gadomski I know you mentioned separating code changes from data updates, but I needed the data updates to verify the code changes. Is it okay to include on one PR here? cc @maxrjones to verify this is the change you were expecting with #85 |
LGTM! |
This is an interesting change that diverges a bit from how I originally thought about this visualization. When I made this, my vibe was one of "we're all connected", which is why the original version of this included "friends of DevSeed" (non-DevSeed folks who work on similar project portfolios). To me, it's more interesting how we all have projects that we've worked on over a long time, even before coming here. By clipping contributions by start-of-employment date, it feels like we're implicitly saying that all contributions after that date were funded/supported/etc by DevSeed, which may not be the case. Speaking personally, I do a lot of open source maintenance that I don't bill as DevSeed time. I don't object to the change per-se, but I did maybe want to pause and have a chat about "what are we doing here"? |
Similarly, I don't object to this not changing. I was just confused by the messaging. E.g., the data didn't seem to fully align with the repo description of "An experimental visualization of contributions at Development Seed to repositories both within and outside of our organization". I hold no strong opinions about whether it's best to change the data, the messaging, or neither if my interpretation/confusion was unusual. |
|
@aboydnw I think I'll throw this to you, since you've got the vision of how we'd like to use this "product" in our comms/marketing/etc. I'm totally ok with clipping contributions on folks' start date, but I do worry that it doesn't really capture the full impact that newer employees might have to community (non-devseed) open source repos. |
|
These are thought-provoking questions, thank you both! I think there will be more of a need for some date-based filtering when we get to the next phase. And the way I implemented this doesn't really take that into account. So, I'd suggest we close this for now, go live with what we have, then think about this date-based filtering more holistically when we get to that point. Give me a shout if you disagree and we can reopen this PR. Again, thank you! 🙏 |
Summary
start_datetoconfig.tomlso only post-employment commits count as DevSeed contributionsLink.from_github()usesget_commits(since=...)and manual counting when a start date is presentaliziel→ "Alison Ziel",pantierra→ "Felix Delattre"Before / After Comparison
Data re-fetched with
uv run contributor-network fetchafter populating start dates.Contributors with "—" had no link data in either state (no tracked repo contributions).
Small positive deltas are from normal commit count changes during the re-fetch, not filtering.
Largest filtering impacts (pre-employment commits removed):
Test plan
ContributorEntrymodel parses both string and inline-table formatsContributorEntryobjectsLink.from_github()withsincecounts only filtered commitsLink.from_github()withoutsincebehaves as beforeNone(link not created)update_from_github()returnsFalseon zero commits, triggering link deletionconfig.tomlwith mixed formats parses correctly🤖 Generated with Claude Code