Skip to content

feat: contributor start dates for data accuracy#89

Closed
aboydnw wants to merge 7 commits intomainfrom
feat/contributor-start-dates
Closed

feat: contributor start dates for data accuracy#89
aboydnw wants to merge 7 commits intomainfrom
feat/contributor-start-dates

Conversation

@aboydnw
Copy link
Copy Markdown
Member

@aboydnw aboydnw commented Mar 24, 2026

Summary

  • Adds per-contributor start_date to config.toml so only post-employment commits count as DevSeed contributions
  • Backwards compatible: contributors without a start date (plain string format) still work unchanged
  • Link.from_github() uses get_commits(since=...) and manual counting when a start date is present
  • Existing link JSON files are deleted if a contributor has zero commits after their start date
  • 62 contributors populated with start dates, 1 new contributor added (firzaariany)
  • Name updates: aliziel → "Alison Ziel", pantierra → "Felix Delattre"

Before / After Comparison

Data re-fetched with uv run contributor-network fetch after populating start dates.

Contributor Start Date Repos (before) Repos (after) Commits (before) Commits (after) Delta
Aimee Barciauskas 2018-01-08 13 13 505 505 0
Alex I. Mandel 2019-11-01 4 4 59 59 0
Alexandra Kirk 2021-04-19 7 5 699 649 -50
Alice Rühl 2020-04-01 4 4 359 359 0
Anthony Boyd 2021-10-18 9 9 70 70 0
Anthony Lukach 2019-04-18 14 14 870 878 8
Brianna Corremonte 2025-03-03 1 1 2 2 0
Chris Holden 2024-07-08 4 3 15 14 -1
Chuck Daniels 2019-07-08 7 7 51 51 0
Ciaran Sweet 2024-07-15 8 8 79 79 0
Daniel Wiesmann 2023-03-06 8 5 173 167 -6
Daniel da Silva 2015-01-01 4 4 1,355 1,355 0
Danny Bauman 2021-08-09
David Bitner 2020-01-01 9 9 300 300 0
Emma Paz 2023-03-01 4 4 31 31 0
Emmanuel Mathot 2024-08-01 16 14 709 591 -118
Fausto Pérez 2022-10-03
Firza Ariany 2025-11-10 0 1 0 20 +20
Gjore Milevski 2024-05-02 4 4 598 594 -4
Hanbyul Jo 2021-09-13 3 3 1,368 1,317 -51
Henry Rodman 2024-05-22 16 14 188 180 -8
Ian Schuler 2014-01-01
Indraneel Purohit 2025-03-17 2 2 36 36 0
Isayah Vidito 2022-10-11 9 9 271 272 1
Jamison French 2022-06-27 5 5 59 59 0
Jennifer Tran 2020-10-05 5 5 339 358 19
Jonas Sølvsteen 2022-08-15 10 10 185 185 0
Kevin Bullock 2024-02-05
Kim Murphy 2018-02-08
Kiri Carini 2022-06-06 1 1 22 22 0
Kyle Barron 2023-08-14 20 15 2,113 1,980 -133
Lane Goodman 2019-01-07 4 4 532 532 0
Leo Thomas 2020-05-26 5 5 80 80 0
Lilly Thomas 2020-04-01 1 1 8 8 0
Loïc Houpert 2025-11-17 6 5 38 37 -1
Marc Farra 2014-09-15 2 2 711 711 0
Martha Morrissey 2025-09-22 1 1 39 39 0
Max Jones 2024-07-15 15 15 402 281 -121
Olaf Veerman 2015-01-01 3 3 6 6 0
Pete Gadomski 2024-09-30 21 16 1,922 809 -1,113
Ricardo Duplos 2015-01-01 3 3 255 255 0
Saadiq Mohiuddin 2022-11-28 5 5 371 371 0
Sajjad Anwar 2018-01-02 7 5 156 113 -43
Sandra Hoang 2023-06-28 7 7 539 538 -1
Sanjay Bhangar 2018-04-30 7 7 643 643 0
Sean Harkins 2018-11-15 4 4 25 25 0
Sharon Lu 2025-05-12
Simi Damani 2025-10-13 2 2 29 29 0
Soumya Ranjan Mohanty 2021-12-06 3 3 22 22 0
Tarashish Mishra 2022-10-31 7 6 103 102 -1
Vincent Sarago 2019-01-07 20 20 3,005 3,007 2
Vitor George 2018-08-01 6 6 758 757 -1
Wei Ji 2023-02-01 9 7 349 151 -198
Will Rynearson 2022-06-13 3 3 102 102 0
Wille Marcel 2021-07-26 10 10 2,105 221 -1,884
Zac Deziel 2023-01-30 7 7 57 57 0
Felix Delattre 2024-10-07 12 11 473 315 -158

Contributors with "—" had no link data in either state (no tracked repo contributions).
Small positive deltas are from normal commit count changes during the re-fetch, not filtering.

Largest filtering impacts (pre-employment commits removed):

  • Wille Marcel (-1,884): Long OSM contribution history before joining DevSeed in Jul 2021
  • Pete Gadomski (-1,113): Extensive STAC ecosystem work before joining in Sep 2024
  • Wei Ji (-198): PyGMT contributions before joining in Feb 2023
  • Felix Delattre (-158): EOEPCA work before joining in Oct 2024
  • Kyle Barron (-133): geoarrow work before joining in Aug 2023
  • Max Jones (-121): xarray/VirtualiZarr work before joining in Jul 2024
  • Emmanuel Mathot (-118): EOEPCA work before joining in Aug 2024

Test plan

  • ContributorEntry model parses both string and inline-table formats
  • Config validator normalizes plain strings into ContributorEntry objects
  • Link.from_github() with since counts only filtered commits
  • Link.from_github() without since behaves as before
  • Zero post-start-date commits returns None (link not created)
  • update_from_github() returns False on zero commits, triggering link deletion
  • Real config.toml with mixed formats parses correctly
  • mypy and ruff pass cleanly
  • All 10 Python tests + 128 JS tests pass
  • Data re-fetched and verified filtering effect

🤖 Generated with Claude Code

aboydnw and others added 5 commits March 24, 2026 17:46
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link.from_github() and update_from_github() now accept an optional
`since` parameter. When provided, commits are filtered server-side
via the GitHub API and counted manually instead of using the
unfiltered contributor.contributions count.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Populates start_date for 62 contributors. Also adds firzaariany,
renames aliziel to "Alison Ziel" and pantierra to "Felix Delattre".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aboydnw aboydnw requested a review from gadomski as a code owner March 24, 2026 21:13
@github-actions
Copy link
Copy Markdown
Contributor

aboydnw and others added 2 commits March 24, 2026 21:16
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@aboydnw aboydnw linked an issue Mar 24, 2026 that may be closed by this pull request
@aboydnw
Copy link
Copy Markdown
Member Author

aboydnw commented Mar 24, 2026

@gadomski I know you mentioned separating code changes from data updates, but I needed the data updates to verify the code changes. Is it okay to include on one PR here?

cc @maxrjones to verify this is the change you were expecting with #85

@maxrjones
Copy link
Copy Markdown
Member

@gadomski I know you mentioned separating code changes from data updates, but I needed the data updates to verify the code changes. Is it okay to include on one PR here?

cc @maxrjones to verify this is the change you were expecting with #85

LGTM!

@gadomski
Copy link
Copy Markdown
Collaborator

gadomski commented Mar 25, 2026

Adds per-contributor start_date to config.toml so only post-employment commits count as DevSeed contributions

This is an interesting change that diverges a bit from how I originally thought about this visualization. When I made this, my vibe was one of "we're all connected", which is why the original version of this included "friends of DevSeed" (non-DevSeed folks who work on similar project portfolios). To me, it's more interesting how we all have projects that we've worked on over a long time, even before coming here.

By clipping contributions by start-of-employment date, it feels like we're implicitly saying that all contributions after that date were funded/supported/etc by DevSeed, which may not be the case. Speaking personally, I do a lot of open source maintenance that I don't bill as DevSeed time.

I don't object to the change per-se, but I did maybe want to pause and have a chat about "what are we doing here"?

@maxrjones
Copy link
Copy Markdown
Member

I don't object to the change per-se, but I did maybe want to pause and have a chat about "what are we doing here"?

Similarly, I don't object to this not changing. I was just confused by the messaging. E.g., the data didn't seem to fully align with the repo description of "An experimental visualization of contributions at Development Seed to repositories both within and outside of our organization". I hold no strong opinions about whether it's best to change the data, the messaging, or neither if my interpretation/confusion was unusual.

@gadomski
Copy link
Copy Markdown
Collaborator

@aboydnw I think I'll throw this to you, since you've got the vision of how we'd like to use this "product" in our comms/marketing/etc. I'm totally ok with clipping contributions on folks' start date, but I do worry that it doesn't really capture the full impact that newer employees might have to community (non-devseed) open source repos.

@aboydnw
Copy link
Copy Markdown
Member Author

aboydnw commented Mar 26, 2026

These are thought-provoking questions, thank you both!

I think there will be more of a need for some date-based filtering when we get to the next phase. And the way I implemented this doesn't really take that into account.

So, I'd suggest we close this for now, go live with what we have, then think about this date-based filtering more holistically when we get to that point. Give me a shout if you disagree and we can reopen this PR.

Again, thank you! 🙏

@aboydnw aboydnw closed this Mar 26, 2026
@gadomski gadomski deleted the feat/contributor-start-dates branch March 26, 2026 22:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filter based on DevSeed start dates?

3 participants