feat(proxy): resolve push identity from token via SCM provider API by coopernetes · Pull Request #1604 · finos/git-proxy

coopernetes · 2026-06-19T19:18:37Z

Description

parsePush uses the last commit's committer as the push user. This adds a new chain processor that extracts the token from HTTP Basic auth, calls the SCM provider's user API (GitHub GET /user for now), and maps the SCM login to a git-proxy user via the gitAccount field.

TokenIdentityProvider interface with hostname-based dispatch
GitHubTokenIdentityProvider calling api.github.com/user
resolveUserFromToken chain processor (non-blocking on failure)
findUserByGitAccount DB lookup (file + mongo)
GET/PUT /api/v1/user/:username/git-account endpoints
In-memory token→user cache (5 min TTL, SHA-512 keyed) to avoid hitting the SCM API on every push. Only positive resolutions are cached. Cache is evicted per-user when gitAccount is updated via the API.

⚠️ This PR does not store PATs or tokens. The token is never written to disk or any database. The in-memory cache stores only a one-way SHA-512 hash of provider:token as the lookup key, alongside the resolved git-proxy username. The hash is non-reversible — the original token cannot be recovered from it. The cache lives in process memory only and is cleared on restart.

This doesn't block a push if the gitAccount isn't mapped in order to allow introduction of the gitAccount via the UI. This acts as a "soft" check for now unless the maintainer team wishes to adopt this model and use it as a requirement for authorising the "pusher" identity link that is missing as per what is described in #1400

How it works

resolveUserFromToken runs in the push chain after parsePush, before checkUserPushPermission
Extracts the token from the HTTP Basic auth header (the password field)
Checks in-memory cache (SHA-512 of provider:token as key) — returns immediately on hit
Dispatches to a TokenIdentityProvider based on the upstream hostname (github.com → GitHubTokenIdentityProvider)
Calls GET /user with the token to get the SCM login
Looks up the git-proxy user by gitAccount field — if found, sets action.user and action.userEmail from the DB user and stores in cache
If no gitAccount match, falls back to using the SCM login directly (non-blocking)

Cache note

The cache of token hashes is in-process memory intentionally — caches should not persist across restarts as it is an API driven optimization (respect user's own rate limits, don't look up data that doesn't change). A database-backed cache would be the natural next step if horizontal scaling becomes a concern, but for a single-process proxy this is sufficient and avoids a schema migration.

Limitations

Does not work for a generic git repository provider that doesn't provide a user API. Forcing this behaviour within Git Proxy will constrain its applicability to only these providers which have an API for identity lookups to match them to a valid Git Proxy user.
For specific providers (GitLab, Forgejo/Codeberg/Gitea), an additional scope is needed. Originally documented here: https://github.com/RBC/fogwall/blob/main/docs/CONFIGURATION.md#token-scope-requirements

Token scope requirements

The SCM login check calls GET /user (or equivalent) on the upstream SCM using the pusher's token. The token must carry at least the following scope:

Provider API endpoint Additional scope

GitHub GET https://api.github.com/user No additional scopes required for either classic or fine-grained PATs.

GitLab GET {uri}/api/v4/user read_user or api (not recommended, prefer read_user)

Codeberg GET https://codeberg.org/api/v1/user read:user

Gitea GET https://gitea.com/api/v1/user read:user

BitBucket is just... weird... It has two separate sets of permissions between git and Bitbucket APIs. A user email can be linked between both "realms" but you cannot use your email to push code to that platform. Supporting Bitbucket proper requires some credential rewriting which is error-prone and brittle. See BitbucketProvider and BitbucketIdentityFilter in RBC/fogwall for details on what is needed in the HTTP flow. It's shared here as prior art/learnings only.

Related Issue

related to #1400

General

I have read the CONTRIBUTING.md guidelines
Commit messages follow Conventional Commits format
I have a FINOS CLA on file

Documentation

Required user docs for adding their gitAccount (GitHub username in this current iteration)
Update any architectural docs with the identity resolution

Configuration

no configuration changes introduced

Tests

Tests have been added/updated for new functionality
Unit tests pass (npm test)
Linting and formatting pass (npm run lint and npm run format:check)
Type checks pass (npm run check-types)
API route tests for GET/PUT /api/v1/user/:username/git-account (coverage exists but UI integration testing is deferred)

netlify · 2026-06-19T19:18:42Z

✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name	Link
🔨 Latest commit	`2a00c7c`
🔍 Latest deploy log	https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/6a3c554efed26c00088b45d7

linux-foundation-easycla · 2026-06-19T19:18:47Z

The committers listed above are authorized under a signed CLA.

✅ login: coopernetes / name: Thomas Cooper (a086b6b, e443da3)

github-actions · 2026-06-19T19:18:54Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

codecov · 2026-06-19T19:25:11Z

Codecov Report

❌ Patch coverage is 97.16981% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.69%. Comparing base (ca1d5aa) to head (2a00c7c).

Files with missing lines	Patch %	Lines
...oxy/processors/push-action/resolveUserFromToken.ts	96.42%	3 Missing ⚠️
src/db/file/users.ts	81.81%	2 Missing ⚠️
src/db/index.ts	50.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1604      +/-   ##
==========================================
+ Coverage   85.38%   85.69%   +0.30%     
==========================================
  Files          83       85       +2     
  Lines        7878     8090     +212     
  Branches     1312     1360      +48     
==========================================
+ Hits         6727     6933     +206     
- Misses       1123     1129       +6     
  Partials       28       28

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coopernetes · 2026-06-19T21:57:53Z


+export const findUserByGitAccount = async function (gitAccount: string): Promise<User | null> {
+  const collection = await connect(collectionName);
+  const doc = await collection.findOne({ gitAccount: { $eq: gitAccount.toLowerCase() } });


Any desire to support an list of accounts here? gitAccount is somewhat of a holdover from v1. It's also singular across the whole user context - there's no shape in the data model today that supports associative git account by upstream provider/hostname.

Ideally, we revisit this shape in support of this PR. Something like this:

# MongoDB doc { # existing keys... "username": "git-proxy-user", "email": "user@corpo-example.com", "gitAccounts": { "github.com": ["foo", "bar"], "gitlab.com": [ "baz" ] } }

kriswest · 2026-06-22T08:47:43Z

We should probably make a decision on whether we're going to start storing PATs/passwords for git accounts in git proxy, or github/gitlab apps etc.. Its been brought up multiple times as a necessary change to fulfil the ultimate goals of several Git Proxy contributors (e.g. raising PRs).

Perhaps a topic for the next meeting.

coopernetes · 2026-06-22T12:15:20Z

@kriswest this PR does not propose storing the PATs. Only a irrevisible SHA-512 hash of the token to avoid excessive calls to the user lookup APIs. Unless I missed something?

kriswest · 2026-06-22T12:34:00Z

@coopernetes sorry I wasn't suggesting it did! I think its a great approach to solving the pusher validation issue using the current data we have on users - it was just prompting me raise the fact that other desired features are going to need to store PATs or be authorised applications in order to do some of the other things contributors have made clear they want to try and achieve with git-proxy and that we should get on a make a formal decision as to whether we're going to take that on soon or not. as it effects the design of various features (such as this one and proxy format/doing the second push for you/raising the PR). I'm aware you are looking at similar features in fogwall and would love to have a chat about approaches for git proxy soon.

coopernetes · 2026-06-22T14:13:22Z

Understood, my mistake. I misinterpreted.

Some good candidates for relevant issues worth discussing in a design session:

GitProxy as a GitHub/GitLab App #1450 (closed for now, needs to be scoped properly - TL;DR "finos/git-proxy as an OAuth integration")
Asynchronous management of a push request #428 (historical but still relevant)

jescalada

LGTM - wondering if any other @finos/git-proxy-maintainers wants to check it out before merging?

Also: Is this a complete fix for #1400 or is there anything else we need to patch up for checkUserPushPermission to work as expected?

jescalada · 2026-06-24T12:10:24Z

+      );
+      action.user = identity.login;
+      if (identity.email) {
+        action.userEmail = identity.email;


Is GitProxy identity guaranteed to be the same as the SCM identity? If not, should we document that proper push identity can only be obtained if the SCM user's email is set to match?

The only guarantee in this flow is that the identity.login string is reliable insofar as the PAT will always be linked to a real GitHub user (except in odd cases like using a separate GitHub OAuth App to generate an OAuth token then feeding that into a git client... not impossible but likely not a common setup for developers).

This will decouple any email-to-user linkage from this new token resolver. The existing commit metadata in the chain already captures committer/author identity and links it to a GitProxy user account. This new step will just run as a later step to link back to the gitAccount.

gitAccount was previously "overloaded" and set to an email address to link to internal GitProxy user account objects based on conventions established from the original maintainer (as far as I remember, I could be mistaken). As I understand it, that was an internal, organization specific convention. This PR reclaims that field to mean the pusher's SCM profile name / GitHub login making it a reliable identifier for who actually performed the push.

I'm gonna remove the if (identity.email) check. As discussed in #1400, the email address is only ever returned in that GET /user endpoint if a user explicitly goes against the private-by-default setting of hiding it. Almost no one on GitHub enables that setting because of the obvious privacy implication.

jescalada · 2026-06-24T12:25:44Z


+// Get git account (SCM identity) for a user
+router.get('/:username/git-account', async (req: Request<{ username: string }>, res: Response) => {
+  const targetUsername = req.params.username.toLowerCase();


I think we might be missing an auth check here:

Suggested change

const targetUsername = req.params.username.toLowerCase();

if (!req.user) {

res.status(401).json({ error: 'Authentication required' });

return;

}

const targetUsername = req.params.username.toLowerCase();

This new endpoint is consistent with the rest of the users Router endpoints. If auth is needed here across the suite of endpoints, let's track that in a separate issue+PR.

coopernetes · 2026-06-24T21:57:04Z

Thanks for the review @jescalada , addressed those comments so just one final round of review. On your question regarding 1400, this is a partial fix. When a user has their gitAccount set, checkUserPushPermission will correctly check the actual pusher's permission. Without it, parsePush still falls back to the last committer. This is intentional and documented in the PR description; it's designed as a non-blocking soft check to allow gradual adoption before the decision on changing gitAccount to be an associative map of SCM-hostnames-to-identity (see #1604 (comment)) is sorted. Happy to add those changes here though so it's a complete solution to 1400.

…1400) parsePush incorrectly uses the last commit's committer as the push user. This adds a new chain processor that extracts the token from HTTP Basic auth, calls the SCM provider's user API (GitHub GET /user for now), and maps the SCM login to a git-proxy user via the gitAccount field. - TokenIdentityProvider interface with hostname-based dispatch - GitHubTokenIdentityProvider calling api.github.com/user - resolveUserFromToken chain processor (non-blocking on failure) - findUserByGitAccount DB lookup (file + mongo) - GET/PUT /api/v1/user/:username/git-account endpoints

… identity resolver GitHub's GET /user only returns email if the user has explicitly made it public — effectively never. Remove the if (identity.email) branch and the email field from ScmUserInfo to avoid the misleading implication that an email fallback exists. Add AbortSignal.timeout(5000) to the GitHub API fetch to prevent the push chain from hanging if the API is slow or unreachable.

coopernetes requested a review from a team as a code owner June 19, 2026 19:18

coopernetes force-pushed the feat/token-id-mapping branch from 9c3d053 to ef788cf Compare June 19, 2026 19:20

coopernetes commented Jun 19, 2026

View reviewed changes

Comment thread src/proxy/processors/push-action/resolveUserFromToken.ts

coopernetes commented Jun 19, 2026

View reviewed changes

coopernetes force-pushed the feat/token-id-mapping branch 2 times, most recently from 4b14544 to e443da3 Compare June 21, 2026 04:01

jescalada approved these changes Jun 24, 2026

View reviewed changes

coopernetes force-pushed the feat/token-id-mapping branch from b373c9b to a086b6b Compare June 24, 2026 21:53

coopernetes added 2 commits June 24, 2026 17:58

coopernetes force-pushed the feat/token-id-mapping branch from a086b6b to 2a00c7c Compare June 24, 2026 22:08

Provider	API endpoint	Additional scope
GitHub	`GET https://api.github.com/user`	No additional scopes required for either classic or fine-grained PATs.
GitLab	`GET {uri}/api/v4/user`	`read_user` or `api` (not recommended, prefer `read_user`)
Codeberg	`GET https://codeberg.org/api/v1/user`	`read:user`
Gitea	`GET https://gitea.com/api/v1/user`	`read:user`

Uh oh!

Conversation

coopernetes commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

How it works

Cache note

Limitations

Token scope requirements

Related Issue

General

Documentation

Configuration

Tests

Uh oh!

netlify Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Uh oh!

linux-foundation-easycla Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Dependency Review

Scanned Files

Uh oh!

codecov Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

coopernetes Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kriswest commented Jun 22, 2026

Uh oh!

coopernetes commented Jun 22, 2026

Uh oh!

kriswest commented Jun 22, 2026

Uh oh!

coopernetes commented Jun 22, 2026

Uh oh!

jescalada left a comment

Choose a reason for hiding this comment

Uh oh!

jescalada Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coopernetes Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jescalada Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coopernetes Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

coopernetes commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coopernetes commented Jun 19, 2026 •

edited

Loading

netlify Bot commented Jun 19, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 19, 2026 •

edited

Loading

github-actions Bot commented Jun 19, 2026 •

edited

Loading

codecov Bot commented Jun 19, 2026 •

edited

Loading

coopernetes Jun 19, 2026 •

edited

Loading

coopernetes commented Jun 24, 2026 •

edited

Loading