feat(proxy): resolve push identity from token via SCM provider API#1604
feat(proxy): resolve push identity from token via SCM provider API#1604coopernetes wants to merge 2 commits into
Conversation
✅ Deploy Preview for endearing-brigadeiros-63f9d0 canceled.
|
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
9c3d053 to
ef788cf
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1604 +/- ##
==========================================
+ Coverage 85.38% 85.69% +0.30%
==========================================
Files 83 85 +2
Lines 7878 8090 +212
Branches 1312 1360 +48
==========================================
+ Hits 6727 6933 +206
- Misses 1123 1129 +6
Partials 28 28 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
|
||
| export const findUserByGitAccount = async function (gitAccount: string): Promise<User | null> { | ||
| const collection = await connect(collectionName); | ||
| const doc = await collection.findOne({ gitAccount: { $eq: gitAccount.toLowerCase() } }); |
There was a problem hiding this comment.
Any desire to support an list of accounts here? gitAccount is somewhat of a holdover from v1. It's also singular across the whole user context - there's no shape in the data model today that supports associative git account by upstream provider/hostname.
Ideally, we revisit this shape in support of this PR. Something like this:
# MongoDB doc
{
# existing keys...
"username": "git-proxy-user",
"email": "user@corpo-example.com",
"gitAccounts": {
"github.com": ["foo", "bar"],
"gitlab.com": [ "baz" ]
}
}
4b14544 to
e443da3
Compare
|
We should probably make a decision on whether we're going to start storing PATs/passwords for git accounts in git proxy, or github/gitlab apps etc.. Its been brought up multiple times as a necessary change to fulfil the ultimate goals of several Git Proxy contributors (e.g. raising PRs). Perhaps a topic for the next meeting. |
|
@kriswest this PR does not propose storing the PATs. Only a irrevisible SHA-512 hash of the token to avoid excessive calls to the user lookup APIs. Unless I missed something? |
|
@coopernetes sorry I wasn't suggesting it did! I think its a great approach to solving the pusher validation issue using the current data we have on users - it was just prompting me raise the fact that other desired features are going to need to store PATs or be authorised applications in order to do some of the other things contributors have made clear they want to try and achieve with git-proxy and that we should get on a make a formal decision as to whether we're going to take that on soon or not. as it effects the design of various features (such as this one and proxy format/doing the second push for you/raising the PR). I'm aware you are looking at similar features in fogwall and would love to have a chat about approaches for git proxy soon. |
|
Understood, my mistake. I misinterpreted. Some good candidates for relevant issues worth discussing in a design session:
|
jescalada
left a comment
There was a problem hiding this comment.
LGTM - wondering if any other @finos/git-proxy-maintainers wants to check it out before merging?
Also: Is this a complete fix for #1400 or is there anything else we need to patch up for checkUserPushPermission to work as expected?
| ); | ||
| action.user = identity.login; | ||
| if (identity.email) { | ||
| action.userEmail = identity.email; |
There was a problem hiding this comment.
Is GitProxy identity guaranteed to be the same as the SCM identity? If not, should we document that proper push identity can only be obtained if the SCM user's email is set to match?
There was a problem hiding this comment.
The only guarantee in this flow is that the identity.login string is reliable insofar as the PAT will always be linked to a real GitHub user (except in odd cases like using a separate GitHub OAuth App to generate an OAuth token then feeding that into a git client... not impossible but likely not a common setup for developers).
This will decouple any email-to-user linkage from this new token resolver. The existing commit metadata in the chain already captures committer/author identity and links it to a GitProxy user account. This new step will just run as a later step to link back to the gitAccount.
gitAccount was previously "overloaded" and set to an email address to link to internal GitProxy user account objects based on conventions established from the original maintainer (as far as I remember, I could be mistaken). As I understand it, that was an internal, organization specific convention. This PR reclaims that field to mean the pusher's SCM profile name / GitHub login making it a reliable identifier for who actually performed the push.
I'm gonna remove the if (identity.email) check. As discussed in #1400, the email address is only ever returned in that GET /user endpoint if a user explicitly goes against the private-by-default setting of hiding it. Almost no one on GitHub enables that setting because of the obvious privacy implication.
|
|
||
| // Get git account (SCM identity) for a user | ||
| router.get('/:username/git-account', async (req: Request<{ username: string }>, res: Response) => { | ||
| const targetUsername = req.params.username.toLowerCase(); |
There was a problem hiding this comment.
I think we might be missing an auth check here:
| const targetUsername = req.params.username.toLowerCase(); | |
| if (!req.user) { | |
| res.status(401).json({ error: 'Authentication required' }); | |
| return; | |
| } | |
| const targetUsername = req.params.username.toLowerCase(); |
There was a problem hiding this comment.
This new endpoint is consistent with the rest of the users Router endpoints. If auth is needed here across the suite of endpoints, let's track that in a separate issue+PR.
b373c9b to
a086b6b
Compare
|
Thanks for the review @jescalada , addressed those comments so just one final round of review. On your question regarding 1400, this is a partial fix. When a user has their |
…1400) parsePush incorrectly uses the last commit's committer as the push user. This adds a new chain processor that extracts the token from HTTP Basic auth, calls the SCM provider's user API (GitHub GET /user for now), and maps the SCM login to a git-proxy user via the gitAccount field. - TokenIdentityProvider interface with hostname-based dispatch - GitHubTokenIdentityProvider calling api.github.com/user - resolveUserFromToken chain processor (non-blocking on failure) - findUserByGitAccount DB lookup (file + mongo) - GET/PUT /api/v1/user/:username/git-account endpoints
… identity resolver GitHub's GET /user only returns email if the user has explicitly made it public — effectively never. Remove the if (identity.email) branch and the email field from ScmUserInfo to avoid the misleading implication that an email fallback exists. Add AbortSignal.timeout(5000) to the GitHub API fetch to prevent the push chain from hanging if the API is slow or unreachable.
a086b6b to
2a00c7c
Compare
Description
parsePush uses the last commit's committer as the push user. This adds a new chain processor that extracts the token from HTTP Basic auth, calls the SCM provider's user API (GitHub
GET /userfor now), and maps the SCM login to a git-proxy user via the gitAccount field.This doesn't block a push if the gitAccount isn't mapped in order to allow introduction of the gitAccount via the UI. This acts as a "soft" check for now unless the maintainer team wishes to adopt this model and use it as a requirement for authorising the "pusher" identity link that is missing as per what is described in #1400
How it works
resolveUserFromTokenruns in the push chain afterparsePush, beforecheckUserPushPermissionprovider:tokenas key) — returns immediately on hitTokenIdentityProviderbased on the upstream hostname (github.com →GitHubTokenIdentityProvider)GET /userwith the token to get the SCM logingitAccountfield — if found, setsaction.userandaction.userEmailfrom the DB user and stores in cachegitAccountmatch, falls back to using the SCM login directly (non-blocking)Cache note
The cache of token hashes is in-process memory intentionally — caches should not persist across restarts as it is an API driven optimization (respect user's own rate limits, don't look up data that doesn't change). A database-backed cache would be the natural next step if horizontal scaling becomes a concern, but for a single-process proxy this is sufficient and avoids a schema migration.
Limitations
gitand Bitbucket APIs. A user email can be linked between both "realms" but you cannot use your email to push code to that platform. Supporting Bitbucket proper requires some credential rewriting which is error-prone and brittle. See BitbucketProvider and BitbucketIdentityFilter in RBC/fogwall for details on what is needed in the HTTP flow. It's shared here as prior art/learnings only.Related Issue
related to #1400
General
Documentation
Configuration
no configuration changes introduced
Tests
npm test)npm run lintandnpm run format:check)npm run check-types)GET/PUT /api/v1/user/:username/git-account(coverage exists but UI integration testing is deferred)