Kmonte/custom launcher validator by kmontemayor2-sc · Pull Request #605 · Snapchat/GiGL

kmontemayor2-sc · 2026-04-29T19:00:20Z

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Adds `CustomResourceConfig { string launcher_fn; map<string,string> launcher_args; }` as a new oneof member (field 5) on both TrainerResourceConfig and InferencerResourceConfig. Also surfaces the new oneof in the GiglResourceConfigPbWrapper so callers can retrieve it alongside the existing VertexAi/Local/KFP/Dataflow variants. Also widens the Union parameter of resource_config_checks._validate_machine_config and the trainer annotation in check_if_trainer_resource_config_valid to include the new type, with a trivial launcher_fn-truthy branch; full semantic validation (e.g. import-resolvability, dry-run) is intentionally deferred to the follow-up validator PR.

Adds gigl/src/common/custom_launcher.py with launch_custom(), which resolves CustomResourceConfig.launcher_fn via import_obj and invokes it with the standard trainer/inferencer kwargs plus the opaque launcher_args map. Wires CustomResourceConfig isinstance branches into the existing dispatch in glt_trainer.py (__execute_VAI_training + run()) and glt_inferencer.py (__execute_VAI_inference + run()). V1 trainer and v1 gnn_inferencer remain Vertex-only. Also adds gigl/env/runtime.py with is_ray_runtime()/get_runtime_env() for callers that need to branch on execution environment. Step 2 of 3 in the upstream series (A: proto, B: dispatch, C: validator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends the resource-config validator to handle CustomResourceConfig: - check_if_trainer_resource_config_valid / check_if_inferencer_resource_ config_valid short-circuit for custom configs (launcher-pluggable; no machine shape to validate). - Reverts the minimal stub from PR #A inside _validate_machine_config — that function's "unrecognized config" else-branch is restored as the contract demands. - New check_if_custom_resource_config_dry_run_valid helper invokes launch_custom(..., is_dry_run=True) (lazy import keeps assert_yaml_configs_parse import-free). - New --check_custom_launcher_dry_run CLI flag on config_validator. - New check_custom_resource_config_requires_glt_backend compatibility check raises when CustomResourceConfig is paired with a task config that has should_use_glt_backend=False (v1 dispatchers don't consult the custom oneof). Step 3 of 3 in the upstream series (A: proto, B: dispatch, C: validator).

Introduces a generic, repeatable --env_vars KEY=VALUE flag on gigl.orchestration.kubeflow.runner that bakes environment variables into every GiGL-owned container at compile time via PipelineTask.set_env_variable. Applied uniformly across all SPECED_COMPONENTS plus the GLT eligibility check and log_metrics_to_ui tasks; the managed VertexNotificationEmailOp exit handler is intentionally excluded. Rejected with --action=run_no_compile to prevent silent UX failure (envs are baked at compile time, so the flag would do nothing in that mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…onte/custom-launcher-validator

ThreadPoolExecutor(max_workers=0) raises ValueError. The previous code passed len(node_data_references) / len(edge_data_references) directly, which crashes when a preprocessor returns empty preprocessing-spec dicts (legitimate use case for an end-to-end harness). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Same root cause as the enumerate fix: ThreadPoolExecutor(max_workers=0) raises ValueError. When both node_ref_to_preprocessing_spec and edge_ref_to_preprocessing_spec are empty, num_dataflow_jobs is 0 and the executor blows up in __init__. Early-return an empty PreprocessedMetadataReferences in that case — no Dataflow work to do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ss dispatch Replace the import-callable ``launcher_fn`` / ``launcher_args`` plumbing with a generic shell-style ``command`` + ``args`` subprocess invocation: - ``CustomResourceConfig.command`` (string) is the shell snippet invoked via ``/bin/sh -c`` so callers can inline leading ``KEY=VALUE`` env assignments. ``CustomResourceConfig.args`` (repeated string) elements are individually shell-quoted and appended. - ``launch_custom`` no longer imports a Python callable. It populates a new ``gigl`` OmegaConf resolver from runtime kwargs (task config URI, applied task identifier, component, docker images, dry-run flag), re-resolves any ``\${gigl:*}`` placeholders embedded in the proto's ``command`` / ``args`` strings, and shells out via ``subprocess.run``. Dry-run logs the resolved shell line and skips the spawn. - The new ``gigl`` resolver returns ``\${gigl:<key>}`` literally when a key is unset so the first-pass YAML load is lossless. - Pre-commit also touched a number of unrelated legacy files (trailing whitespace, scalafmt) which are bundled here for cleanliness. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

semgrep-code-snapchat · 2026-05-04T19:19:28Z

@@ -41,12 +41,11 @@ runs:
              console.log(`Error checking collaborator status: ${error.message}`);
            }
          }
-          
+


Semgrep identified an issue in your code:

GitHub Actions input ${{ inputs.username }} is directly interpolated into JavaScript code in actions/github-script, allowing command injection if an attacker provides malicious input.

More details about this

The script directly interpolates ${{ inputs.username }} into a JavaScript string using GitHub Actions' expression syntax. This creates a command injection vulnerability in the actions/github-script step.

Attack Scenario:

An attacker opens a pull request or triggers a workflow and provides a malicious username input containing JavaScript code, such as: test"; process.env.GITHUB_TOKEN = ""; //

When this workflow runs, the string interpolation happens in the YAML parsing phase before the script executes, so the malicious input is directly embedded into the JavaScript code like this:

const username = "test"; process.env.GITHUB_TOKEN = ""; //";

The attacker's injected JavaScript code (process.env.GITHUB_TOKEN = "";) now executes with full access to the GitHub Actions runner environment, including all secrets available to the workflow.

The attacker can steal GITHUB_TOKEN, access repository secrets, modify code, or exfiltrate sensitive data that the runner has access to.

The vulnerability exists because inputs.username is user-controlled (can be set by anyone triggering the workflow) and is directly embedded into the script without sanitization or escaping.

To resolve this comment:

✨ Commit fix suggestion

Suggested change

env:

GITHUB_TOKEN: ${{ inputs.github-token }}

USERNAME: ${{ inputs.username }} # Move untrusted input into the env block

script: |

try {

const username = process.env.USERNAME; // Reference the username from env for safety

const result = await github.rest.repos.checkCollaborator({

owner: context.repo.owner,

repo: context.repo.repo,

username: username

});

if (result.status === 204) {

console.log(`${username} is a collaborator.`);

} else {

throw new Error(`${username} is NOT a collaborator.`);

}

} catch (error) {

if (error.status === 404) {

throw new Error(`${username} is NOT a collaborator.`);

} else if (error.status === 403) {

console.log(`User is not a collaborator: ${error.message}`);

} else {

console.log(`Error checking collaborator status: ${error.message}`);

}

}

View step-by-step instructions

Move the untrusted input ${{ inputs.username }} from the script: section to the env: section of the step. For example, add USERNAME: ${{ inputs.username }} under env:.

Update the code inside the script: block to use process.env.USERNAME instead of "${{ inputs.username }}". Change the line to const username = process.env.USERNAME;.

Always use double quotes when referencing an environment variable in shell contexts, which is not needed in this JS context, but important if used elsewhere for safety.

This prevents any user input from being injected directly as code into the script. Using process.env.USERNAME reads the value at runtime, reducing the risk of code injection.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by github-script-injection.

_{You can view more details about this finding in the Semgrep AppSec Platform.}

semgrep-code-snapchat · 2026-05-04T19:19:29Z

+    )
+    if is_dry_run:
+        return
+    subprocess.run(shell_line, shell=True, check=True)


Semgrep identified an issue in your code:

The subprocess.run() call uses shell=True on user-controlled command strings, allowing shell injection attacks that could execute arbitrary commands with the process's privileges.

More details about this

The subprocess.run() call executes shell_line with shell=True, which spawns a shell process to interpret the command. This is dangerous because if resolved_command or any element in resolved_args comes from untrusted input (e.g., user-provided configuration, external data), an attacker can inject arbitrary shell commands.

For example, if an attacker controls custom_resource_config.command and sets it to id; rm -rf /, the shell will execute both id and the destructive rm -rf / command. Even though shlex.quote() escapes individual arguments, it doesn't protect against injection in the command itself—only in the args list. An attacker who controls the command field can bypass this protection entirely.

Exploit scenario:

Attacker provides custom_resource_config.command = "echo test; cat /etc/passwd #"

After joining with args, shell_line becomes something like "echo test; cat /etc/passwd #"

When subprocess.run() executes this with shell=True, the shell interprets the semicolon as a command separator

Both echo test and cat /etc/passwd execute, leaking sensitive system files

If the process runs with elevated privileges, the attacker can exfiltrate or modify sensitive data

The shell inherits environment variables and settings from the parent process, which further expands the attack surface. Using shell=False would treat the entire string as a single command name rather than allowing shell metacharacters to be interpreted.

To resolve this comment:

💡 Follow autofix suggestion

Suggested change

subprocess.run(shell_line, shell=True, check=True)

subprocess.run(shell_line, shell=False, check=True)

View step-by-step instructions

Replace subprocess.run(shell_line, shell=True, check=True) with an invocation that does not use shell=True.

Change the code to directly pass the command and arguments as a list, rather than joining into a string. Use: subprocess.run([resolved_command] + resolved_args, check=True).

Remove any uses of shlex.quote when building the argument list, since quoting is only necessary when passing a string to the shell.

Ensure that resolved_command and every item in resolved_args are unquoted strings representing the command and its arguments.

Passing the command and arguments as a list with shell=False (the default) is safer, because it avoids any interpretation by a shell, preventing command injection vulnerabilities.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

/fp <comment> for false positive

/ar <comment> for acceptable risk

/other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by subprocess-shell-true.

_{You can view more details about this finding in the Semgrep AppSec Platform.}

kmontemayor and others added 10 commits April 17, 2026 01:56

Merge branch 'main' into kmonte/custom-launcher-validator

64d9c3f

Merge branch 'main' into kmonte/custom-launcher-validator

ae8159d

Merge remote-tracking branch 'origin/kmonte/add-kfp-env-vars' into km…

8260ad9

…onte/custom-launcher-validator

semgrep-code-snapchat Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kmonte/custom launcher validator#605

Kmonte/custom launcher validator#605
kmontemayor2-sc wants to merge 10 commits intomainfrom
kmonte/custom-launcher-validator

kmontemayor2-sc commented Apr 29, 2026

Uh oh!

semgrep-code-snapchat Bot May 4, 2026

Uh oh!

semgrep-code-snapchat Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

+    env:
+      GITHUB_TOKEN: ${{ inputs.github-token }}
+      USERNAME: ${{ inputs.username }} # Move untrusted input into the env block
+    script: |
+      try {
+        const username = process.env.USERNAME; // Reference the username from env for safety
+        const result = await github.rest.repos.checkCollaborator({
+          owner: context.repo.owner,
+          repo: context.repo.repo,
+          username: username
+        });
+        if (result.status === 204) {
+          console.log(`${username} is a collaborator.`);
+        } else {
+          throw new Error(`${username} is NOT a collaborator.`);
+        }
+      } catch (error) {
+        if (error.status === 404) {
+          throw new Error(`${username} is NOT a collaborator.`);
+        } else if (error.status === 403) {
+          console.log(`User is not a collaborator: ${error.message}`);
+        } else {
+          console.log(`Error checking collaborator status: ${error.message}`);
+        }
+      }

	subprocess.run(shell_line, shell=True, check=True)
	subprocess.run(shell_line, shell=False, check=True)

Conversation

kmontemayor2-sc commented Apr 29, 2026

Uh oh!

semgrep-code-snapchat Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

semgrep-code-snapchat Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants