Skip to content

Kmonte/custom launcher validator#605

Draft
kmontemayor2-sc wants to merge 10 commits intomainfrom
kmonte/custom-launcher-validator
Draft

Kmonte/custom launcher validator#605
kmontemayor2-sc wants to merge 10 commits intomainfrom
kmonte/custom-launcher-validator

Conversation

@kmontemayor2-sc
Copy link
Copy Markdown
Collaborator

Scope of work done

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

kmontemayor and others added 10 commits April 17, 2026 01:56
Adds `CustomResourceConfig { string launcher_fn; map<string,string>
launcher_args; }` as a new oneof member (field 5) on both
TrainerResourceConfig and InferencerResourceConfig. Also surfaces the
new oneof in the GiglResourceConfigPbWrapper so callers can retrieve
it alongside the existing VertexAi/Local/KFP/Dataflow variants.

Also widens the Union parameter of
resource_config_checks._validate_machine_config and the trainer
annotation in check_if_trainer_resource_config_valid to include the
new type, with a trivial launcher_fn-truthy branch; full semantic
validation (e.g. import-resolvability, dry-run) is intentionally
deferred to the follow-up validator PR.
Adds gigl/src/common/custom_launcher.py with launch_custom(), which
resolves CustomResourceConfig.launcher_fn via import_obj and invokes
it with the standard trainer/inferencer kwargs plus the opaque
launcher_args map.

Wires CustomResourceConfig isinstance branches into the existing
dispatch in glt_trainer.py (__execute_VAI_training + run()) and
glt_inferencer.py (__execute_VAI_inference + run()). V1 trainer and
v1 gnn_inferencer remain Vertex-only.

Also adds gigl/env/runtime.py with is_ray_runtime()/get_runtime_env()
for callers that need to branch on execution environment.

Step 2 of 3 in the upstream series (A: proto, B: dispatch, C: validator).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the resource-config validator to handle CustomResourceConfig:
- check_if_trainer_resource_config_valid / check_if_inferencer_resource_
  config_valid short-circuit for custom configs (launcher-pluggable; no
  machine shape to validate).
- Reverts the minimal stub from PR #A inside _validate_machine_config
  — that function's "unrecognized config" else-branch is restored as
  the contract demands.
- New check_if_custom_resource_config_dry_run_valid helper invokes
  launch_custom(..., is_dry_run=True) (lazy import keeps
  assert_yaml_configs_parse import-free).
- New --check_custom_launcher_dry_run CLI flag on config_validator.
- New check_custom_resource_config_requires_glt_backend compatibility
  check raises when CustomResourceConfig is paired with a task config
  that has should_use_glt_backend=False (v1 dispatchers don't consult
  the custom oneof).

Step 3 of 3 in the upstream series (A: proto, B: dispatch, C: validator).
Introduces a generic, repeatable --env_vars KEY=VALUE flag on
gigl.orchestration.kubeflow.runner that bakes environment variables into
every GiGL-owned container at compile time via PipelineTask.set_env_variable.
Applied uniformly across all SPECED_COMPONENTS plus the GLT eligibility
check and log_metrics_to_ui tasks; the managed VertexNotificationEmailOp
exit handler is intentionally excluded.

Rejected with --action=run_no_compile to prevent silent UX failure (envs
are baked at compile time, so the flag would do nothing in that mode).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ThreadPoolExecutor(max_workers=0) raises ValueError. The previous code
passed len(node_data_references) / len(edge_data_references) directly,
which crashes when a preprocessor returns empty preprocessing-spec
dicts (legitimate use case for an end-to-end harness).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as the enumerate fix: ThreadPoolExecutor(max_workers=0)
raises ValueError. When both node_ref_to_preprocessing_spec and
edge_ref_to_preprocessing_spec are empty, num_dataflow_jobs is 0 and
the executor blows up in __init__. Early-return an empty
PreprocessedMetadataReferences in that case — no Dataflow work to do.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ss dispatch

Replace the import-callable ``launcher_fn`` / ``launcher_args`` plumbing
with a generic shell-style ``command`` + ``args`` subprocess invocation:

- ``CustomResourceConfig.command`` (string) is the shell snippet invoked
  via ``/bin/sh -c`` so callers can inline leading ``KEY=VALUE`` env
  assignments. ``CustomResourceConfig.args`` (repeated string) elements
  are individually shell-quoted and appended.
- ``launch_custom`` no longer imports a Python callable. It populates a
  new ``gigl`` OmegaConf resolver from runtime kwargs (task config URI,
  applied task identifier, component, docker images, dry-run flag),
  re-resolves any ``\${gigl:*}`` placeholders embedded in the proto's
  ``command`` / ``args`` strings, and shells out via ``subprocess.run``.
  Dry-run logs the resolved shell line and skips the spawn.
- The new ``gigl`` resolver returns ``\${gigl:<key>}`` literally when a
  key is unset so the first-pass YAML load is lossless.
- Pre-commit also touched a number of unrelated legacy files (trailing
  whitespace, scalafmt) which are bundled here for cleanliness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment on lines 22 to +44
@@ -41,12 +41,11 @@ runs:
console.log(`Error checking collaborator status: ${error.message}`);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:

GitHub Actions input ${{ inputs.username }} is directly interpolated into JavaScript code in actions/github-script, allowing command injection if an attacker provides malicious input.

More details about this

The script directly interpolates ${{ inputs.username }} into a JavaScript string using GitHub Actions' expression syntax. This creates a command injection vulnerability in the actions/github-script step.

Attack Scenario:

  1. An attacker opens a pull request or triggers a workflow and provides a malicious username input containing JavaScript code, such as: test"; process.env.GITHUB_TOKEN = ""; //

  2. When this workflow runs, the string interpolation happens in the YAML parsing phase before the script executes, so the malicious input is directly embedded into the JavaScript code like this:

    const username = "test"; process.env.GITHUB_TOKEN = ""; //";
  3. The attacker's injected JavaScript code (process.env.GITHUB_TOKEN = "";) now executes with full access to the GitHub Actions runner environment, including all secrets available to the workflow.

  4. The attacker can steal GITHUB_TOKEN, access repository secrets, modify code, or exfiltrate sensitive data that the runner has access to.

The vulnerability exists because inputs.username is user-controlled (can be set by anyone triggering the workflow) and is directly embedded into the script without sanitization or escaping.

To resolve this comment:

✨ Commit fix suggestion

Suggested change
env:
GITHUB_TOKEN: ${{ inputs.github-token }}
USERNAME: ${{ inputs.username }} # Move untrusted input into the env block
script: |
try {
const username = process.env.USERNAME; // Reference the username from env for safety
const result = await github.rest.repos.checkCollaborator({
owner: context.repo.owner,
repo: context.repo.repo,
username: username
});
if (result.status === 204) {
console.log(`${username} is a collaborator.`);
} else {
throw new Error(`${username} is NOT a collaborator.`);
}
} catch (error) {
if (error.status === 404) {
throw new Error(`${username} is NOT a collaborator.`);
} else if (error.status === 403) {
console.log(`User is not a collaborator: ${error.message}`);
} else {
console.log(`Error checking collaborator status: ${error.message}`);
}
}
View step-by-step instructions
  1. Move the untrusted input ${{ inputs.username }} from the script: section to the env: section of the step. For example, add USERNAME: ${{ inputs.username }} under env:.
  2. Update the code inside the script: block to use process.env.USERNAME instead of "${{ inputs.username }}". Change the line to const username = process.env.USERNAME;.
  3. Always use double quotes when referencing an environment variable in shell contexts, which is not needed in this JS context, but important if used elsewhere for safety.

This prevents any user input from being injected directly as code into the script. Using process.env.USERNAME reads the value at runtime, reducing the risk of code injection.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by github-script-injection.

You can view more details about this finding in the Semgrep AppSec Platform.

)
if is_dry_run:
return
subprocess.run(shell_line, shell=True, check=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semgrep identified an issue in your code:

The subprocess.run() call uses shell=True on user-controlled command strings, allowing shell injection attacks that could execute arbitrary commands with the process's privileges.

More details about this

The subprocess.run() call executes shell_line with shell=True, which spawns a shell process to interpret the command. This is dangerous because if resolved_command or any element in resolved_args comes from untrusted input (e.g., user-provided configuration, external data), an attacker can inject arbitrary shell commands.

For example, if an attacker controls custom_resource_config.command and sets it to id; rm -rf /, the shell will execute both id and the destructive rm -rf / command. Even though shlex.quote() escapes individual arguments, it doesn't protect against injection in the command itself—only in the args list. An attacker who controls the command field can bypass this protection entirely.

Exploit scenario:

  1. Attacker provides custom_resource_config.command = "echo test; cat /etc/passwd #"
  2. After joining with args, shell_line becomes something like "echo test; cat /etc/passwd #"
  3. When subprocess.run() executes this with shell=True, the shell interprets the semicolon as a command separator
  4. Both echo test and cat /etc/passwd execute, leaking sensitive system files
  5. If the process runs with elevated privileges, the attacker can exfiltrate or modify sensitive data

The shell inherits environment variables and settings from the parent process, which further expands the attack surface. Using shell=False would treat the entire string as a single command name rather than allowing shell metacharacters to be interpreted.

To resolve this comment:

💡 Follow autofix suggestion

Suggested change
subprocess.run(shell_line, shell=True, check=True)
subprocess.run(shell_line, shell=False, check=True)
View step-by-step instructions
  1. Replace subprocess.run(shell_line, shell=True, check=True) with an invocation that does not use shell=True.
  2. Change the code to directly pass the command and arguments as a list, rather than joining into a string. Use: subprocess.run([resolved_command] + resolved_args, check=True).
  3. Remove any uses of shlex.quote when building the argument list, since quoting is only necessary when passing a string to the shell.
  4. Ensure that resolved_command and every item in resolved_args are unquoted strings representing the command and its arguments.

Passing the command and arguments as a list with shell=False (the default) is safer, because it avoids any interpretation by a shell, preventing command injection vulnerabilities.

💬 Ignore this finding

Reply with Semgrep commands to ignore this finding.

  • /fp <comment> for false positive
  • /ar <comment> for acceptable risk
  • /other <comment> for all other reasons

Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by subprocess-shell-true.

You can view more details about this finding in the Semgrep AppSec Platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants