Kmonte/custom launcher validator#605
Conversation
Adds `CustomResourceConfig { string launcher_fn; map<string,string>
launcher_args; }` as a new oneof member (field 5) on both
TrainerResourceConfig and InferencerResourceConfig. Also surfaces the
new oneof in the GiglResourceConfigPbWrapper so callers can retrieve
it alongside the existing VertexAi/Local/KFP/Dataflow variants.
Also widens the Union parameter of
resource_config_checks._validate_machine_config and the trainer
annotation in check_if_trainer_resource_config_valid to include the
new type, with a trivial launcher_fn-truthy branch; full semantic
validation (e.g. import-resolvability, dry-run) is intentionally
deferred to the follow-up validator PR.
Adds gigl/src/common/custom_launcher.py with launch_custom(), which resolves CustomResourceConfig.launcher_fn via import_obj and invokes it with the standard trainer/inferencer kwargs plus the opaque launcher_args map. Wires CustomResourceConfig isinstance branches into the existing dispatch in glt_trainer.py (__execute_VAI_training + run()) and glt_inferencer.py (__execute_VAI_inference + run()). V1 trainer and v1 gnn_inferencer remain Vertex-only. Also adds gigl/env/runtime.py with is_ray_runtime()/get_runtime_env() for callers that need to branch on execution environment. Step 2 of 3 in the upstream series (A: proto, B: dispatch, C: validator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends the resource-config validator to handle CustomResourceConfig: - check_if_trainer_resource_config_valid / check_if_inferencer_resource_ config_valid short-circuit for custom configs (launcher-pluggable; no machine shape to validate). - Reverts the minimal stub from PR #A inside _validate_machine_config — that function's "unrecognized config" else-branch is restored as the contract demands. - New check_if_custom_resource_config_dry_run_valid helper invokes launch_custom(..., is_dry_run=True) (lazy import keeps assert_yaml_configs_parse import-free). - New --check_custom_launcher_dry_run CLI flag on config_validator. - New check_custom_resource_config_requires_glt_backend compatibility check raises when CustomResourceConfig is paired with a task config that has should_use_glt_backend=False (v1 dispatchers don't consult the custom oneof). Step 3 of 3 in the upstream series (A: proto, B: dispatch, C: validator).
Introduces a generic, repeatable --env_vars KEY=VALUE flag on gigl.orchestration.kubeflow.runner that bakes environment variables into every GiGL-owned container at compile time via PipelineTask.set_env_variable. Applied uniformly across all SPECED_COMPONENTS plus the GLT eligibility check and log_metrics_to_ui tasks; the managed VertexNotificationEmailOp exit handler is intentionally excluded. Rejected with --action=run_no_compile to prevent silent UX failure (envs are baked at compile time, so the flag would do nothing in that mode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…onte/custom-launcher-validator
ThreadPoolExecutor(max_workers=0) raises ValueError. The previous code passed len(node_data_references) / len(edge_data_references) directly, which crashes when a preprocessor returns empty preprocessing-spec dicts (legitimate use case for an end-to-end harness). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same root cause as the enumerate fix: ThreadPoolExecutor(max_workers=0) raises ValueError. When both node_ref_to_preprocessing_spec and edge_ref_to_preprocessing_spec are empty, num_dataflow_jobs is 0 and the executor blows up in __init__. Early-return an empty PreprocessedMetadataReferences in that case — no Dataflow work to do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ss dispatch
Replace the import-callable ``launcher_fn`` / ``launcher_args`` plumbing
with a generic shell-style ``command`` + ``args`` subprocess invocation:
- ``CustomResourceConfig.command`` (string) is the shell snippet invoked
via ``/bin/sh -c`` so callers can inline leading ``KEY=VALUE`` env
assignments. ``CustomResourceConfig.args`` (repeated string) elements
are individually shell-quoted and appended.
- ``launch_custom`` no longer imports a Python callable. It populates a
new ``gigl`` OmegaConf resolver from runtime kwargs (task config URI,
applied task identifier, component, docker images, dry-run flag),
re-resolves any ``\${gigl:*}`` placeholders embedded in the proto's
``command`` / ``args`` strings, and shells out via ``subprocess.run``.
Dry-run logs the resolved shell line and skips the spawn.
- The new ``gigl`` resolver returns ``\${gigl:<key>}`` literally when a
key is unset so the first-pass YAML load is lossless.
- Pre-commit also touched a number of unrelated legacy files (trailing
whitespace, scalafmt) which are bundled here for cleanliness.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| @@ -41,12 +41,11 @@ runs: | |||
| console.log(`Error checking collaborator status: ${error.message}`); | |||
| } | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
Semgrep identified an issue in your code:
GitHub Actions input ${{ inputs.username }} is directly interpolated into JavaScript code in actions/github-script, allowing command injection if an attacker provides malicious input.
More details about this
The script directly interpolates ${{ inputs.username }} into a JavaScript string using GitHub Actions' expression syntax. This creates a command injection vulnerability in the actions/github-script step.
Attack Scenario:
-
An attacker opens a pull request or triggers a workflow and provides a malicious
usernameinput containing JavaScript code, such as:test"; process.env.GITHUB_TOKEN = ""; // -
When this workflow runs, the string interpolation happens in the YAML parsing phase before the script executes, so the malicious input is directly embedded into the JavaScript code like this:
const username = "test"; process.env.GITHUB_TOKEN = ""; //";
-
The attacker's injected JavaScript code (
process.env.GITHUB_TOKEN = "";) now executes with full access to the GitHub Actions runner environment, including all secrets available to the workflow. -
The attacker can steal
GITHUB_TOKEN, access repository secrets, modify code, or exfiltrate sensitive data that the runner has access to.
The vulnerability exists because inputs.username is user-controlled (can be set by anyone triggering the workflow) and is directly embedded into the script without sanitization or escaping.
To resolve this comment:
✨ Commit fix suggestion
| env: | |
| GITHUB_TOKEN: ${{ inputs.github-token }} | |
| USERNAME: ${{ inputs.username }} # Move untrusted input into the env block | |
| script: | | |
| try { | |
| const username = process.env.USERNAME; // Reference the username from env for safety | |
| const result = await github.rest.repos.checkCollaborator({ | |
| owner: context.repo.owner, | |
| repo: context.repo.repo, | |
| username: username | |
| }); | |
| if (result.status === 204) { | |
| console.log(`${username} is a collaborator.`); | |
| } else { | |
| throw new Error(`${username} is NOT a collaborator.`); | |
| } | |
| } catch (error) { | |
| if (error.status === 404) { | |
| throw new Error(`${username} is NOT a collaborator.`); | |
| } else if (error.status === 403) { | |
| console.log(`User is not a collaborator: ${error.message}`); | |
| } else { | |
| console.log(`Error checking collaborator status: ${error.message}`); | |
| } | |
| } |
View step-by-step instructions
- Move the untrusted input
${{ inputs.username }}from thescript:section to theenv:section of the step. For example, addUSERNAME: ${{ inputs.username }}underenv:. - Update the code inside the
script:block to useprocess.env.USERNAMEinstead of"${{ inputs.username }}". Change the line toconst username = process.env.USERNAME;. - Always use double quotes when referencing an environment variable in shell contexts, which is not needed in this JS context, but important if used elsewhere for safety.
This prevents any user input from being injected directly as code into the script. Using process.env.USERNAME reads the value at runtime, reducing the risk of code injection.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by github-script-injection.
You can view more details about this finding in the Semgrep AppSec Platform.
| ) | ||
| if is_dry_run: | ||
| return | ||
| subprocess.run(shell_line, shell=True, check=True) |
There was a problem hiding this comment.
Semgrep identified an issue in your code:
The subprocess.run() call uses shell=True on user-controlled command strings, allowing shell injection attacks that could execute arbitrary commands with the process's privileges.
More details about this
The subprocess.run() call executes shell_line with shell=True, which spawns a shell process to interpret the command. This is dangerous because if resolved_command or any element in resolved_args comes from untrusted input (e.g., user-provided configuration, external data), an attacker can inject arbitrary shell commands.
For example, if an attacker controls custom_resource_config.command and sets it to id; rm -rf /, the shell will execute both id and the destructive rm -rf / command. Even though shlex.quote() escapes individual arguments, it doesn't protect against injection in the command itself—only in the args list. An attacker who controls the command field can bypass this protection entirely.
Exploit scenario:
- Attacker provides
custom_resource_config.command = "echo test; cat /etc/passwd #" - After joining with args,
shell_linebecomes something like"echo test; cat /etc/passwd #" - When
subprocess.run()executes this withshell=True, the shell interprets the semicolon as a command separator - Both
echo testandcat /etc/passwdexecute, leaking sensitive system files - If the process runs with elevated privileges, the attacker can exfiltrate or modify sensitive data
The shell inherits environment variables and settings from the parent process, which further expands the attack surface. Using shell=False would treat the entire string as a single command name rather than allowing shell metacharacters to be interpreted.
To resolve this comment:
💡 Follow autofix suggestion
| subprocess.run(shell_line, shell=True, check=True) | |
| subprocess.run(shell_line, shell=False, check=True) |
View step-by-step instructions
- Replace
subprocess.run(shell_line, shell=True, check=True)with an invocation that does not useshell=True. - Change the code to directly pass the command and arguments as a list, rather than joining into a string. Use:
subprocess.run([resolved_command] + resolved_args, check=True). - Remove any uses of
shlex.quotewhen building the argument list, since quoting is only necessary when passing a string to the shell. - Ensure that
resolved_commandand every item inresolved_argsare unquoted strings representing the command and its arguments.
Passing the command and arguments as a list with shell=False (the default) is safer, because it avoids any interpretation by a shell, preventing command injection vulnerabilities.
💬 Ignore this finding
Reply with Semgrep commands to ignore this finding.
/fp <comment>for false positive/ar <comment>for acceptable risk/other <comment>for all other reasons
Alternatively, triage in Semgrep AppSec Platform to ignore the finding created by subprocess-shell-true.
You can view more details about this finding in the Semgrep AppSec Platform.
Scope of work done
Where is the documentation for this feature?: N/A
Did you add automated tests or write a test plan?
Updated Changelog.md? NO
Ready for code review?: NO