-
Notifications
You must be signed in to change notification settings - Fork 1
Adding TabSyn examples and ensemble attack code #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
dbf2a1f
ba2bbcc
bdde267
42fbabe
1b63a7d
2af4a09
5eaebb5
492c106
3e86710
bd41b3c
cffe8f9
92a7b7f
cbd05bc
990e278
101e167
7db80e3
51811b7
498c21c
34686be
db18a07
72b7351
bf3585f
22d298d
fc01bae
c895320
7ba95a2
b4e6007
067e3c9
2dee575
1cad880
547b102
c5008ae
04ea715
bb681cb
b0ea88b
b13acc6
6ca19b5
6aa3032
b00fc81
c75b529
7aef62a
26d3cf2
a31e1eb
d04cc1d
4b55ff5
7216f9d
84b0c32
76c5f83
7664d3f
bed857c
7082237
67b1f03
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -42,10 +42,10 @@ jobs: | |
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v6.0.2 | ||
| uses: actions/checkout@v7.0.0 | ||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@v8.1.0 | ||
| uses: astral-sh/setup-uv@v8.2.0 | ||
|
Comment on lines
+45
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔒 Security & Privacy | 🟠 Major 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
# Inspect the workflow file around the referenced lines
nl -ba .github/workflows/docs.yml | sed -n '1,140p'Repository: VectorInstitute/midst-toolkit Length of output: 207 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Show the workflow file and any action refs used in it
grep -nE 'uses:|persist-credentials' .github/workflows/docs.ymlRepository: VectorInstitute/midst-toolkit Length of output: 488 🏁 Script executed: #!/bin/bash
set -euo pipefail
nl -ba .github/workflows/docs.yml | sed -n '1,140p'Repository: VectorInstitute/midst-toolkit Length of output: 207 🏁 Script executed: #!/bin/bash
set -euo pipefail
awk 'NR>=35 && NR<=110 {printf "%4d:%s\n", NR, $0}' .github/workflows/docs.ymlRepository: VectorInstitute/midst-toolkit Length of output: 2097 🏁 Script executed: #!/bin/bash
set -euo pipefail
awk 'NR>=1 && NR<=140 {printf "%4d:%s\n", NR, $0}' .github/workflows/docs.ymlRepository: VectorInstitute/midst-toolkit Length of output: 2856 🏁 Script executed: #!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path('.github/workflows/docs.yml')
for i, line in enumerate(p.read_text().splitlines(), 1):
if 35 <= i <= 110:
print(f"{i:4d}:{line}")
PYRepository: VectorInstitute/midst-toolkit Length of output: 2097 Pin all workflow actions to commit SHAs and disable checkout credentials
🧰 Tools🪛 zizmor (1.26.1)[error] 45-45: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy) (unpinned-uses) [error] 48-48: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy) (unpinned-uses) 🤖 Prompt for AI AgentsSource: Linters/SAST tools
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="VectorInstitute/midst-toolkit"
pr="143"
echo "=== PR review comments mentioning workflow pinning / similar replies ==="
gh api "repos/$repo/pulls/$pr/comments?per_page=100" --paginate \
--jq '.[] | {id, path, line, user: .user.login, in_reply_to_id, body, url} |
select((.path | startswith(".github/workflows/")) or (.body | test("persist-credentials|unpinned|Same here|Pin all workflow actions"; "i")))' || true
echo
echo "=== Issue/PR conversation comments ==="
gh api "repos/$repo/issues/$pr/comments?per_page=100" --paginate \
--jq '.[] | {user: .user.login, body, url}' || trueLength of output: 27247
I’m good with treating it as non-blocking and leaving it for a follow-up if you want to keep this change focused. |
||
| with: | ||
| version: "0.5.21" | ||
| enable-cache: true | ||
|
|
@@ -77,7 +77,7 @@ jobs: | |
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout code | ||
| uses: actions/checkout@v6.0.2 | ||
| uses: actions/checkout@v7.0.0 | ||
|
|
||
| - name: Configure Git Credentials | ||
| run: | | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,10 +41,10 @@ jobs: | |
| integration-tests: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v6.0.2 | ||
| - uses: actions/checkout@v7.0.0 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔒 Security & Privacy | 🟠 Major 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
# Locate the workflow and inspect the relevant section with line numbers.
git ls-files .github/workflows/integration_tests.yml
printf '\n--- file excerpt ---\n'
sed -n '1,120p' .github/workflows/integration_tests.yml | cat -nRepository: VectorInstitute/midst-toolkit Length of output: 2456 🌐 Web query:
💡 Result: As of June 24, 2026, the latest release version of the actions/checkout repository is v7.0.0, released on June 18, 2026 [1][2][3]. Regarding the Citations:
🌐 Web query:
💡 Result: In the GitHub actions/checkout action, the persist-credentials input defaults to true [1][2]. When persist-credentials is set to its default value of true, the action configures the authentication token or SSH key in the local git configuration [1][2]. This allows subsequent steps in your workflow to run authenticated Git commands (such as git push or git fetch) [1][3]. The token or key is removed during the post-job cleanup process [1][3]. Starting with recent versions, these credentials are stored in a file under $RUNNER_TEMP to improve security, rather than directly in.git/config [3][4]. If you wish to disable this behavior, you must explicitly set persist-credentials to false in your workflow file [1]. Citations:
🌐 Web query:
💡 Result: The actions/checkout v7.0.0 release was published on June 18, 2026 [1][2][3]. Key changes and highlights in v7.0.0 include: * Security Update: It introduces a mechanism to block checking out fork pull requests when using pull_request_target and workflow_run events to prevent potential security vulnerabilities [1][4]. * Dependency Updates: Various dependencies were updated, including Citations:
Pin 🧰 Tools🪛 zizmor (1.26.1)[warning] 44-44: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false (artipacked) [error] 44-44: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy) (unpinned-uses) 🤖 Prompt for AI AgentsSource: Linters/SAST tools |
||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b | ||
| uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 | ||
| with: | ||
| # Install a specific version of uv. | ||
| version: "0.5.21" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,10 +16,10 @@ jobs: | |
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install libcurl4-openssl-dev libssl-dev | ||
| - uses: actions/checkout@v6.0.2 | ||
| - uses: actions/checkout@v7.0.0 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔒 Security & Privacy | 🟠 Major 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
# Inspect the workflow and nearby context.
git ls-files .github/workflows/publish.yml
wc -l .github/workflows/publish.yml
cat -n .github/workflows/publish.yml | sed -n '1,120p'Repository: VectorInstitute/midst-toolkit Length of output: 1882 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Find whether checkout is used elsewhere with SHA pinning or persist-credentials settings.
rg -n --hidden --glob '.github/workflows/*.yml' --glob '.github/workflows/*.yaml' 'actions/checkout@|persist-credentials:' .github/workflowsRepository: VectorInstitute/midst-toolkit Length of output: 604 🏁 Script executed: #!/bin/bash
set -euo pipefail
# Check whether any workflow already pins actions/checkout by SHA in the repository.
rg -n --hidden 'actions/checkout@' .github/workflows . || trueRepository: VectorInstitute/midst-toolkit Length of output: 1053 Pin Line 19 uses 🧰 Tools🪛 zizmor (1.26.1)[warning] 19-19: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false (artipacked) [error] 19-19: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy) (unpinned-uses) 🤖 Prompt for AI AgentsSource: Linters/SAST tools |
||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b | ||
| uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 | ||
| with: | ||
| # Install a specific version of uv. | ||
| version: "0.5.21" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -41,10 +41,10 @@ jobs: | |
| unit-tests: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v6.0.2 | ||
| - uses: actions/checkout@v7.0.0 | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔒 Security & Privacy | 🟠 Major 🧩 Analysis chain🏁 Script executed: sed -n '1,140p' .github/workflows/unit_tests.ymlRepository: VectorInstitute/midst-toolkit Length of output: 1747 🏁 Script executed: sed -n '1,140p' .github/workflows/unit_tests.ymlRepository: VectorInstitute/midst-toolkit Length of output: 1747 Pin 🧰 Tools🪛 zizmor (1.26.1)[warning] 44-44: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false (artipacked) [error] 44-44: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy) (unpinned-uses) 🤖 Prompt for AI AgentsSource: Linters/SAST tools |
||
|
|
||
| - name: Install uv | ||
| uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b | ||
| uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 | ||
| with: | ||
| # Install a specific version of uv. | ||
| version: "0.5.21" | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -40,6 +40,14 @@ def make_training_config(config: DictConfig) -> dict[Any, Any]: | |
| Returns: | ||
| The ensemble attack training config for the CTGAN model. | ||
| """ | ||
| base_data_dir = str | ||
| if "base_data_dir" in config: | ||
| base_data_dir = config.base_data_dir | ||
| elif "data_dir" in config: | ||
| base_data_dir = config.data_dir | ||
| else: | ||
| raise ValueError("Either base_data_dir or data_dir must be provided in the config.") | ||
|
|
||
| # Saving the model config from the config.yaml into a json file | ||
| # because that's what the ensemble attack code will be looking for | ||
| training_config_path = Path(config.ensemble_attack.shadow_training.training_json_config_paths.training_config_path) | ||
|
|
@@ -48,10 +56,10 @@ def make_training_config(config: DictConfig) -> dict[Any, Any]: | |
| training_config = OmegaConf.to_container(config.ensemble_attack.shadow_training.model_config, resolve=True) | ||
| assert isinstance(training_config, dict), "Training config must be a dictionary." | ||
| training_config["general"] = { | ||
| "test_data_dir": config.base_data_dir, | ||
| "test_data_dir": base_data_dir, | ||
| "sample_prefix": "ctgan", | ||
| "data_dir": config.base_data_dir, | ||
| "workspace_dir": str(Path(config.base_data_dir) / "shadow_workspace"), | ||
| "data_dir": base_data_dir, | ||
| "workspace_dir": str(Path(base_data_dir) / "shadow_workspace"), | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Super minor, but I think you can avoid the str -> Path -> str conversion here with |
||
| "exp_name": "pre_trained_model", | ||
| } | ||
| json.dump(training_config, f) | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,92 @@ | ||||||||||||||||||||||||
| # TabSyn Single Table Example | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| This example will go over training a single-table [TabSyn](https://arxiv.org/abs/2310.09656) | ||||||||||||||||||||||||
| model and synthesizing data afterwards. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Downloading data | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| First, we need the data. Download it from this | ||||||||||||||||||||||||
| [Google Drive link](https://drive.google.com/file/d/1HTgfgeL5GXc8uAGfeQirJrUynK7vFeyb/view?usp=drive_link), | ||||||||||||||||||||||||
| extract the files and place them in a `/data` folder in within this folder | ||||||||||||||||||||||||
| (`examples/tabsyn`). | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| > [!NOTE] | ||||||||||||||||||||||||
| > If you wish to change the data folder, you can do so by editing the `base_data_dir` attribute | ||||||||||||||||||||||||
| > of the [`config.yaml`](config.yaml) file. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| Here is a description of the files that have been extracted: | ||||||||||||||||||||||||
| - `trans.csv`: The training data. It consists of information about bank transactions and it | ||||||||||||||||||||||||
| contains 20,000 data points. | ||||||||||||||||||||||||
| - `trans_info.json`: Metadata about the `trans.csv` data, with information such as which columns are | ||||||||||||||||||||||||
| numerical and which are categorical, what is the task type, etc. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Kicking off training | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To kick off training, simply run the command below from the project's root folder: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| python -m examples.tabsyn.train | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Training results | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| The result files will be saved inside a `/results` folder within this folder | ||||||||||||||||||||||||
| (`examples/tabsyn`). | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| > [!NOTE] | ||||||||||||||||||||||||
| > If you wish to change the save folder, you can do so by editing the `results_dir` attribute | ||||||||||||||||||||||||
| > of the [`config.yaml`](config.yaml) file. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| In the `/results/trans` folder, there will be a file called `model.pt`, | ||||||||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given the path discussions above, just want to make sure this is the right path for the model checkpoint? |
||||||||||||||||||||||||
| which is a pytorch saved model. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Synthesizing data | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To synthesize some data with the trained model, run: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| python -m examples.tabsyn.synthesize | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| If there is already a trained model in the `/results` folder, it will use that model. | ||||||||||||||||||||||||
| Otherwise it will train one from scratch. At the end of the script, it will save the | ||||||||||||||||||||||||
| synthesized data to `/results/trans/synthetic_data/trans_synthetic.csv`. | ||||||||||||||||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same with this guy. |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ## Evaluating the quality of the synthetic data | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ### Alpha Precision | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To run a round of evaluation with [Alpha Precision](https://arxiv.org/abs/2301.07573) | ||||||||||||||||||||||||
| metrics on a set of synthetic data, run the `evaluate.py` script: | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| python -m midst_toolkit.evaluation.quality.scripts.midst_alpha_precision_eval \ | ||||||||||||||||||||||||
| --synthetic_data_path examples/tabsyn/results/trans/synthetic_data/trans_synthetic.csv \ | ||||||||||||||||||||||||
| --real_data examples/tabsyn/data/trans_sampled.csv \ | ||||||||||||||||||||||||
| --meta_info_path examples/gan/data/meta_info.json \ | ||||||||||||||||||||||||
| --save_directory examples/gan/results/ | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
Comment on lines
+68
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win Alpha Precision command points at
📝 Suggested fix --real_data examples/tabsyn/data/trans_sampled.csv \
- --meta_info_path examples/gan/data/meta_info.json \
- --save_directory examples/gan/results/
+ --meta_info_path examples/tabsyn/data/meta_info.json \
+ --save_directory examples/tabsyn/results/📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lotif: Seconded 🙂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
🐇🙂 |
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| It will save the evaluation results under the `/results/model.txt` file. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ### Additional Metrics | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| The calculation of additional metrics are set up in the `evaluate.py` file. They are the | ||||||||||||||||||||||||
| Kolmogorov-Smirnov (KS) test, Total Variation Distance (TVD), Correlation Matrix Difference | ||||||||||||||||||||||||
| and Mutual Information Difference. | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| To compute those metrics, you can run the command below. The data files should | ||||||||||||||||||||||||
| be under `/data/{table_name}.csv` for the real data, `/data/{table_name}_samples.csv` | ||||||||||||||||||||||||
| for the sampleed data used for training, and `/results/{table_name}_synthetic.csv` | ||||||||||||||||||||||||
| for the synthetic data. | ||||||||||||||||||||||||
|
Comment on lines
+83
to
+86
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win Filename inconsistency: The README documents the sampled training file as 📝 Suggested fix-be under `/data/{table_name}.csv` for the real data, `/data/{table_name}_samples.csv`
-for the sampleed data used for training, and `/results/{table_name}_synthetic.csv`
+be under `/data/{table_name}.csv` for the real data, `/data/{table_name}_sampled.csv`
+for the sampled data used for training, and `/results/{table_name}_synthetic.csv`📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lotif: This is also good to fix up, here and other readmes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== Matches for _samples.csv in README files =="
rg -n --glob '**/README.md' '_samples\.csv'
echo
echo "== Matches for sampleed in README files =="
rg -n --glob '**/README.md' 'sampleed'
echo
echo "== Matches for _sampled.csv in README files =="
rg -n --glob '**/README.md' '_sampled\.csv'Length of output: 682 🏁 Script executed: #!/bin/bash
set -euo pipefail
echo "== examples/tabsyn/README.md relevant lines =="
sed -n '78,90p' examples/tabsyn/README.md || true
echo
echo "== Matches for _samples.csv in README files =="
rg -n --glob '**/README.md' '_samples\.csv' || true
echo
echo "== Matches for sampleed in README files =="
rg -n --glob '**/README.md' 'sampleed' || true
echo
echo "== Matches for _sampled.csv in README files =="
rg -n --glob '**/README.md' '_sampled\.csv' || true
echo
echo "== README lines mentioning sampled/sample data =="
rg -n --glob '**/README.md' 'sampled|sample data|samples\.csv|sampled\.csv' || trueLength of output: 1382
|
||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| ```bash | ||||||||||||||||||||||||
| python -m examples.tabsyn.evaluate | ||||||||||||||||||||||||
| ``` | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| The results will be saved in the `/results/evaluation.json` file. | ||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,85 @@ | ||
| data_dir: examples/tabsyn/data | ||
| results_dir: examples/tabsyn/results | ||
| tabsyn_config: examples/tabsyn/tabsyn_config.toml | ||
| table_name: trans | ||
|
|
||
| training: | ||
| sample_size: 20000 | ||
|
|
||
| evaluation: | ||
| # Reporting Configuration | ||
| write_report: True | ||
| metric_report_path: ${results_dir}/evaluation_results.txt | ||
|
|
||
| # Privacy Evaluation Configurations | ||
| dcr: | ||
| run: True | ||
| norm: "l2" | ||
| batch_size: 1000 | ||
|
|
||
| median_dcr: | ||
| run: True | ||
| norm: "l2" | ||
| batch_size: 1000 | ||
|
|
||
| hitting_rate: | ||
| run: True | ||
| hitting_threshold: 0.03 | ||
|
|
||
| eir: | ||
| run: True | ||
| norm: "gower" | ||
|
|
||
| nndr: | ||
| run: True | ||
| norm: "l2" | ||
| batch_size: 1000 | ||
|
|
||
| # Quality Evaluation Configurations | ||
| ks_tv: | ||
| run: True | ||
| significance_level: 0.05 | ||
| permutations: 1000 | ||
|
|
||
| alpha_precision: | ||
| run: True | ||
| naive_only: False | ||
|
|
||
| ci_overlap: | ||
| run: True | ||
| confidence_level: 95 | ||
|
|
||
| correlation_diff: | ||
| run: True | ||
| compute_mixed_correlations: True | ||
|
|
||
| mean_diff: | ||
| run: True | ||
|
|
||
| f1_score_diff: | ||
| run: True | ||
| label_column: "trans_type" | ||
| folds: 5 | ||
| f1_type: "macro" | ||
|
|
||
| regression_score_diff: | ||
| run: True | ||
| label_column: "trans_type" | ||
| preprocess_labels: True | ||
| measure_metrics_in_original_label_space: False | ||
| verbose: True | ||
| convert_label_to_float: True | ||
|
|
||
| hellinger: | ||
| run: True | ||
| include_numerical_columns: True | ||
|
|
||
| propensity_mse: | ||
| run: True | ||
| folds: 5 | ||
| max_iterations: 50 | ||
| solver: "liblinear" | ||
|
|
||
| mutual_information: | ||
| run: True | ||
| include_numerical_columns: False |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔒 Security & Privacy | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
Repository: VectorInstitute/midst-toolkit
Length of output: 2624
🏁 Script executed:
Repository: VectorInstitute/midst-toolkit
Length of output: 270
Pin
actions/checkoutby SHA and disable persisted credentials..github/workflows/code_checks.yml:32still uses the mutablev7.0.0tag; replace it with the release commit SHA and setpersist-credentials: falsehere, since this job doesn’t need git auth.🧰 Tools
🪛 zizmor (1.26.1)
[warning] 32-32: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 32-32: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Source: Linters/SAST tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lotif: I guess we can do this, but I don't see it as essential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emersodbFair point — I agree this is more defense-in-depth hardening than something essential for correctness in this PR.The recommendation still stands as a best practice for GitHub Actions, but I’m fine treating it as non-blocking and leaving it for a follow-up if you want to keep this change focused.