Draft: Add automated test runner and visualization script for stress testing#1080
Draft: Add automated test runner and visualization script for stress testing#1080
Conversation
ahzero7d1
commented
Mar 20, 2026
- Add extra arguments for running full test configurations
- Separate test configuration and execution script
- Add visualization script
a6673ed to
75e469e
Compare
JulienVig
left a comment
There was a problem hiding this comment.
Thanks Ahyoung, the scripts are very handy! I've left a few comments and questions.
- Can you rename
scriptsto something likepythonto clarify the difference with the other typescript scripts also incli/src? - Can you also add a short
README.mdin thescripts(python) folder with instructions and example to use the two scripts?
| import pandas as pd | ||
| import matplotlib.pyplot as plt |
There was a problem hiding this comment.
Could you add a requirements.txt with the libraries contributors should install?
| def plot_mean_std(df, metric, output_path, title, ylabel): | ||
| summary = df.groupby("step")[metric].agg(["mean", "std"]).reset_index() | ||
|
|
||
| summary["std"] = summary["std"].fillna(0) |
There was a problem hiding this comment.
Why were there NaN stds?
|
|
||
| def main(): | ||
| # path of configuration file for experiments | ||
| config_path = Path(sys.argv[1]) |
There was a problem hiding this comment.
Could you set the default value to the path to basic_tests.json?
| "experiments": [ | ||
| { | ||
| "testID": "mnist_fed_mean_cnn3_p3_d600_e50_r2", | ||
| "task": "mnist_federated", |
There was a problem hiding this comment.
I'm getting this error:
cli/dist/args.js:46
throw Error(`${unsafeArgs.task} not implemented.`);
^
Error: mnist_federated not implemented
Did you create new default tasks (mnist_federated, titanic_decentralized, etc.)?
| const streamPath = path.join(dir, `client${userIndex}_local_log.ndjson`); | ||
|
|
||
| const finalLog: SummaryLogs[] = []; | ||
| // create a write stream that saves learning logs during the train | ||
| let ndjsonStream: ReturnType<typeof createWriteStream> | null = null; | ||
|
|
||
| if (args.save){ | ||
| ndjsonStream = createWriteStream(streamPath, {flags: "w"}); | ||
| } | ||
|
|
There was a problem hiding this comment.
Did you choose the "ndjson" name over jsonl for a particular reason? I had only ever seen jsonl until now so unless you particularly prefer ndjson I would change it to jsonl
| /** | ||
| * Return validation metrics | ||
| * | ||
| * TODO: currently only works for TFJS, gpt models | ||
| */ | ||
| evaluate( | ||
| _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>> | ||
| ): Promise<ValidationMetrics>{ | ||
| throw new Error("Evaluation not supported for this model"); | ||
| } |
There was a problem hiding this comment.
| /** | |
| * Return validation metrics | |
| * | |
| * TODO: currently only works for TFJS, gpt models | |
| */ | |
| evaluate( | |
| _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>> | |
| ): Promise<ValidationMetrics>{ | |
| throw new Error("Evaluation not supported for this model"); | |
| } | |
| /** | |
| * Return validation metrics | |
| */ | |
| abstract evaluate( | |
| _validationDataset?: Dataset<Batched<DataFormat.ModelEncoded[D]>> | |
| ): Promise<ValidationMetrics>; |
I think we can make this method abstract rather than throwing