|
1 | 1 | # cwm-simulator |
2 | | -A SLURM-like cluster workload manager simulator |
| 2 | +A SLURM-like cluster workload manager simulator, for evaluating power-aware scehduling policies. |
| 3 | + |
| 4 | +Simulator allows users to implement their own policies and test them. Also offers scripts |
| 5 | +to create their job load, given a number of jobs. The simulator uses application traces |
| 6 | +to run jobs (it does not actually run them). |
| 7 | + |
| 8 | + |
| 9 | +* Configuration Files: The following files need to created in the conf directory. |
| 10 | + These files describe the workload, the cluster and workload manager settings. |
| 11 | + |
| 12 | + - job-list: a list of jobs that are to be used to create a workload (used to generate |
| 13 | + traffic, not by the simulator itself). Should contain tuplets of job name and |
| 14 | + estimated execution time |
| 15 | + |
| 16 | + - host-list: a list of host machine names |
| 17 | + |
| 18 | + - job_profiler_manager.txt: setting options for job profiler (job profiler is used to |
| 19 | + process traces and make power predictions for jobs) |
| 20 | + |
| 21 | + - sched_default.txt: settings for scheduler options |
| 22 | + |
| 23 | + - traffic: a file describing traffic, used by gen_traffic.py to replay a traffic |
| 24 | + scenario and send jobs to be scheduled by the simulator. |
| 25 | + |
| 26 | +* Application Traces and predictions: |
| 27 | + Traces are list of tuplets that contain a timestamp and a corresponding power consumption. |
| 28 | + The actual exectuion time is also required. Example traces and statistics used for making |
| 29 | + power predictions can be found in the data/ folder. |
| 30 | + |
| 31 | + |
| 32 | +* Generate/Replay Traffic: ./script/gen_traffic.py |
| 33 | + This script is used to either generate a traffic load and/or replay it to send jobs to |
| 34 | + the simulator for execution. Without replaying, users need to manually send jobs. See |
| 35 | + .scripts/gen_traffic.py --help for more information. |
| 36 | + |
| 37 | + |
| 38 | +* Simulator: ./scripts/cluster_simulator.py |
| 39 | + Accepts incoming jobs and schedules them according to user implemented scheduling |
| 40 | + policies (some already implemented examples exist in scripts). Terminates when "quit" |
| 41 | + job is received. |
| 42 | + See ./scripts/cluster_simulator.py --help for more information. |
| 43 | + |
| 44 | + |
| 45 | +An example of running the simulator with a given pre-generated traffic file. |
| 46 | + |
| 47 | +./scripts/gen_traffic.py --job-list ./conf/jobs.txt \ |
| 48 | + --time-unit 1 \ |
| 49 | + --replay-traffic \ |
| 50 | + --traffic-infile ./conf/traffic_bursty.txt | \ |
| 51 | +./scripts/cluster_simulator.py \ |
| 52 | + --host-list ./conf/hosts.txt \ |
| 53 | + -v --time-unit 1 \ |
| 54 | + --enable-statistics \ |
| 55 | + --stats-log stats.txt \ |
| 56 | + --stats-trace-file stats_trace.csv \ |
| 57 | + --log log.txt \ |
| 58 | + --scheduler naive \ # this is SLURM extended |
| 59 | + --global-power-budget 20000 \ |
| 60 | + --time-limit 3600 \ |
| 61 | + --job-power-cap 120 |
| 62 | + |
| 63 | + |
0 commit comments