Skip to content

Commit 45f48b5

Browse files
author
dchasap
committed
initial commit
1 parent cfb90a5 commit 45f48b5

19 files changed

+5258
-1
lines changed

README.md

Lines changed: 62 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,63 @@
11
# cwm-simulator
2-
A SLURM-like cluster workload manager simulator
2+
A SLURM-like cluster workload manager simulator, for evaluating power-aware scehduling policies.
3+
4+
Simulator allows users to implement their own policies and test them. Also offers scripts
5+
to create their job load, given a number of jobs. The simulator uses application traces
6+
to run jobs (it does not actually run them).
7+
8+
9+
* Configuration Files: The following files need to created in the conf directory.
10+
These files describe the workload, the cluster and workload manager settings.
11+
12+
- job-list: a list of jobs that are to be used to create a workload (used to generate
13+
traffic, not by the simulator itself). Should contain tuplets of job name and
14+
estimated execution time
15+
16+
- host-list: a list of host machine names
17+
18+
- job_profiler_manager.txt: setting options for job profiler (job profiler is used to
19+
process traces and make power predictions for jobs)
20+
21+
- sched_default.txt: settings for scheduler options
22+
23+
- traffic: a file describing traffic, used by gen_traffic.py to replay a traffic
24+
scenario and send jobs to be scheduled by the simulator.
25+
26+
* Application Traces and predictions:
27+
Traces are list of tuplets that contain a timestamp and a corresponding power consumption.
28+
The actual exectuion time is also required. Example traces and statistics used for making
29+
power predictions can be found in the data/ folder.
30+
31+
32+
* Generate/Replay Traffic: ./script/gen_traffic.py
33+
This script is used to either generate a traffic load and/or replay it to send jobs to
34+
the simulator for execution. Without replaying, users need to manually send jobs. See
35+
.scripts/gen_traffic.py --help for more information.
36+
37+
38+
* Simulator: ./scripts/cluster_simulator.py
39+
Accepts incoming jobs and schedules them according to user implemented scheduling
40+
policies (some already implemented examples exist in scripts). Terminates when "quit"
41+
job is received.
42+
See ./scripts/cluster_simulator.py --help for more information.
43+
44+
45+
An example of running the simulator with a given pre-generated traffic file.
46+
47+
./scripts/gen_traffic.py --job-list ./conf/jobs.txt \
48+
--time-unit 1 \
49+
--replay-traffic \
50+
--traffic-infile ./conf/traffic_bursty.txt | \
51+
./scripts/cluster_simulator.py \
52+
--host-list ./conf/hosts.txt \
53+
-v --time-unit 1 \
54+
--enable-statistics \
55+
--stats-log stats.txt \
56+
--stats-trace-file stats_trace.csv \
57+
--log log.txt \
58+
--scheduler naive \ # this is SLURM extended
59+
--global-power-budget 20000 \
60+
--time-limit 3600 \
61+
--job-power-cap 120
62+
63+

conf/job_profiler_manager.conf

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
#test_node quartz584_0
2+
test_node quartz1466_1
3+
test_job cholesky

conf/jobs.txt

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
blackscholes 1 200
2+
bodytrack 1 140
3+
dedup 1 110
4+
facesim 1 350
5+
ferret 1 50
6+
fluidanimate 1 150
7+
freqmine 1 80
8+
streamcluster 1 60
9+
swaptions 1 45
10+
bt-mz_D.8 8 240
11+
bt-mz_D.16 16 200
12+
bt-mz_D.64 64 45
13+
lu-mz_C.8 8 100
14+
lu-mz_C.16 16 50
15+
bt-mz_D.8 8 120
16+
bt-mz_D.16 16 200

conf/sched_default.conf

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
estimation_error 0

0 commit comments

Comments
 (0)