Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
53d1ccf
Added first draft of OLS model with sample data and helper tools
ArneTR Sep 9, 2025
a501ac8
Moving energy logger again to 0.1. We need many samples
ArneTR Sep 17, 2025
a3561a6
Itegrates prediction; DataFrame transformation for numerical stabilit…
ArneTR Sep 17, 2025
8f16c4a
Merge branch 'main' into OLS-model
ArneTR Oct 25, 2025
48ae8cf
Added Workload energy samples of new system only metrics and energy d…
ArneTR Oct 27, 2025
b030705
Changed readme to recommend 99 ms anti lockstep sampling by default
ArneTR Oct 27, 2025
dccaffd
(fix): Predictions where not re-transformed from logspace
ArneTR Oct 27, 2025
ea3daa8
Added missing statsmodels requirement
ArneTR Oct 27, 2025
380614b
More robust np.subtype check
ArneTR Oct 27, 2025
5e141e5
Added examples in the readme for usage with sample data and for the p…
ArneTR Oct 27, 2025
66e6344
Added better prediction and scaling by incorporating SKLearn instead …
ArneTR Jan 10, 2026
4ff4403
Added XGBoost model, Ridge model and HuberRegressor
ArneTR Jan 10, 2026
6922943
Added Sublime Workspace got gitignore
ArneTR Jan 10, 2026
9176c5d
Removing any version requirements. Just pull latest available
ArneTR Mar 21, 2026
126b257
Added missing requirement XGBoost
ArneTR Mar 21, 2026
e82e43d
Made code allow to use different target: rapl_core_sum_uj; Refactored…
ArneTR Mar 21, 2026
51961c8
Removed run-workload and merged with Didis benchmark.sh
ArneTR Mar 26, 2026
0af2b0d
Added powercurve display functionality
ArneTR Mar 26, 2026
d990add
Benchmark option added to end after CPU workload
ArneTR Mar 26, 2026
5021e83
Made memory benchmarks more versatile
ArneTR Mar 26, 2026
f38a8b5
Added all mode to model which allows to include all parameters withou…
ArneTR Mar 26, 2026
62d2e25
Added sMAPE and WAPE to evaluation aprams which are more reliable whe…
ArneTR Mar 26, 2026
5db9685
Brought plotting functionality to model.py
ArneTR Mar 26, 2026
d7b836f
Updated Readme to reflect new benchmark file
ArneTR Mar 26, 2026
0c68012
(fix): Plot shold happen on diffed DF
ArneTR Mar 26, 2026
bb27667
timestamp and sample_ns should not be diffed
ArneTR Mar 26, 2026
331e434
Added sample_ns to extractor
ArneTR Mar 26, 2026
7f4d85e
Integrated MCP functionality into energy logger
ArneTR Apr 3, 2026
2a0c668
Updated Sample data
ArneTR Apr 3, 2026
6d21e4c
Added psu_energy_ac_mcp_machine and slightly altered normal features …
ArneTR Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,7 @@ Mkfile.old
dkms.conf
__pycache__
venv/
sftp-config.json

*.sublime-project
*.sublime-workspace
78 changes: 61 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,13 @@ Field | Meaning

It is imporant to not that we use pid 0 as the whole system and not the idle process.

## Test

There is a test script you can run that you need to run on a host that will build the kernel extension and then see if everythig works out. Just run
```
sudo ./test.sh
```

## Data Collection Helper
The repository ships with **`energy-logger.sh`** which periodically dumps `/proc/energy/all` to disk for offline analysis (e.g. weight regression).

Expand All @@ -180,32 +187,69 @@ The repository ships with **`energy-logger.sh`** which periodically dumps `/proc

## Model

We use a linear model to calculate the energy score for each process. You should train this model yourself by
To get energy values per process we use a ML Model to estimate weights for the respective hardware component metrics.

1) `src`
1) running the data collection helper
```
echo 100000000 > /sys/module/energy_proc/parameters/sample_ns
sudo ./energy-logger.sh
This model is fitted on the machine power and the machine metrics of the system and then mapped on the processes via their individual hardware component metrics.

### Install requirements

1) `$ cd tools`
2) `$ python3 -m venv venv`
3) `$ source venv/bin/activate`
4) `$ pip install -r requirements.txt`
5) `$ sudo apt install stress-ng` - Or equivalent for your distribution

### Capture energy data while running a workload

```bash
# optionally set sampling before
echo 99000000 | sudo tee /sys/module/energy_proc/parameters/sample_ns

## run workload sandwiched between energy logger and start and end
sudo pkill -f energy-logger.sh
sudo ./energy-logger.sh & # will output the file it will write to
bash benchmark.sh
sudo pkill -f energy-logger.sh
```
1) `python3 -m venv venv`
1) `source venv/bin/activate`
1) `pip install -r requirements.txt`
1) `python3 model.py tmp/energy-XXXX.log`

You can then add the weights to your kernel module by
By default it will run the *mixed* workload. Check `$ python3 run-workload.py -h` for alternative implemented workloads.

### Estimating the weights

This estimation will give you an estimate to how your system is currently behaving.
This means it will produce very matching weights for idle workloads if your system is idle.
But these will not be very fit for compute workloads. So effectively you need to run different workloads and create a mash-up model.
The mixed model is already an approach in this direction, but the more you know your future workloads, the better the predicitions will be.

Run: `$ python model.py LOG_FILE_NAME --model ols --no-validate`

Example on sample data: `python3 model.py --model ols ../sample_data/energy_logs/mixed_workload.log --no-validate`

*OLS* is the default model for now.

Also you have multiple switches that help with transformation to achieve better numerical stability. Try
altering the model to use transformed input variables (`--fit OLS-IPS` for instructions per second) or
try transforming the data as a whole to a new space (`--log` to transform to log space).

### Validating the weights through a prediction

Run: `$ python model.py LOG_FILE_NAME --no-validate --model ols --predict LOG_FILE_NAME_2`

Example on sample data: `python3 model.py ../sample_data/energy_logs/mixed_workload.log --model ols --no-validate --predict ../sample_data/energy_logs/mixed_workload_2.log --scale`

Here you will get descriptive values like the *R2*, *MAE* and *MAPE* to evalute the goodness of the fit for your machine.


### Adding estimated weights to kernel module

For each weight you can just send it to the kernel module file endpoint via:

```bash
echo 1231232 > /sys/module/energy_proc/parameters/PARAM # 100 ms
```

Please remember that you can not add floats. We use fix decimal values with 3 decimal points.

## Test

There is a test script you can run that you need to run on a host that will build the kernel extension and then see if everythig works out. Just run
```
sudo ./test.sh
```

## Troubleshooting
| Symptom | Fix |
Expand Down
Loading