Skip to content

Added first draft of OLS model with sample data and helper tools#3

Open
ArneTR wants to merge 30 commits intomainfrom
OLS-model
Open

Added first draft of OLS model with sample data and helper tools#3
ArneTR wants to merge 30 commits intomainfrom
OLS-model

Conversation

@ArneTR
Copy link
Copy Markdown
Collaborator

@ArneTR ArneTR commented Sep 9, 2025

This PR re-works the current model implementation of the OLS model to estimate the weights.

The implementation is done in Python with the statsmodels library to make the formulas used better readable than with sklearn's OLS implementation.

OLS was used as model of choice as it has the highest interpretability allowing direct conclusions about component energy factors.

Also this PR adds sample data from my machine (Framebook) to support the project development with some sample data.

Suggested TODOs for further exploration

  • Perform validations of the OLS assumptions
  • Incorporate ridge L2 model to test
  • Consider alternative models, e.g., logarithmic models, which correlate more closely with the energy consumption of a CPU
  • Generally, record a power curve of the system and see if it is linear, logarithmic, or whatever ... draw conclusions from this
  • Generally validate how good the mixed model is for repetitions. Currently only done once

@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Sep 9, 2025

Also I want to provide some experience while playing around with some models:

Workload design experiences

Computer systems behave very differently in power draw when different workloads happen. Thus it was explored if separate and disparate workloads shall be used or one large mixed workload shall be used.

Advantages of the separate workloads could be:

  • Better fit and better prediction
  • Might work especially well in edge cases
  • Might needs less sample data to fit

Disadvantages though respectively:

  • When conditions change the fit might be unusable. This was tried with a fit on a compute workload. Here the intercept was already 16 W (although system was 4 Watts in idle). Thus it was impossible to know for the model what would happen if CPU was idle.
  • The later prediction must know which model to choose when. This is hard when no domain knowledge is present as for instance a rule like "switch to compute when instructions are > 10,000,000" might be misleading in a system with a higher base frequency or sytems that only have one core and cannot fully sleep the core.

Model design experiences

  • Fitting on all variables leads to high colinearity. This means that repeated runs produce highly different weights. This can be either combated by higher sample sizes, changing the sample interval or by simply dropping variables or chaining them together.

    • Needs to be explored further
  • Idle workloads are dominated by wakeups. If a simple model with only wakeups is fitted the idle is properly estimated at around 1 W for the intercept and about 2-3 W contribution by the wakeups.

@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Sep 9, 2025

@ribalba ping for you

@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Sep 17, 2025

I just upgrade the PR and added some more transformations.

Try at you box and tell me the values.

Here is what I do:

sudo pkill -f energy-logger.sh
sudo ./energy-logger.sh & # will output the file it will write to
python3 run-workload.py mixed
sudo pkill -f energy-logger.sh

# now you will have output to stdout of a file name like /tmp/energy-0yb45hR0.log

# now you run the exact same stuff again

sudo pkill -f energy-logger.sh
sudo ./energy-logger.sh & # will output the file it will write to
python3 run-workload.py mixed
sudo pkill -f energy-logger.sh

# now you will have a different output to stdout of a file name like /tmp/energy-890342da.log

# Then you can run:

python3 model.py /tmp/energy-0yb45hR0.log --no-validate --fit OLS --predict /tmp/energy-890342da.log

This will effectively fit a model on the first benchmark run and then make an out of sample prediction with the second run.

Also please run the last command again with --log appended

Eager for the results :)

@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Sep 17, 2025

@ribalba

@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Oct 27, 2025

@ribalba I added a prediction stage that now also back-transforms the data from the logarithm space.

It can now be iterated quite quickly iterated on. I added examples for using the model.py and the predict stage of it with included energy sample data from the newly added endpoint in /sys/kernel/debug/energy/sys.

This can be merged now if you feel the functionality is contributing to the tool.

Happy for your feedback.

TODOs (TBD in a different PR though):

  • Automatic validation of model assumptions (condition number, F-Statistic, normal distribution of errors etc.)
    • Statsmodels calculates all these numbers already and outputs them in a statistic. They currently must be interpreted by the reader but conditions could be introduced to make this automatic
  • Implementing more and better models
    • The model chokes on numerical errors. Log transformation helps quite a bit, but sacrifices explainability and needs a log transformation in the kernel.
    • Thus other models and other transformations should be explored to make the model more viable
    • Alternatively models should be situational. A compute model for when compute is happening and a idle model when the system is idle.
  • Reducing overhead
    • On a 99ms sampling the kernel module requires 10% of one core. That is a negligable load on the 10 core system I am testing on, but when added with a sampling from userspace of 10 Hz the 5% margin of system load (which this PR assumes as not idle anymore) is quickly reached

@ArneTR ArneTR marked this pull request as ready for review October 27, 2025 06:50
@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Jan 10, 2026

I just pushed another big chunk of implementations to the branch:

Models

  • Implemented XGBoost Model
  • Implemented Ridge Model
  • Implemented Huber Regressor

Data Preparation

  • Implemeneted Standard Scaler

Results

python3 model.py ../sample_data/energy_logs/mixed_workload.log --dump-raw --dump-diff --predict ../sample_data/energy_logs/mixed_workload_2.log --add-intercept --features extra --model xgboost --dump-predictions --no-validate
=> 82% Accuracy

python3 model.py ../sample_data/energy_logs/mixed_workload.log --dump-raw --dump-diff --predict ../sample_data/energy_logs/mixed_workload_2.log --add-intercept --features normal --model ols --dump-predictions --no-validate
=> 66% Accuracy

Summary

  • Huber and Ridge provide no benefit over a scaled OLS
  • XGBoost far outperforms all models on mixed data. However data becomes uninterpretable.
    • If we want to use that in the model output we need to put a user space conversion step after the kernel output

Possible next steps

  • Train separate OLS models for idle, compute etc. and see if we can get OLS to > 80% accuracy
  • Massage the dataset even more, removing outliers, apply different scaling strategies etc.

Call me :)

@ArneTR
Copy link
Copy Markdown
Collaborator Author

ArneTR commented Apr 3, 2026

@ribalba

This model is now in a very good status.

  • Brought over benchmark.sh script with more versatile network etc. workload
  • Made memory workload more versatile to also have --stream
  • Better ouput of accuracy including WMAPE and sMAPE (normal MAPE can collapse due to singularity and give false negative accuracy)
  • Added MCP integration (See Mcp granularity green-coding-solutions/green-metrics-tool#1630 for high resolution energy output)

Results

The results are on-par with the model from #4 ... thus I would favor closing #4 and #5 for increased complexity and no gain

To reproduce:

$ python3 model.py ../sample_data/energy_logs/M2-fixed-frequency-mixed-workload-new.log --no-validate --target rapl_core_sum_uj --add-intercept --features all --predict ../sample_data/energy_logs/M2-fixed-frequency-mixed-workload-new-2.log --model ols --scale
                            OLS Regression Results
==============================================================================
Dep. Variable:       rapl_core_sum_uj   R-squared:                       0.920
Model:                            OLS   Adj. R-squared:                  0.920
Method:                 Least Squares   F-statistic:                     3216.
Date:                Fri, 03 Apr 2026   Prob (F-statistic):               0.00
Time:                        17:29:00   Log-Likelihood:                -30414.
No. Observations:                2244   AIC:                         6.085e+04
Df Residuals:                    2235   BIC:                         6.090e+04
Df Model:                           8
Covariance Type:            nonrobust
================================================================================
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
const         1.058e+06   3938.888    268.579      0.000    1.05e+06    1.07e+06
cpu_ns        4.688e+05   6511.658     71.990      0.000    4.56e+05    4.82e+05
mem           1.216e+04   3959.779      3.072      0.002    4399.143    1.99e+04
instructions  1.976e+05   6016.728     32.842      0.000    1.86e+05    2.09e+05
wakeups       4.503e+04   4489.838     10.028      0.000    3.62e+04    5.38e+04
diski        -1087.3669   4157.814     -0.262      0.794   -9240.948    7066.214
disko         1.692e+04   4193.800      4.033      0.000    8691.217    2.51e+04
rx            5.319e+04   4.68e+04      1.137      0.256   -3.86e+04    1.45e+05
tx           -2.092e+04   4.68e+04     -0.447      0.655   -1.13e+05    7.08e+04
==============================================================================
Omnibus:                      238.860   Durbin-Watson:                   0.813
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1114.578
Skew:                          -0.406   Prob(JB):                    9.39e-243
Kurtosis:                       6.356   Cond. No.                         26.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Rescaled params:
 const           274158.747531
cpu_ns               0.002019
mem                  0.000135
instructions         0.000322
wakeups              4.524398
diski               -0.044344
disko                0.081460
rx                 211.375596
tx                -189.914460
dtype: float64
MAE: 127893.85497056974
MAPE (%): 13.415981823943687
WAPE (%): 12.35187010382711
sMAPE (%): 12.744775805994854
R²: 0.9156642472220207

As seen the average error is around 10% sometimes more sometimes less depending on some fluctuation.

With XGBoost, which is the gold standard, we get down to 5%.

Next steps

To Advance further now I need more variables in the procpower output:

  • Unhalted Clock Cycles
    • Is this available per core?
  • Halted Clock Cycles (Might not be available)
    • Is this available per core?
  • Frequency (Only if available directly. Otherwise I will produce it from ClockCycles / WallTime )
    • Is this available per core?
  • APERF
    • Is this available per core?
  • MPERF
    • Is this available per core?
  • DRAM-serviced LLC-miss loads
  • TLB Miss
  • DMA Access
  • Interrupts (is that distinctively available from wakeups?)
  • Other Instruction value. What are we using at the moment? Unhalted Instructions? Retired Instructions?
    • Is this available per core?
  • uOPs (sub-Instructions)
    • Is this available per core?
  • Temperature (Should be in IA32_PACKAGE_THERM_STATUS and can be Intel only for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant