Skip to content

Main reduce hp#1142

Open
wmousa wants to merge 5 commits into
mainfrom
main-reduce-hp
Open

Main reduce hp#1142
wmousa wants to merge 5 commits into
mainfrom
main-reduce-hp

Conversation

@wmousa

@wmousa wmousa commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

No description provided.

wmousa added 5 commits June 23, 2026 17:27
  Previously hugepages were only increased when below the required amount.
  Now the value is set whenever current != required, allowing reduction when
  the node has more hugepages allocated than needed. Kubelet restart in the
  storage init job is also triggered on any change, not just increases.
  When sn restart is called with a new max_lvol, the sn_config_file on the
  node was not updated and hugepages were not adjusted to match the new
  requirement.

  - Add update_node_config API endpoint (docker + k8s) that updates
    max_lvol and huge_page_memory for the matching NUMA node in
    sn_config_file, matched by socket + SSD PCI intersection
  - Add update_node_config client method in SNodeClient
  - In _restart_storage_node_impl, call update_node_config before
    set_hugepages() only when max_lvol actually changed (lvol_changed flag
    captures the delta before snode.max_lvol is overwritten)
  - Fix hugepage drift bug in set_hugepages_if_needed: store the adjusted
    kernel total (required) instead of raw hugepages_needed so that
    user_delta = current - prev correctly reflects only manual changes
  When a user adds or removes hugepages manually between deploys, the delta
  is captured correctly for the first restart but then silently lost on the
  second: prev_sb advances to the new required (which already absorbed the
  delta), so user_delta becomes 0 and the baseline wins � reverting to the
  pre-change count.

  Fix by writing the updated user reservation (baseline + delta) back to
  hugepages_baseline_nodeN whenever user_delta != 0. The baseline then
  reflects the user's intended permanent allocation, and each subsequent
  restart computes the correct total without needing a manual re-add.
  /tmp is subject to periodic cleanup by systemd-tmpfiles-clean (default
  10-day TTL), which could silently delete hugepages_sb_node* and
  hugepages_baseline_node* between deployments without a reboot, causing
  the baseline tracking to reset incorrectly.

  /var/run (/run) is a tmpfs that clears only on reboot � the correct
  behaviour for these files � and is never touched by tmpfiles cleanup.
  Updates the constant, docker volume mount, k8s hostPath mounts, and
  the init job shell script.
@wmousa wmousa requested review from geoffrey1330 and mxsrc June 26, 2026 13:57

@mxsrc mxsrc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Largely this looks good. A few comments:

I'd like to use a consistent naming of numa_node. Instead of node or numa. "node" in this codebase commonly refers to storage nodes, I feel we should avoid any kind of ambiguity.

As for the general design, I think the hugepages config/state file(s) could be simpler represented in a json file, that would make things a bit more self-descriptive, allow to validate the values, and make the logic more readable. Not sure you have the capacity for that sort of change right now though.

Failures in storing the hugepage allocation are ignored, I feel they should be exposed to the caller for them to decide how to handle them.

What's the difference between the new update_node_config endpoint and the old apply_config? It seems like there is overlap, at least concerning the intent of both. Can they be merged?

Comment on lines +676 to +679
max_lvol: int
huge_page_memory: int
numa: int
ssd_list: list

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be typed better I feel. We should at least use List[str] for the ssd_list. In other places of the API we use ssd_pcie, does this list also contain a list of PCIe addresses? Then I'd argue we should name this consistenly. For the other values we should at least use util.Unsigned as the values cannot be negative. It might also make sense to make the values optional to allow partial updates.

Comment thread simplyblock_core/env_var
Comment on lines +4 to +5
SIMPLY_BLOCK_DOCKER_IMAGE=simplyblock/simplyblock:main
SIMPLY_BLOCK_SPDK_ULTRA_IMAGE=simplyblock/spdk:main-latest

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be merged.

Comment on lines +682 to +709
@api.post('/update_node_config', responses={
200: {'content': {'application/json': {'schema': utils.response_schema({
'type': 'boolean'
})}}},
})
def update_node_config(body: UpdateNodeConfigParams):
node_info = core_utils.load_config(constants.NODES_CONFIG_FILE)
if not node_info.get("nodes"):
return utils.get_response(False, "Config not found")

ssd_set = set(body.ssd_list)
matched = False
for node_config in node_info["nodes"]:
if node_config["socket"] != body.numa:
continue
if not ssd_set.intersection(set(node_config.get("ssd_pcis", []))):
continue
node_config["max_lvol"] = body.max_lvol
node_config["huge_page_memory"] = body.huge_page_memory
matched = True
break

if not matched:
return utils.get_response(False, "No matching node found for given numa and ssd_list")

core_utils.store_config_file(node_info, constants.NODES_CONFIG_FILE)
return utils.get_response(True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the docker endpoint, we should define it once and reuse it. Check the bottom of the file for how that's done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants