Skip to content

Add install-openchami as part of openchami-release#52

Open
erl-hpe wants to merge 2 commits into
OpenCHAMI:mainfrom
erl-hpe:add-install-openchami-tool
Open

Add install-openchami as part of openchami-release#52
erl-hpe wants to merge 2 commits into
OpenCHAMI:mainfrom
erl-hpe:add-install-openchami-tool

Conversation

@erl-hpe
Copy link
Copy Markdown
Contributor

@erl-hpe erl-hpe commented May 6, 2026

Pull Request Template

Thank you for your contribution! Please ensure the following before submitting:

Checklist

  • My code follows the style guidelines of this project
  • I have added/updated comments where needed
  • I have added tests that prove my fix is effective or my feature works
  • I have run make test (or equivalent) locally and all tests pass
  • DCO Sign-off: All commits are signed off (git commit -s) with my real name and email
  • REUSE Compliance:
    • Each new/modified source file has SPDX copyright and license headers
    • Any non-commentable files include a <filename>.license sidecar
    • All referenced licenses are present in the LICENSES/ directory

NOTE: I have taken advantage of the REUSE license sidecar mechanism for several files in this PR that are technically commentable where licensing comments would be inappropriate for one of two reasons:

  • The file is used as a template to generate a similar but not necessarily identical file as part of the deployment of OpenCHAMI and incorporating Copyright / License information into the generated file would be inconsistent with the intention of a Copyright. These files are found in the install_openchami/install_openchami/templates directory.
  • The file content is displayed by the install_openchami -b command and carrying Copyright / License information in that output is inconsistent with the intention of the command. The file in question is install_openchami/install_openchami/config/config.yaml.

Testing

The installer tool added here was used to deploy 'host' mode deployments of OpenCHAMI, using nested virtualization, both stand-alone on an AMD-64 virtual machine (running Rocky 9 under KVM with an Ubuntu 24.04 host), an ARM-64 virtual machine (running Rocky 9 under Apple Virtualization on MacOS). It was also manually used to deploy a 'cluster' mode OpenCHAMI installation on a GCP project created as a 'bare' OpenCHAMI cluster using vTDS and the vtds-application-openchami application layer. It was also used via vtds-application-openchami orchestration to automatically deploy a 'quadlet' OpenCHAMI cluster on a GCP project. Both vTDS deployments created a head node as a vTDS Virtual Node, a set of 4 Compute Nodes as vTDS Virtual Nodes, a cluster network connecting these 5 nodes as a vTDS Virtual Network, and a BMC implemented as a vTDS Virtual Blade running RedFish which was used to power-on the Compute Nodes and allow them to boot from OpenCHAMI across the cluster network.

Description

This PR introduces a python package that implements an OpenCHAMI installation and deployment tool derived from the procedure described in the OpenCHAMI 2025 Tutorial and the work done in the vtds-application-openchami GitHub project. This iteration of the tool provides a mechanism by which a developer can quickly deploy OpenCHAMI onto a host Virtual Machine or similar platform by invoking two simple commands and verify that OpenCHAMI deploys correctly and that a Virtual Compute Node (VM) co-resident on the OpenCHAMI Head Node will boot from OpenCHAMI and is accessible using SSH.

The motivation behind this project is twofold:

  • Provide a development tool that encourages developers contributing to OpenCHAMI to run a full, from scratch, test deployment of their changes and verify that they are working before proposing them
  • Move the deployment logic for OpenCHAMI that has resided in the vtds-application-openchami package, where maintenance of the mechanism as OpenCHAMI evolves is decoupled from that evolution and into the OpenCHAMI product itself where it can be naturally maintained as part of the product.

There is a README.md in the install_openchami sub-directory that covers the intent and use of this new feature.

Fixes #53

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

For more info, see Contributing Guidelines.

@erl-hpe erl-hpe marked this pull request as draft May 6, 2026 15:44
@erl-hpe erl-hpe force-pushed the add-install-openchami-tool branch 3 times, most recently from a37f01f to 464f834 Compare May 6, 2026 16:23
@erl-hpe erl-hpe marked this pull request as ready for review May 6, 2026 16:55
@erl-hpe erl-hpe force-pushed the add-install-openchami-tool branch 3 times, most recently from 579491b to 07ec0e9 Compare May 6, 2026 18:45
Signed-off-by: Eric Lund <77127214+erl-hpe@users.noreply.github.com>
@erl-hpe erl-hpe force-pushed the add-install-openchami-tool branch from 07ec0e9 to 94d1d75 Compare May 6, 2026 18:48
@adrianreber
Copy link
Copy Markdown

The OpenHPC projects publishes similar documentation which is already end to end tested on bare metal systems:

RHEL 10 and clones:

RHEL 9 and clones:

Just in case you are looking for some additional inspiration or (even better) a place to contribute.

The documentation is available here: https://github.com/openhpc/ohpc/tree/4.x/docs/install/templates/provisioner/openchami

The nice thing about the OpenHPC documentation is that it gives you a pdf as mentioned above but also a recipe.sh which runs all the steps from the PDF and at the end you have a working cluster. Similar to your python installer. See https://repos.openhpc.community/results/4/4.1/0-LATEST-OHPC-4.1-almalinux10-openchami-ethernet-gpu-none-x86_64-slurm/console.out for a complete test run of a cluster installation with two compute nodes using slurm and running over 1200 tests on top of it to make sure everything works. From the provisioning to the resource manager and three different MPI stacks.

Comment thread openchami.spec
fi

%post
# Create a shared python virtual environmnet in which to install
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not from OpenCHAMI, but building a lot of RPMs for OpenHPC, so you definitely can ignore my comments. Having something like this in %post would be something I do not would want to see on my systems. I would not be happy to have some by RPM untracked files on my system installed. This looks like it will download the dependencies from the Internet.

This should happen inside the %build or %install section. But, I am not from OpenCHAMI, so just ignore it.

Copy link
Copy Markdown
Contributor Author

@erl-hpe erl-hpe May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason my previous response did not show up under this conversation, and then when I tried to get it back there it got lost. This is okay because I have had a chance to give this a bit more thought...

First off, thanks for bringing this up. It has given me some valuable things to think about.

I see two separate concerns here, both of which need responses. The first concern is that the RPM is actively installing content using 'pip' during the %post phase, which means it does not simply define RPM dependencies and install them. This is necessary in order to preserve the noarch character of the RPM, since some of the content being installed during %post is architecture specific and needs to be installed with a knowledge of the host architecture on which OpenCHAMI is being installed. The content being installed here is, specifically, a set Python modules that are being installed into the shared virtual environment used exclusively by the OpenCHAMI Installer. That shared virtual environment and the wrapper that invokes the OpenCHAMI Installer from that environment is removed in %postun so no residue of the installer or its dependencies remain on the system after removal. Unless I am missing something important, I think those factors address that concern.

The second concern has to do with pulling content from the public Internet as part of the install. This has the potential to complicate installation for an air-gapped OpenCHAMI system, since it is more difficult to collect all of the dependencies and place them onto the system prior to installing OpenCHAMI from its RPM. On thinking more about that issue, however, I recognize that installation and deployment of OpenCHAMI is already highly dependent on access to the public Internet for content. The OpenCHAMI Release installation does not provide the container images called for by the quadlet configuration for OpenCHAMI nor does it provide the S3 and Registry container images. The OpenCHAMI Release installation also does not provide the boot image data used to build compute node images. In short, installation on an air-gapped system calls for something more than what we currently provide as procedures for installing OpenCHAMI, whether manual or automated. That seems like a problem worth addressing, but not one that needs to be addressed by this PR.

Signed-off-by: Eric Lund <77127214+erl-hpe@users.noreply.github.com>
@erl-hpe erl-hpe force-pushed the add-install-openchami-tool branch from 053bc4a to 24016f2 Compare May 14, 2026 13:24
@alexlovelltroy
Copy link
Copy Markdown
Member

Since this is fully self-contained, I'm curious if we could make the installer a separate repo that installs the OpenCHAMI rpm as a dependency?

@erl-hpe
Copy link
Copy Markdown
Contributor Author

erl-hpe commented May 20, 2026

@alexlovelltroy with respect to making the installer a separate repository, that could certainly be done, but I think it undermines one of the goals of the effort, which is to ensure that the installer represents a correct way to install any given OpenCHAMI release at any point in time. To the extent that OpenCHAMI evolves and grows the installer will be required to reflect that evolution and growth from OpenCHAMI release to OpenCHAMI release. By including the installer in the release, the evolution of the installer along with OpenCHAMI is assured, and the relationship between a given release of OpenCHAMI and the installer required to install that release of OpenCHAMI remains invariant and obvious.

If the installer were moved to its own repository, then breaking changes to OpenCHAMI release (like the addition of a new service and its associated configuration) could happen without the installer being aware of them. Furthermore, it would become necessary to create some kind of compatibility matrix between OpenCHAMI releases and installer releases so that anyone trying to use the installer to deploy OpenCHAMI knows what version of OpenCHAMI Release to use for that deployment. I think that adds unnecessary maintenance and coordination complexity to OpenCHAMI as a whole.

It make sense for OpenCHAMI Release to be at the center of OpenCHAMI (at least for those who want to use it that way) and for it to provide its own tool for deploying OpenCHAMI that remains consistent and properly maintained as it moves forward.

[I responded here instead of in a response to a thread because GitHub did not offer me a way to respond to the thread. There may still be issues with my relationship to this repository]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: an OpenCHAMI Installation / Deployment Tool based on the Tutorials

3 participants