-
Notifications
You must be signed in to change notification settings - Fork 33
SynaXG plugin dev scheme review #626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: einsteinXue The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Hi @einsteinXue. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/ok-to-test |
|
@einsteinXue: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@bn222 some tests were fail. |
|
|
||
| // Firmware image path/package path (e.g. /quay.io/openshift/firmware/dpu:v1.0.8) | ||
| // +kubebuilder:validation:Required | ||
| FirmwarePath string `json:"firmwarePath,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having your firmware inside a container (using it as File System) works for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for the POC provided by you, it works now. We already verfied it in SynaXG-operator v1.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a default firmware version hard-coded in the code. Since we want to ship the whole stack, we need full control of all components including the firmware. Otherwise we can't ensure that it works e2e.
|
|
||
| // Detailed configuration for firmware upgrade, required when Operation is upgrade type | ||
| // +kubebuilder:validation:RequiredWhen=Operation,FirmwareUpgrade | ||
| Firmware DpuFirmwareSpec `json:"firmware,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration for fw upgrade? That's new to me. Please explain the situation where that would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take the SynaXG card as an example: the card contains an internal SDK. To upgrade this SDK, we want to avoid manual intervention and instead automate the process via an Operator.
It is important to note that the SynaXG card does not run Kubernetes or OpenShift (OCP); instead, it runs OAM (Operations, Administration, and Maintenance) software, which communicates with the SynaXG VSP through gRPC. The SynaXG VSP downloads the SDK image and transmits it to the SynaXG card via gRPC, where the local OAM software then executes the SDK installation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host runs OCP, and that means that for it to be compatible with what's on the card even if custom things run on the card. Once the host is implicated in any way, there is at least some level of compatibility that needs to be taken into account.
Are we talking about custom firmware? Where is the firmware released?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The host runs OCP, and that means that for it to be compatible with what's on the card even if custom things run on the card. Once the host is implicated in any way, there is at least some level of compatibility that needs to be taken into account.
CustomSoftware on card is just a gRPC server. vsp has a gRPC client(connected with the server on card). This means host side and card side are compatible.
Could this implementation potentially deviate from the intended architecture of the dpu-operator?
Are we talking about custom firmware? Where is the firmware released?
Yes. we released the firmware as a docker image here:
https://quay.io/repository/synaxgcom/sdk-img?tab=tags
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are "supporting SynaXG" from the host side, it means we need to know what firmware is on the card (not what it does, only that it's a specific version). Can we extend the API to report the firmware version as a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on the specific concerns you have regarding this?
Do you mean we shouldn't have 3 fields in DpuFirmwareSpec , instead just a firmware version is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we simply allow an opaque blob to configure the card, we don't have any way to have coverage for this. It opens the door to allow any firmware and any configuration to be programmed. At least, we should have some check somewhere that ties down what is allowed in OCP. This should hold even if it's on the card, since proper functioning of the OCP deployment depends on what's running on the card.
We want to manually control what is allowed. If you have a specific FW version (the ones you linked on quay), we need the code to check if it's a good firmware to use. This way, our QE knows what to test.
| CompletionTime *metav1.Time `json:"completionTime,omitempty"` | ||
|
|
||
| // Upgrade-related versions (valid only when SubOperation is FirmwareUpgrade) | ||
| OriginalVersion string `json:"originalVersion,omitempty"` // Version before upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why omitempty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This DpuNodeOperationStatus field manages both 'Reboot' and 'Firmware Upgrade' status. Note that OriginVersion is not required for 'Reboot' tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DPU config CR is user facing, and I'm wondering what this field gives us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you upgrade the firmware ,this field will tell you the firmware version before upgrade, so that the user can compare it with the targetVersion.
But the name of this field is not accurate, maybe previousVersion is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previousVersion is indeed better. It seems to me that there is always some version running on the card. Can we report the previous version when we've detected it and remove the "omit empty" portion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no problem
| // DataProcessingUnitConfigStatus defines the observed state of DataProcessingUnitConfig. | ||
| type DataProcessingUnitConfigStatus struct { | ||
| // DpuNodeOperationStatus defines the observed state of DataProcessingUnitConfig. | ||
| type DpuNodeOperationStatus struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make sure that we can use kubectl wait ... to block on firmware upgrades.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that the update logic for the Phase field must be highly reliable? In other words, it must accurately reflect the actual real-time status of the operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We have the right infrastructure in place with APIs to ensure that we know when the card is up. We already ping the card from the host, and we track the state of the success of the ping. You could hook into that logic.
| @@ -517,6 +517,126 @@ func (x *PingResponse) GetHealthy() bool { | |||
| return false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're going to need 2 commits, one for the actual changes, and one for the generated code to make review and history easier. That means one commit won't compile. I know that's painful. We've decided we favor review-ability instead of having being able to compile each commit. Note, we should still be able to compile each PR in its entirety.
| type DataProcessingUnitConfigReconciler struct { | ||
| client.Client | ||
| Scheme *runtime.Scheme | ||
| vsp *plugin.VendorPlugin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to pass in a vsp? We are going to use dpu selectors. Please refer to https://docs.google.com/document/d/1niEIreHlMsnFF4Ol2RidvHBuTpXcRphLKGY7Ebtzo-0/edit?tab=t.0 and implement that design.
| } else { | ||
| //if label not matched, skip processing | ||
| log.Info("Labels are not matched, skip processing") | ||
| return ctrl.Result{}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the err handling?
| } | ||
|
|
||
| //f the CR is targeted at DPU rebooting | ||
| if dpuConfig.Spec.DpuManagement.Operation == "Reboot" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else if
|
|
||
| //f the CR is targeted at DPU rebooting | ||
| if dpuConfig.Spec.DpuManagement.Operation == "Reboot" { | ||
| r.vsp.RebootDpu(r.pciAddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep level of abstraction as you move deeper in the callstack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that pciAddr shouldn't be exposed at this level, and instead should be encapsulated within the vsp client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. We don't use pci address to identify cards. We use the DPU CRs (see the doc that I linked earlier) and DPU selectors.
|
|
||
| // Extract noodeName and pci-address from selector | ||
| if dpuConfig.Spec.DpuSelector != nil { | ||
| nodeName, pciAddr, err = GetNodenameAndPCIAddressFromSelector(dpuConfig.Spec.DpuSelector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we already do this, why then do we need to pass in vsp in the cr.
General rule: We don't expect the user to repeat themselves. If we can derive fields from other fields, then they will only need to provide the information once.
|
|
||
| func (r *DataProcessingUnitConfigReconciler) CheckPCIAddressExists(pciAddr string) (string, error) { | ||
| // check /sys/bus/pci/devices/pciAddr | ||
| cmd := exec.Command("ls", fmt.Sprintf("/sys/bus/pci/devices/%s", pciAddr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstract all file system access (see tests).
| } | ||
|
|
||
| func (r *DataProcessingUnitConfigReconciler) getCurrentNodeName() string { | ||
| return os.Getenv("MY_NODE_NAME") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're increasing cohesion within the patch between pieces of code. We don't want some code to depend on the node name and then have other piece of code again depend on node name. Can we please use the serial number concept that we're already using elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do it.
BTW: currently dpu-operator only supports 1 DPU per node, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have not yet tested multiple DPUs per node. However, in the latest refactor, we have laid the foundation for multiple DPUs.
Without testing, and looking at the code, it's more likely that a node with 2 DPUs from different vendors will work as opposed to a node with 2 DPus from the same vendor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see the current status.
Testing nodes with up to four SynaXG DPUs is a potential requirement for our project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good and most of the groundwork to support that is already here.
| return g.dsClient.SetNumVfs(context.Background(), c) | ||
| } | ||
|
|
||
| func (g *GrpcPlugin) RebootDpu(dmr pb.DPUManagementRequest) (*pb.DPUManagementResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing where you're checking the health so that you know you can complete the reboot operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I do miss the health check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying you don't know where it is in the code? Or are you saying you will adjust the patch to add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I will adjust the patch.
| } | ||
|
|
||
| func (g *GrpcPlugin) UpgradeFirmware(dmr pb.DPUManagementRequest) (*pb.DPUManagementResponse, error) { | ||
| err := g.ensureConnected() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing where you're checking the health so that you know you can complete the update operation.
| pathManager utils.PathManager | ||
| stopRequested bool | ||
| dpListener net.Listener | ||
| dpuMgrClient pb2.DataProcessingUnitManagementServiceClient |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dpuManagerClient.
| // FIXME: Must be a unique value on the DPU that is non changing. | ||
| func (d *SynaXGDetector) DpuPlatformIdentifier(platform Platform) (plugin.DpuIdentifier, error) { | ||
| return plugin.DpuIdentifier("SynaXG-dpu"), nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add test. Patch one: add the SynaXG detector (if we really need it) + test, Patch 2: rest.
bn222
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General idea is ok. Make sure to align with https://docs.google.com/document/d/1niEIreHlMsnFF4Ol2RidvHBuTpXcRphLKGY7Ebtzo-0/edit?tab=t.0
Thank you so much! I will go through your comments one by one and address them accordingly. |
|
Thank you. Please make the PR also pass most of the CI lanes. |
Hi @bn222 @wizhaoredhat @thom311
I have submitted a pull request for the review of the design scheme. Would you be kind enough to help assess its rationality?
Certain implementation details — including the Dockerfile, Makefile, and SynaXG Plugin logic — have not yet been fully finalized. Please disregard these items for the time being and focus solely on the design scheme. Thank you.
First, I would like to provide some background:
The SynaXG card will not have Kubernetes or OpenShift Container Platform (OCP) installed. As such, the SynaXG VSP will run exclusively on the host side. The VSP’s core functions are twofold:
1) To perform DPU reboot by unbinding the target PCI address.
2) To implement firmware upgrade via gRPC — specifically, a gRPC server runs on the SynaXG card itself, while the corresponding gRPC client is deployed within the VSP pod. (This gRPC is only for firmware upgrade, has nothing to do with dpu-operator)
Let me briefly introduce my idea:
A "DataProcessingUnitConfig" CRD will be created for each DPU.
Daemon pods will be deployed on all DPU nodes in the cluster, meaning "HostSideManagers" will run on every DPU node.
The "DataProcessingUnitConfigReconciler" is configured within the "HostSideManager", so each DPU node can monitor changes to the "DataProcessingUnitConfig" CRD.
When a user adds "nodeName" and specifies "reboot DPU" in the CRD, each "DataProcessingUnitConfigReconciler" will verify whether its node is the target node. If the labels match, the "DataProcessingUnitConfigReconciler" will call the gRPC method to execute the reboot operation.
From my understanding, the gRPC connection is already established by the "HostSideManager"—thus, when the "DataProcessingUnitConfigReconciler" is initialized, "vsp" is passed in as an input parameter.