Skip to content

feat: Initial changes for Astra support#2216

Draft
srinivasadmurthy wants to merge 16 commits into
NVIDIA:mainfrom
srinivasadmurthy:sdmvr
Draft

feat: Initial changes for Astra support#2216
srinivasadmurthy wants to merge 16 commits into
NVIDIA:mainfrom
srinivasadmurthy:sdmvr

Conversation

@srinivasadmurthy
Copy link
Copy Markdown
Contributor

Description

Changes to core to support both SVPC and ASTRA. SVPC is code complete. Some initial changes for Astra.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4661870d-eac2-4775-8755-f40aaaab94bc

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@srinivasadmurthy
Copy link
Copy Markdown
Contributor Author

/ok to test f65bcb9

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-06-04 22:06:16 UTC | Commit: f65bcb9

@srinivasadmurthy
Copy link
Copy Markdown
Contributor Author

/ok to test 37ce636

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 116 13 50 39 4 10
nico-nsm 133 11 45 64 11 2
nico-psm 118 13 52 39 4 10
nico-rest-api 182 16 84 63 7 12
nico-rest-cert-manager 95 5 47 30 3 10
nico-rest-db 116 13 50 39 4 10
nico-rest-site-agent 115 13 50 39 3 10
nico-rest-site-manager 102 6 48 35 3 10
nico-rest-workflow 118 13 52 39 4 10
TOTAL 1095 103 478 387 43 84

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
@srinivasadmurthy
Copy link
Copy Markdown
Contributor Author

/ok to test eec95f7

#[clap(help = "PCI name (e.g. 5e:00.0)")]
pub pci_name: String,
#[clap(help = "Interface type (e.g. SVPC or ASTRA)")]
pub interface_type: i32,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be able to be a DpaInterfaceType -- there will be the correct implementations on it to allow it to naturally be set as svpc or astra (or whatever the Display variants are) -- and then you get validation for free

Comment thread crates/api-core/src/handlers/dpa.rs Outdated
Comment thread crates/api-core/src/handlers/dpa.rs
CREATE TYPE dpa_interface_type AS ENUM ('Svpc', 'Astra');

ALTER TABLE dpa_interfaces
ADD COLUMN interface_type dpa_interface_type NOT NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...so i guess we're just standardizing as "DPA" on the name for E/W then? lol. it works. since S-VPC is the new FW name, it's easy to roll off the tongue to say "DPA" as E/W.

..but this also means, what if we have an E/W card that we won't run SVPC on, and its not ASTRA, but we want to manage it? I think it makes sense to have some other variant that still gives us the ability to flash stock firmware, put mlxconfig parameters on it, and lock it down, even if its not SVPC.

Comment thread crates/api-model/src/dpa_interface/mod.rs
Comment thread crates/dpa-manager/src/card_handler/astra.rs Outdated
Comment thread crates/dpa-manager/src/card_handler/astra.rs Outdated
Comment thread crates/dpa-manager/src/card_handler/astra.rs
Comment thread crates/dpa-manager/src/card_handler/astra.rs Outdated
client: Arc<MqtteaClient>,
dpa_info: &Arc<DpaInfo>,
hb_interval: TimeDelta,
metrics: &mut DpaMonitorMetrics,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could break this down into a couple of "contexts" (like a context with services, and a context with data), so we could get rid of #[allow(clippy::too_many_arguments)] -- or if any of those could be a part of the SvpcInterfaceHandler then you could just refer to them as self.


let mut vni = 0_u32;

let mut this_nic_configured_attachments = spx_config
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah again, we do try to avoid mut -- i think it's even in the STYLE_GUIDE.md -- sometimes its an annoying exercise, sometimes its a fun exercise, sometimes its a hopeless exercise, but if you can rewrite the flow such that you avoid mut, it usually comes out very idiomatic

.to_string(),
));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing you could do is express this as "configured" and "observed" -- you already got the "observed" part, so rephrase the top half as the "configured" half and it will naturally all come together later. Like you could introduce a new at_most_one:

fn at_most_one<I: Iterator>(mut iter: I, ctx: &str) -> DpaManagerResult<Option<I::Item>> {
    let first = iter.next();
    if first.is_some() && iter.next().is_some() {
        tracing::error!("{ctx}: more than one match");
        return Err(DpaManagerError::InvalidArgument(format!("{ctx}: more than one match")));
    }
    Ok(first)
}

...and let configured become:

let configured = at_most_one(
    instance.config.spxconfig.spx_attachments
        .iter()
        .filter(|a| a.mac_address.as_deref() == Some(this_mac_str.as_str())),
    "reconcile_assigned_state configured attachments",
)?;

...and then observed becomes:

let observed = at_most_one(
    machine.spx_status_observation
        .iter()
        .flat_map(|o| &o.spx_attachments)
        .filter(|a| a.mac_address == this_mac),
    "reconcile_assigned_state observed attachments",
)?;

and then your need_creation can turn into an action of sorts match configured?

  • Nothing configured, then the action is SetVni 0
  • Something configured, then look up the VNI and see if it needs to be SetVni or a Heartbeat.

And then based on the action, you either monitor.send_set_vni_command or monitor.do_heartbeat.

The action could probably be some type of new MonitorAction enum with SetVni(vni, version) or Heartbeat(vni) variants, so then the match on action can just match on the enum type.

Sorry, lol. Super long feedback on this.

DpaInterfaceType interface_type = 21;
}

enum DpaInterfaceType {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 0 should be our basic E/W NIC type, not SVPC or ASTRA (e.g. don't default to SVPC).


enum DpaInterfaceType {
Svpc = 0;
Astra = 1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can SVPC and ASTRA these -- Rust will naturally stub them out as Svpc and Astra. If you want to be compatible with other proto parsing libraries (whatever the C one is), these should be:

DPA_INTERFACE_TYPE_SVPC
DPA_INTERFACE_TYPE_ASTRA

Rust knows the right thing to do and still calls them Svpc and Astra in its generated code.

Signed-off-by: Srinivasa Murthy <srmurthy@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants