Skip to content

HYPERFLEET-531 - feat: Add Hyperfleet release process spike report#83

Open
86254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-531
Open

HYPERFLEET-531 - feat: Add Hyperfleet release process spike report#83
86254860 wants to merge 2 commits intoopenshift-hyperfleet:mainfrom
86254860:HYPERFLEET-531

Conversation

@86254860
Copy link
Contributor

@86254860 86254860 commented Jan 29, 2026

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive release process guide covering release entry/readiness criteria, branching and code‑freeze workflows, multi‑gate governance, post‑code‑freeze bug/hotfix handling, release artifacts and publishing (containers, charts, tags), templates for notes/upgrades and ad‑hoc requests, a Prow‑based MVP with a Konflux post‑MVP migration plan, suggested 3‑week sprint cadence with ad‑hoc releases, testing/security gates, and success metrics.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 29, 2026

Walkthrough

Adds a new comprehensive release-process spike document at hyperfleet/docs/release-process-spike-report.md specifying release entry criteria, branching and code-freeze workflows (release branches, RCs, GA), a 3‑week sprint cadence with ad‑hoc releases, multi‑gate readiness checks, post‑freeze bug and hotfix handling, release artifacts (container images, Helm charts, adapters, Git tags, release repo), documentation and testing requirements, security gates, governance, MVP (Prow manual releases) → Post‑MVP (Konflux) migration plan, templates, appendices, and success metrics.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Dev as Developer
    participant Repo as Git Repo
    participant CI as CI System (Prow / Konflux)
    participant Registry as Image Registry
    participant ReleaseRepo as Release Repository
    participant Ops as Operations

    Dev->>Repo: Push feature branch / open PR
    Repo->>CI: Trigger CI (tests, build)
    CI->>Repo: Report status (pass/fail)
    CI->>Registry: Publish build artifact (on success)
    Note over CI,Repo: Evaluate release entry criteria & readiness gates
    Dev->>Repo: Create release branch / tag (RC)
    Repo->>CI: Trigger release pipeline
    CI->>ReleaseRepo: Publish release artifacts (charts, manifests, notes)
    CI->>Registry: Push release images (tagged)
    ReleaseRepo->>Ops: Provide release bundle & release notes
    Ops->>Registry: Deploy to environments (canary -> GA)
    Ops->>Repo: Report deployment status / feedback
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • rh-amarin
  • ciaranRoche
  • xueli181114
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding a HyperFleet release process spike report document.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Around line 638-665: Replace emphasized section titles that use bold (e.g.,
"**1. Conduct Retrospectives and Identify Improvements**", "**2. Migrate to
Konflux for Official Releases**", "**Why Konflux:**", "**Migration Approach:**",
"**3. Additional Process Improvements**") with proper Markdown headings (e.g.,
"#", "##", or "###" as appropriate) so they are true headings instead of
emphasis; update the three numbered/section headings to at least "##" and the
subheadings like "Why Konflux:" and "Migration Approach:" to "###" for
consistent document structure and to satisfy MD036.
- Line 79: The fenced code block at the shown diff is missing a language
identifier and triggers MD040; update the opening fence from ``` to include a
language (e.g., change the opening fence to ```text or ```bash) so the block
becomes a tagged code fence; ensure the corresponding closing fence remains ```
and leave the block contents unchanged.
- Line 250: The fenced code block in
hyperfleet/docs/release-process-spike-report.md is missing a language identifier
(triggering MD040); update the opening fence from ``` to ```text (or another
appropriate language like ```text) for the code block that contains the flow
diagram (the sequence starting with "Bug Reported") so the linter recognizes it
as a code block and MD040 is satisfied.
- Line 303: A fenced code block in the markdown ends with a bare triple backtick
(```); MD040 requires a language identifier—update that fenced block (the
trailing/backtick-only block shown in the diff) to include a language token like
text (e.g., change ``` to ```text) so the block becomes ```text and closes
properly; ensure the same fenced block that contains "Developer → Code Review →
Release Owner → Automated Tests → Merge" is updated.
🧹 Nitpick comments (1)
hyperfleet/docs/release-process-spike-report.md (1)

400-400: Consider tightening wording (“under discussion”).
LanguageTool notes this as wordy; “proposed” or “being considered” reads tighter.

✅ Suggested tweak
-- Note: Umbrella chart strategy (hyperfleet-chart repo) is under discussion
+- Note: Umbrella chart strategy (hyperfleet-chart repo) is being considered

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Line 199: Rename the section header "### 3.4 Documentation Completeness
(Mandatory)" to "### 3.3 Documentation Completeness (Mandatory)" and renumber
the following headers accordingly: change "3.5 Cross-Team Coordination" to "3.4
Cross-Team Coordination", "3.6 Security & Compliance" to "3.5 Security &
Compliance", and "3.7 Release Artifacts Verification" to "3.6 Release Artifacts
Verification" so the sequence reads 3.2 → 3.3 → 3.4 → 3.5 → 3.6; update any
in-file references to these section numbers if present.

@86254860 86254860 force-pushed the HYPERFLEET-531 branch 2 times, most recently from 0d6b5cb to 5d202ca Compare January 29, 2026 05:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Around line 98-104: The document has conflicting support-window language
between the "After GA" branching diagram and the "Section 2.4" paragraph; pick
one explicit policy and update both places to match (e.g., replace "support
window: 12 months" in the "After GA" diagram with the exact wording used in
Section 2.4, or update Section 2.4 to state "12 months" if that is the chosen
policy), and ensure the same canonical phrase appears in both the "After GA"
diagram block and the Section 2.4 text so the policy is unambiguous throughout.
🧹 Nitpick comments (1)
hyperfleet/docs/release-process-spike-report.md (1)

416-456: Make templates explicitly “examples” or use placeholders.

The release notes and appendix templates use concrete versions/dates (e.g., v1.5.0, 2026‑05‑12, April 14, 2026), which can be misread as committed schedules. Consider labeling these blocks as “Example” and/or replacing with placeholders (e.g., vX.Y.Z, YYYY‑MM‑DD).

♻️ Example tweak (one option)
-# HyperFleet v1.5.0 Release Notes
+# HyperFleet vX.Y.Z Release Notes (Example)

-## [1.5.0] - 2026-05-12
+## [X.Y.Z] - YYYY-MM-DD

-- Feature Freeze: April 14, 2026
-- Code Freeze: April 28, 2026
-- GA Target: May 12, 2026
+- Feature Freeze: YYYY-MM-DD
+- Code Freeze: YYYY-MM-DD
+- GA Target: YYYY-MM-DD

Also applies to: 508-529, 732-736


### 2.4 Release Branch Maintenance

**Support Policy:** N-2 OR 6 months (whichever is longer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if my math is mathing here, but N-2 with a 6 week release cycle, does this not mean 12 weeks of support vs 6 months 🤔 so wouldn't 6 months be defacto?

Copy link
Contributor Author

@86254860 86254860 Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - the math wasn't working as intended! Therefore, I've updated the policy to use lifecycle stages instead:

  • Every release gets exactly 6 months of support from GA date
  • Phase 1 (months 0-3): Full support - all Major+ bug fixes
  • Phase 2 (months 3-6): Security maintenance - only CRITICAL/HIGH CVEs and Blockers
  • After 6 months: EOL

This could:

  • Gives users a clear, predictable 6-month upgrade window regardless of our release cadence
  • Naturally reduces maintenance burden for developers (security-only after month 3)

- [ ] Other: [specify]

## Rollback Plan
How will we rollback if issues are discovered?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are rollbacks something we want to support, or should we always roll forward?

Copy link
Contributor Author

@86254860 86254860 Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the hyperfleet-api database as an example: if a database schema change is involved and the offering team encounters an emergency in production, the fastest and simplest mitigation is usually to roll back. Without a defined rollback plan, how is the offering team expected to handle such a situation?

Here's some analysis from AI.

Roll-forward (Always fix forward):

  • ✅ Safer for Kubernetes environments (CRDs, state changes, data migrations)
  • ✅ Aligns with GitOps and immutable infrastructure principles
  • ✅ Forces proper testing and validation of fixes
  • ✅ Simpler support matrix (no need to maintain rollback compatibility)
  • ❌ Slower response time (need to build/test/release new version)
  • ❌ Requires discipline to fix issues quickly

Rollback (Support reverting):

  • ✅ Faster emergency response (immediate revert)
  • ✅ Buys time to properly fix issues
  • ❌ Complex with CRDs, state changes, database schemas
  • ❌ Requires bi-directional migration support (N → N-1)
  • ❌ More testing burden (must test both upgrade and downgrade paths)

My opinion for HyperFleet, WDYT?

  • Primary strategy: Roll-forward (fix issues with new patch release)
  • Emergency escape hatch: Rollback support (only for critical/blocker scenarios where roll-forward would take too long)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean towards roll-forward, until such time that we require rollback support. Putting roll back support right now will stretch the testing capacity we have before the end of Q1, I would focus this doc on roll-forward only and create a separate epic in our backlog for roll-back support post Q1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### 2.2 Timeline and Freeze Process

**Sprint-Based Release Cycle (6 weeks / 2 sprints):**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 weeks seems pretty long to me, considering how new the project is, I think it better if we tighten up on this and push releases a lot quicker so that we can get early feedback loops between us and the pillar teams. As the project matures we can look to extend the release cadence but I think we should focus on getting the release cycle as fast as possible.

2 weeks given to stabilization and code freeze, I think with the right CI/CD in place we can tighten this up a lot

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add if we can ship a hotfix in 48 hours, what is the reason 'features' take 6 weeks 🤔

Not saying we need to release every 2 days but just worried how things can by pass the process and be labeled as a hotfix to get a feature out ASAP

Copy link
Contributor Author

@86254860 86254860 Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial thought is that in the initial releases, the CI/CD and E2E testing automation is currently being built, the Stabilization & Release Phase might take longer.

And yes as you mentioned it's very important four our new product to get early and rapid feedback with pillar teams. Therefore, I've updated cadence from 6 weeks to 3 weeks. And I also added some note "Since CI/CD automation is currently being built, the Stabilization & Release Phase (Week 3) may take longer in the initial releases. As the product matures and automation capabilities improve, the team should continuously refine and optimize the release cadence based on actual data and lessons learned from each release cycle."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good to me, a release per sprint feels a lot better. We should definitely refine this as we go 🙏

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Around line 171-176: The bold phase labels "**Phase 1: Full Support (first 3
months)**" and "**Phase 2: Security Maintenance (months 3-6)**" should be
converted to Markdown subheadings (e.g., use "#### Phase 1: Full Support (first
3 months)" and "#### Phase 2: Security Maintenance (months 3-6)") so the section
titles are proper headings; keep the subsequent bullet lists unchanged and
ensure there is a blank line before each new heading to satisfy markdownlint
MD036.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Around line 755-763: The timeline section under "Release v1.5.0 Tracking
Issue" contains incorrect weekday labels for dates; update the lines "Feature
Freeze: April 15, 2026 (Monday, Week 3)", "Code Freeze: April 17, 2026
(Thursday, Week 3)", and optionally "GA Target: April 19, 2026 (Friday, Week 3)"
by either removing the parenthesized weekday labels entirely (recommended) or
replacing them with the correct weekdays for those dates, keeping the date
strings "Sprint Start: April 1, 2026", "Feature Freeze: April 15, 2026", "Code
Freeze: April 17, 2026", and "GA Target: April 19, 2026" intact so the timeline
no longer shows inconsistent weekday/date pairs.
🧹 Nitpick comments (1)
hyperfleet/docs/release-process-spike-report.md (1)

471-480: Use placeholders for sample versions/dates in templates.

The example release notes and dependency versions (e.g., “Go: 1.25”, “2026-05-12”) may go stale or imply commitments. Prefer placeholders so teams fill in current values per release.

✅ Suggested fix (placeholder examples)
-## [1.5.0] - 2026-05-12
+## [X.Y.Z] - YYYY-MM-DD
...
-- Go: 1.25
-- Base image: gcr.io/distroless/static-debian12:nonroot
-- Helm: 3.14+
+- Go: X.Y
+- Base image: <image-ref>
+- Helm: X.Y+

Also applies to: 533-555

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Line 217: Section 3.2 and 4.2 contain conflicting policies about
Major-severity bugs (3.2 at line 217 says "No Major severity bugs" as a hard GA
gate while 4.2 at lines 304-307 allows deferral); reconcile them by choosing one
of the three recommended resolutions and apply consistent wording: either soften
3.2 to "No Blocker/Critical bugs; Major bugs evaluated by Release Owner and may
be deferred with documented risk", or strengthen 4.2 to require Major fixes
before GA (or mandate release delay/severity downgrade if not fixable), or add
an explicit exception clause in 3.2 stating Major bugs may be accepted only with
documented stakeholder approval and workarounds; update both Section 3.2 and 4.2
text so they match the chosen policy and add a short sentence referencing
"Release Owner" and "stakeholder approval" where applicable.
🧹 Nitpick comments (2)
hyperfleet/docs/release-process-spike-report.md (2)

428-428: Provide timeline for umbrella chart strategy decision.

The note "under discussion" is clear, but a release process document ideally should have resolved strategies. Consider adding a timeline for when this decision will be made (e.g., "Decision expected by vX.Y.0 release" or "To be resolved in Q2 2026").

This helps teams understand whether they should plan for a unified chart or continue with individual component charts.


759-762: Use placeholder date format for template consistency.

The Release Tracking Issue template uses specific dates (April 1, 15, 17, 19, 2026) while the Ad-Hoc Release Request template (Appendix D) correctly uses "YYYY-MM-DD" placeholders. For consistency and to prevent copy-paste errors, consider using:

-- Sprint Start: April 1, 2026
-- Feature Freeze: April 15, 2026
-- Code Freeze: April 17, 2026
-- GA Target: April 19, 2026
+- Sprint Start: YYYY-MM-DD (Week 1, Day 1)
+- Feature Freeze: YYYY-MM-DD (Week 3, Day 1)
+- Code Freeze: YYYY-MM-DD (Week 3, Day 3)
+- GA Target: YYYY-MM-DD (Week 3, Day 5)

Or use relative notation like "Sprint Day 1", "Sprint Day 15", etc.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Around line 756-796: The Release Candidates checklist contains incorrect
weekday labels for the April 2026 dates (e.g., "v1.5.0-rc.1 (April 15,
Monday)"). Update the text under the "Release Candidates" section to either
remove all weekday names or correct them to the proper weekdays (April 15 →
Wednesday, April 16 → Thursday, April 17 → Friday, April 18 → Saturday) so the
date/weekday entries in the v1.5.0-rc.1/rc.2/rc.3 lines are accurate and won't
drift.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@hyperfleet/docs/release-process-spike-report.md`:
- Around line 762-766: Update the template timeline to match the 3-week sprint
cadence: keep Sprint Start as April 1, 2026, change Feature Freeze to April
15–16, 2026, change Code Freeze to April 17–18, 2026, and change GA Target to
April 22, 2026 so the dates across the timeline in
release-process-spike-report.md align with the 3-week (21-day) sprint cadence
referenced elsewhere.
🧹 Nitpick comments (3)
hyperfleet/docs/release-process-spike-report.md (3)

536-536: Date inconsistency between changelog example and template.

The changelog example shows release date 2026-05-12 (May 12), but the release tracking template in Appendix C uses April 2026 dates (lines 762-766). For consistency and to avoid confusion, consider using the same month in both examples.

🗓️ Suggested fix
-## [1.5.0] - 2026-05-12
+## [1.5.0] - 2026-04-22

Or update both to use the same example month consistently throughout the document.


780-782: Update RC checklist dates to match corrected timeline.

If the main timeline is adjusted to properly reflect a 3-week sprint (per previous comment), update the RC checklist dates accordingly.

📅 Suggested alignment
 ## Release Candidates
-- [ ] v1.5.0-rc.1 (April 15, at Feature Freeze)
-- [ ] v1.5.0-rc.2 (April 16-17, if needed)
-- [ ] v1.5.0-rc.3 (April 18, if needed)
+- [ ] v1.5.0-rc.1 (April 16, at Feature Freeze)
+- [ ] v1.5.0-rc.2 (April 17-18, if needed)
+- [ ] v1.5.0-rc.3 (April 20-21, if needed)

431-431: Optional: Consider more concise wording.

The phrase "is under discussion" could be simplified to "is being discussed" or "is TBD" for brevity, though the current wording is acceptable for a spike document.

✍️ More concise alternatives
-- Note: Umbrella chart strategy (hyperfleet-chart repo) is under discussion
+- Note: Umbrella chart strategy (hyperfleet-chart repo) is being discussed

Or:

-- Note: Umbrella chart strategy (hyperfleet-chart repo) is under discussion
+- Note: Umbrella chart strategy (hyperfleet-chart repo) is TBD

Copy link
Contributor

@ciaranRoche ciaranRoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a small NP, but otherwise looks good to me

What testing will be deferred to next regular release?
- [ ] Full exploratory testing
- [ ] Performance regression testing
- [ ] Cross-browser testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP : cross-browser testing is not required for our system

- ✓ Performance regression tests show no degradation vs. previous release

**Build & CI Health:**
- ✓ Prow CI pipeline is green for all components on the main branch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my understanding, Prow CI pipeline is green equals to Automated Test Passed. This part mixes the Testing and Building.

  **CI/CD Pipeline Health:**                                                                                                                                                       
  - ✓ Prow CI pipeline green on main branch for all components                                                                                                                     
    - Unit tests: >70% coverage for new code                                                                                                                                       
    - Integration tests: Passing consistently                                                                                                                                      
    - E2E tests: Critical user journeys validated                                                                                                                                  
    - Performance tests: No regression show no degradation vs. previous release                                                                                                                                     
                                                                                                                                                                                   
  **Build Artifacts:**                                                                                                                                                             
  - ✓ Container images build successfully for all target architectures                                                                                                             
  - ✓ Helm charts package without errors

- Image naming: `registry.ci.openshift.org/hyperfleet/{component}:v{version}`
- api-service:v1.5.0
- sentinel:v1.5.0
- adapter-framework:v1.5.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current examples show all components using the same version (v1.5.0). I think we may need to clarify the versioning strategy here. Specifically, whether we intend to use a unified version across all components for each release, or allow components to maintain independent versions while the HyperFleet release defines a validated version set.

### 3.2 Bug Severity Gates (Mandatory)

- ✓ No open bugs with severity **Major** or above (Blocker, Critical, Major)
- ✓ Normal and Minor bugs: No gate, tracked for future releases
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MVP , we can skip normal bugs. Post MVP , let us set a more strict Quality criteria to fix normal bugs before release.

│ │
│ │ Code Freeze (critical fixes only)
│ │
│ ├─── vX.Y.0-rc.1 (Release Candidate 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is vX.Y... a branch from a branch or a tag?

Maybe it worth adding that it's a tag. Example (Tag - Release Candidate 1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants