Semi aggregated query for hardware details summary endpoint#1832
Semi aggregated query for hardware details summary endpoint#1832alanpeixinho wants to merge 1 commit intokernelci:mainfrom
Conversation
9c304aa to
11b5c99
Compare
162a78e to
93f201f
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
Seems like there's some discrepancy between the listing and the details, specially with larger hardware such as kubernetes
|
Good code though, very organized 👍 |
0002e6c to
a19b158
Compare
I happened due to a misunderstanding in the filtering of dummy builds. Should be correct now. |
| "base_hardware, filters", | ||
| [ | ||
| (ASUS_HARDWARE, {"config_name": "defconfig+kcidebug+x86-board"}), | ||
| (ASUS_HARDWARE, {"config_name": "defconfig"}), |
There was a problem hiding this comment.
Check if this is change is correct.
| "base_hardware, filters", | ||
| [ | ||
| (ASUS_HARDWARE, {"architecture": "i386"}), | ||
| (ASUS_HARDWARE, {"architecture": "asus-CM1400CXA-dalboz"}), |
There was a problem hiding this comment.
Check if this is change is correct.
There was a problem hiding this comment.
Pull request overview
This PR refactors the Hardware details summary endpoint to use a semi-aggregated SQL query, aiming to reduce row counts returned from the DB and improve page load performance.
Changes:
- Replace per-record processing with server-side aggregation via new
get_hardware_details_summary()query. - Add filter prefetch/sanitization (
get_hardware_details_filters) and status-filter validation. - Update summary aggregation logic and adjust integration tests to match new behaviors.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
backend/kernelCI_app/views/hardwareDetailsSummaryView.py |
Reworks endpoint logic to consume aggregated rows and rebuild summary/common/filters. |
backend/kernelCI_app/queries/hardware.py |
Adds new aggregated summary query + filter discovery query; refactors query_records. |
backend/kernelCI_app/helpers/filters.py |
Adds status validation + helper is_filtered_out; adjusts filter handler keys. |
backend/kernelCI_app/typeModels/common.py |
Extends StatusCount.increment() to support incrementing by an arbitrary count. |
backend/kernelCI_app/typeModels/commonDetails.py |
Adds __add__/__iadd__ to BuildArchitectures to support aggregation. |
backend/kernelCI_app/tests/integrationTests/hardwareDetailsSummary_test.py |
Updates expectations for invalid ID / invalid filters; adjusts some filter values. |
backend/kernelCI_app/helpers/hardwareDetails.py |
Makes issue fields access more defensive via record.get(...). |
backend/kernelCI_app/constants/localization.py |
Adds client-facing error string for invalid filters. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| clause += ( | ||
| "AND (NOT (tests.path like 'boot.%%' or tests.path = 'boot') " | ||
| f"OR tests.duration >= {duration_min})\n" | ||
| ) |
There was a problem hiding this comment.
doesn't this mean "AND (test is not boot OR test_duration >= min)"? IIUC in this case the right would be to include the test if it is boot and its duration is >= min
What I'm understanding is that you are filtering in the results of the clause, meaning that if the clause is True then the result will return. So if we want boots where the duration is within the interval, we want to include "tests that are boot and have duration greater than min and lower than max", the current code sounds like the opposite. I might be mistaken though, not sure here
There was a problem hiding this comment.
Yes, you are correct. But the idea is to NOT apply this filter on lines that are not boot (regular tests).
The idea is (for this filter):
| not boot | passes |
| boot with duration >= min | passes |
But as you mentioned before, we should include the boot checking for the regular tests as well.
I am not entirelly satisfied with this clauses. If you have any better alternative, I am willing to try.
8b78eaa to
ea1c257
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
Looks good, the behavior seems correct. Only some small changes left about the comments that were already made
|
|
||
|
|
||
| def is_valid_status(status: str) -> bool: | ||
| return status in StatusChoices or status == "NULL" or status == "null" |
There was a problem hiding this comment.
very nit: status.lower() == "null" or status.upper() == "NULL" wouldn't work?
There was a problem hiding this comment.
got it, the only problem here, is that we might accept combinations such as "Null", "nUll". But I dont think this is a big problem.
| def is_filtered_out(value: str, filter_values: set[set]): | ||
| if filter_values and value not in filter_values: | ||
| return True | ||
| return False |
There was a problem hiding this comment.
I think its more concise:
| def is_filtered_out(value: str, filter_values: set[set]): | |
| if filter_values and value not in filter_values: | |
| return True | |
| return False | |
| def is_filtered_out(value: str, filter_values: set[set]): | |
| return filter_values and value not in filter_values |
|
|
||
|
|
||
| def test_invalid_filters(invalid_filters_input): | ||
|
|
| NULL: Optional[int] = 0 | ||
|
|
||
| def increment(self, status: Optional[str]) -> None: | ||
| def increment(self, status: Optional[str], value=1) -> None: |
There was a problem hiding this comment.
maybe type value here?
| def filter_instance( | ||
| self, | ||
| *, | ||
| hardware_id: str, | ||
| config: str, | ||
| origin: str, | ||
| lab: str, | ||
| compiler: str, | ||
| architecture: str, | ||
| status: str, | ||
| known_issues: set[str], | ||
| is_build: bool, | ||
| is_boot: bool, | ||
| is_test: bool, | ||
| ) -> bool: |
There was a problem hiding this comment.
origin seems unused here
| if is_build and is_filtered_out(status, filters.filterBuildStatus): | ||
| return True | ||
| if is_boot and is_filtered_out(status, filters.filterBootStatus): | ||
| return True | ||
| if ( | ||
| is_test | ||
| and not is_boot | ||
| and is_filtered_out(status, filters.filterTestStatus) | ||
| ): | ||
| return True |
There was a problem hiding this comment.
here I think we could do:
| if is_build and is_filtered_out(status, filters.filterBuildStatus): | |
| return True | |
| if is_boot and is_filtered_out(status, filters.filterBootStatus): | |
| return True | |
| if ( | |
| is_test | |
| and not is_boot | |
| and is_filtered_out(status, filters.filterTestStatus) | |
| ): | |
| return True | |
| filter_type = self.get_filter_type(is_build, is_boot, is_test) | |
| status_filter_map = { | |
| "build": filters.filterBuildStatus, | |
| "boot": filters.filterBootStatus, | |
| "test": filters.filterTestStatus, | |
| } | |
| if is_filtered_out(status, status_filter_map[filter_type]): | |
| return True |
| if is_filtered_out(config, filters.filterConfigs): | ||
| return True | ||
| if is_filtered_out(lab, filters.filter_labs): | ||
| return True | ||
| if is_filtered_out(architecture, filters.filterArchitecture): | ||
| return True | ||
| if is_filtered_out(hardware_id, filters.filterHardware): | ||
| return True |
There was a problem hiding this comment.
these ones couldn't be a list together with compiler? or order is important here?
| if is_filtered_out(config, filters.filterConfigs): | |
| return True | |
| if is_filtered_out(lab, filters.filter_labs): | |
| return True | |
| if is_filtered_out(architecture, filters.filterArchitecture): | |
| return True | |
| if is_filtered_out(hardware_id, filters.filterHardware): | |
| return True | |
| field_checks = [ | |
| (compiler, filters.filterCompiler), | |
| (config, filters.filterConfigs), | |
| (lab, filters.filter_labs), | |
| (architecture, filters.filterArchitecture), | |
| (hardware_id, filters.filterHardware), | |
| ] | |
| if any(is_filtered_out(val, filt) for val, filt in field_checks): | |
| return True |
| if is_build: | ||
| self.increment_build( | ||
| builds_summary=builds_summary, | ||
| status_count=status_count, | ||
| architecture=architecture, | ||
| config=config, | ||
| lab=lab, | ||
| origin=origin, | ||
| known_issues=len(known_issues) - 1, | ||
| compiler=compiler, | ||
| ) | ||
|
|
||
| elif is_boot: | ||
| self.increment_test( | ||
| tests_summary=boots_summary, | ||
| status_count=status_count, | ||
| config=config, | ||
| lab=lab, | ||
| origin=origin, | ||
| known_issues=len(known_issues) - 1, | ||
| architecture=architecture, | ||
| compiler=compiler, | ||
| platform=platform, | ||
| ) | ||
|
|
||
| elif is_test: | ||
| self.increment_test( | ||
| tests_summary=tests_summary, | ||
| status_count=status_count, | ||
| config=config, | ||
| lab=lab, | ||
| origin=origin, | ||
| known_issues=len(known_issues) - 1, | ||
| architecture=architecture, | ||
| compiler=compiler, | ||
| platform=platform, | ||
| ) |
There was a problem hiding this comment.
I think we could use get_filter_type here too to avoid multiple branches, not sure though, because of the params differences
There was a problem hiding this comment.
we could go for something like:
summary_type = self.get_summary_type(is_build=is_build, is_boot=is_boot, is_test=is_test)
increment_strategy = {
'builds': partial(increment_build, self, builds_summary=builds_summary),
'boots': partial(increment_test, self, tests_summary=boots_summary),
'tests': partial(increment_test, self, tests_summary=tests_summary),
}
increment_strategy[summary_type](
status_count=status_count,
architecture=architecture,
config=config,
lab=lab,
origin=origin,
known_issues=len(known_issues) - 1,
compiler=compiler,
platform=platform,
)But I am afraid is a little bit of overengineer for 3 conditionals
There was a problem hiding this comment.
What you think of it?
3a753af to
4270998
Compare
MarceloRobert
left a comment
There was a problem hiding this comment.
From testing, seems like the hardware compatible filter is always returning empty, and the tree filter is not working
dede999
left a comment
There was a problem hiding this comment.
Review Summary
The performance strategy is solid — moving aggregation into SQL (GROUP BY + UNION ALL) to reduce rows returned to Python is the right approach. A few items to address before merge, the most critical being the SQL injection in duration clauses.
| # builds | ||
| duration_min, duration_max = builds_duration | ||
| if duration_min: | ||
| clause += f"AND builds.duration >= {duration_min}\n" |
There was a problem hiding this comment.
🔴 bad — SQL injection via f-string interpolation.
duration_min and duration_max are interpolated directly into SQL via f-strings here and on the lines below (_get_boot_test_duration_clause has the same pattern). The rest of the query correctly uses %s placeholders — these should too.
def _get_build_duration_clause(builds_duration, params: list) -> str:
clause = ""
duration_min, duration_max = builds_duration
if duration_min:
clause += "AND builds.duration >= %s\n"
params.append(duration_min)
if duration_max:
clause += "AND builds.duration <= %s\n"
params.append(duration_max)
return clauseSame fix needed for _get_boot_test_duration_clause.
There was a problem hiding this comment.
Good call, I will include named parametrization to avoid this
| false AS is_boot | ||
| FROM | ||
| builds | ||
| INNER JOIN tests ON |
There was a problem hiding this comment.
🟡 medium — The build subquery does INNER JOIN tests ON tests.build_id = builds.id, which means builds without any test records will be silently dropped from the summary. Is this intentional? If a build exists but no tests have run for it yet, it won't appear.
There was a problem hiding this comment.
@MarceloRobert , builds without associated should also be considered in such scenario?
There was a problem hiding this comment.
I don't think so, the hardwareDetails page only shows the builds related to the tests that were performed in that hardware, so if a build isn't related to any tests then it won't be shown in that hardware
| (tests.path like 'boot.%%' or tests.path = 'boot') AS is_boot | ||
| FROM | ||
| builds | ||
| inner JOIN tests ON |
There was a problem hiding this comment.
🔵 nit — Inconsistent SQL casing: inner JOIN here vs INNER JOIN on the build subquery (line ~505). Minor but easy to standardize.
| ] | ||
|
|
||
| # TODO: check if we can reuse parameters to avoid double passing | ||
| params = [*params, *params] |
There was a problem hiding this comment.
🟡 medium — params = [*params, *params] duplicates all params including commit hashes.
The TODO acknowledges this, but if the two SELECT branches ever diverge in parameter needs (one already has builds_duration_clause vs boots_tests_duration_clause), this duplication will silently break. Consider building params per-branch to make it explicit.
There was a problem hiding this comment.
Solved when changed to named parameters
| NULL_STRINGS = set(["null", UNKNOWN_STRING, "NULL"]) | ||
|
|
||
|
|
||
| def is_valid_status(status: str) -> bool: |
There was a problem hiding this comment.
🔴 bad — {*StatusChoices, "NULL"} reconstructs the set on every call, and if status is None, status.upper() will raise AttributeError.
Suggestion:
VALID_STATUSES = {choice.value for choice in StatusChoices} | {"NULL"}
def is_valid_status(status: str) -> bool:
return status is not None and status.upper() in VALID_STATUSESThere was a problem hiding this comment.
This shouldn't be a problem, as it is not in any hot path of the application, also the set is quite small.
The status is a str, not Optional[str], it should not be None.
| return not in_filter | ||
|
|
||
|
|
||
| def is_filtered_out(value: str, filter_values: set[set]): |
There was a problem hiding this comment.
🟡 medium — Type hint set[set] should be set[str] (or just set). Won't break at runtime but misleads type checkers and readers.
There was a problem hiding this comment.
Good point, let this one slip
| "test.status": self._handle_test_status, | ||
| "test.duration": self._handle_test_duration, | ||
| "build.status": self._handle_build_status, | ||
| "duration": self._handle_build_duration, # TODO: same as build.duration (should be standardized) |
There was a problem hiding this comment.
🔵 nit — If this "duration" alias is known tech debt, consider creating a ticket instead of a TODO — it may confuse future contributors about which filter key to use.
| status = instance["status"] | ||
| count = instance["count"] | ||
| incidents = instance["incidents_count"] | ||
| known_issues = set(parse_issue(issue) for issue in instance["known_issues"]) |
There was a problem hiding this comment.
🟡 medium — When instance["known_issues"] contains None entries (from array_agg when there are no incidents), parse_issue(None) returns (UNCATEGORIZED_STRING, None). These phantom tuples are then checked against filters.filterIssues, which could cause false-positive filtering.
Suggestion:
known_issues = set(parse_issue(issue) for issue in instance["known_issues"] if issue is not None)| ) | ||
|
|
||
| # ensure uniqueness on architecture and compilers (maybe we could change data structures???) | ||
| for summary in builds_summary.architectures.values(): |
There was a problem hiding this comment.
🔵 nit — The loop variable summary shadows the method parameter summary: list[dict]. Consider renaming to arch_summary or similar to avoid confusion.
| MISS=self.MISS + other.MISS, | ||
| DONE=self.DONE + other.DONE, | ||
| NULL=self.NULL + other.NULL, | ||
| compilers=self.compilers, |
There was a problem hiding this comment.
🟡 medium — __add__ only keeps self.compilers, silently discarding other.compilers. __iadd__ also doesn't merge compilers. This might be intentional (compilers are merged separately in aggregate_summaries), but it's a silent data-loss trap if these operators are used elsewhere.
There was a problem hiding this comment.
StatusCount does not have a compilers, if we do include an option to add two BuildArchitectures (which could be the case some point in future), than we would need to deal with this
There was a problem hiding this comment.
I also dont know if Build Architectures should inherit from StatusCount, composition might make more sense in this case, instead of inheritance
d0a33ef to
98ea304
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 12 out of 12 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| as known_issues, | ||
| array[builds.compiler, builds.architecture] AS compiler_arch, | ||
| builds.config_name, | ||
| builds.misc->>'runtime' AS lab, |
a6ee1b8 to
cb8b2d9
Compare
* Bring data grouped by the filter values, ensuring a mid point between bringing all the data to be aggregated and filtered on the app, and needing to query for every filter change. * Build / Test duration is an exception, as they are continuous columns, and any change will trigger a new query. * Include a validation for invalid status values. * Small frontend bugfix on rerender loop when applying trees filter.
cb8b2d9 to
366299f
Compare
Description
This implements performance improvement on the Hardware details summary endpoint.
How to test