|
| 1 | +--- |
| 2 | +description: |
| 3 | +globs: |
| 4 | +alwaysApply: false |
| 5 | +--- |
| 6 | +# Rule Name: dql-aggregation |
| 7 | +# Description: Generate DQL queries for aggregation operations including counting, summing, averaging, and other mathematical operations on graph data. |
| 8 | + |
| 9 | +# DQL Aggregation Query Patterns |
| 10 | + |
| 11 | +DQL aggregation queries perform mathematical operations and calculations on graph data, including counting nodes, relationships, and computing statistical values. |
| 12 | + |
| 13 | +Follow [dql-language.mdc](mdc:.cursor/rules/dql-language.mdc) to generate valid DQL. |
| 14 | + |
| 15 | +Respond with the parameterized query, include meaningful parameters and identifier values. Include comments in the query to explain the aggregation steps. |
| 16 | + |
| 17 | +Don't prompt the user for anything else. Just produce the query. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## 1. Basic Counting Operations |
| 22 | + |
| 23 | +**Pattern**: Count entities, relationships, or specific attribute values. |
| 24 | + |
| 25 | +**Instructions**: |
| 26 | +- Use `count()` function to count nodes or relationships |
| 27 | +- Use `count(predicate)` to count specific relationships |
| 28 | +- Combine with filters to count subsets |
| 29 | +- Use `has()` to ensure entities have the predicate before counting |
| 30 | + |
| 31 | +**Generic Template**: |
| 32 | +```dql |
| 33 | +query countEntities($filterValue: string = "FILTER_VALUE") { |
| 34 | + # Count total entities |
| 35 | + totalCount(func: has(ENTITY.IDENTIFIER)) { |
| 36 | + totalEntities: count(uid) |
| 37 | + } |
| 38 | + |
| 39 | + # Count entities with specific criteria |
| 40 | + filteredCount(func: has(ENTITY.ATTRIBUTE)) |
| 41 | + @filter(eq(ENTITY.FILTER_ATTRIBUTE, $filterValue)) { |
| 42 | + filteredEntities: count(uid) |
| 43 | + } |
| 44 | + |
| 45 | + # Count relationships per entity |
| 46 | + entityRelationCounts(func: has(ENTITY.RELATION)) { |
| 47 | + ENTITY.IDENTIFIER |
| 48 | + relationCount: count(ENTITY.RELATION) |
| 49 | + } |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +**Replace**: |
| 54 | +- `ENTITY` with the entity type |
| 55 | +- `IDENTIFIER`, `ATTRIBUTE`, `FILTER_ATTRIBUTE` with actual attribute names |
| 56 | +- `RELATION` with relationship predicate |
| 57 | +- `FILTER_VALUE` with the filter criteria |
| 58 | + |
| 59 | +--- |
| 60 | + |
| 61 | +## 2. Statistical Aggregations (Sum, Average, Min, Max) |
| 62 | + |
| 63 | +**Pattern**: Perform mathematical operations on numeric attributes. |
| 64 | + |
| 65 | +**Instructions**: |
| 66 | +- Use `sum()`, `avg()`, `min()`, `max()` functions |
| 67 | +- Apply to numeric predicates only |
| 68 | +- Combine with grouping for category-wise statistics |
| 69 | +- Use `val()` to reference computed values |
| 70 | + |
| 71 | +**Generic Template**: |
| 72 | +```dql |
| 73 | +query getStatistics($groupBy: string = "GROUP_VALUE") { |
| 74 | + # Overall statistics |
| 75 | + overallStats(func: has(ENTITY.NUMERIC_ATTRIBUTE)) { |
| 76 | + totalSum: sum(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 77 | + average: avg(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 78 | + minimum: min(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 79 | + maximum: max(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 80 | + count: count(uid) |
| 81 | + } |
| 82 | + |
| 83 | + # Grouped statistics |
| 84 | + groupedStats(func: eq(ENTITY.GROUP_ATTRIBUTE, $groupBy)) { |
| 85 | + ENTITY.GROUP_ATTRIBUTE |
| 86 | + groupSum: sum(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 87 | + groupAvg: avg(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 88 | + groupCount: count(uid) |
| 89 | + # Include entities in this group |
| 90 | + ENTITY.RELATION { |
| 91 | + RELATED_ENTITY.IDENTIFIER |
| 92 | + RELATED_ENTITY.NUMERIC_ATTRIBUTE |
| 93 | + } |
| 94 | + } |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +**Replace**: |
| 99 | +- `NUMERIC_ATTRIBUTE` with numeric predicate (e.g., `price`, `age`, `score`) |
| 100 | +- `GROUP_ATTRIBUTE` with grouping predicate (e.g., `category`, `status`) |
| 101 | +- `GROUP_VALUE` with the specific group to analyze |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## 3. Hierarchical Aggregations |
| 106 | + |
| 107 | +**Pattern**: Aggregate data across hierarchical relationships (parent-child, category-subcategory). |
| 108 | + |
| 109 | +**Instructions**: |
| 110 | +- Use nested aggregations to roll up values |
| 111 | +- Combine parent and child counts/sums |
| 112 | +- Use variables to propagate values up the hierarchy |
| 113 | +- Include both individual and cumulative totals |
| 114 | + |
| 115 | +**Generic Template**: |
| 116 | +```dql |
| 117 | +query getHierarchicalAggregation($parentId: string = "PARENT_VALUE") { |
| 118 | + # Parent level aggregation |
| 119 | + parentAggregation(func: eq(PARENT_ENTITY.IDENTIFIER, $parentId)) { |
| 120 | + PARENT_ENTITY.IDENTIFIER |
| 121 | + PARENT_ENTITY.ATTRIBUTE |
| 122 | + |
| 123 | + # Direct children aggregation |
| 124 | + directChildrenCount: count(PARENT_ENTITY.CHILD_RELATION) |
| 125 | + directChildrenSum: sum(val(PARENT_ENTITY.CHILD_RELATION.NUMERIC_ATTRIBUTE)) |
| 126 | + |
| 127 | + # Detailed children with their own aggregations |
| 128 | + PARENT_ENTITY.CHILD_RELATION { |
| 129 | + CHILD_ENTITY.IDENTIFIER |
| 130 | + CHILD_ENTITY.NUMERIC_ATTRIBUTE |
| 131 | + |
| 132 | + # Grandchildren aggregation |
| 133 | + grandchildrenCount: count(CHILD_ENTITY.GRANDCHILD_RELATION) |
| 134 | + grandchildrenSum: sum(val(CHILD_ENTITY.GRANDCHILD_RELATION.NUMERIC_ATTRIBUTE)) |
| 135 | + |
| 136 | + # Include grandchildren details |
| 137 | + CHILD_ENTITY.GRANDCHILD_RELATION { |
| 138 | + GRANDCHILD_ENTITY.IDENTIFIER |
| 139 | + GRANDCHILD_ENTITY.NUMERIC_ATTRIBUTE |
| 140 | + } |
| 141 | + } |
| 142 | + } |
| 143 | +} |
| 144 | +``` |
| 145 | + |
| 146 | +**Replace**: |
| 147 | +- `PARENT_ENTITY`, `CHILD_ENTITY`, `GRANDCHILD_ENTITY` with entity types |
| 148 | +- `CHILD_RELATION`, `GRANDCHILD_RELATION` with relationship predicates |
| 149 | +- `NUMERIC_ATTRIBUTE` with the attribute to aggregate |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +## 4. Time-based Aggregations |
| 154 | + |
| 155 | +**Pattern**: Aggregate data by time periods (daily, monthly, yearly). |
| 156 | + |
| 157 | +**Instructions**: |
| 158 | +- Use date/time functions with aggregations |
| 159 | +- Group by time periods using date extraction |
| 160 | +- Use `ge()`, `le()` for date range filtering |
| 161 | +- Combine with other filters for specific time-based analysis |
| 162 | + |
| 163 | +**Generic Template**: |
| 164 | +```dql |
| 165 | +query getTimeBasedAggregation($startDate: string = "2024-01-01", $endDate: string = "2024-12-31") { |
| 166 | + # Aggregation within date range |
| 167 | + timeRangeStats(func: has(ENTITY.DATE_ATTRIBUTE)) |
| 168 | + @filter(ge(ENTITY.DATE_ATTRIBUTE, $startDate) AND le(ENTITY.DATE_ATTRIBUTE, $endDate)) { |
| 169 | + |
| 170 | + # Overall stats for the period |
| 171 | + totalCount: count(uid) |
| 172 | + totalSum: sum(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 173 | + avgValue: avg(val(ENTITY.NUMERIC_ATTRIBUTE)) |
| 174 | + |
| 175 | + # Group by related entity (e.g., by category, user, etc.) |
| 176 | + ENTITY.GROUP_RELATION { |
| 177 | + GROUP_ENTITY.IDENTIFIER |
| 178 | + periodCount: count(~ENTITY.GROUP_RELATION @filter(ge(ENTITY.DATE_ATTRIBUTE, $startDate) AND le(ENTITY.DATE_ATTRIBUTE, $endDate))) |
| 179 | + periodSum: sum(val(~ENTITY.GROUP_RELATION.NUMERIC_ATTRIBUTE @filter(ge(ENTITY.DATE_ATTRIBUTE, $startDate) AND le(ENTITY.DATE_ATTRIBUTE, $endDate)))) |
| 180 | + } |
| 181 | + } |
| 182 | +} |
| 183 | +``` |
| 184 | + |
| 185 | +**Replace**: |
| 186 | +- `DATE_ATTRIBUTE` with date/datetime predicate |
| 187 | +- `GROUP_RELATION` with the relationship to group by |
| 188 | +- `GROUP_ENTITY` with the entity type to group by |
| 189 | + |
| 190 | +--- |
| 191 | + |
| 192 | +## 5. Complex Multi-level Aggregations |
| 193 | + |
| 194 | +**Pattern**: Perform aggregations across multiple relationship levels with complex conditions. |
| 195 | + |
| 196 | +**Instructions**: |
| 197 | +- Use variables to collect UIDs at different levels |
| 198 | +- Apply multiple aggregation functions |
| 199 | +- Use `val()` for computed value references |
| 200 | +- Combine filtering with aggregation |
| 201 | + |
| 202 | +**Generic Template**: |
| 203 | +```dql |
| 204 | +query getComplexAggregation($param1: string = "VALUE1", $threshold: int = 100) { |
| 205 | + # Step 1: Identify entities meeting criteria |
| 206 | + var(func: eq(ENTITY1.ATTRIBUTE, $param1)) { |
| 207 | + entity1_set as uid |
| 208 | + } |
| 209 | + |
| 210 | + # Step 2: Find related entities and compute intermediate values |
| 211 | + var(func: uid(entity1_set)) { |
| 212 | + ENTITY1.RELATION1 { |
| 213 | + intermediate_entities as uid |
| 214 | + intermediate_value as ENTITY2.NUMERIC_ATTRIBUTE |
| 215 | + } |
| 216 | + } |
| 217 | + |
| 218 | + # Step 3: Aggregate intermediate values |
| 219 | + var(func: uid(intermediate_entities)) { |
| 220 | + total_intermediate as sum(val(intermediate_value)) |
| 221 | + } |
| 222 | + |
| 223 | + # Step 4: Final aggregation with conditions |
| 224 | + complexAggregation(func: uid(entity1_set)) { |
| 225 | + ENTITY1.IDENTIFIER |
| 226 | + ENTITY1.ATTRIBUTE |
| 227 | + |
| 228 | + # Direct aggregations |
| 229 | + directCount: count(ENTITY1.RELATION1) |
| 230 | + directSum: sum(val(ENTITY1.RELATION1.NUMERIC_ATTRIBUTE)) |
| 231 | + |
| 232 | + # Conditional aggregations |
| 233 | + highValueCount: count(ENTITY1.RELATION1 @filter(gt(ENTITY2.NUMERIC_ATTRIBUTE, $threshold))) |
| 234 | + |
| 235 | + # Multi-level aggregations |
| 236 | + ENTITY1.RELATION1 { |
| 237 | + ENTITY2.IDENTIFIER |
| 238 | + ENTITY2.NUMERIC_ATTRIBUTE |
| 239 | + nestedCount: count(ENTITY2.RELATION2) |
| 240 | + nestedSum: sum(val(ENTITY2.RELATION2.NUMERIC_ATTRIBUTE)) |
| 241 | + } |
| 242 | + |
| 243 | + # Reference computed totals |
| 244 | + totalIntermediate: val(total_intermediate) |
| 245 | + } |
| 246 | +} |
| 247 | +``` |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +## 6. Ranking and Top-K Aggregations |
| 252 | + |
| 253 | +**Pattern**: Find top/bottom entities based on aggregated values. |
| 254 | + |
| 255 | +**Instructions**: |
| 256 | +- Use `orderdesc` or `orderasc` for sorting |
| 257 | +- Use `first` parameter to limit results |
| 258 | +- Combine aggregation with ranking |
| 259 | +- Use `val()` to sort by computed values |
| 260 | + |
| 261 | +**Generic Template**: |
| 262 | +```dql |
| 263 | +query getTopEntities($topK: int = 10, $minThreshold: int = 0) { |
| 264 | + # Find entities with aggregated values |
| 265 | + var(func: has(ENTITY.RELATION)) { |
| 266 | + ENTITY.IDENTIFIER |
| 267 | + aggregated_value as sum(val(ENTITY.RELATION.NUMERIC_ATTRIBUTE)) |
| 268 | + } |
| 269 | + |
| 270 | + # Get top K entities by aggregated value |
| 271 | + topEntities(func: uid(aggregated_value), orderdesc: val(aggregated_value), first: $topK) |
| 272 | + @filter(gt(val(aggregated_value), $minThreshold)) { |
| 273 | + |
| 274 | + ENTITY.IDENTIFIER |
| 275 | + ENTITY.ATTRIBUTE |
| 276 | + totalValue: val(aggregated_value) |
| 277 | + |
| 278 | + # Show breakdown of the aggregated value |
| 279 | + relationCount: count(ENTITY.RELATION) |
| 280 | + avgValue: avg(val(ENTITY.RELATION.NUMERIC_ATTRIBUTE)) |
| 281 | + |
| 282 | + # Include top contributing relationships |
| 283 | + ENTITY.RELATION (orderdesc: RELATED_ENTITY.NUMERIC_ATTRIBUTE, first: 5) { |
| 284 | + RELATED_ENTITY.IDENTIFIER |
| 285 | + RELATED_ENTITY.NUMERIC_ATTRIBUTE |
| 286 | + } |
| 287 | + } |
| 288 | +} |
| 289 | +``` |
| 290 | + |
| 291 | +--- |
| 292 | + |
| 293 | +## 7. Conditional Aggregations |
| 294 | + |
| 295 | +**Pattern**: Perform aggregations with complex conditional logic. |
| 296 | + |
| 297 | +**Instructions**: |
| 298 | +- Use `@filter` within aggregation functions |
| 299 | +- Combine multiple conditions with `AND`, `OR`, `NOT` |
| 300 | +- Use `uid_in()` for relationship-based conditions |
| 301 | +- Apply different aggregations based on conditions |
| 302 | + |
| 303 | +**Generic Template**: |
| 304 | +```dql |
| 305 | +query getConditionalAggregation($condition1: string = "VALUE1", $condition2: int = 50) { |
| 306 | + conditionalAggregation(func: has(ENTITY.IDENTIFIER)) { |
| 307 | + ENTITY.IDENTIFIER |
| 308 | + |
| 309 | + # Total counts |
| 310 | + totalRelations: count(ENTITY.RELATION) |
| 311 | + |
| 312 | + # Conditional counts |
| 313 | + condition1Count: count(ENTITY.RELATION @filter(eq(RELATED_ENTITY.ATTRIBUTE1, $condition1))) |
| 314 | + condition2Count: count(ENTITY.RELATION @filter(gt(RELATED_ENTITY.NUMERIC_ATTRIBUTE, $condition2))) |
| 315 | + bothConditionsCount: count(ENTITY.RELATION @filter(eq(RELATED_ENTITY.ATTRIBUTE1, $condition1) AND gt(RELATED_ENTITY.NUMERIC_ATTRIBUTE, $condition2))) |
| 316 | + |
| 317 | + # Conditional sums |
| 318 | + condition1Sum: sum(val(ENTITY.RELATION.NUMERIC_ATTRIBUTE @filter(eq(RELATED_ENTITY.ATTRIBUTE1, $condition1)))) |
| 319 | + condition2Sum: sum(val(ENTITY.RELATION.NUMERIC_ATTRIBUTE @filter(gt(RELATED_ENTITY.NUMERIC_ATTRIBUTE, $condition2)))) |
| 320 | + |
| 321 | + # Conditional averages |
| 322 | + condition1Avg: avg(val(ENTITY.RELATION.NUMERIC_ATTRIBUTE @filter(eq(RELATED_ENTITY.ATTRIBUTE1, $condition1)))) |
| 323 | + condition2Avg: avg(val(ENTITY.RELATION.NUMERIC_ATTRIBUTE @filter(gt(RELATED_ENTITY.NUMERIC_ATTRIBUTE, $condition2)))) |
| 324 | + } |
| 325 | +} |
| 326 | +``` |
| 327 | + |
| 328 | +--- |
| 329 | + |
| 330 | +## Validation Guidelines |
| 331 | + |
| 332 | +- Always validate that aggregation functions are applied to appropriate data types |
| 333 | +- Use meaningful parameter names and default values for thresholds and limits |
| 334 | +- Include comments explaining the aggregation logic |
| 335 | +- Consider performance implications of complex aggregations |
| 336 | +- Use variables efficiently to avoid redundant calculations |
| 337 | +- Test aggregation results with known data sets |
| 338 | +- Handle edge cases (empty results, null values) |
| 339 | +- Use appropriate root functions to minimize the initial result set |
| 340 | + |
| 341 | +--- |
| 342 | + |
| 343 | +## Common Aggregation Functions |
| 344 | + |
| 345 | +- `count(predicate)` - Count relationships or nodes |
| 346 | +- `count(uid)` - Count nodes in current context |
| 347 | +- `sum(val(predicate))` - Sum numeric values |
| 348 | +- `avg(val(predicate))` - Average of numeric values |
| 349 | +- `min(val(predicate))` - Minimum numeric value |
| 350 | +- `max(val(predicate))` - Maximum numeric value |
| 351 | +- `val(variable)` - Reference computed values |
| 352 | +- `math(expression)` - Custom mathematical expressions |
| 353 | + |
| 354 | +--- |
| 355 | + |
| 356 | +## Performance Considerations |
| 357 | + |
| 358 | +- Use `has(predicate)` to filter entities before aggregation |
| 359 | +- Apply filters early to reduce the dataset size |
| 360 | +- Use variables to avoid recomputing the same aggregations |
| 361 | +- Consider using `first` parameter to limit large result sets |
| 362 | +- Be cautious with deep nesting in aggregation queries |
| 363 | +- Use appropriate indexes on predicates used in aggregations |
0 commit comments