update ppl doc examples to use otel data by ritvibhatt · Pull Request #5303 · opensearch-project/sql

ritvibhatt · 2026-04-01T22:22:05Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
New PPL command checklist all confirmed.
API changes companion pull request created.
Commits are signed per the DCO using --signoff or -s.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-04-01T22:23:25Z

PR Reviewer Guide 🔍

(Review updated until commit `2000094`)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Update docs exporter to support otel playground buttons Relevant files: scripts/docs_exporter/export_to_docs_website.py Sub-PR theme: Replace otellogs test data with new otel dataset Relevant files: doctest/test_data/otellogs.json Sub-PR theme: Update PPL command doc examples to use otel data Relevant files: docs/user/ppl/cmd/search.md docs/user/ppl/cmd/where.md docs/user/ppl/cmd/graphlookup.md
⚡ Recommended focus areas for review Regex Ordering The new PPL block regex (lines 259-269) converts `ppl` fences with `otellogs` to `sql` fences with playground buttons, then line 272 converts remaining `ppl` fences to `sql`. However, the new regex at line 260 uses ^```ppl\b[^\n]\n(.?)^```$ which will match ALL ppl blocks (not just otellogs ones) — the `otellogs` check is done inside the lambda, not in the pattern. This means the non-otellogs branch returns '```ppl\n' + m.group(1) + '```' (without the language tag suffix), potentially losing any extra info on the opening fence line. Also, after this substitution, the remaining `ppl` fences (non-otellogs) are re-converted by line 272, which is correct, but the interaction should be verified to ensure no double-processing or missed blocks. content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( '```ppl\n' + m.group(1) + '```' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) # Convert remaining PPL code fences to SQL content = re.sub(r'^```ppl\b.$', '```sql', content, flags=re.MULTILINE) Copy Button Duplication* The regex at line 280 adds copy buttons after `sql/bash/sh` fences that don't already have one, using a negative lookahead `(?!\n\{% include copy)`. However, the playground-eligible PPL blocks (converted to `sql` at lines 262) already have `{% include copy.html %}` appended. The negative lookahead should prevent double-adding, but this interaction should be carefully validated, especially since the copy button is on the same line as the closing fence in the new format. content = re.sub(r'^```(bash\|sh\|sql)\b[^\n]\n(.?)^```$(?!\n\{% include copy)', r'```\1\n\2```\n{% include copy.html %}', content, flags=re.MULTILINE \| re.DOTALL) Formatting Inconsistency Line 414 shows `\| ERROR \|` (with extra trailing spaces before the pipe), which is inconsistent with the other rows in the same table (`\| ERROR \|`). This may cause rendering issues in the documentation website. \| ERROR \| Missing Index Directives The old data file used `{"index": {"_id": "N"}}` bulk API index directives before each document, which is the standard format for OpenSearch bulk ingestion. The new file omits these directives entirely. If this file is used for bulk indexing in doctests or CI, the ingestion will fail or produce unexpected results without the action/metadata lines. {"@timestamp": "2024-02-01T09:10:00Z", "time": "2024-02-01T09:10:00Z", "severityNumber": 9, "severityText": "INFO", "body": "[2024-02-01T09:10:00.123Z] \"GET /api/products HTTP/1.1\" 200 - 1024 45 frontend-6b7b4c9f-x2kl9", "traceId": "abcd1234efgh5678", "spanId": "span0001", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:11:00Z", "time": "2024-02-01T09:11:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Order #1234 placed successfully by user U100", "traceId": "abcd1234efgh5678", "spanId": "span0002", "flags": 0, "instrumentationScope": {"name": "Microsoft.Extensions.Hosting", "version": "9.0.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:12:00Z", "time": "2024-02-01T09:12:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Slow query detected: SELECT * FROM products WHERE category = 'electronics' took 3200ms", "traceId": "abcd1234efgh5678", "spanId": "span0003", "flags": 0, "instrumentationScope": {"name": "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc", "version": "0.49.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:13:00Z", "time": "2024-02-01T09:13:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Payment failed: connection timeout to payment gateway after 30000ms", "traceId": "ijkl9012mnop3456", "spanId": "span0004", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:14:00Z", "time": "2024-02-01T09:14:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Cache miss for key user:session:U200 in Valkey cluster", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:15:00Z", "time": "2024-02-01T09:15:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "NullPointerException in CheckoutService.placeOrder at line 142", "traceId": "qrst5678uvwx9012", "spanId": "span0006", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:16:00Z", "time": "2024-02-01T09:16:00Z", "severityNumber": 9, "severityText": "INFO", "body": "User U300 authenticated via OAuth2 from 10.0.0.5", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:17:00Z", "time": "2024-02-01T09:17:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Out of memory: Java heap space - shutting down pod payment-6f8d4b-ht7q3", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:18:00Z", "time": "2024-02-01T09:18:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Connection pool 80% utilized on database replica db-replica-02", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:19:00Z", "time": "2024-02-01T09:19:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Kafka consumer group rebalanced: 4 partitions assigned to consumer checkout-consumer-01", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:20:00Z", "time": "2024-02-01T09:20:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "[2024-02-01T09:20:00.456Z] \"POST /api/checkout HTTP/1.1\" 503 - 0 30000 checkout-8d4f7b-mk2p9", "traceId": "yzab3456cdef7890", "spanId": "span0011", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:21:00Z", "time": "2024-02-01T09:21:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "gRPC call /ProductCatalogService/GetProduct completed in 12ms", "traceId": "abcd1234efgh5678", "spanId": "span0012", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:22:00Z", "time": "2024-02-01T09:22:00Z", "severityNumber": 13, "severityText": "WARN", "body": "SSL certificate for api.example.com expires in 14 days", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:23:00Z", "time": "2024-02-01T09:23:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Deployment frontend-v2.2.0 rolled out successfully to 3/3 replicas", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:24:00Z", "time": "2024-02-01T09:24:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Failed to process recommendation request: invalid product ID from 203.0.113.50", "traceId": "ghij7890klmn1234", "spanId": "span0015", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "recommendation"}, "host": {"name": "recommendation-5f7c-bn3k8"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:25:00Z", "time": "2024-02-01T09:25:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Database primary node unreachable: connection refused to db-primary-01:5432", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:26:00Z", "time": "2024-02-01T09:26:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Health check passed: all 5 dependencies healthy", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:27:00Z", "time": "2024-02-01T09:27:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Rate limit threshold reached: 450/500 requests per minute for API key ending in ...abc789", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:28:00Z", "time": "2024-02-01T09:28:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Valkey SETEX user:session:U300 3600 - session refreshed", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0} {"@timestamp": "2024-02-01T09:29:00Z", "time": "2024-02-01T09:29:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Kafka producer delivery failed: message too large for topic order-events (max 1048576 bytes)", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}

github-actions · 2026-04-01T22:24:07Z

PR Code Suggestions ✨

Latest suggestions up to 2000094

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Remove stray code lines outside code block The PPL code block for Example 3 is malformed. There is a stray `\| where age > 30` and `\| fields state, age` outside the code fence, which are leftover lines from the old example that were not properly removed. These lines should be deleted. docs/user/ppl/cmd/replace.md [83-91] ```ppl source=otellogs \| where severityText = 'ERROR' \| replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name` \| fields `resource.attributes.service.name`, body \| head 3 -\| where age > 30 -\| fields state, age -``` <details><summary>Suggestion importance[1-10]: 9</summary> __ Why: The code block for Example 3 is clearly malformed - there are leftover lines `\| where age > 30` and `\| fields state, age` outside the closing code fence, which are remnants from the old example. This is a real bug in the documentation that would confuse readers. </details></details></td><td align=center>High </td></tr><tr><td> <details><summary>Fix duplicate identical queries in example</summary> ___ Example 3 contains two identical PPL queries but claims they demonstrate different <br>behaviors (one with <code>keepempty=true</code> and one without). The second query should use <br><code>keepempty=true</code> to actually demonstrate including null values, matching the <br>description "deduplicates while ignoring documents with empty values" (which should <br>instead show <code>keepempty=true</code> behavior). The queries and their descriptions need to be <br>differentiated. [docs/user/ppl/cmd/dedup.md [86-115]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R115) ```diff The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped: ```ppl source=otellogs \| dedup instrumentationScope.name \| fields instrumentationScope.name \| sort instrumentationScope.name The query returns the following results: `fetched rows / total rows = 3/3 -... ++-----------------------------------------------------------------------------+ +\| instrumentationScope.name \| +\|-----------------------------------------------------------------------------\| +\| @opentelemetry/instrumentation-http \| +\| Microsoft.Extensions.Hosting \| +\| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc \| ++-----------------------------------------------------------------------------+` -The following query deduplicates while ignoring documents with empty values in the specified field: +The following query deduplicates while keeping documents with empty values in the specified field: `source=otellogs -\| dedup instrumentationScope.name +\| dedup instrumentationScope.name keepempty=true \| fields instrumentationScope.name \| sort instrumentationScope.name` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The two PPL queries in Example 3 are identical but claim to demonstrate different behaviors. The second query should include `keepempty=true` to actually show the described behavior of "ignoring documents with empty values," making this a genuine documentation correctness issue. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Non-playground PPL blocks skip SQL conversion</summary> ___ The non-playground branch preserves the original <code>ppl</code> fence, which means the <br>subsequent <code>re.sub</code> for converting remaining PPL fences to SQL will still match and <br>convert them. However, the non-playground branch does NOT add a copy button, while <br>the original code added one for all SQL fences. This is likely intentional, but the <br>non-playground branch should also add a copy button after converting to SQL, or the <br>conversion to SQL should happen inside this lambda for the non-playground case as <br>well, to ensure consistent behavior. [scripts/docs_exporter/export_to_docs_website.py [259-269]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R269) ```diff content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( - '```ppl\n' + m.group(1) + '```' + '```sql\n' + m.group(1) + '```' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) Suggestion importance[1-10]: 7 __ Why: The non-playground branch preserves the `ppl` fence, which will be caught by the subsequent `re.sub` on line 272 that converts remaining PPL fences to SQL. However, that conversion won't add a copy button, while playground-eligible blocks do get one. The suggestion to convert to `sql` directly in the non-playground branch is valid and would ensure consistent SQL conversion, though the copy button omission for non-playground blocks may be intentional design.	Medium
	Fix incorrect duplicate row in example result table The result table for Example 3 shows `WARN` appearing twice with the same count (`4`), which looks like a data error. The `override=true` behavior should replace the main search `agg` values with subsearch values only for matching rows (`ERROR` and `WARN`), leaving `INFO` and `DEBUG` from the main search unchanged. The duplicate `WARN` row and missing `DEBUG` row make the example misleading and potentially incorrect. docs/user/ppl/cmd/appendcol.md [95-104] fetched rows / total rows = 4/4 +-----+--------------+ \| agg \| severityText \| \|-----+--------------\| +\| 3 \| DEBUG \| \| 7 \| ERROR \| -\| 4 \| WARN \| \| 6 \| INFO \| \| 4 \| WARN \| +-----+--------------+ Suggestion importance[1-10]: 6 __ Why: The result table shows `WARN` appearing twice with the same count and is missing a `DEBUG` row, which appears to be a data error that could mislead readers about the `override=true` behavior.	Low
	General	Fix non-sequential example numbering The examples jump from Example 4 to Example 6, skipping Example 5. This appears to be a numbering error introduced in the PR when examples were reorganized. The example numbering should be sequential to avoid confusing readers. docs/user/ppl/cmd/sort.md [180] -## Example 6: Specifying the number of sorted documents to return +## Example 5: Specifying the number of sorted documents to return Suggestion importance[1-10]: 7 __ Why: The examples jump from Example 4 to Example 6, skipping Example 5. This is a real numbering error in the PR that would confuse readers navigating the documentation.	Medium
		Fix filter placement affecting aggregation correctness The `where` clause filters by `severityText` after `eventstats`, but `eventstats` computes counts over all rows (not just ERROR rows). This means `scope_count` reflects counts across all severity levels, not just errors, which is misleading. The `where` clause should be placed before `eventstats` to ensure the count reflects only ERROR entries, or the description should clarify this behavior. docs/user/ppl/cmd/eventstats.md [141-146] ```ppl source=otellogs +\| where severityText = 'ERROR' \| eventstats bucket_nullable=false count() as scope_count by instrumentationScope.name -\| where severityText = 'ERROR' \| sort `resource.attributes.service.name` \| fields `resource.attributes.service.name`, `instrumentationScope.name`, scope_count <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The `where severityText = 'ERROR'` clause is placed after `eventstats`, meaning `scope_count` reflects counts across all severity levels rather than just errors. While this may be intentional to demonstrate `bucket_nullable=false`, the query logic could be misleading. However, this may be a deliberate design choice for the example, so the impact is moderate. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Add missing result block for first subsection</summary> ___ The "Matching with a prefix pattern" subsection shows a query but has no result <br>block, while only the second subsection ("Matching with a wildcard pattern") shows <br>results. Readers of the first subsection will be left without example output. A <br>result block should be added after the first query, or a note should clarify that <br>only the second query's results are shown. [docs/user/ppl/cmd/where.md [108-130]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R108-R130) ```diff ### Matching with a prefix pattern The following query uses a percent sign (`%`) to find all services starting with `frontend`: ```ppl source=otellogs \| where LIKE(`resource.attributes.service.name`, 'frontend%') \| fields severityText, `resource.attributes.service.name`, body \| head 3 +``` + +The query returns the following results: + +```text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+-------+ +\| severityText \| resource.attributes.service.name \| body \| +\|--------------+----------------------------------+-------\| +\| ... \| frontend \| ... \| +\| ... \| frontend-proxy \| ... \| +\| ... \| frontend-proxy \| ... \| ++--------------+----------------------------------+-------+ Matching with a wildcard pattern The following query finds all logs from services containing `product` in their name: source=otellogs \| where LIKE(`resource.attributes.service.name`, '%product%') \| fields severityText, `resource.attributes.service.name`, body \| head 3 The query returns the following results: <details><summary>Suggestion importance[1-10]: 4</summary> __ Why: The "Matching with a prefix pattern" subsection lacks a result block, which is inconsistent with the rest of the documentation. However, the `improved_code` contains placeholder data (`...`) rather than actual query results, making it an incomplete fix. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Fix misaligned table column padding in result output</summary> ___ The <code>unknown</code> values in the result table are not padded to align with the column <br>width, causing the table to appear misformatted. The <code>unknown</code> entries should be <br>padded with trailing spaces to match the column width defined by the header, <br>consistent with the formatting of other result tables in the documentation. [docs/user/ppl/cmd/fillnull.md [53-64]](https://github.com/opensearch-project/sql/pull/5303/files#diff-e05ce7bc17faa0105fe2b13aeacef92b8f08f90aac016e003b615f5f575efaa4R53-R64) ```diff -\| ERROR \| checkout \| unknown \| -\| ERROR \| checkout \| unknown \| -\| ERROR \| frontend-proxy \| unknown \| -\| WARN \| frontend-proxy \| unknown \| -\| WARN \| frontend-proxy \| unknown \| +\| ERROR \| checkout \| unknown \| +\| ERROR \| checkout \| unknown \| +\| ERROR \| frontend-proxy \| unknown \| +\| WARN \| frontend-proxy \| unknown \| +\| WARN \| frontend-proxy \| unknown \| \| ERROR \| payment \| @opentelemetry/instrumentation-http \| -\| ERROR \| payment \| unknown \| +\| ERROR \| payment \| unknown \| \| WARN \| product-catalog \| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc \| -\| WARN \| product-catalog \| unknown \| -\| ERROR \| product-catalog \| unknown \| -\| ERROR \| recommendation \| unknown \| +\| WARN \| product-catalog \| unknown \| +\| ERROR \| product-catalog \| unknown \| +\| ERROR \| recommendation \| unknown \| Suggestion importance[1-10]: 4 __ Why: The `unknown` values in the result table are not padded to match the column width, causing visual misalignment. This is a cosmetic issue that affects readability but not correctness.	Low
Lookahead pattern may not reliably prevent duplicate copy buttons The negative lookahead `(?!\n{% include copy)` uses `re.DOTALL`, which makes `.` match newlines, but the lookahead itself checks for a literal newline `\n` followed by `{%`. With `re.DOTALL`, the `$` in `^`$(?!\n...) `matches end-of-line (due to` re.MULTILINE`), so the lookahead should work correctly. However, the`{% include try-in-playground.html %} `button is added after` {% include copy.html %}`, so the pattern` (?!\n{% include copy) `will correctly skip blocks that already have a copy button. Verify that the` `lookahead correctly handles cases where the playground button was added (which also` `includes a copy button before it) to avoid double copy buttons.` scripts/docs_exporter/export_to_docs_website.py [280-282] -content = re.sub(r'^```(bash\|sh\|sql)\b[^\n]\n(.?)^```$(?!\n\{% include copy)', +content = re.sub(r'^```(bash\|sh\|sql)\b[^\n]\n(.?)^```$(?!\n\{%[ ]include copy)', r'```\1\n\2```\n{% include copy.html %}', content, flags=re.MULTILINE \| re.DOTALL) Suggestion importance[1-10]: 3 __ Why: The suggestion to add a space in the lookahead pattern (`{%[ ]include copy`) is a minor robustness improvement, but the existing pattern `{% include copy` already matches the literal string produced by the code. The `improved_code` change is minimal and the concern about double copy buttons is valid but the existing logic should handle it correctly since playground blocks add `copy.html` before `try-in-playground.html`.		Low
Add missing newline at end of file The file is missing a newline at the end. Files should end with a newline character to follow standard text file conventions and avoid potential issues with file concatenation or diff tools. docs/user/ppl/cmd/appendpipe.md [86] +* Schema compatibility: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`. - Suggestion importance[1-10]: 1 __ Why: The `existing_code` and `improved_code` are identical, meaning no actual change is proposed. Adding a trailing newline is a very minor style concern with negligible impact.		Low

Previous suggestions

Suggestions up to commit c051fd0

Category Suggestion Impact

Possible issue

Remove orphaned code lines outside code fence

The PPL code block for Example 3 is broken — there is a stray closing
triple-backtick followed by two orphaned pipeline stages (| where age > 30 and |
fields state, age) that appear outside the code fence. These leftover lines from the
old example were not removed and will render as raw text in the documentation.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The `| where age > 30` and `| fields state, age` lines are leftover from the old example and appear outside the code fence, which will render as raw text in the documentation and break the example's presentation.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Prevent duplicate copy buttons on playground-eligible blocks</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, which is the </code><br><code>intended behavior. However, the non-playground branch is missing the copy button and </code><br><code>the try-in-playground button, so the final output for non-otellogs PPL blocks will </code><br><code>only get a copy button from the later generic SQL regex — this is fine, but the </code><br><code>playground-eligible branch adds the copy button inline AND the later regex will add </code><br><code>another copy button because the </code>(?!\n\{% include copy)<code> lookahead checks for a </code><br><code>newline before </code>{% include copy<code>, but the playground block ends with </code>{% include <br>try-in-playground.html %}<code>, not </code>{% include copy<code>. This means playground-eligible </code><br><code>blocks will get a duplicate </code>{% include copy.html %}<code> appended.</code>**

[scripts/docs_exporter/export_to_docs_website.py [259-282]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R282)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
             '```ppl\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
 
+# Convert remaining PPL code fences to SQL
+content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
+
+# Add copy buttons after sql/bash/sh fences that don't already have one
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)(?!\n\{% include try-in-playground)',
+                 r'```\1\n\2```\n{% include copy.html %}',
+                 content, flags=re.MULTILINE | re.DOTALL)
+

Suggestion importance[1-10]: 7

__

Why: This is a valid bug: playground-eligible blocks already include {% include copy.html %}, but the later regex at line 280 checks only for (?!\n\{% include copy) — since those blocks end with {% include try-in-playground.html %}, the lookahead won't match and a second copy button will be appended. The fix to also exclude blocks followed by {% include try-in-playground is correct and important.

Medium

Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 now use identical PPL queries (both lack
keepempty=true) and return identical results, making them indistinguishable. The
first sub-example should use keepempty=true to demonstrate retaining null values, as
the preceding description states "by default, records with null values are dropped."

docs/user/ppl/cmd/dedup.md [108-128]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

-The query returns the following results:

-```text
-fetched rows / total rows = 3/3
-+-----------------------------------------------------------------------------+

-	instrumentationScope.name
-	@opentelemetry/instrumentation-http
-	Microsoft.Extensions.Hosting
-	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
-+-----------------------------------------------------------------------------+
-```

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Both sub-examples in Example 3 use identical queries without `keepempty=true`, making them indistinguishable despite the description implying different behaviors. The first sub-example should include `keepempty=true` to match its stated purpose.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in example output</summary>

___


**The result table for Example 3 shows <code>WARN</code> appearing twice with the same count (<code>4</code>), <br>which looks like a data error. With <code>override=true</code>, the subsearch values should <br>replace the main search values for matching rows, so <code>INFO</code> and <code>DEBUG</code> rows (which have <br>no match in the subsearch) should retain their original counts while <code>ERROR</code> and <code>WARN</code> <br>rows get replaced. The current output is misleading and inconsistent with the <br>described behavior.**

[docs/user/ppl/cmd/appendcol.md [96-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R96-R103)

```diff
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+

Suggestion importance[1-10]: 6

__

Why: The result table shows WARN appearing twice with count 4, which is inconsistent with the described behavior of override=true. The query filters for ERROR and WARN only in the subsearch, so INFO and DEBUG rows from the main search should still appear with their original values, making the output misleading.

Low

General

Fill in missing example body content

Example 3 has a heading but no body — the PPL query, description, and expected
output are completely missing. This leaves a blank section in the documentation
between the heading and Example 4.

docs/user/ppl/cmd/convert.md [78-80]

 ## Example 3: Extract and convert memory sizes from error messages
+
+The following query extracts and converts memory size strings from error messages to kilobytes:
+
+```ppl
+source=otellogs
+| rex field=body "(?<memory>\d+[kmgKMG]?b?)"
+| convert memk(memory)
+| where NOT ISNULL(memory)
+| stats avg(memory) as avg_memory_kb by `resource.attributes.service.name`
+```
 
 ## Example 4: Multiple field conversions

Suggestion importance[1-10]: 5

__

Why: Example 3 has a heading but no content, leaving a blank section in the documentation. However, the suggested improved_code is speculative and may not reflect actual data available in the otellogs index used throughout the PR.

Low

Use precise pattern matching for source index check

The check 'source=otellogs' not in block_text is a plain substring match and could
produce false positives for strings like source=otellogssomething or #
source=otellogs in a comment. Consider using a more precise pattern such as a
word-boundary regex to ensure the source is exactly otellogs.

scripts/docs_exporter/export_to_docs_website.py [248-256]

 def _is_playground_eligible(block_text, file_path):
     """Check if a PPL block should get a Try in Playground button."""
-    if 'source=otellogs' not in block_text:
+    if not re.search(r'\bsource\s*=\s*otellogs\b', block_text):
         return False
     if file_path:
         stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
         if stem in PLAYGROUND_UNSUPPORTED_CMDS:
             return False
     return True

Suggestion importance[1-10]: 4

__

Why: The suggestion to use a regex word-boundary check instead of a plain substring match for source=otellogs is a reasonable robustness improvement, but the risk of false positives in practice (e.g., source=otellogssomething) is low given the PPL syntax. The improvement is minor but valid.

Low

Remove duplicated note about coordinating node

The new introductory paragraph duplicates the information already present in the
existing > Note block directly below it. Having both the paragraph and the note
say the same thing about the coordinating node creates redundancy. Consider removing
the note block since the new paragraph is more descriptive, or merging them.

docs/user/ppl/cmd/eval.md [6-8]

 The `eval` command processes data after documents are retrieved from the shards. This means that `eval` cannot be used to filter documents before they are returned. Use a `where` clause for filtering. Additionally, because `eval` computations are performed on the coordinating node rather than distributed across data nodes, performance may be slower for large result sets.
 
-> **Note**: The `eval` command is not rewritten to [query domain-specific language (DSL)](https://docs.opensearch.org/latest/query-dsl/). It is only executed on the coordinating node.
-

Suggestion importance[1-10]: 4

__

Why: The new paragraph and the existing > **Note** block both describe the same coordinating node behavior, creating redundancy. Removing the note would improve clarity, but this is a minor documentation style issue.

Low

Restore accidentally removed limitations section

The entire "Limitations" section was removed from appendpipe.md in this PR. This
section contained important information about schema compatibility constraints that
users need to be aware of when using the command.

docs/user/ppl/cmd/appendpipe.md [82]

+## Limitations
 
+The `appendpipe` command has the following limitations:
 
+* **Schema compatibility**: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`.
+

Suggestion importance[1-10]: 4

__

Why: The Limitations section was removed from appendpipe.md in this PR, but looking at the diff, the old hunk shows it was present and the new hunk does not include it. However, the existing_code and improved_code are identical, so this is more of a verification request than a concrete fix.

Low

Fix misaligned table column padding in output

The unknown values in the instrumentationScope.name column are not padded to match
the column width, causing misaligned table formatting in the rendered output. The
column width should be consistent across all rows to maintain proper table alignment
in the markdown text block.

docs/user/ppl/cmd/fillnull.md [53-63]

-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
 | ERROR        | payment                          | @opentelemetry/instrumentation-http                                         |
-| ERROR        | payment                          | unknown                                                            |
+| ERROR        | payment                          | unknown                                                                     |
 | WARN         | product-catalog                  | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
-| WARN         | product-catalog                  | unknown                                                            |
-| ERROR        | product-catalog                  | unknown                                                            |
-| ERROR        | recommendation                   | unknown                                                            |
+| WARN         | product-catalog                  | unknown                                                                     |
+| ERROR        | product-catalog                  | unknown                                                                     |
+| ERROR        | recommendation                   | unknown                                                                     |

Suggestion importance[1-10]: 3

__

Why: The unknown values in the instrumentationScope.name column are not padded to match the column width, causing visual misalignment in the text block. This is a minor formatting issue that affects readability but not functionality.

Low

Suggestions up to commit f0e3a6a

Category Suggestion Impact

Possible issue

Fix corrupted markdown document structure

The stats.md file has been corrupted — raw query result output has been injected
directly into the ## Parameters section header and the parameter table header,
breaking the document structure. The query result blocks appear to have been
incorrectly merged into the markdown prose. These sections need to be restored to
their original clean markdown headings and table structure.

docs/user/ppl/cmd/stats.md [18-36]

-## fetched rows / total rows = 1/1
-+-----+
-| p90 |
-|-----|
-| 21  |
-+-----+arameters
+## Parameters
 
 The `stats` command supports the following parameters.
 
-| fetched rows / total rows = 5/5
-...
-+--------------------------------+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |

Suggestion importance[1-10]: 9

__

Why: The stats.md file has raw query result output injected directly into the ## Parameters section header and the parameter table header, completely breaking the document structure. This is a critical documentation corruption issue that needs to be fixed.

High

Fix corrupted aggregation function list entries

The aggregation function list in stats.md has raw query result tables injected
mid-word throughout the bullet points (e.g., VAR_SAM

instead of VAR_SAMP). This
corrupts all the function name entries and their descriptions. The list needs to be
restored to clean bullet points with proper function names.

docs/user/ppl/cmd/stats.md [52-187]

-* `VAR_SAMfetched rows / total rows = 5/5
-...
-` -- Sample variance
-* `VAR_fetched rows / total rows = 1/1
-...
-Ofetched rows / total rows = 3/3
-...
-` -- fetched rows / total rows = 1/1
-...
-opulation variance
-* `STDDEV_SAMfetched rows / total rows = 4/4
-...
-` -- Sample standard deviation
-* `STDDEV_fetched rows / total rows = 5/5
-...
-Ofetched rows / total rows = 1/1
-...
-` -- fetched rows / total rows = 5/5
-...
-opulation standard deviation
+* `VAR_SAMP` -- Sample variance
+* `VAR_POP` -- Population variance
+* `STDDEV_SAMP` -- Sample standard deviation
+* `STDDEV_POP` -- Population standard deviation
+* `DISTINCT_COUNT_APPROX` -- Approximate distinct count
+* `TAKE` -- List of original values
+* `PERCENTILE`/`PERCENTILE_APPROX` -- Percentile calculations
+* `PERC<percent>`/`P<percent>` -- Percentile shortcut functions

Suggestion importance[1-10]: 9

__

Why: The aggregation function list has raw query result tables injected mid-word throughout the bullet points (e.g., VAR_SAM<table> instead of VAR_SAMP), corrupting all function name entries and descriptions. This is a critical documentation corruption that makes the content unreadable.

High
Fix corrupted Parameters section header The `## Parameters` section header and the parameter table header have been accidentally replaced with query result blocks. This appears to be a copy-paste error that corrupts the documentation structure, making the parameters section unreadable. docs/user/ppl/cmd/head.md [16-30] -## fetched rows / total rows = 1/1 -+------------+ -\| total_logs \| -\|------------\| -\| 20 \| -+------------+arameters +## Parameters The `head` command supports the following parameters. -\| fetched rows / total rows = 1/1 -+--------------+ -\| avg_severity \| -\|--------------\| -\| 12.0 \| -+--------------+arameter \| Required/Optional \| Description \| +\| Parameter \| Required/Optional \| Description \| Suggestion importance[1-10]: 9 __ Why: The `## Parameters` section header and the parameter table header have been accidentally replaced with query result blocks, making the documentation structurally broken and unreadable. This is a clear copy-paste error that needs to be fixed.	High
Fix duplicate queries with different intended behaviors The two sub-examples in Example 3 use identical PPL queries (both use `dedup` `instrumentationScope.name` without `keepempty=true`) and return identical results. The first sub-example is supposed to demonstrate `keepempty=true` behavior (keeping null values), while the second demonstrates the default behavior (dropping nulls). The first query is missing `keepempty=true` and the result should show a null row. docs/user/ppl/cmd/dedup.md [107-127] -The following query deduplicates while ignoring documents with empty values in the specified field: +The following query deduplicates while keeping documents with `null` values in the specified field: ```ppl source=otellogs -\| dedup instrumentationScope.name +\| dedup instrumentationScope.name keepempty=true \| fields instrumentationScope.name \| sort instrumentationScope.name The query returns the following results: `-fetched rows / total rows = 3/3 +fetched rows / total rows = 4/4 +-----------------------------------------------------------------------------+ \| instrumentationScope.name \| \|-----------------------------------------------------------------------------\| +\| null \| \| @opentelemetry/instrumentation-http \| \| Microsoft.Extensions.Hosting \| \| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc \| +-----------------------------------------------------------------------------+` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: Both sub-examples in Example 3 use identical PPL queries, but they are meant to demonstrate different behaviors (`keepempty=true` vs default). The first query is missing `keepempty=true` and the result should include a null row, making this a genuine documentation bug. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix filter value to match actual dataset service name</summary> ___ The example filters for <code>resource.attributes.service.name = 'payment-service'</code>, but <br>the dataset uses <code>payment</code> (not <code>payment-service</code>) as the service name, resulting in 0 <br>rows returned. An example that intentionally returns no results is confusing for <br>documentation purposes and does not demonstrate the <code>AND</code> operator effectively. The <br>service name in the filter should match an actual value present in the dataset, such <br>as <code>payment</code>. [docs/user/ppl/cmd/where.md [50-69]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R50-R69) ```diff ## Example 2: Filter using combined criteria The following query narrows down errors to a specific service during an incident investigation, combining severity and service name conditions with `AND`: ```ppl source=otellogs -\| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment-service' +\| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment' \| sort severityNumber \| fields severityText, severityNumber, `resource.attributes.service.name` The query returns the following results: `-fetched rows / total rows = 0/0 +fetched rows / total rows = 2/2 +--------------+----------------+----------------------------------+ \| severityText \| severityNumber \| resource.attributes.service.name \| \|--------------+----------------+----------------------------------\| +\| ERROR \| 17 \| payment \| +\| ERROR \| 17 \| payment \| +--------------+----------------+----------------------------------+` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The filter uses `'payment-service'` but the dataset contains `'payment'`, causing 0 rows returned. An example returning empty results for a combined `AND` filter is misleading and doesn't demonstrate the feature effectively. </details></details></td><td align=center>Low </td></tr><tr><td rowspan=4>General</td> <td> <details><summary>Remove duplicate example with identical query and results</summary> ___ Example 2 is a duplicate of Example 1 — both use the exact same PPL query and return <br>the exact same results. This provides no additional value to the reader. Replace <br>Example 2 with a distinct scenario, such as demonstrating the <code>replace</code> command with a <br>different field or showing how multiple replacements interact when both patterns <br>match. [docs/user/ppl/cmd/replace.md [52-78]](https://github.com/opensearch-project/sql/pull/5303/files#diff-8644862a762e96b24031ad83e31729768a40b3387fda49b9a3a4f0c6f4c3de39R52-R78) ```diff -## Example 2: Replace service names for display +## Example 2: Replace severity text labels for display -The following query shortens service names for a compact dashboard view: +The following query replaces verbose severity labels with shorter abbreviations for compact reporting: ```ppl source=otellogs \| where severityText IN ('ERROR', 'WARN') -\| replace 'payment-service' with 'payment' in `resource.attributes.service.name` -\| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name` +\| replace 'ERROR' with 'ERR' in severityText +\| replace 'WARN' with 'WRN' in severityText \| sort severityNumber, `resource.attributes.service.name` \| fields severityText, `resource.attributes.service.name` \| head 4 -The query returns the following results: -`text -fetched rows / total rows = 4/4 -+--------------+----------------------------------+ -\| severityText \| resource.attributes.service.name \| -\|--------------+----------------------------------\| -\| WARN \| frontend-proxy \| -\| WARN \| frontend-proxy \| -\| WARN \| product-catalog \| -\| WARN \| product-catalog \| -+--------------+----------------------------------+ -` <details><summary>Suggestion importance[1-10]: 7</summary> __ Why: Example 2 in `replace.md` is an exact duplicate of Example 1 — same PPL query, same description, same results. This provides no value to readers and should be replaced with a distinct scenario. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fill in missing example content</summary> ___ Example 3 has a heading but no content — the query, description, and expected output <br>are missing entirely. This leaves a broken section in the documentation that jumps <br>directly from an empty heading to Example 4. [docs/user/ppl/cmd/convert.md [78-80]](https://github.com/opensearch-project/sql/pull/5303/files#diff-76c639678d1e487015ecd16ab9353af28891a46b2d29760c877e41904ed29eedR78-R80) ```diff ## Example 3: Extract and convert memory sizes from error messages + +The following query extracts and converts memory size strings from error log messages to kilobytes: + +```ppl +source=otellogs +\| where severityText = 'ERROR' +\| rex field=body "(?<mem_size>\d+[kmg]b?)" +\| convert memk(mem_size) +\| where NOT ISNULL(mem_size) +\| fields body, mem_size +\| head 3 +``` ## Example 4: Multiple field conversions Suggestion importance[1-10]: 6 __ Why: `Example 3` has a heading but no content, jumping directly to `Example 4`. While the suggestion provides plausible content, the improved code is speculative and may not match the actual intended example for this PR.	Low
Ensure non-playground PPL blocks get copy buttons The non-playground branch converts `ppl` blocks back to `ppl` fences without adding a copy button or converting to `sql`. However, the subsequent `re.sub` call converts all remaining `ppl` fences to `sql` without adding a copy button. Non-playground `ppl` blocks should be converted to `sql` with a copy button in the first pass, or the else branch should emit `sql` with a copy button directly, to avoid inconsistent output. scripts/docs_exporter/export_to_docs_website.py [259-269] content = re.sub( r'^```ppl\b[^\n]\n(.?)^```$', lambda m: ( '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}' if _is_playground_eligible(m.group(1), current_file_path) else ( - '```ppl\n' + m.group(1) + '```' + '```sql\n' + m.group(1) + '```\n{% include copy.html %}' ) ), content, flags=re.MULTILINE \| re.DOTALL, ) Suggestion importance[1-10]: 5 __ Why: The non-playground `else` branch keeps `ppl` fences without a copy button, relying on the subsequent `re.sub` to convert them to `sql`, but that subsequent call also doesn't add copy buttons. Converting to `sql` with a copy button directly in the `else` branch would be more consistent and correct.	Low
Fix duplicate row in override example result table The result table for Example 3 shows `WARN` appearing twice with the same `agg` value of `4`, which is inconsistent with the described behavior of `override=true` replacing main search values with subsearch values. The subsearch only covers `ERROR` and `WARN`, so `INFO` and `DEBUG` rows from the main search should retain their original counts while `ERROR` and `WARN` are overridden. The duplicated `WARN` row and the explanation should be reviewed for accuracy. docs/user/ppl/cmd/appendcol.md [94-104] ```text fetched rows / total rows = 4/4 +-----+--------------+ \| agg \| severityText \| \|-----+--------------\| +\| 3 \| DEBUG \| \| 7 \| ERROR \| -\| 4 \| WARN \| \| 6 \| INFO \| \| 4 \| WARN \| +-----+--------------+ <details><summary>Suggestion importance[1-10]: 5</summary> __ Why: The result table shows `WARN` duplicated with the same `agg=4`, which is inconsistent with the described `override=true` behavior. The table should show all four severity levels (`DEBUG`, `ERROR`, `INFO`, `WARN`) with the subsearch overriding only `ERROR` and `WARN` values. </details></details></td><td align=center>Low </td></tr></tr></tbody></table> </details> <details><summary>Suggestions up to commit d0b2a08</summary> <br> </details> <details><summary>Suggestions up to commit cc17d7e</summary> <br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion                                                                                                                                    </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=3>Possible issue</td> <td> <details><summary>Fix duplicate queries with missing parameter</summary> ___ Example 3 contains two identical PPL queries that are supposed to demonstrate <br>different behaviors (<code>keepempty=true</code> vs. default). The first query is missing <br><code>keepempty=true</code> and both queries produce the same result, making the distinction <br>between the two behaviors unclear. The first query should include <code>keepempty=true</code> to <br>demonstrate that null values are retained. [docs/user/ppl/cmd/dedup.md [86-126]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R126) ```diff The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped: ```ppl source=otellogs -\| dedup instrumentationScope.name +\| dedup instrumentationScope.name keepempty=true \| fields instrumentationScope.name \| sort instrumentationScope.name The query returns the following results: `-fetched rows / total rows = 2/2 +fetched rows / total rows = 3/3 +---------------------------+ \| instrumentationScope.name \| \|---------------------------\| +\| null \| \| opentelemetry-java \| \| opentelemetry-python \| +---------------------------+` The following query deduplicates while ignoring documents with empty values in the specified field: `source=otellogs \| dedup instrumentationScope.name \| fields instrumentationScope.name \| sort instrumentationScope.name` <details><summary>Suggestion importance[1-10]: 8</summary> __ Why: The two PPL queries in Example 3 are identical, but they should demonstrate different behaviors. The first query is missing `keepempty=true`, which would show null values being retained, making the documentation misleading and incorrect. </details></details></td><td align=center>Medium </td></tr><tr><td> <details><summary>Fix incorrect duplicate row in result table</summary> ___ The result table for Example 3 (override parameter) is missing the <code>DEBUG</code> row and <br>shows <code>FATAL</code> duplicated, which appears inconsistent with the described behavior. The <br>main search produces 5 severity levels (DEBUG, ERROR, FATAL, INFO, WARN), but the <br>result only shows 5 rows with FATAL appearing twice and DEBUG missing. This is <br>likely a documentation error that could confuse readers about how <code>override=true</code> <br>works. [docs/user/ppl/cmd/appendcol.md [94-104]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R94-R104) ```diff ```text fetched rows / total rows = 5/5 +-----+--------------+ \| agg \| severityText \| \|-----+--------------\| +\| 3 \| DEBUG \| \| 5 \| ERROR \| -\| 2 \| FATAL \| \| 2 \| FATAL \| \| 6 \| INFO \| \| 4 \| WARN \| +-----+--------------+ <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The result table shows `FATAL` duplicated and `DEBUG` missing, which appears inconsistent with the described `override=true` behavior. However, this could be intentional based on how the subsearch aligns rows positionally, so the actual correctness depends on the implementation behavior. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Fix inaccurate total row count in result</summary> ___ The new data contains two documents matching "Payment failed" (one ERROR and one <br>FATAL), but the query claims <code>fetched rows / total rows = 1/1</code>. The <code>head 1</code> limits <br>output to one row, but the total count should reflect all matching rows (2/2). <br>Verify the expected total count in the result block is accurate. [docs/user/ppl/cmd/search.md [178]](https://github.com/opensearch-project/sql/pull/5303/files#diff-dd984cb777515ecc5ea6aa39ef6df5a798c5eb668370ba32279b00f311be3d4cR178-R178) ```diff search "Payment failed" source=otellogs \| sort `resource.attributes.service.name` \| fields body \| head 1 +``` +The query returns the following results: + +```text +fetched rows / total rows = 1/2 ++---------------------------------------------------------------------+ +\| body \| +\|---------------------------------------------------------------------\| +\| Payment failed: connection timeout to payment gateway after 30000ms \| ++---------------------------------------------------------------------+ + Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid concern about the `fetched rows / total rows = 1/1` count when `head 1` is used, but the `improved_code` changes the format to `1/2` which may or may not be accurate depending on how the engine reports totals with `head`. The claim about two matching documents needs verification against the actual data.	Low
General	Remove duplicate documentation examples Examples 1 and 2 are identical in both query and output, which provides no additional value to the reader. Replace Example 2 with a distinct use case, such as replacing a severity text value or normalizing a different field, to demonstrate the command's versatility. docs/user/ppl/cmd/replace.md [24-78] -## Example 1: Normalize service names for a compact dashboard +## Example 2: Normalize severity text labels -The following query shortens long service names for a compact dashboard view: -... -## Example 2: Replace service names for display +The following query replaces verbose severity labels with shorter codes for compact display: -The following query shortens service names for a compact dashboard view: -... -\| replace 'payment-service' with 'payment' in `resource.attributes.service.name` -\| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name` +```ppl +source=otellogs +\| where severityText IN ('ERROR', 'FATAL') +\| replace 'ERROR' with 'ERR' in severityText +\| replace 'FATAL' with 'FTL' in severityText \| sort severityNumber, `resource.attributes.service.name` \| fields severityText, `resource.attributes.service.name` \| head 4 +``` Suggestion importance[1-10]: 7 __ Why: Examples 1 and 2 in `replace.md` are completely identical in both query and output, which is a clear documentation error that provides no value. This is a valid and important observation that should be fixed to improve documentation quality.	Medium
	Restore removed syntax reference sections The PR removes the syntax reference sections ("Optional choices", "Repetition") that explain PPL syntax conventions, replacing them entirely with examples. These reference sections are important for users learning PPL syntax and should be retained in a syntax reference document. The examples are a good addition but should supplement, not replace, the syntax documentation. docs/user/ppl/cmd/syntax.md [52-60] +### Optional choices + +Optional choices between alternatives are shown in square brackets with pipe separators (`[option1 \| option2]`). You can choose one of the options or omit them entirely. + +Example: `[asc \| desc]` means you can specify `asc`, `desc`, or neither. + +### Repetition + +An ellipsis (`...`) indicates that the preceding element can be repeated multiple times. + +Examples: +- `<field>...` means one or more fields without commas: `field1 field2 field3` +- `<field>, ...` means comma-separated repetition: `field1, field2, field3` + ## Example 1: Fetch data from an index with a filter The following query retrieves all error logs from the `otellogs` index. The filter is specified inline with the `source` command: ```ppl source=otellogs severityText = 'ERROR' \| fields severityText, `resource.attributes.service.name` \| sort `resource.attributes.service.name` <details><summary>Suggestion importance[1-10]: 6</summary> __ Why: The PR removes important syntax reference sections ("Optional choices" and "Repetition") from `syntax.md`, which are valuable for users learning PPL conventions. Restoring these sections alongside the new examples would make the documentation more complete and useful. </details></details></td><td align=center>Low </td></tr><tr><td> <details><summary>Add missing query results block</summary> ___ The subsection "Matching with a prefix pattern" shows a query but is missing its <br>expected output block. Every other example in the file includes a results block, and <br>this omission leaves the documentation incomplete and inconsistent. [docs/user/ppl/cmd/where.md [104-113]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R104-R113) ```diff ### Matching with a prefix pattern The following query uses a percent sign (`%`) to find all services in the `cart` domain, useful when investigating issues across a team's microservices: ```ppl source=otellogs \| where LIKE(`resource.attributes.service.name`, 'cart-%') \| sort severityNumber \| fields severityText, `resource.attributes.service.name`, body +The query returns the following results: + +```text +fetched rows / total rows = 3/3 ++--------------+----------------------------------+----------------------------------------------------------------------------------------------+ +\| severityText \| resource.attributes.service.name \| body \| +\|--------------+----------------------------------+---------------------------------...

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-04-02T07:40:44Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0e3a6a.

Path	Line	Severity	Description
docs/user/ppl/cmd/stats.md	18	low	Query result output is embedded directly inside section headings and aggregation function bullet points (e.g., '## fetched rows / total rows = 1/1 ... +arameters' and 'VAR_SAMfetched rows...+--...+arameter'). This corrupted content is anomalous and inconsistent with normal editing behavior, though it appears to be an accidental copy-paste error rather than deliberate injection.
docs/user/ppl/cmd/head.md	15	low	Query result tables are embedded inside the '## Parameters' section heading and the parameter table header, producing malformed markdown. The same pattern as stats.md — anomalous but likely an accidental copy-paste error during the documentation rewrite.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-04-02T07:41:53Z

Persistent review updated to latest commit d0b2a08

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-04-03T07:20:18Z

Persistent review updated to latest commit f0e3a6a

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-04-06T07:14:48Z

Persistent review updated to latest commit c051fd0

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

github-actions · 2026-04-29T16:47:43Z

Persistent review updated to latest commit 2000094

update ppl doc examples to use otel data

cc17d7e

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update sample data

d0b2a08

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update script and examples

f0e3a6a

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

update otel examples

c051fd0

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

apply feedback from documentation website to docs

2000094

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update ppl doc examples to use otel data#5303

update ppl doc examples to use otel data#5303
ritvibhatt wants to merge 5 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples

ritvibhatt commented Apr 1, 2026

Uh oh!

github-actions Bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 1, 2026 •

edited

Loading

Matching with a wildcard pattern

-The query returns the following results:

-The query returns the following results:

Uh oh!

github-actions Bot commented Apr 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 3, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ritvibhatt commented Apr 1, 2026

Description

Related Issues

Check List

Uh oh!

github-actions Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 2000094)

Uh oh!

github-actions Bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Matching with a wildcard pattern

Previous suggestions

-The query returns the following results:

-The query returns the following results:

Uh oh!

github-actions Bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Analyzer ❗

Uh oh!

github-actions Bot commented Apr 2, 2026

Uh oh!

github-actions Bot commented Apr 3, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Apr 1, 2026 •

edited

Loading

(Review updated until commit `2000094`)

github-actions Bot commented Apr 1, 2026 •

edited

Loading

github-actions Bot commented Apr 2, 2026 •

edited

Loading