Skip to content

update ppl doc examples to use otel data#5303

Draft
ritvibhatt wants to merge 5 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples
Draft

update ppl doc examples to use otel data#5303
ritvibhatt wants to merge 5 commits intoopensearch-project:mainfrom
ritvibhatt:update-ppl-examples

Conversation

@ritvibhatt
Copy link
Copy Markdown
Contributor

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR Reviewer Guide 🔍

(Review updated until commit 2000094)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Update docs exporter to support otel playground buttons

Relevant files:

  • scripts/docs_exporter/export_to_docs_website.py

Sub-PR theme: Replace otellogs test data with new otel dataset

Relevant files:

  • doctest/test_data/otellogs.json

Sub-PR theme: Update PPL command doc examples to use otel data

Relevant files:

  • docs/user/ppl/cmd/search.md
  • docs/user/ppl/cmd/where.md
  • docs/user/ppl/cmd/graphlookup.md

⚡ Recommended focus areas for review

Regex Ordering

The new PPL block regex (lines 259-269) converts ppl fences with otellogs to sql fences with playground buttons, then line 272 converts remaining ppl fences to sql. However, the new regex at line 260 uses ^```ppl\b[^\n]*\n(.*?)^```$ which will match ALL ppl blocks (not just otellogs ones) — the otellogs check is done inside the lambda, not in the pattern. This means the non-otellogs branch returns '```ppl\n' + m.group(1) + '```' (without the language tag suffix), potentially losing any extra info on the opening fence line. Also, after this substitution, the remaining ppl fences (non-otellogs) are re-converted by line 272, which is correct, but the interaction should be verified to ensure no double-processing or missed blocks.

content = re.sub(
    r'^```ppl\b[^\n]*\n(.*?)^```$',
    lambda m: (
        '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
        if _is_playground_eligible(m.group(1), current_file_path)
        else (
            '```ppl\n' + m.group(1) + '```'
        )
    ),
    content, flags=re.MULTILINE | re.DOTALL,
)

# Convert remaining PPL code fences to SQL
content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
Copy Button Duplication

The regex at line 280 adds copy buttons after sql/bash/sh fences that don't already have one, using a negative lookahead (?!\n\{% include copy). However, the playground-eligible PPL blocks (converted to sql at lines 262) already have {% include copy.html %} appended. The negative lookahead should prevent double-adding, but this interaction should be carefully validated, especially since the copy button is on the same line as the closing fence in the new format.

content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)',
                 r'```\1\n\2```\n{% include copy.html %}',
                 content, flags=re.MULTILINE | re.DOTALL)
Formatting Inconsistency

Line 414 shows | ERROR | (with extra trailing spaces before the pipe), which is inconsistent with the other rows in the same table (| ERROR |). This may cause rendering issues in the documentation website.

| ERROR       |
Missing Index Directives

The old data file used {"index": {"_id": "N"}} bulk API index directives before each document, which is the standard format for OpenSearch bulk ingestion. The new file omits these directives entirely. If this file is used for bulk indexing in doctests or CI, the ingestion will fail or produce unexpected results without the action/metadata lines.

{"@timestamp": "2024-02-01T09:10:00Z", "time": "2024-02-01T09:10:00Z", "severityNumber": 9, "severityText": "INFO", "body": "[2024-02-01T09:10:00.123Z] \"GET /api/products HTTP/1.1\" 200 - 1024 45 frontend-6b7b4c9f-x2kl9", "traceId": "abcd1234efgh5678", "spanId": "span0001", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:11:00Z", "time": "2024-02-01T09:11:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Order #1234 placed successfully by user U100", "traceId": "abcd1234efgh5678", "spanId": "span0002", "flags": 0, "instrumentationScope": {"name": "Microsoft.Extensions.Hosting", "version": "9.0.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:12:00Z", "time": "2024-02-01T09:12:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Slow query detected: SELECT * FROM products WHERE category = 'electronics' took 3200ms", "traceId": "abcd1234efgh5678", "spanId": "span0003", "flags": 0, "instrumentationScope": {"name": "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc", "version": "0.49.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:13:00Z", "time": "2024-02-01T09:13:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Payment failed: connection timeout to payment gateway after 30000ms", "traceId": "ijkl9012mnop3456", "spanId": "span0004", "flags": 0, "instrumentationScope": {"name": "@opentelemetry/instrumentation-http", "version": "0.57.0", "droppedAttributesCount": 0}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:14:00Z", "time": "2024-02-01T09:14:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Cache miss for key user:session:U200 in Valkey cluster", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:15:00Z", "time": "2024-02-01T09:15:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "NullPointerException in CheckoutService.placeOrder at line 142", "traceId": "qrst5678uvwx9012", "spanId": "span0006", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:16:00Z", "time": "2024-02-01T09:16:00Z", "severityNumber": 9, "severityText": "INFO", "body": "User U300 authenticated via OAuth2 from 10.0.0.5", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:17:00Z", "time": "2024-02-01T09:17:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Out of memory: Java heap space - shutting down pod payment-6f8d4b-ht7q3", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "payment"}, "host": {"name": "payment-6f8d4b-ht7q3"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:18:00Z", "time": "2024-02-01T09:18:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Connection pool 80% utilized on database replica db-replica-02", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:19:00Z", "time": "2024-02-01T09:19:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Kafka consumer group rebalanced: 4 partitions assigned to consumer checkout-consumer-01", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:20:00Z", "time": "2024-02-01T09:20:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "[2024-02-01T09:20:00.456Z] \"POST /api/checkout HTTP/1.1\" 503 - 0 30000 checkout-8d4f7b-mk2p9", "traceId": "yzab3456cdef7890", "spanId": "span0011", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:21:00Z", "time": "2024-02-01T09:21:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "gRPC call /ProductCatalogService/GetProduct completed in 12ms", "traceId": "abcd1234efgh5678", "spanId": "span0012", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:22:00Z", "time": "2024-02-01T09:22:00Z", "severityNumber": 13, "severityText": "WARN", "body": "SSL certificate for api.example.com expires in 14 days", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:23:00Z", "time": "2024-02-01T09:23:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Deployment frontend-v2.2.0 rolled out successfully to 3/3 replicas", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:24:00Z", "time": "2024-02-01T09:24:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Failed to process recommendation request: invalid product ID from 203.0.113.50", "traceId": "ghij7890klmn1234", "spanId": "span0015", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "recommendation"}, "host": {"name": "recommendation-5f7c-bn3k8"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:25:00Z", "time": "2024-02-01T09:25:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Database primary node unreachable: connection refused to db-primary-01:5432", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "product-catalog"}, "host": {"name": "productcatalog-7c9d-zn4p2"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:26:00Z", "time": "2024-02-01T09:26:00Z", "severityNumber": 9, "severityText": "INFO", "body": "Health check passed: all 5 dependencies healthy", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend"}, "host": {"name": "frontend-6b7b4c9f-x2kl9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:27:00Z", "time": "2024-02-01T09:27:00Z", "severityNumber": 13, "severityText": "WARN", "body": "Rate limit threshold reached: 450/500 requests per minute for API key ending in ...abc789", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "frontend-proxy"}, "host": {"name": "frontendproxy-envoy-7d4b8c-xk2q9"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:28:00Z", "time": "2024-02-01T09:28:00Z", "severityNumber": 5, "severityText": "DEBUG", "body": "Valkey SETEX user:session:U300 3600 - session refreshed", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "cart"}, "host": {"name": "cart-5d8f7b-mk29s"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}
{"@timestamp": "2024-02-01T09:29:00Z", "time": "2024-02-01T09:29:00Z", "severityNumber": 17, "severityText": "ERROR", "body": "Kafka producer delivery failed: message too large for topic order-events (max 1048576 bytes)", "traceId": "", "spanId": "", "flags": 0, "instrumentationScope": {}, "resource": {"attributes": {"service": {"name": "checkout"}, "host": {"name": "checkout-8b4c2d-jp5r7"}}, "droppedAttributesCount": 0}, "attributes": {}, "droppedAttributesCount": 0}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

PR Code Suggestions ✨

Latest suggestions up to 2000094

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove stray code lines outside code block

The PPL code block for Example 3 is malformed. There is a stray | where age > 30 and
| fields state, age outside the code fence, which are leftover lines from the old
example that were not properly removed. These lines should be deleted.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 9</summary>

__

Why: The code block for Example 3 is clearly malformed - there are leftover lines `| where age > 30` and `| fields state, age` outside the closing code fence, which are remnants from the old example. This is a real bug in the documentation that would confuse readers.


</details></details></td><td align=center>High

</td></tr><tr><td>



<details><summary>Fix duplicate identical queries in example</summary>

___


**Example 3 contains two identical PPL queries but claims they demonstrate different <br>behaviors (one with <code>keepempty=true</code> and one without). The second query should use <br><code>keepempty=true</code> to actually demonstrate including null values, matching the <br>description "deduplicates while ignoring documents with empty values" (which should <br>instead show <code>keepempty=true</code> behavior). The queries and their descriptions need to be <br>differentiated.**

[docs/user/ppl/cmd/dedup.md [86-115]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R115)

```diff
 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
 | dedup instrumentationScope.name
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

fetched rows / total rows = 3/3
-...
++-----------------------------------------------------------------------------+
+| instrumentationScope.name                                                   |
+|-----------------------------------------------------------------------------|
+| @opentelemetry/instrumentation-http                                         |
+| Microsoft.Extensions.Hosting                                                |
+| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
++-----------------------------------------------------------------------------+

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates while keeping documents with empty values in the specified field:

source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The two PPL queries in Example 3 are identical but claim to demonstrate different behaviors. The second query should include `keepempty=true` to actually show the described behavior of "ignoring documents with empty values," making this a genuine documentation correctness issue.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Non-playground PPL blocks skip SQL conversion</summary>

___


**The non-playground branch preserves the original <code>ppl</code> fence, which means the <br>subsequent <code>re.sub</code> for converting remaining PPL fences to SQL will still match and <br>convert them. However, the non-playground branch does NOT add a copy button, while <br>the original code added one for all SQL fences. This is likely intentional, but the <br>non-playground branch should also add a copy button after converting to SQL, or the <br>conversion to SQL should happen inside this lambda for the non-playground case as <br>well, to ensure consistent behavior.**

[scripts/docs_exporter/export_to_docs_website.py [259-269]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R269)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 7

__

Why: The non-playground branch preserves the ppl fence, which will be caught by the subsequent re.sub on line 272 that converts remaining PPL fences to SQL. However, that conversion won't add a copy button, while playground-eligible blocks do get one. The suggestion to convert to sql directly in the non-playground branch is valid and would ensure consistent SQL conversion, though the copy button omission for non-playground blocks may be intentional design.

Medium
Fix incorrect duplicate row in example result table

The result table for Example 3 shows WARN appearing twice with the same count (4),
which looks like a data error. The override=true behavior should replace the main
search agg values with subsearch values only for matching rows (ERROR and WARN),
leaving INFO and DEBUG from the main search unchanged. The duplicate WARN row and
missing DEBUG row make the example misleading and potentially incorrect.

docs/user/ppl/cmd/appendcol.md [95-104]

 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
Suggestion importance[1-10]: 6

__

Why: The result table shows WARN appearing twice with the same count and is missing a DEBUG row, which appears to be a data error that could mislead readers about the override=true behavior.

Low
General
Fix non-sequential example numbering

The examples jump from Example 4 to Example 6, skipping Example 5. This appears to
be a numbering error introduced in the PR when examples were reorganized. The
example numbering should be sequential to avoid confusing readers.

docs/user/ppl/cmd/sort.md [180]

-## Example 6: Specifying the number of sorted documents to return
+## Example 5: Specifying the number of sorted documents to return
Suggestion importance[1-10]: 7

__

Why: The examples jump from Example 4 to Example 6, skipping Example 5. This is a real numbering error in the PR that would confuse readers navigating the documentation.

Medium
Fix filter placement affecting aggregation correctness

The where clause filters by severityText after eventstats, but eventstats computes
counts over all rows (not just ERROR rows). This means scope_count reflects counts
across all severity levels, not just errors, which is misleading. The where clause
should be placed before eventstats to ensure the count reflects only ERROR entries,
or the description should clarify this behavior.

docs/user/ppl/cmd/eventstats.md [141-146]

 ```ppl
 source=otellogs
+| where severityText = 'ERROR'
 | eventstats bucket_nullable=false count() as scope_count by instrumentationScope.name
-| where severityText = 'ERROR'
 | sort `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, `instrumentationScope.name`, scope_count
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The `where severityText = 'ERROR'` clause is placed after `eventstats`, meaning `scope_count` reflects counts across all severity levels rather than just errors. While this may be intentional to demonstrate `bucket_nullable=false`, the query logic could be misleading. However, this may be a deliberate design choice for the example, so the impact is moderate.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Add missing result block for first subsection</summary>

___


**The "Matching with a prefix pattern" subsection shows a query but has no result <br>block, while only the second subsection ("Matching with a wildcard pattern") shows <br>results. Readers of the first subsection will be left without example output. A <br>result block should be added after the first query, or a note should clarify that <br>only the second query's results are shown.**

[docs/user/ppl/cmd/where.md [108-130]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R108-R130)

```diff
 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services starting with `frontend`:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'frontend%')
 | fields severityText, `resource.attributes.service.name`, body
 | head 3
+```
+
+The query returns the following results:
+
+```text
+fetched rows / total rows = 3/3
++--------------+----------------------------------+-------+
+| severityText | resource.attributes.service.name | body  |
+|--------------+----------------------------------+-------|
+| ...          | frontend                         | ...   |
+| ...          | frontend-proxy                   | ...   |
+| ...          | frontend-proxy                   | ...   |
++--------------+----------------------------------+-------+

Matching with a wildcard pattern

The following query finds all logs from services containing product in their name:

source=otellogs
| where LIKE(`resource.attributes.service.name`, '%product%')
| fields severityText, `resource.attributes.service.name`, body
| head 3

The query returns the following results:

<details><summary>Suggestion importance[1-10]: 4</summary>

__

Why: The "Matching with a prefix pattern" subsection lacks a result block, which is inconsistent with the rest of the documentation. However, the `improved_code` contains placeholder data (`...`) rather than actual query results, making it an incomplete fix.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fix misaligned table column padding in result output</summary>

___


**The <code>unknown</code> values in the result table are not padded to align with the column <br>width, causing the table to appear misformatted. The <code>unknown</code> entries should be <br>padded with trailing spaces to match the column width defined by the header, <br>consistent with the formatting of other result tables in the documentation.**

[docs/user/ppl/cmd/fillnull.md [53-64]](https://github.com/opensearch-project/sql/pull/5303/files#diff-e05ce7bc17faa0105fe2b13aeacef92b8f08f90aac016e003b615f5f575efaa4R53-R64)

```diff
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
 | ERROR        | payment                          | @opentelemetry/instrumentation-http                                         |
-| ERROR        | payment                          | unknown                                                            |
+| ERROR        | payment                          | unknown                                                                     |
 | WARN         | product-catalog                  | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
-| WARN         | product-catalog                  | unknown                                                            |
-| ERROR        | product-catalog                  | unknown                                                            |
-| ERROR        | recommendation                   | unknown                                                            |
+| WARN         | product-catalog                  | unknown                                                                     |
+| ERROR        | product-catalog                  | unknown                                                                     |
+| ERROR        | recommendation                   | unknown                                                                     |
Suggestion importance[1-10]: 4

__

Why: The unknown values in the result table are not padded to match the column width, causing visual misalignment. This is a cosmetic issue that affects readability but not correctness.

Low
Lookahead pattern may not reliably prevent duplicate copy buttons

The negative lookahead (?!\n{% include copy) uses re.DOTALL, which makes . match
newlines, but the lookahead itself checks for a literal newline \n followed by {%.
With re.DOTALL, the $ in ^$(?!\n...) matches end-of-line (due to re.MULTILINE), so
the lookahead should work correctly. However, the
{% include try-in-playground.html
%} button is added after {% include copy.html %}, so the pattern (?!\n{% include
copy) will correctly skip blocks that already have a copy button. Verify that the
lookahead correctly handles cases where the playground button was added (which also
includes a copy button before it) to avoid double copy buttons.

scripts/docs_exporter/export_to_docs_website.py [280-282]

-content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)',
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{%[ ]include copy)',
                  r'```\1\n\2```\n{% include copy.html %}',
                  content, flags=re.MULTILINE | re.DOTALL)
Suggestion importance[1-10]: 3

__

Why: The suggestion to add a space in the lookahead pattern ({%[ ]include copy) is a minor robustness improvement, but the existing pattern {% include copy already matches the literal string produced by the code. The improved_code change is minimal and the concern about double copy buttons is valid but the existing logic should handle it correctly since playground blocks add copy.html before try-in-playground.html.

Low
Add missing newline at end of file

The file is missing a newline at the end. Files should end with a newline character
to follow standard text file conventions and avoid potential issues with file
concatenation or diff tools.

docs/user/ppl/cmd/appendpipe.md [86]

+* **Schema compatibility**: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`.
 
-
Suggestion importance[1-10]: 1

__

Why: The existing_code and improved_code are identical, meaning no actual change is proposed. Adding a trailing newline is a very minor style concern with negligible impact.

Low

Previous suggestions

Suggestions up to commit c051fd0
CategorySuggestion                                                                                                                                    Impact
Possible issue
Remove orphaned code lines outside code fence

The PPL code block for Example 3 is broken — there is a stray closing
triple-backtick followed by two orphaned pipeline stages (| where age > 30 and |
fields state, age) that appear outside the code fence. These leftover lines from the
old example were not removed and will render as raw text in the documentation.

docs/user/ppl/cmd/replace.md [83-91]

 ```ppl
 source=otellogs
 | where severityText = 'ERROR'
 | replace "frontend-proxy" WITH "frontend proxy" IN `resource.attributes.service.name`
 | fields `resource.attributes.service.name`, body
 | head 3

-| where age > 30
-| fields state, age
-```

<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The `| where age > 30` and `| fields state, age` lines are leftover from the old example and appear outside the code fence, which will render as raw text in the documentation and break the example's presentation.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Prevent duplicate copy buttons on playground-eligible blocks</summary>

___


**The non-playground branch reconstructs the block as <code> </code><code></code>ppl <code> but the subsequent regex </code><br><code></code>re.sub(r'^<code></code><code>ppl\b.*$', '</code><code></code>sql', ...)<code> will then convert it to SQL anyway, which is the </code><br><code>intended behavior. However, the non-playground branch is missing the copy button and </code><br><code>the try-in-playground button, so the final output for non-otellogs PPL blocks will </code><br><code>only get a copy button from the later generic SQL regex — this is fine, but the </code><br><code>playground-eligible branch adds the copy button inline AND the later regex will add </code><br><code>another copy button because the </code>(?!\n\{% include copy)<code> lookahead checks for a </code><br><code>newline before </code>{% include copy<code>, but the playground block ends with </code>{% include <br>try-in-playground.html %}<code>, not </code>{% include copy<code>. This means playground-eligible </code><br><code>blocks will get a duplicate </code>{% include copy.html %}<code> appended.</code>**

[scripts/docs_exporter/export_to_docs_website.py [259-282]](https://github.com/opensearch-project/sql/pull/5303/files#diff-181ca7ed47f0c6ab9ec4a7da4a3f26e0c13c35b9d3cfe596f072eb1df128c84eR259-R282)

```diff
 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
             '```ppl\n' + m.group(1) + '```'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
 
+# Convert remaining PPL code fences to SQL
+content = re.sub(r'^```ppl\b.*$', '```sql', content, flags=re.MULTILINE)
+
+# Add copy buttons after sql/bash/sh fences that don't already have one
+content = re.sub(r'^```(bash|sh|sql)\b[^\n]*\n(.*?)^```$(?!\n\{% include copy)(?!\n\{% include try-in-playground)',
+                 r'```\1\n\2```\n{% include copy.html %}',
+                 content, flags=re.MULTILINE | re.DOTALL)
+
Suggestion importance[1-10]: 7

__

Why: This is a valid bug: playground-eligible blocks already include {% include copy.html %}, but the later regex at line 280 checks only for (?!\n\{% include copy) — since those blocks end with {% include try-in-playground.html %}, the lookahead won't match and a second copy button will be appended. The fix to also exclude blocks followed by {% include try-in-playground is correct and important.

Medium
Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 now use identical PPL queries (both lack
keepempty=true) and return identical results, making them indistinguishable. The
first sub-example should use keepempty=true to demonstrate retaining null values, as
the preceding description states "by default, records with null values are dropped."

docs/user/ppl/cmd/dedup.md [108-128]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

-The query returns the following results:

-```text
-fetched rows / total rows = 3/3
-+-----------------------------------------------------------------------------+

- instrumentationScope.name
- @opentelemetry/instrumentation-http
- Microsoft.Extensions.Hosting
- go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc
-+-----------------------------------------------------------------------------+
-```
<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Both sub-examples in Example 3 use identical queries without `keepempty=true`, making them indistinguishable despite the description implying different behaviors. The first sub-example should include `keepempty=true` to match its stated purpose.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in example output</summary>

___


**The result table for Example 3 shows <code>WARN</code> appearing twice with the same count (<code>4</code>), <br>which looks like a data error. With <code>override=true</code>, the subsearch values should <br>replace the main search values for matching rows, so <code>INFO</code> and <code>DEBUG</code> rows (which have <br>no match in the subsearch) should retain their original counts while <code>ERROR</code> and <code>WARN</code> <br>rows get replaced. The current output is misleading and inconsistent with the <br>described behavior.**

[docs/user/ppl/cmd/appendcol.md [96-103]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R96-R103)

```diff
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
Suggestion importance[1-10]: 6

__

Why: The result table shows WARN appearing twice with count 4, which is inconsistent with the described behavior of override=true. The query filters for ERROR and WARN only in the subsearch, so INFO and DEBUG rows from the main search should still appear with their original values, making the output misleading.

Low
General
Fill in missing example body content

Example 3 has a heading but no body — the PPL query, description, and expected
output are completely missing. This leaves a blank section in the documentation
between the heading and Example 4.

docs/user/ppl/cmd/convert.md [78-80]

 ## Example 3: Extract and convert memory sizes from error messages
+
+The following query extracts and converts memory size strings from error messages to kilobytes:
+
+```ppl
+source=otellogs
+| rex field=body "(?<memory>\d+[kmgKMG]?b?)"
+| convert memk(memory)
+| where NOT ISNULL(memory)
+| stats avg(memory) as avg_memory_kb by `resource.attributes.service.name`
+```
 
 ## Example 4: Multiple field conversions
Suggestion importance[1-10]: 5

__

Why: Example 3 has a heading but no content, leaving a blank section in the documentation. However, the suggested improved_code is speculative and may not reflect actual data available in the otellogs index used throughout the PR.

Low
Use precise pattern matching for source index check

The check 'source=otellogs' not in block_text is a plain substring match and could
produce false positives for strings like source=otellogssomething or #
source=otellogs in a comment. Consider using a more precise pattern such as a
word-boundary regex to ensure the source is exactly otellogs.

scripts/docs_exporter/export_to_docs_website.py [248-256]

 def _is_playground_eligible(block_text, file_path):
     """Check if a PPL block should get a Try in Playground button."""
-    if 'source=otellogs' not in block_text:
+    if not re.search(r'\bsource\s*=\s*otellogs\b', block_text):
         return False
     if file_path:
         stem = file_path.stem if hasattr(file_path, 'stem') else str(file_path).rsplit('/', 1)[-1].replace('.md', '')
         if stem in PLAYGROUND_UNSUPPORTED_CMDS:
             return False
     return True
Suggestion importance[1-10]: 4

__

Why: The suggestion to use a regex word-boundary check instead of a plain substring match for source=otellogs is a reasonable robustness improvement, but the risk of false positives in practice (e.g., source=otellogssomething) is low given the PPL syntax. The improvement is minor but valid.

Low
Remove duplicated note about coordinating node

The new introductory paragraph duplicates the information already present in the
existing > Note block directly below it. Having both the paragraph and the note
say the same thing about the coordinating node creates redundancy. Consider removing
the note block since the new paragraph is more descriptive, or merging them.

docs/user/ppl/cmd/eval.md [6-8]

 The `eval` command processes data after documents are retrieved from the shards. This means that `eval` cannot be used to filter documents before they are returned. Use a `where` clause for filtering. Additionally, because `eval` computations are performed on the coordinating node rather than distributed across data nodes, performance may be slower for large result sets.
 
-> **Note**: The `eval` command is not rewritten to [query domain-specific language (DSL)](https://docs.opensearch.org/latest/query-dsl/). It is only executed on the coordinating node.
-
Suggestion importance[1-10]: 4

__

Why: The new paragraph and the existing > **Note** block both describe the same coordinating node behavior, creating redundancy. Removing the note would improve clarity, but this is a minor documentation style issue.

Low
Restore accidentally removed limitations section

The entire "Limitations" section was removed from appendpipe.md in this PR. This
section contained important information about schema compatibility constraints that
users need to be aware of when using the command.

docs/user/ppl/cmd/appendpipe.md [82]

+## Limitations
 
+The `appendpipe` command has the following limitations:
 
+* **Schema compatibility**: When fields with the same name exist in both the main search and the subpipeline but have incompatible types, the query fails with an error. To avoid type conflicts, ensure that fields with the same name share the same data type. Alternatively, use different field names. You can rename the conflicting fields using `eval` or select non-conflicting columns using `fields`.
+
Suggestion importance[1-10]: 4

__

Why: The Limitations section was removed from appendpipe.md in this PR, but looking at the diff, the old hunk shows it was present and the new hunk does not include it. However, the existing_code and improved_code are identical, so this is more of a verification request than a concrete fix.

Low
Fix misaligned table column padding in output

The unknown values in the instrumentationScope.name column are not padded to match
the column width, causing misaligned table formatting in the rendered output. The
column width should be consistent across all rows to maintain proper table alignment
in the markdown text block.

docs/user/ppl/cmd/fillnull.md [53-63]

-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | checkout                         | unknown                                                            |
-| ERROR        | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
-| WARN         | frontend-proxy                   | unknown                                                            |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | checkout                         | unknown                                                                     |
+| ERROR        | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
+| WARN         | frontend-proxy                   | unknown                                                                     |
 | ERROR        | payment                          | @opentelemetry/instrumentation-http                                         |
-| ERROR        | payment                          | unknown                                                            |
+| ERROR        | payment                          | unknown                                                                     |
 | WARN         | product-catalog                  | go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
-| WARN         | product-catalog                  | unknown                                                            |
-| ERROR        | product-catalog                  | unknown                                                            |
-| ERROR        | recommendation                   | unknown                                                            |
+| WARN         | product-catalog                  | unknown                                                                     |
+| ERROR        | product-catalog                  | unknown                                                                     |
+| ERROR        | recommendation                   | unknown                                                                     |
Suggestion importance[1-10]: 3

__

Why: The unknown values in the instrumentationScope.name column are not padded to match the column width, causing visual misalignment in the text block. This is a minor formatting issue that affects readability but not functionality.

Low
Suggestions up to commit f0e3a6a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix corrupted markdown document structure

The stats.md file has been corrupted — raw query result output has been injected
directly into the ## Parameters section header and the parameter table header,
breaking the document structure. The query result blocks appear to have been
incorrectly merged into the markdown prose. These sections need to be restored to
their original clean markdown headings and table structure.

docs/user/ppl/cmd/stats.md [18-36]

-## fetched rows / total rows = 1/1
-+-----+
-| p90 |
-|-----|
-| 21  |
-+-----+arameters
+## Parameters
 
 The `stats` command supports the following parameters.
 
-| fetched rows / total rows = 5/5
-...
-+--------------------------------+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |
Suggestion importance[1-10]: 9

__

Why: The stats.md file has raw query result output injected directly into the ## Parameters section header and the parameter table header, completely breaking the document structure. This is a critical documentation corruption issue that needs to be fixed.

High
Fix corrupted aggregation function list entries

The aggregation function list in stats.md has raw query result tables injected
mid-word throughout the bullet points (e.g., VAR_SAM

instead of VAR_SAMP). This
corrupts all the function name entries and their descriptions. The list needs to be
restored to clean bullet points with proper function names.

docs/user/ppl/cmd/stats.md [52-187]

-* `VAR_SAMfetched rows / total rows = 5/5
-...
-` -- Sample variance
-* `VAR_fetched rows / total rows = 1/1
-...
-Ofetched rows / total rows = 3/3
-...
-` -- fetched rows / total rows = 1/1
-...
-opulation variance
-* `STDDEV_SAMfetched rows / total rows = 4/4
-...
-` -- Sample standard deviation
-* `STDDEV_fetched rows / total rows = 5/5
-...
-Ofetched rows / total rows = 1/1
-...
-` -- fetched rows / total rows = 5/5
-...
-opulation standard deviation
+* `VAR_SAMP` -- Sample variance
+* `VAR_POP` -- Population variance
+* `STDDEV_SAMP` -- Sample standard deviation
+* `STDDEV_POP` -- Population standard deviation
+* `DISTINCT_COUNT_APPROX` -- Approximate distinct count
+* `TAKE` -- List of original values
+* `PERCENTILE`/`PERCENTILE_APPROX` -- Percentile calculations
+* `PERC<percent>`/`P<percent>` -- Percentile shortcut functions
Suggestion importance[1-10]: 9

__

Why: The aggregation function list has raw query result tables injected mid-word throughout the bullet points (e.g., VAR_SAM<table> instead of VAR_SAMP), corrupting all function name entries and descriptions. This is a critical documentation corruption that makes the content unreadable.

High
Fix corrupted Parameters section header

The ## Parameters section header and the parameter table header have been
accidentally replaced with query result blocks. This appears to be a copy-paste
error that corrupts the documentation structure, making the parameters section
unreadable.

docs/user/ppl/cmd/head.md [16-30]

-## fetched rows / total rows = 1/1
-+------------+
-| total_logs |
-|------------|
-| 20         |
-+------------+arameters
+## Parameters
 
 The `head` command supports the following parameters.
 
-| fetched rows / total rows = 1/1
-+--------------+
-| avg_severity |
-|--------------|
-| 12.0         |
-+--------------+arameter | Required/Optional | Description |
+| Parameter | Required/Optional | Description |
Suggestion importance[1-10]: 9

__

Why: The ## Parameters section header and the parameter table header have been accidentally replaced with query result blocks, making the documentation structurally broken and unreadable. This is a clear copy-paste error that needs to be fixed.

High
Fix duplicate queries with different intended behaviors

The two sub-examples in Example 3 use identical PPL queries (both use dedup
instrumentationScope.name without keepempty=true) and return identical results. The
first sub-example is supposed to demonstrate keepempty=true behavior (keeping null
values), while the second demonstrates the default behavior (dropping nulls). The
first query is missing keepempty=true and the result should show a null row.

docs/user/ppl/cmd/dedup.md [107-127]

-The following query deduplicates while ignoring documents with empty values in the specified field:
+The following query deduplicates while keeping documents with `null` values in the specified field:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 3/3
+fetched rows / total rows = 4/4
+-----------------------------------------------------------------------------+
| instrumentationScope.name                                                   |
|-----------------------------------------------------------------------------|
+| null                                                                        |
| @opentelemetry/instrumentation-http                                         |
| Microsoft.Extensions.Hosting                                                |
| go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc |
+-----------------------------------------------------------------------------+
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: Both sub-examples in Example 3 use identical PPL queries, but they are meant to demonstrate different behaviors (`keepempty=true` vs default). The first query is missing `keepempty=true` and the result should include a null row, making this a genuine documentation bug.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix filter value to match actual dataset service name</summary>

___


**The example filters for <code>resource.attributes.service.name = 'payment-service'</code>, but <br>the dataset uses <code>payment</code> (not <code>payment-service</code>) as the service name, resulting in 0 <br>rows returned. An example that intentionally returns no results is confusing for <br>documentation purposes and does not demonstrate the <code>AND</code> operator effectively. The <br>service name in the filter should match an actual value present in the dataset, such <br>as <code>payment</code>.**

[docs/user/ppl/cmd/where.md [50-69]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R50-R69)

```diff
 ## Example 2: Filter using combined criteria
 
 The following query narrows down errors to a specific service during an incident investigation, combining severity and service name conditions with `AND`:
 
 ```ppl
 source=otellogs
-| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment-service'
+| where severityNumber >= 17 AND `resource.attributes.service.name` = 'payment'
 | sort severityNumber
 | fields severityText, severityNumber, `resource.attributes.service.name`

The query returns the following results:

-fetched rows / total rows = 0/0
+fetched rows / total rows = 2/2
+--------------+----------------+----------------------------------+
| severityText | severityNumber | resource.attributes.service.name |
|--------------+----------------+----------------------------------|
+| ERROR        | 17             | payment                          |
+| ERROR        | 17             | payment                          |
+--------------+----------------+----------------------------------+
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The filter uses `'payment-service'` but the dataset contains `'payment'`, causing 0 rows returned. An example returning empty results for a combined `AND` filter is misleading and doesn't demonstrate the feature effectively.


</details></details></td><td align=center>Low

</td></tr><tr><td rowspan=4>General</td>
<td>



<details><summary>Remove duplicate example with identical query and results</summary>

___


**Example 2 is a duplicate of Example 1 — both use the exact same PPL query and return <br>the exact same results. This provides no additional value to the reader. Replace <br>Example 2 with a distinct scenario, such as demonstrating the <code>replace</code> command with a <br>different field or showing how multiple replacements interact when both patterns <br>match.**

[docs/user/ppl/cmd/replace.md [52-78]](https://github.com/opensearch-project/sql/pull/5303/files#diff-8644862a762e96b24031ad83e31729768a40b3387fda49b9a3a4f0c6f4c3de39R52-R78)

```diff
-## Example 2: Replace service names for display
+## Example 2: Replace severity text labels for display
 
-The following query shortens service names for a compact dashboard view:
+The following query replaces verbose severity labels with shorter abbreviations for compact reporting:
 
 ```ppl
 source=otellogs
 | where severityText IN ('ERROR', 'WARN')
-| replace 'payment-service' with 'payment' in `resource.attributes.service.name`
-| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name`
+| replace 'ERROR' with 'ERR' in severityText
+| replace 'WARN' with 'WRN' in severityText
 | sort severityNumber, `resource.attributes.service.name`
 | fields severityText, `resource.attributes.service.name`
 | head 4

-The query returns the following results:

-text -fetched rows / total rows = 4/4 -+--------------+----------------------------------+ -| severityText | resource.attributes.service.name | -|--------------+----------------------------------| -| WARN | frontend-proxy | -| WARN | frontend-proxy | -| WARN | product-catalog | -| WARN | product-catalog | -+--------------+----------------------------------+ -

<details><summary>Suggestion importance[1-10]: 7</summary>

__

Why: Example 2 in `replace.md` is an exact duplicate of Example 1 — same PPL query, same description, same results. This provides no value to readers and should be replaced with a distinct scenario.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fill in missing example content</summary>

___


**Example 3 has a heading but no content — the query, description, and expected output <br>are missing entirely. This leaves a broken section in the documentation that jumps <br>directly from an empty heading to Example 4.**

[docs/user/ppl/cmd/convert.md [78-80]](https://github.com/opensearch-project/sql/pull/5303/files#diff-76c639678d1e487015ecd16ab9353af28891a46b2d29760c877e41904ed29eedR78-R80)

```diff
 ## Example 3: Extract and convert memory sizes from error messages
+
+The following query extracts and converts memory size strings from error log messages to kilobytes:
+
+```ppl
+source=otellogs
+| where severityText = 'ERROR'
+| rex field=body "(?<mem_size>\d+[kmg]b?)"
+| convert memk(mem_size)
+| where NOT ISNULL(mem_size)
+| fields body, mem_size
+| head 3
+```
 
 ## Example 4: Multiple field conversions
Suggestion importance[1-10]: 6

__

Why: Example 3 has a heading but no content, jumping directly to Example 4. While the suggestion provides plausible content, the improved code is speculative and may not match the actual intended example for this PR.

Low
Ensure non-playground PPL blocks get copy buttons

The non-playground branch converts ppl blocks back to ppl fences without adding a
copy button or converting to sql. However, the subsequent re.sub call converts all
remaining ppl fences to sql without adding a copy button. Non-playground ppl blocks
should be converted to sql with a copy button in the first pass, or the else branch
should emit sql with a copy button directly, to avoid inconsistent output.

scripts/docs_exporter/export_to_docs_website.py [259-269]

 content = re.sub(
     r'^```ppl\b[^\n]*\n(.*?)^```$',
     lambda m: (
         '```sql\n' + m.group(1) + '```\n{% include copy.html %}\n{% include try-in-playground.html %}'
         if _is_playground_eligible(m.group(1), current_file_path)
         else (
-            '```ppl\n' + m.group(1) + '```'
+            '```sql\n' + m.group(1) + '```\n{% include copy.html %}'
         )
     ),
     content, flags=re.MULTILINE | re.DOTALL,
 )
Suggestion importance[1-10]: 5

__

Why: The non-playground else branch keeps ppl fences without a copy button, relying on the subsequent re.sub to convert them to sql, but that subsequent call also doesn't add copy buttons. Converting to sql with a copy button directly in the else branch would be more consistent and correct.

Low
Fix duplicate row in override example result table

The result table for Example 3 shows WARN appearing twice with the same agg value of
4, which is inconsistent with the described behavior of override=true replacing main
search values with subsearch values. The subsearch only covers ERROR and WARN, so
INFO and DEBUG rows from the main search should retain their original counts while
ERROR and WARN are overridden. The duplicated WARN row and the explanation should be
reviewed for accuracy.

docs/user/ppl/cmd/appendcol.md [94-104]

 ```text
 fetched rows / total rows = 4/4
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 7   | ERROR        |
-| 4   | WARN         |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
<details><summary>Suggestion importance[1-10]: 5</summary>

__

Why: The result table shows `WARN` duplicated with the same `agg=4`, which is inconsistent with the described `override=true` behavior. The table should show all four severity levels (`DEBUG`, `ERROR`, `INFO`, `WARN`) with the subsearch overriding only `ERROR` and `WARN` values.


</details></details></td><td align=center>Low

</td></tr></tr></tbody></table>

</details>
<details><summary>Suggestions up to commit d0b2a08</summary>
<br>

</details>
<details><summary>Suggestions up to commit cc17d7e</summary>
<br><table><thead><tr><td><strong>Category</strong></td><td align=left><strong>Suggestion&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </strong></td><td align=center><strong>Impact</strong></td></tr><tbody><tr><td rowspan=3>Possible issue</td>
<td>



<details><summary>Fix duplicate queries with missing parameter</summary>

___


**Example 3 contains two identical PPL queries that are supposed to demonstrate <br>different behaviors (<code>keepempty=true</code> vs. default). The first query is missing <br><code>keepempty=true</code> and both queries produce the same result, making the distinction <br>between the two behaviors unclear. The first query should include <code>keepempty=true</code> to <br>demonstrate that null values are retained.**

[docs/user/ppl/cmd/dedup.md [86-126]](https://github.com/opensearch-project/sql/pull/5303/files#diff-532601ea92e852acbb8b9bcdd2e269377bc05a311ec141be38148539d20c5bafR86-R126)

```diff
 The following query deduplicates by instrumentation scope name to see which OTel SDKs are reporting. By default, records with null values are dropped:
   
 ```ppl
 source=otellogs
-| dedup instrumentationScope.name
+| dedup instrumentationScope.name keepempty=true
 | fields instrumentationScope.name
 | sort instrumentationScope.name

The query returns the following results:

-fetched rows / total rows = 2/2
+fetched rows / total rows = 3/3
+---------------------------+
| instrumentationScope.name |
|---------------------------|
+| null                      |
| opentelemetry-java        |
| opentelemetry-python      |
+---------------------------+

The following query deduplicates while ignoring documents with empty values in the specified field:

source=otellogs
| dedup instrumentationScope.name
| fields instrumentationScope.name
| sort instrumentationScope.name
<details><summary>Suggestion importance[1-10]: 8</summary>

__

Why: The two PPL queries in Example 3 are identical, but they should demonstrate different behaviors. The first query is missing `keepempty=true`, which would show null values being retained, making the documentation misleading and incorrect.


</details></details></td><td align=center>Medium

</td></tr><tr><td>



<details><summary>Fix incorrect duplicate row in result table</summary>

___


**The result table for Example 3 (override parameter) is missing the <code>DEBUG</code> row and <br>shows <code>FATAL</code> duplicated, which appears inconsistent with the described behavior. The <br>main search produces 5 severity levels (DEBUG, ERROR, FATAL, INFO, WARN), but the <br>result only shows 5 rows with FATAL appearing twice and DEBUG missing. This is <br>likely a documentation error that could confuse readers about how <code>override=true</code> <br>works.**

[docs/user/ppl/cmd/appendcol.md [94-104]](https://github.com/opensearch-project/sql/pull/5303/files#diff-d780d2b5919b1afcebbdbc20cc6a898119e430a7387532d8355a251ad895f203R94-R104)

```diff
 ```text
 fetched rows / total rows = 5/5
 +-----+--------------+
 | agg | severityText |
 |-----+--------------|
+| 3   | DEBUG        |
 | 5   | ERROR        |
-| 2   | FATAL        |
 | 2   | FATAL        |
 | 6   | INFO         |
 | 4   | WARN         |
 +-----+--------------+
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The result table shows `FATAL` duplicated and `DEBUG` missing, which appears inconsistent with the described `override=true` behavior. However, this could be intentional based on how the subsearch aligns rows positionally, so the actual correctness depends on the implementation behavior.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Fix inaccurate total row count in result</summary>

___


**The new data contains two documents matching "Payment failed" (one ERROR and one <br>FATAL), but the query claims <code>fetched rows / total rows = 1/1</code>. The <code>head 1</code> limits <br>output to one row, but the total count should reflect all matching rows (2/2). <br>Verify the expected total count in the result block is accurate.**

[docs/user/ppl/cmd/search.md [178]](https://github.com/opensearch-project/sql/pull/5303/files#diff-dd984cb777515ecc5ea6aa39ef6df5a798c5eb668370ba32279b00f311be3d4cR178-R178)

```diff
 search "Payment failed" source=otellogs
 | sort `resource.attributes.service.name` | fields body | head 1
+```
 
+The query returns the following results:
+
+```text
+fetched rows / total rows = 1/2
++---------------------------------------------------------------------+
+| body                                                                |
+|---------------------------------------------------------------------|
+| Payment failed: connection timeout to payment gateway after 30000ms |
++---------------------------------------------------------------------+
+
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about the fetched rows / total rows = 1/1 count when head 1 is used, but the improved_code changes the format to 1/2 which may or may not be accurate depending on how the engine reports totals with head. The claim about two matching documents needs verification against the actual data.

Low
General
Remove duplicate documentation examples

Examples 1 and 2 are identical in both query and output, which provides no
additional value to the reader. Replace Example 2 with a distinct use case, such as
replacing a severity text value or normalizing a different field, to demonstrate the
command's versatility.

docs/user/ppl/cmd/replace.md [24-78]

-## Example 1: Normalize service names for a compact dashboard
+## Example 2: Normalize severity text labels
 
-The following query shortens long service names for a compact dashboard view:
-...
-## Example 2: Replace service names for display
+The following query replaces verbose severity labels with shorter codes for compact display:
 
-The following query shortens service names for a compact dashboard view:
-...
-| replace 'payment-service' with 'payment' in `resource.attributes.service.name`
-| replace 'inventory-service' with 'inventory' in `resource.attributes.service.name`
+```ppl
+source=otellogs
+| where severityText IN ('ERROR', 'FATAL')
+| replace 'ERROR' with 'ERR' in severityText
+| replace 'FATAL' with 'FTL' in severityText
 | sort severityNumber, `resource.attributes.service.name`
 | fields severityText, `resource.attributes.service.name`
 | head 4
+```
Suggestion importance[1-10]: 7

__

Why: Examples 1 and 2 in replace.md are completely identical in both query and output, which is a clear documentation error that provides no value. This is a valid and important observation that should be fixed to improve documentation quality.

Medium
Restore removed syntax reference sections

The PR removes the syntax reference sections ("Optional choices", "Repetition") that
explain PPL syntax conventions, replacing them entirely with examples. These
reference sections are important for users learning PPL syntax and should be
retained in a syntax reference document. The examples are a good addition but should
supplement, not replace, the syntax documentation.

docs/user/ppl/cmd/syntax.md [52-60]

+### Optional choices
+
+Optional choices between alternatives are shown in square brackets with pipe separators (`[option1 | option2]`). You can choose one of the options or omit them entirely.
+
+**Example**: `[asc | desc]` means you can specify `asc`, `desc`, or neither.
+
+### Repetition
+
+An ellipsis (`...`) indicates that the preceding element can be repeated multiple times.
+
+**Examples**:
+- `<field>...` means one or more fields without commas: `field1 field2 field3`
+- `<field>, ...` means comma-separated repetition: `field1, field2, field3`
+
 ## Example 1: Fetch data from an index with a filter
 
 The following query retrieves all error logs from the `otellogs` index. The filter is specified inline with the `source` command:
 
 ```ppl
 source=otellogs severityText = 'ERROR'
 | fields severityText, `resource.attributes.service.name`
 | sort `resource.attributes.service.name`
<details><summary>Suggestion importance[1-10]: 6</summary>

__

Why: The PR removes important syntax reference sections ("Optional choices" and "Repetition") from `syntax.md`, which are valuable for users learning PPL conventions. Restoring these sections alongside the new examples would make the documentation more complete and useful.


</details></details></td><td align=center>Low

</td></tr><tr><td>



<details><summary>Add missing query results block</summary>

___


**The subsection "Matching with a prefix pattern" shows a query but is missing its <br>expected output block. Every other example in the file includes a results block, and <br>this omission leaves the documentation incomplete and inconsistent.**

[docs/user/ppl/cmd/where.md [104-113]](https://github.com/opensearch-project/sql/pull/5303/files#diff-ea8ca7481f94d3d9401c4057d33afd640010be2df62b62730137101ef083c140R104-R113)

```diff
 ### Matching with a prefix pattern
 
 The following query uses a percent sign (`%`) to find all services in the `cart` domain, useful when investigating issues across a team's microservices:
 
 ```ppl
 source=otellogs
 | where LIKE(`resource.attributes.service.name`, 'cart-%')
 | sort severityNumber
 | fields severityText, `resource.attributes.service.name`, body

+The query returns the following results:
+
+```text
+fetched rows / total rows = 3/3
++--------------+----------------------------------+----------------------------------------------------------------------------------------------+
+| severityText | resource.attributes.service.name | body |
+|--------------+----------------------------------+---------------------------------...

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit f0e3a6a.

PathLineSeverityDescription
docs/user/ppl/cmd/stats.md18lowQuery result output is embedded directly inside section headings and aggregation function bullet points (e.g., '## fetched rows / total rows = 1/1 ... +arameters' and 'VAR_SAMfetched rows...+--...+arameter'). This corrupted content is anomalous and inconsistent with normal editing behavior, though it appears to be an accidental copy-paste error rather than deliberate injection.
docs/user/ppl/cmd/head.md15lowQuery result tables are embedded inside the '## Parameters' section heading and the parameter table header, producing malformed markdown. The same pattern as stats.md — anomalous but likely an accidental copy-paste error during the documentation rewrite.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 2, 2026

Persistent review updated to latest commit d0b2a08

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 3, 2026

Persistent review updated to latest commit f0e3a6a

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 6, 2026

Persistent review updated to latest commit c051fd0

Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2000094

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant