Skip to content

[Improvement] LLM indexability#372

Open
g-despot wants to merge 2 commits intomainfrom
llm-indexability
Open

[Improvement] LLM indexability#372
g-despot wants to merge 2 commits intomainfrom
llm-indexability

Conversation

@g-despot
Copy link
Contributor

@g-despot g-despot commented Mar 6, 2026

What's being changed:

Add a test suite that validates AI agents and web crawlers can access all documentation content on the live site.

HTML structure tests (pytest -m indexability, no API keys needed):

  • All 10 representative pages return HTTP 200 with proper meta tags and heading hierarchy
  • Tabbed code blocks render ALL tab panels in HTML (not just the active tab)
  • Code blocks, collapsible <details> sections, and images are present and non-empty
  • llms.txt and sitemap.xml are accessible with substantial content

Agent tests (pytest -m indexability_agents, requires ANTHROPIC_API_KEY and OPENAI_API_KEY):

  • Claude (web_fetch) fetches the quickstart page and extracts the exact vectorizer config line for all 5 languages (Python, TypeScript, Go, Java, C#)
  • Claude fetches the collections config-ref page and reads text2vec-contextionary from inside a collapsible section
  • Claude fetches and reads llms.txt
  • ChatGPT (web_search_preview) finds the quickstart URL and identifies multi-language code tabs

Files:

File Action
tests/test_docs_indexability.py New — 47 HTML tests + 6 agent tests
tests/README-INDEXABILITY.md New — documentation for the test suite
.github/workflows/indexability_tests.yml New — weekly CI (Sunday 22:00 UTC)
pytest.ini Add indexability and indexability_agents markers
pyproject.toml Add beautifulsoup4, anthropic, openai deps
tools/chatgpt_fetch_quickstart.py New — debug script to inspect ChatGPT web_search output

Type of change:

  • Feature or enhancements (non-breaking change to add functionality)

How has this been tested?

  • uv run pytest -m indexability -v — 47 HTML structure tests pass
  • uv run pytest -m indexability_agents -v — Claude tests pass (all 5 vectorizer lines found); ChatGPT finds quickstart URL and languages but can only see the active tab's code (known web_search_preview limitation)

@g-despot g-despot requested a review from Copilot March 6, 2026 10:48
Copy link

@orca-security-eu orca-security-eu bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Orca Security Scan Summary

Status Check Issues by priority
Passed Passed Infrastructure as Code high 0   medium 0   low 0   info 0 View in Orca
Passed Passed SAST high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Secrets high 0   medium 0   low 0   info 0 View in Orca
Passed Passed Vulnerabilities high 0   medium 0   low 0   info 0 View in Orca

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new indexability test suite that validates AI agents (Claude, ChatGPT) and web crawlers can access all documentation content on the live Weaviate docs site. The tests are split into HTML structure tests (no API keys needed) and agent-based tests (requiring Anthropic/OpenAI API keys), with a corresponding CI workflow that runs weekly.

Changes:

  • New test suite (test_docs_indexability.py) with ~48 HTML structure tests and 6 agent tests covering page status codes, meta tags, heading hierarchy, tabbed content, code blocks, collapsibles, images, llms.txt, and sitemap.xml
  • New CI workflow (.github/workflows/indexability_tests.yml) running weekly on Sunday, with corresponding pytest markers and dependencies
  • Supporting files including README documentation and a debug utility script

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/test_docs_indexability.py New test suite with HTML structure and AI agent indexability tests
tests/README-INDEXABILITY.md Documentation for the indexability test suite
.github/workflows/indexability_tests.yml Weekly CI workflow for running indexability tests
pytest.ini Adds indexability and indexability_agents markers
pyproject.toml Adds anthropic, beautifulsoup4, openai dependencies
uv.lock Lock file updates for new dependencies
tools/chatgpt_fetch_quickstart.py Debug script for inspecting ChatGPT web_search output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- name: Run agent tests
id: agent-tests
continue-on-error: true
if: env.ANTHROPIC_API_KEY != '' && env.OPENAI_API_KEY != ''
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if condition references env.ANTHROPIC_API_KEY and env.OPENAI_API_KEY, but these environment variables are defined in this step's own env: block (lines 47-49). In GitHub Actions, the step-level env: is not available during evaluation of the same step's if: condition — the if: is evaluated before the step's environment is set up. Since these secrets are not defined in the workflow-level env: block either, both values will always be empty, and the agent tests will never run.

Use secrets directly in the condition instead: if: ${{ secrets.ANTHROPIC_API_KEY != '' && secrets.OPENAI_API_KEY != '' }}

Suggested change
if: env.ANTHROPIC_API_KEY != '' && env.OPENAI_API_KEY != ''
if: ${{ secrets.ANTHROPIC_API_KEY != '' && secrets.OPENAI_API_KEY != '' }}

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +50
- **Fetch quickstart code tabs** — fetches `/weaviate/quickstart` and must extract actual code lines for all 5 languages (Python, TypeScript, Go, Java, C#). Asserts on language-specific code tokens like `near_text` (Python), `nearText` (TS), `NearTextArgBuilder` (Go), etc.
- Fetch a page with collapsible sections and read the content
- Fetch `/llms.txt` and identify it as Weaviate documentation

### ChatGPT agent tests (Part 3)

Uses GPT-4.1 Mini with the `web_search_preview` tool to verify ChatGPT can:

- **Search quickstart code tabs** — searches for `/weaviate/quickstart` and must extract actual code lines for all 5 languages. Uses the same language-specific code token assertions as the Claude test.
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README documentation has multiple inaccuracies compared to the actual test code:

  1. It states the tests assert on "language-specific code tokens like near_text (Python), nearText (TS), NearTextArgBuilder (Go), etc." but the actual code asserts on vectorizer configuration lines like Configure.Vectors.text2vec_weaviate(), vectors.text2VecWeaviate(), etc. (see QUICKSTART_VECTORIZER_LINES in test_docs_indexability.py lines 297-303).

  2. Both lines 42 and 50 say the tests require "all 5 languages", but the ChatGPT test actually only requires 3 out of 5 (line 501 of test_docs_indexability.py).

These appear to be leftover from an earlier version of the tests and should be updated to match the current implementation.

Copilot uses AI. Check for mistakes.
if: always()
uses: ./.github/actions/handle-test-results
with:
test-outcome: ${{ steps.html-tests.outcome }}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test-outcome only considers the HTML tests outcome (steps.html-tests.outcome), ignoring the agent tests outcome entirely. If the HTML tests pass but agent tests fail, the workflow will report success and no failure notification will be sent. Consider combining both outcomes, e.g., reporting failure if either step failed: test-outcome: ${{ steps.html-tests.outcome == 'success' && (steps.agent-tests.outcome == 'success' || steps.agent-tests.outcome == 'skipped') && 'success' || 'failure' }}

Suggested change
test-outcome: ${{ steps.html-tests.outcome }}
test-outcome: ${{ steps.html-tests.outcome == 'success' && (steps.agent-tests.outcome == 'success' || steps.agent-tests.outcome == 'skipped') && 'success' || 'failure' }}

Copilot uses AI. Check for mistakes.

import os
import time
from functools import lru_cache
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lru_cache is imported but never used. The caching is implemented manually via the _page_cache dict. This unused import should be removed.

Suggested change
from functools import lru_cache

Copilot uses AI. Check for mistakes.
("/weaviate/search/similarity", {"tabs", "code"}),
("/weaviate/search/hybrid", {"tabs", "code"}),
("/weaviate/connections/connect-cloud", {"tabs", "code"}),
("/weaviate/config-refs/collections", {"details", "table"}),
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "table" feature tag is declared in TEST_PAGES and documented in the README as an available feature tag, but no test uses it (there's no "table" in f filtering). This is a dead feature tag with no effect. Either add a corresponding test (e.g., test_tables_present that verifies <table> elements exist on pages with this feature), or remove the tag from this entry and the README to avoid confusion.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants