Skip to content

Fix xpath parsing xpath that has string literal containing parentheses and brackets#318

Open
tompng wants to merge 1 commit into
ruby:masterfrom
tompng:xpath_paren_bracket_in_string
Open

Fix xpath parsing xpath that has string literal containing parentheses and brackets#318
tompng wants to merge 1 commit into
ruby:masterfrom
tompng:xpath_paren_bracket_in_string

Conversation

@tompng
Copy link
Copy Markdown
Member

@tompng tompng commented May 24, 2026

String literal in XPath 1.0 doesn't have escapes.
begin end while cond is not a normal modifier while, but a do-while loop. (I didn't know it). REXML uses this syntax in several files.

I don't think preprocess like strip_ignorable_spaces is a good idea. This is just a first aid.
I found that REXML can't parse //a[1] [2]. I think it could be fixed by changing strip_ignoreable_spaces, but leaved as, it's a separate issue, and I think it should be fixed elsewhere.

Copilot AI review requested due to automatic review settings May 24, 2026 17:38
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR improves XPath parsing robustness around whitespace handling, especially when spaces appear inside string literals, and adds regression tests to cover these cases.

Changes:

  • Add tests validating that extra whitespace in XPath expressions is treated as ignorable where appropriate.
  • Update XPathParser#parse to strip ignorable spaces without altering whitespace inside quoted string literals.
  • Update group parsing to ignore parentheses/brackets that appear inside quoted string literals.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
test/xpath/test_base.rb Adds test coverage for whitespace normalization and for bracket/paren characters inside XPath string literals.
lib/rexml/parsers/xpathparser.rb Refactors space-stripping into a helper that respects string literals; improves grouping logic to ignore delimiters inside quotes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +34 to +46
def strip_ignorable_spaces(path)
# Safely do `path.gsub(/([\(\[])\s+/, '\1').gsub( /\s+([\]\)])/, '\1')`
# without modifying spaces inside string literals.
quote = nil
path.gsub(/([\(\[])\s+|\s+([\]\)])|(['"])/) do
if quote
quote = nil if $3 == quote
$&
else
$1 || $2 || (quote = $3)
end
end
end
Comment on lines +695 to +706
if quote
# ignore () [] inside quotes
quote = nil if string[ind] == quote
else
case string[ind]
when st
depth += 1
when en
depth -= 1
when '"', "'"
quote = string[ind]
end
Comment on lines +38 to +44
path.gsub(/([\(\[])\s+|\s+([\]\)])|(['"])/) do
if quote
quote = nil if $3 == quote
$&
else
$1 || $2 || (quote = $3)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants