Skip to content

reject non-xsd lexical forms in lexFloat and lexDouble#44

Closed
aizu-m wants to merge 2 commits into
apache:trunkfrom
aizu-m:lexfloat-lexdouble-xsd-lexical
Closed

reject non-xsd lexical forms in lexFloat and lexDouble#44
aizu-m wants to merge 2 commits into
apache:trunkfrom
aizu-m:lexfloat-lexdouble-xsd-lexical

Conversation

@aizu-m

@aizu-m aizu-m commented Jun 13, 2026

Copy link
Copy Markdown

Found while feeding malformed numerics through the lexical converters.

XsTypeConverter.lexFloat("0x1p4")     -> 16.0     (accepted)
XsTypeConverter.lexFloat("Infinity")  -> Infinity (accepted)
XsTypeConverter.lexFloat("1.0d")      -> 1.0      (accepted)
XsTypeConverter.lexDouble("1.0f")     -> 1.0      (accepted)

Float.parseFloat and Double.parseDouble take hex floats, the Java "Infinity" token and a trailing f/F/d/D suffix. None of those are in the xsd:float/xsd:double lexical space. lexFloat only guarded a trailing f/F and lexDouble only a trailing d/D, so each let the other type's suffix through, and both let hex floats and "Infinity" through.

This is reachable from untrusted XML through JavaFloatHolder/JavaDoubleHolder.validateLexical, so an out-of-space value validates clean instead of being flagged invalid.

Moved the guard into a shared helper that rejects the hex marker, the Infinity letters and the trailing suffix before parsing. INF, -INF, NaN and the ordinary decimal/exponent forms are untouched.

@pjfanning

pjfanning commented Jun 13, 2026

Copy link
Copy Markdown
Member

xmlbeans is over 20 years old and for it do suddenly be strict on numbers like this - I don't think that is a good user experience

  • I thought0x1p4 looked wrong - but apparently it is parseable (power of) - so not XSD compliant but parseable
  • failing because of trailing d or f, I would prefer if this was optional based on the XmlOptions class
  • likewise failing for infinity failures could be controlled by the same XmlOptions option

@pjfanning

Copy link
Copy Markdown
Member

Users can validate their XML matches their XML schema using built-in Java classes

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.io.InputStream;

public static boolean validate(InputStream xmlInputStream, 
                                   InputStream xsdInputStream) 
            throws SAXException, IOException {

        // Create SchemaFactory for W3C XML Schema
        SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        
        // Load the schema
        Schema schema = factory.newSchema(new StreamSource(xsdInputStream));
        
        // Create Validator
        Validator validator = schema.newValidator();
        
        // Optional: set error handler for detailed messages
        // validator.setErrorHandler(new MyErrorHandler());
        
        try {
            validator.validate(new StreamSource(xmlInputStream));
            System.out.println("XML is valid according to the schema.");
            return true;
        } catch (SAXException e) {
            System.err.println("XML validation failed: " + e.getMessage());
            return false;
        }
    }

@aizu-m

aizu-m commented Jun 13, 2026

Copy link
Copy Markdown
Author

Fair point, defaulting to strict was the wrong call for a library this old.

When I traced the path in, the snag is that lexFloat/lexDouble are static and the validation route (JavaFloatHolder/JavaDoubleHolder.validateLexical) only carries a ValidationContext, not XmlOptions. So an XmlOptions opt-in means threading the flag down through validateLexical and the ValidationContext interface, since the options object doesn't reach the lexer today. Worth knowing before I touch that interface.

For context, the trailing f/F (float) and d/D (double) check isn't new, it predates this PR. The parts I added on top were the cross-type suffix, the hex form and the "Infinity" spelling.

So either I wire up the XmlOptions opt-in and thread it through (default stays lenient), or I drop the added strictness and leave the lexers as they were. Which would you rather? Given your point about users running strict validation through JAXP, I'm happy with the latter if you'd sooner keep the lexer lenient.

@pjfanning

Copy link
Copy Markdown
Member

I've got stuff on today so can't spend much time on this today.
This is not urgent so I'd suggest that you don't spend too much time on it until a design is agreed.
Maybe the ValidationContext or an existing param could be modified to include the XmlOptions.

The aim would be that calls like SstDocument.Factory.parse(is, DEFAULT_XML_OPTIONS); would have their options passed through the call stack and that we can add new overloads for static methods that need options and use these overloads instead of the existing ones.

@aizu-m

aizu-m commented Jun 14, 2026

Copy link
Copy Markdown
Author

Sounds good, no rush from my side. The overload route fits: thread XmlOptions through parse() into the ValidationContext, add option-carrying overloads of lexFloat/lexDouble, and have JavaFloatHolder/JavaDoubleHolder call those. The existing static signatures stay lenient, so nothing changes for current callers unless the option is set.

I'll leave it parked until you're happy with the shape rather than touch the ValidationContext interface speculatively. Can do the wiring once the design's settled.

@pjfanning

Copy link
Copy Markdown
Member

Sounds good, no rush from my side. The overload route fits: thread XmlOptions through parse() into the ValidationContext, add option-carrying overloads of lexFloat/lexDouble, and have JavaFloatHolder/JavaDoubleHolder call those. The existing static signatures stay lenient, so nothing changes for current callers unless the option is set.

I'll leave it parked until you're happy with the shape rather than touch the ValidationContext interface speculatively. Can do the wiring once the design's settled.

Sure @aizu-m - I think we agree on the high level design. If you run into issues and want to ask me to review a design decision, just add a comment here. It might save you some time by avoiding having you implement changes that I might then ask you to reimplement.

@aizu-m

aizu-m commented Jun 15, 2026

Copy link
Copy Markdown
Author

Had a dig through the validation path before touching the interface.

Trace in: holder.set_text -> validateLexical(s, _voorVc) -> lexFloat/lexDouble. The _voorVc it passes is the static ValueOutOfRangeValidationContext singleton on XmlObjectBase (XmlObjectBase.java:347), so it's shared across every parse, not a per-document object. Nothing per-parse hangs off it to read an option from.

Then chased where XmlOptions actually lands. The Locale doesn't keep the options object around; it pulls the flags it needs in its constructor (e.g. _validateOnSet = options.isValidateOnSet(), Locale.java:104) and drops the rest. So XmlOptions isn't sitting next to the validation context to begin with.

Two ways to bridge that:

  1. Ride on ValidationContext: swap the static singleton for a per-instance one and add an accessor to the interface (4 implementors). Works, but it touches a public interface for what's really a per-parse boolean.
  2. Mirror _validateOnSet: add the option, have Locale extract it into a field, and have JavaFloatHolder/JavaDoubleHolder read it via get_locale() rather than off the context. No interface change, follows the existing pattern, default stays lenient.

I'd lean to (2). It's lighter, and it's exactly how Locale already carries a per-parse validation flag down to the value layer. Shout if you'd rather it ride on ValidationContext. Only open detail then is the option name, something like setLoadStrictFloatingPoint/isLoadStrictFloatingPoint, or whatever fits your convention. Once you pick, I'll wire it.

@pjfanning

Copy link
Copy Markdown
Member

setLoadStrictFloatingPoint/isLoadStrictFloatingPoint seem ok to me

I'm not against using the locale to get the XmlOptions as long as we avoid ThreadLocals. Cur.CurLoadContext in Locale.java has the options available. Maybe we can leverage that.

@aizu-m

aizu-m commented Jun 15, 2026

Copy link
Copy Markdown
Author

Wired up and pushed.

Went the lighter route we talked about. The Locale already reads the option in its constructor, right next to _validateOnSet, because getLocale(stl, options) builds it from the parse options. The holders read it back through get_locale() in set_text, so the flag just rides the XmlOptions that parse() already threads through. No ThreadLocals, and it turned out I didn't need to reach into CurLoadContext for it after all.

setLoadStrictFloatingPoint/isLoadStrictFloatingPoint, off by default. XsTypeConverter gets lexFloat(cs, strict)/lexDouble(cs, strict) overloads, and the existing no-arg signatures stay lenient so current callers don't move.

One decision worth flagging. With the option off the behaviour is exactly what it was before this PR, including the trailing f/d guard that already predated it, so nothing shifts for existing users. Turning the option on applies the full XSD check (hex floats, the Infinity spelling, any f/F/d/D suffix). If you'd sooner have that old f/d guard lifted too when the option is off, that's a one-liner, just say.

Tests cover both modes in XsTypeConverter plus an end-to-end parse through XmlFloat/XmlDouble with the option set.

@pjfanning pjfanning left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove all the new build files like the .gradle ones

// the xsd:float/xsd:double space (hex floats, the java "Infinity" token and
// the f/F/d/D suffix). Driven by XmlOptions.setLoadStrictFloatingPoint.
default boolean isLoadStrictFloatingPoint ( ) { return false; }
} No newline at end of file

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a blank line at the end of the file

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, file ends with a trailing newline now.

return lexFloat(cs, false);
}

public static float lexFloat(CharSequence cs, boolean strict)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add @since 5.4.0 to this method and javadoc that describes what strict means - possibly rename the field to enforceXmlSchemaTypeDefinition or something like that.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the javadoc and @SInCE 5.4.0. On the rename: I kept the param as strict so it lines up with the setLoadStrictFloatingPoint option name we settled on, and spelt out in the javadoc exactly what it enforces (hex floats, the Infinity token, the f/F/d/D suffix). Happy to switch it to enforceXmlSchemaTypeDefinition if you'd still rather, just say.

return lexDouble(cs, false);
}

public static double lexDouble(CharSequence cs, boolean strict)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc like what I requested for lexFloat

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, same javadoc and @SInCE on the double overload.

* outside the xsd:float/xsd:double lexical space: hexadecimal floats
* ({@code 0x1p4}), the java {@code Infinity} token, and a trailing type
* suffix ({@code f}/{@code F}/{@code d}/{@code D}). It is off by default, so
* the long-standing lenient behaviour is unchanged unless this is set.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javadoc on all new public methods and @since 5.4.0

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Both setters and the getter have javadoc and @SInCE 5.4.0 now.

return set(XmlOptionsKeys.LOAD_STRICT_FLOATING_POINT, b);
}

public boolean isLoadStrictFloatingPoint() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a test for XmlOptions class and if there isn't, there should be one.
These methods should be tested to check the default is false and that the set method affects the get.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a XmlOptionsTest already, so I added testLoadStrictFloatingPointFlag next to the unsynchronized one. It checks the default is false and that set flips the get. Green.

Default parsing stays lenient (unchanged), and the strict XSD lexical
check is only applied when setLoadStrictFloatingPoint is set. The flag is
read off the parse Locale in JavaFloatHolder/JavaDoubleHolder.set_text, so
it follows the XmlOptions already threaded through parse(), no ThreadLocals.

lexFloat/lexDouble gain a strict overload; the existing signatures keep
their long-standing lenient behaviour. INF/-INF/NaN and decimal/exponent
forms are accepted in both modes.
@aizu-m aizu-m force-pushed the lexfloat-lexdouble-xsd-lexical branch from 6dec743 to b8fc5b7 Compare June 15, 2026 19:16
@aizu-m

aizu-m commented Jun 15, 2026

Copy link
Copy Markdown
Author

Pushed the review fixes.

Dropped the .gradle build files that slipped into the last commit, and added .gradle/ to .gitignore so they can't sneak back in. The diff is back to just the source and tests.

The rest: javadoc + @SInCE 5.4.0 on the new XmlOptions methods and the lexFloat/lexDouble strict overloads, trailing newline on XmlLocale.java, and a test for the new flag in XmlOptionsTest. ./gradlew test --tests XsTypeConverterTest --tests XmlOptionsTest is green, 20 tests.

@pjfanning

Copy link
Copy Markdown
Member

thanks - merged with 21dbc28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants