reject non-xsd lexical forms in lexFloat and lexDouble#44
Conversation
|
xmlbeans is over 20 years old and for it do suddenly be strict on numbers like this - I don't think that is a good user experience
|
|
Users can validate their XML matches their XML schema using built-in Java classes |
|
Fair point, defaulting to strict was the wrong call for a library this old. When I traced the path in, the snag is that lexFloat/lexDouble are static and the validation route (JavaFloatHolder/JavaDoubleHolder.validateLexical) only carries a ValidationContext, not XmlOptions. So an XmlOptions opt-in means threading the flag down through validateLexical and the ValidationContext interface, since the options object doesn't reach the lexer today. Worth knowing before I touch that interface. For context, the trailing f/F (float) and d/D (double) check isn't new, it predates this PR. The parts I added on top were the cross-type suffix, the hex form and the "Infinity" spelling. So either I wire up the XmlOptions opt-in and thread it through (default stays lenient), or I drop the added strictness and leave the lexers as they were. Which would you rather? Given your point about users running strict validation through JAXP, I'm happy with the latter if you'd sooner keep the lexer lenient. |
|
I've got stuff on today so can't spend much time on this today. The aim would be that calls like |
|
Sounds good, no rush from my side. The overload route fits: thread XmlOptions through parse() into the ValidationContext, add option-carrying overloads of lexFloat/lexDouble, and have JavaFloatHolder/JavaDoubleHolder call those. The existing static signatures stay lenient, so nothing changes for current callers unless the option is set. I'll leave it parked until you're happy with the shape rather than touch the ValidationContext interface speculatively. Can do the wiring once the design's settled. |
Sure @aizu-m - I think we agree on the high level design. If you run into issues and want to ask me to review a design decision, just add a comment here. It might save you some time by avoiding having you implement changes that I might then ask you to reimplement. |
|
Had a dig through the validation path before touching the interface. Trace in: holder.set_text -> validateLexical(s, _voorVc) -> lexFloat/lexDouble. The Then chased where XmlOptions actually lands. The Locale doesn't keep the options object around; it pulls the flags it needs in its constructor (e.g. Two ways to bridge that:
I'd lean to (2). It's lighter, and it's exactly how Locale already carries a per-parse validation flag down to the value layer. Shout if you'd rather it ride on ValidationContext. Only open detail then is the option name, something like |
|
setLoadStrictFloatingPoint/isLoadStrictFloatingPoint seem ok to me I'm not against using the locale to get the XmlOptions as long as we avoid ThreadLocals. Cur.CurLoadContext in Locale.java has the options available. Maybe we can leverage that. |
|
Wired up and pushed. Went the lighter route we talked about. The Locale already reads the option in its constructor, right next to _validateOnSet, because getLocale(stl, options) builds it from the parse options. The holders read it back through get_locale() in set_text, so the flag just rides the XmlOptions that parse() already threads through. No ThreadLocals, and it turned out I didn't need to reach into CurLoadContext for it after all. setLoadStrictFloatingPoint/isLoadStrictFloatingPoint, off by default. XsTypeConverter gets lexFloat(cs, strict)/lexDouble(cs, strict) overloads, and the existing no-arg signatures stay lenient so current callers don't move. One decision worth flagging. With the option off the behaviour is exactly what it was before this PR, including the trailing f/d guard that already predated it, so nothing shifts for existing users. Turning the option on applies the full XSD check (hex floats, the Infinity spelling, any f/F/d/D suffix). If you'd sooner have that old f/d guard lifted too when the option is off, that's a one-liner, just say. Tests cover both modes in XsTypeConverter plus an end-to-end parse through XmlFloat/XmlDouble with the option set. |
pjfanning
left a comment
There was a problem hiding this comment.
please remove all the new build files like the .gradle ones
| // the xsd:float/xsd:double space (hex floats, the java "Infinity" token and | ||
| // the f/F/d/D suffix). Driven by XmlOptions.setLoadStrictFloatingPoint. | ||
| default boolean isLoadStrictFloatingPoint ( ) { return false; } | ||
| } No newline at end of file |
There was a problem hiding this comment.
nit: add a blank line at the end of the file
There was a problem hiding this comment.
Done, file ends with a trailing newline now.
| return lexFloat(cs, false); | ||
| } | ||
|
|
||
| public static float lexFloat(CharSequence cs, boolean strict) |
There was a problem hiding this comment.
Add @since 5.4.0 to this method and javadoc that describes what strict means - possibly rename the field to enforceXmlSchemaTypeDefinition or something like that.
There was a problem hiding this comment.
Added the javadoc and @SInCE 5.4.0. On the rename: I kept the param as strict so it lines up with the setLoadStrictFloatingPoint option name we settled on, and spelt out in the javadoc exactly what it enforces (hex floats, the Infinity token, the f/F/d/D suffix). Happy to switch it to enforceXmlSchemaTypeDefinition if you'd still rather, just say.
| return lexDouble(cs, false); | ||
| } | ||
|
|
||
| public static double lexDouble(CharSequence cs, boolean strict) |
There was a problem hiding this comment.
javadoc like what I requested for lexFloat
There was a problem hiding this comment.
Done, same javadoc and @SInCE on the double overload.
| * outside the xsd:float/xsd:double lexical space: hexadecimal floats | ||
| * ({@code 0x1p4}), the java {@code Infinity} token, and a trailing type | ||
| * suffix ({@code f}/{@code F}/{@code d}/{@code D}). It is off by default, so | ||
| * the long-standing lenient behaviour is unchanged unless this is set. |
There was a problem hiding this comment.
javadoc on all new public methods and @since 5.4.0
There was a problem hiding this comment.
Done. Both setters and the getter have javadoc and @SInCE 5.4.0 now.
| return set(XmlOptionsKeys.LOAD_STRICT_FLOATING_POINT, b); | ||
| } | ||
|
|
||
| public boolean isLoadStrictFloatingPoint() { |
There was a problem hiding this comment.
I think there is a test for XmlOptions class and if there isn't, there should be one.
These methods should be tested to check the default is false and that the set method affects the get.
There was a problem hiding this comment.
There's a XmlOptionsTest already, so I added testLoadStrictFloatingPointFlag next to the unsynchronized one. It checks the default is false and that set flips the get. Green.
Default parsing stays lenient (unchanged), and the strict XSD lexical check is only applied when setLoadStrictFloatingPoint is set. The flag is read off the parse Locale in JavaFloatHolder/JavaDoubleHolder.set_text, so it follows the XmlOptions already threaded through parse(), no ThreadLocals. lexFloat/lexDouble gain a strict overload; the existing signatures keep their long-standing lenient behaviour. INF/-INF/NaN and decimal/exponent forms are accepted in both modes.
6dec743 to
b8fc5b7
Compare
|
Pushed the review fixes. Dropped the .gradle build files that slipped into the last commit, and added .gradle/ to .gitignore so they can't sneak back in. The diff is back to just the source and tests. The rest: javadoc + @SInCE 5.4.0 on the new XmlOptions methods and the lexFloat/lexDouble strict overloads, trailing newline on XmlLocale.java, and a test for the new flag in XmlOptionsTest. ./gradlew test --tests XsTypeConverterTest --tests XmlOptionsTest is green, 20 tests. |
|
thanks - merged with 21dbc28 |
Found while feeding malformed numerics through the lexical converters.
Float.parseFloat and Double.parseDouble take hex floats, the Java "Infinity" token and a trailing f/F/d/D suffix. None of those are in the xsd:float/xsd:double lexical space. lexFloat only guarded a trailing f/F and lexDouble only a trailing d/D, so each let the other type's suffix through, and both let hex floats and "Infinity" through.
This is reachable from untrusted XML through JavaFloatHolder/JavaDoubleHolder.validateLexical, so an out-of-space value validates clean instead of being flagged invalid.
Moved the guard into a shared helper that rejects the hex marker, the Infinity letters and the trailing suffix before parsing. INF, -INF, NaN and the ordinary decimal/exponent forms are untouched.