-
Notifications
You must be signed in to change notification settings - Fork 8
Underlying unicode char types, updates to JSON reader #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Underlying unicode char types, updates to JSON reader #18
Conversation
C++ unicode character types are defined as being equivalent to underlying types of at least N bits, so it's correct to use those, and improves platform-independency.
* replaced character-pointer logic with string_view functionality. * corrected non-compiling `reader::match_any()`. * simplified consumption loop.
|
Fantastic! I'll take a look at this in a little bit and try to get it merged quickly. |
| return consume_while([chars](Ch ch) { | ||
| return chars.find(ch) != std::basic_string_view<Ch>::npos; | ||
| return consume_while([&chars](Ch ch) { | ||
| return chars.find(ch) != chars.npos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of accessing static members via an instance. I recognize that it's shorter, but I feel it muddies the waters a bit.
| size_t consume(std::basic_string_view<Ch> chars) noexcept { | ||
| return consume_while([chars](Ch ch) { | ||
| return chars.find(ch) != std::basic_string_view<Ch>::npos; | ||
| return consume_while([&chars](Ch ch) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think capturing a string_view by ref isn't necessary, as it is a simple pointer/length pair designed to be passed around by value.
| void surrogate_pair_to_utf8(std::uint_least16_t w1, std::uint_least16_t w2, Out &out) { | ||
|
|
||
| std::uint32_t cp; | ||
| std::uint_least32_t cp = '\0'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we use char32_t instead of uint_least32_t. According to https://en.cppreference.com/w/c/string/multibyte/char32_t.html it's an equivalent typedef anyway but is a bit more clear on intent.
|
|
||
| uint16_t w1 = 0; | ||
| uint16_t w2 = 0; | ||
| std::uint_least16_t w1 = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're changing this, I think I'd prefer char16_t since it defined as uint_least16_t anyway but is more clear.
| */ | ||
| std::optional<std::basic_string<Ch>> match(const std::basic_regex<Ch> ®ex) { | ||
| std::match_results<const Ch *> matches; | ||
| std::match_results<typename std::basic_string_view<Ch>::const_iterator> matches; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's funny, I've used this reader in several projects and found/fixed this bug elsewhere... just didn't fix it in this one! good catch 👍🏻
| std::basic_string<Ch> m(&input_[start], &input_[index_]); | ||
| if (!m.empty()) { | ||
| return m; | ||
| else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for an else after a return
Originally i set out to correct a warning about
snprintf()requiring anunsigned intwhen passed achar32_t.So this PR fixes this warning and more:
char->unsigned char,char8_t->unsigned char,char16_t->std::uint_least16_t,char32_t->std::uint_least32_t.json::basic_reader<>::match()due to accessing the character array past-the-end.json::basic_reader<>::match_any().json::basic_reader<>with string_view functionality or iterator logic.Additionally: