Skip to content

Conversation

@trueqbit
Copy link

Originally i set out to correct a warning about snprintf() requiring an unsigned int when passed a char32_t.

So this PR fixes this warning and more:

  • It is better, or even the only correct approach, to use the documented underlying types of C++ Unicode character types for byte-based, arithmetic, or bitwise operations. These are defined to have at least the required bit size, as opposed to a fixed bit size. Also there are platforms where integral types of fixed bit size are unavailable. Specifically: char -> unsigned char, char8_t -> unsigned char, char16_t -> std::uint_least16_t, char32_t -> std::uint_least32_t.
  • Fixed a crash in json::basic_reader<>::match() due to accessing the character array past-the-end.
  • Fixed non-compiling json::basic_reader<>::match_any().
  • Replaced character-pointer logic in json::basic_reader<> with string_view functionality or iterator logic.

Additionally:

  • The CMake project was built with Linux in mind. I updated it so it works on Windows with Visual C++.
  • In addition, I added the compiler option to treat the warning about incorrect order of initialization of members as an error, which I found very useful in other codebases.

C++ unicode character types are defined as being equivalent to underlying types of at least N bits, so it's correct to use those, and improves platform-independency.
* replaced character-pointer logic with string_view functionality.
* corrected non-compiling `reader::match_any()`.
* simplified consumption loop.
@eteran
Copy link
Owner

eteran commented Nov 10, 2025

Fantastic! I'll take a look at this in a little bit and try to get it merged quickly.

return consume_while([chars](Ch ch) {
return chars.find(ch) != std::basic_string_view<Ch>::npos;
return consume_while([&chars](Ch ch) {
return chars.find(ch) != chars.npos;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of accessing static members via an instance. I recognize that it's shorter, but I feel it muddies the waters a bit.

size_t consume(std::basic_string_view<Ch> chars) noexcept {
return consume_while([chars](Ch ch) {
return chars.find(ch) != std::basic_string_view<Ch>::npos;
return consume_while([&chars](Ch ch) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think capturing a string_view by ref isn't necessary, as it is a simple pointer/length pair designed to be passed around by value.

void surrogate_pair_to_utf8(std::uint_least16_t w1, std::uint_least16_t w2, Out &out) {

std::uint32_t cp;
std::uint_least32_t cp = '\0';
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we use char32_t instead of uint_least32_t. According to https://en.cppreference.com/w/c/string/multibyte/char32_t.html it's an equivalent typedef anyway but is a bit more clear on intent.


uint16_t w1 = 0;
uint16_t w2 = 0;
std::uint_least16_t w1 = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're changing this, I think I'd prefer char16_t since it defined as uint_least16_t anyway but is more clear.

*/
std::optional<std::basic_string<Ch>> match(const std::basic_regex<Ch> &regex) {
std::match_results<const Ch *> matches;
std::match_results<typename std::basic_string_view<Ch>::const_iterator> matches;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's funny, I've used this reader in several projects and found/fixed this bug elsewhere... just didn't fix it in this one! good catch 👍🏻

std::basic_string<Ch> m(&input_[start], &input_[index_]);
if (!m.empty()) {
return m;
else {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an else after a return

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants