Skip to content

robertroessler/rmj

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RMj

Build status

The RMj (rmj) "mini" JSON parser project can be seen as YAJP (Yet Another JSON Parser) - OK, it probably will be seen that way, as it is technically guilty as charged.

It is written in idiomatic modern C++ 20, but it has a key advantage: it uses the rva::variant template class from the Recursive Variant Authority to provide a pretty convincing simulation of an actual recursive "sum type" - which, of course, is not provided by the C++ language at this time.

See the recursive-variant project's site for more details - or just look at how rmj uses it in rmj.h.

Note that rmj has been implemented as a C++ 20 "header-only library", so that to be used, you only need to reference / include the "rmj.h" file - which will itself reference variant.hpp - so these two files need to be present in your source folder, or at least on your compile-time include path.

The primary "user" (as well as "developer") documentation for rmj is present in the rmj.h header file, while examples and a test harness are provided in "t0.cpp".

Besides being "pure" C++ 20, the code is believed to be both 32/64 -bit "safe", and to contain no dependencies (overt or lurking) on any particular OS / hardware platform.

Details

As I had need of a simple JSON parser, and didn't require a whole ecosystem to be imported, I ended up creating rmj to just be a simple JSON parser (as of RFC 8259) that implements a [static] js_val::parse method - yielding variant js_val objects - and a js_val::to_string method which serializes a js_val to a std::string.

The parse method

(signature: static js_val js_val::parse(std::string_view))

accepts a string - expected to be valid [UTF-8 encoded] RFC 8259 JSON - and returns a js_val, which is effectively a recursive sum type containing all JSON data types and values mapped to C++ 20 data types and values.

If parse detects invalid JSON syntax it throws an exception (a std::runtime_error) with a message stating the problem and the precise offset of this error in the input string.

The to_string method

(signature: constexpr std::string to_string() const)

will serialize its [recursive] js_val variant data as valid [UTF-8 encoded] RFC 8259 JSON. However, with the "v1" release of RMj, there is now an "optional" (defaulted) parameter to to_string: just using to_string() with no parameter will default to "escaping" any UTF-8 sequence that would result in "non-printable" characters (i.e., non-ASCII)... passing a true parameter will revert to "pass-through" mode, in which any valid UTF-8 sequence will appear in the serialized output (excepting the required "always-escaped" control codes and other RFC 8259-defined special characters).

But wait, there's more added for "v1": to make it easier to write code to display js_val objects, the following operator<< plus std::format "formatter" are also supplied (both based on the new, more nuanced to_string):

(signature: inline std::ostream& operator<<(std::ostream&, const js_val&))

To "stringify" stream output using operator<<:

[output-stream] << rmj::parse("我能吞下玻璃而不伤身体") << std::endl;

... to invoke "stringify" in "pass_thru", use the i/o stream manipulator helper std::setw(rmj::pass_thru):

[output-stream] << std::setw(rmj::pass_thru) << rmj::parse("我能吞下玻璃而不伤身体") << std::endl;

To "stringify" a type-checked js_val using std::format, simply include the js_val in the format call's parameter list:

[output-stream] << std::format("{}\n", "我能吞下玻璃而不伤身体");

...to invoke "stringify" in "pass_thru", include the '_' format specifier:

[output-stream] << std::format("{:_}\n", "我能吞下玻璃而不伤身体");

While all of these use to_string to serialize their js_val to an output std::ostream, the former in each pair uses the default ("all non-ASCII is escaped") mode, resulting in guaranteed printable output, while the latter uses the ("pass-through") mode, resulting in pure UTF-8 - which may or may not be printable.

Finally, for completeness, a C++ 20 "spaceship" operator<=> as well as a "deep compare" operator== are supplied

(signature: constexpr auto operator<=>(const js_val&) const)

which is able to compare two js_val objects and return the results as a "three-way" comparison, with -1, 0, 1 representing the left-hand js_val object being less than, equal to, or greater than the right-hand js_val object.

(signature: constexpr bool operator==(const js_val&) const)

which performs a "recursive deep compare" on two js_val objects, returning a simple bool.

More Details

C++ [20] Language Issues

Note that with only the supplied operator<=> and operator== functionality, the C++ 20 compiler is able to synthesize all of the "secondary" comparison operators: <, <=, !=, >=, >... so we don't need to write them.

Dependencies

If for some reason use of either (or both) of the standard library's stream output or type-checked text-formatting capabilities are not wanted, references to either (or both) are easily removed (along with inclusion of their associated header files) by using the following old-school defined values:

NO_STREAM definition of this value results in the operator<< def for js_val not being included

NO_FORMAT definition of this value results in the std::format formatter for js_val not being included

Note that the definitions of these two symbols are present near the top of rmj.h, but in "commented" form. Also, as is mentioned in the source, removing stream support will be problematic for the included test/demo file t0.cpp.

Performance

While no "exotic" attempts were made to break any JSON parsing speed records, at the same time, some efforts were made to not do a terrible job... the results are:

On a 12-th Gen Intel Core i7 12700K, the 1.4 KB file pass1.json from the json.org test suite is parsed in ~50 microseconds, while to_string serialization of that parsed js_val requires ~15 microseconds.

JSON -> C++ 20 Type and Value Mapping

Finally, the mapping of JSON data types and values to C++ 20 data types and values shouldn't really contain any surprises, with reasonably "direct" mappings suggesting themselves in all cases - as shown in the following table, which details the correspondence between JSON data types and values and the C++ js_val recursive sum type:

JSON Data Type or Value js_val (C++ 20 "std::variant" with the listed elements)
null nullptr (nullptr_t)
true / false true / false (bool)
json-numeric-value (double)
json-string-value (std::string)
json-object (std::map<std::string, js_val>)
json-array (std::vector<js_val>)

Implementation Notes

As mentioned above, a "header-only library" is supplied, consisting of the local file "rmj.h" in conjunction with the imported file "variant.hpp".

At least as of the end of 2024, there were still some non-conforming "C++20" compilers which don't actually fully implement the C++17 library function std::from_chars... while there are some fairly heavy-weight workarounds that re-implement the float support from std::from_chars, it was decided to not commit any of these to this repo - if you are really stuck with an older "almost C++20" compiler, feel free to open an issue on the subject.

Also, again depending on the exact version and what is supported in your "C++20" compiler, there is a slight chance of an issue with the declaration of the "format" method in the formatter for js_val params... a workaround is detailed in llvm/llvm-project#66466), which basically replaces the std::format_context param with a new template param, e.g., FormatContext.

While the rmj.h header does implement the basic parsing and serializing described above - plus some helper functions to make C++ use a little friendlier, the real magic that enables the above is suppled by the recursive-variant project, contributing the variant.hpp header which allows at least the illusion of a recursive sum type in C++ 20!

The extensively commented code in rmj.h and variant.hpp show both how the js_val type is implemented, as well as its use in its own definition, with the file "t0.cpp" present both for testing and illustrating standard use cases.

Note that the variant.hpp file from the recursive-variant project is included here in the rmj project in slightly modified form to simplify "dependency" issues, as well as to include a minor syntactic fix necessitated by a later version of C++, and to include the license from that project.

About

Simple "mini" JSON Parser for creating in-memory C++ data structures from UTF-8 encoded JSON.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages