-
Notifications
You must be signed in to change notification settings - Fork 25
feat: URL resolving support for validate & metaschema #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
050993e to
eca24c0
Compare
|
Interesting heisenbug failure - the URL is fine, but it had a 404 when the test ran? |
|
Interesting feature! Thanks for sending it. I'll book some time today to properly review it 🙏🏻 |
jviotti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea. I think the way Codex/you are implementing it is fighting the current machinery for iterating on schema arguments. I think we need for make for_each_json transparently work with both URLs or file paths. The src/input.h file abstracts a way a lot of the logic, including line number gathering, reading, etc so that the actual commands never have to worry about it.
for_each_json will loop and pass specific positional arguments to handle_json_entry. It feels to me that such handle_json_entry needs to take either a path or a URI (maybe just a string?) and then internally check: if it is a path and it exists and it is a file, do X, if it is a directory, do Y, if it is an HTTP/S URL, do Z.
I think if we do all of that there, we very elegantly gain URL support in every applicable command.
|
On the transient CI failure, yeah, we've been getting that quite a bit recently. I'm on it right now! |
|
If you merge main back in, the transient CI issue should be gone! |
6bf416d to
3d93e71
Compare
Closes: sourcemeta#234 Co-authored-by: Codex <codex@openai.com> Generated-with: OpenAI Codex CLI (partial) Signed-off-by: Robin H. Johnson <rjohnson@coreweave.com>
3d93e71 to
7ced116
Compare
|
couple more CI bumps from the code but ready again; nothing transient this time |
jviotti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. I also wonder if there is a way to split up this PR so it's easy to merge the core logic first and avoid git conflicts, but not sure what that would be 🤔
| -> const std::filesystem::path & { | ||
| if (!this->path.has_value()) { | ||
| std::ostringstream error; | ||
| error << "The `" << command << "` command does not support HTTP inputs"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note we never dynamically construct strings for exceptions. Instead, use one of the existing error classes that would be handled in src/error.h or create a new exception class there
| } | ||
|
|
||
| [[nodiscard]] auto base_uri() const -> std::string { | ||
| if (sourcemeta::jsonschema::is_http_url(this->first)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (sourcemeta::jsonschema::is_http_url(this->first)) { | |
| if (is_http_url(this->first)) { |
You are already in the sourcemeta::jsonschema:: namespace, so you can omit it in every case.
| return this->path.value(); | ||
| } | ||
|
|
||
| [[nodiscard]] auto base_uri() const -> std::string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is called base_uri but it doesn't return a base URI, but the whole file URI given the path?
|
|
||
| struct InputJSON { | ||
| std::filesystem::path first; | ||
| std::string first; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the thing here is that the first might be either a path or a URI, maybe a std::variant expresses this better
| const auto path_without_query_or_fragment = | ||
| sourcemeta::jsonschema::uri_path_without_query_or_fragment( | ||
| entry_identifier); | ||
| if (path_without_query_or_fragment.ends_with(".jsonl")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beyond just the extension, I think we need to look at the content type mime header?
| { "name": "foo", "kind": "example" } | ||
| EOF | ||
|
|
||
| "$1" validate "http://127.0.0.1:$PORT/schema.json" "$TMP/instance.json" --http \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now wondering if a command invocation to an HTTP URL should assume --http. I think it makes sense?
| "$1" validate "http://127.0.0.1:$PORT/schema.json?ignored=1" "$TMP/instance.json" \ | ||
| --http 2>"$TMP/stderr.txt" | ||
|
|
||
| cat << EOF > "$TMP/expected.txt" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run all of these that present no output with --verbose and actually capture BOTH stdout and stderr? i.e. if for whatever reason this command printed non-sense to stdout, we would never know
| SCHEMA_PATH="$(cd "$TMP" && pwd)/schema.json" | ||
| SCHEMA_URI="file://$SCHEMA_PATH" | ||
|
|
||
| "$1" lint "$SCHEMA_URI" --json >"$TMP/output.json" 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a _json suffix to the test file as this is only covering the JSON part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get this test, but I'm wondering now if it is a bit confusing. I think if any input is an HTTP URL, we should assume --http, as that's clearly the user's intent?
| EOF | ||
|
|
||
| diff "$TMP/output.txt" "$TMP/expected.txt" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like various test files have this empty line here?
Closes: #234
Co-authored-by: Codex codex@openai.com
Generated-with: OpenAI Codex CLI (partial)
Signed-off-by: Robin H. Johnson rjohnson@coreweave.com