feat(aws_s3 sink): add Parquet encoder with schema_file and auto infer schema support#25156
Conversation
Keep JSON-based build_record_batch/find_null_non_nullable_fields for Parquet compatibility. Drop unused serde_arrow dep. Regenerate Cargo.lock. Made-with: Cursor
|
All contributors have signed the CLA ✍️ ✅ |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1eaa921f96
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 03a4544905
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
Hey @petere-datadog, while I am taking a look at this PR please see this #25156 (comment) and the codex review comments. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ca201c5078
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6dcc854103
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 08b9b547f4
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0f8dde57d0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
recheck |
|
I have read the CLA Document and I hereby sign the CLA |
pront
left a comment
There was a problem hiding this comment.
Some things that stood out. I will take another look.
|
Hey, we are currently deploying vector on your UAT instances to test few workflows. One of our use-case consists of having logs in the parquet format with gzip compression. I did check some documentation but couldn't find any good resources on it. Could you let me know when this PR is going to be merged or if there's any resource that I can read on how to set-up. |
tessneau
left a comment
There was a problem hiding this comment.
nice ! overall seems good to me, just some non-blockers, thanks for all the tests
It should be merged later today and yeah this will support encoding batched events with parquet and gzip compression |
…ik/aws-s3-parquet-encoding
pront
left a comment
There was a problem hiding this comment.
This PR is now in a reasonable state, thanks!
Due it's large size and number of changes, there are some rough edges and pre-existing issues that can be addressed in follow-up PRs.
Created: #25191 (not introduced by this PR but directly related)
Co-authored-by: May Lee <may.lee@datadoghq.com>
…rdotdev/vector into peter.ehik/aws-s3-parquet-encoding
Summary
This PR was initially started by @szibis: #24706
Vector configuration
auto-infer.yaml
schema-file.yaml
apache-common.schema
How did you test this PR?
Change Type
Is this a breaking change?
All the new stuff is added under features: codec-parquet.
Does this PR include user facing changes?
no-changeloglabel to this PR.Notes
@vectordotdev/vectorto reach out to us regarding this PR.pre-pushhook, please see this template.make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make testgit merge origin masterandgit push.Cargo.lock), pleaserun
make build-licensesto regenerate the license inventory and commit the changes (if any). More details on the dd-rust-license-tool.