Skip to content

Commit 17989cc

Browse files
cosmo0920esmerel
andauthored
out_s3: Add a description for pure C parquet (#2111)
* out_s3: Add an instruction for enabling parquet compression - Apply suggestion from @esmerel Co-authored-by: Lynette Miles <6818907+esmerel@users.noreply.github.com> Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io> * out_s3: Add classic format configuration for Parquet Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io> * out_s3: Align headers Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io> * out_s3: Remove a needless newline Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io> --------- Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io> Co-authored-by: Lynette Miles <6818907+esmerel@users.noreply.github.com>
1 parent 9ca03a4 commit 17989cc

File tree

1 file changed

+85
-2
lines changed

1 file changed

+85
-2
lines changed

pipeline/outputs/s3.md

Lines changed: 85 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ description: Send logs, data, and metrics to Amazon S3
66

77
![AWS logo](../../.gitbook/assets/image%20(9).png)
88

9-
The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) cloud object store.
9+
The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) cloud object store.
1010

1111
The plugin can upload data to S3 using the [multipart upload API](https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html) or [`PutObject`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html). Multipart is the default and is recommended. Fluent Bit will stream data in a series of _parts_. This limits the amount of data buffered on disk at any point in time. By default, every time 5&nbsp;MiB of data have been received, a new part will be uploaded. The plugin can create files up to gigabytes in size from many small chunks or parts using the multipart API. All aspects of the upload process are configurable.
1212

@@ -42,7 +42,7 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor
4242
| `blob_database_file` | Absolute path to a database file to be used to store blob files contexts. | _none_ |
4343
| `bucket` | S3 Bucket name | _none_ |
4444
| `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
45-
| `compression` | Compression type for S3 objects. `gzip`, `arrow`, `parquet` and `zstd` are the supported values, `arrow` and `parquet` are only available if Apache Arrow was enabled at compile time. Defaults to no compression. | _none_ |
45+
| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available **only when Fluent Bit is built with `-DFLB_ARROW=On`** and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | _none_ |
4646
| `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
4747
| `endpoint` | Custom endpoint for the S3 API. Endpoints can contain scheme and port. | _none_ |
4848
| `external_id` | Specify an external ID for the STS API. Can be used with the `role_arn` parameter if your role requires an external ID. | _none_ |
@@ -645,6 +645,7 @@ After being compiled, Fluent Bit can upload incoming data to S3 in Apache Arrow
645645
For example:
646646

647647
{% tabs %}
648+
648649
{% tab title="fluent-bit.yaml" %}
649650

650651
```yaml
@@ -701,3 +702,85 @@ The following example uses `pyarrow` to analyze the uploaded data:
701702
3 2021-04-27T09:33:56.539430Z 0.0 0.0 0.0 0.0 0.0 0.0
702703
4 2021-04-27T09:33:57.539803Z 0.0 0.0 0.0 0.0 0.0 0.0
703704
```
705+
706+
## Enable Parquet support
707+
708+
### Build requirements for Parquet
709+
710+
To enable Parquet, build Fluent Bit with Apache Arrow support and install Arrow GLib/Parquet GLib:
711+
712+
```bash
713+
# Ubuntu/Debian example
714+
sudo apt-get update
715+
sudo apt-get install -y -V ca-certificates lsb-release wget
716+
wget https://packages.apache.org/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
717+
sudo apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
718+
sudo apt-get update
719+
sudo apt-get install -y -V libarrow-glib-dev libparquet-glib-dev
720+
721+
# Build Fluent Bit with Arrow:
722+
cd build/
723+
cmake -DFLB_ARROW=On ..
724+
cmake --build .
725+
```
726+
727+
For other Linux distributions, refer [the document for installation instructions of Apache Parquet](https://arrow.apache.org/install/).
728+
Apache Parquet GLib is a part of Apache Arrow project.
729+
730+
### Testing Parquet support
731+
732+
Example configuration:
733+
734+
{% tabs %}
735+
{% tab title="fluent-bit.yaml" %}
736+
737+
```yaml
738+
service:
739+
flush: 5
740+
daemon: Off
741+
log_level: debug
742+
http_server: Off
743+
744+
pipeline:
745+
inputs:
746+
- name: dummy
747+
tag: dummy.local
748+
dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}
749+
750+
outputs:
751+
- name: s3
752+
match: dummy*
753+
region: us-east-2
754+
bucket: <your_testing_bucket>
755+
use_put_object: On
756+
compression: parquet
757+
# other parameters
758+
```
759+
760+
{% endtab %}
761+
{% tab title="fluent-bit.conf" %}
762+
763+
```text
764+
[SERVICE]
765+
Flush 5
766+
Daemon Off
767+
Log_Level debug
768+
HTTP_Server Off
769+
770+
[INPUT]
771+
Name dummy
772+
Tag dummy.local
773+
Dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}
774+
775+
[OUTPUT]
776+
Name s3
777+
Match dummy*
778+
Region us-east-2
779+
Bucket <your_testing_bucket>
780+
Use_Put_Object On
781+
Compression parquet
782+
# other parameters
783+
```
784+
785+
{% endtab %}
786+
{% endtabs %}

0 commit comments

Comments
 (0)