Skip to content

Commit 875f838

Browse files
authored
Update hdfs howto for fluent-package (#546)
NOTE: It might better about recent CDH information later. See https://community.cloudera.com/t5/Support-Questions/is-there-any-open-source-CDH-version-available-to-download/td-p/316250 Signed-off-by: Kentaro Hayashi <hayashi@clear-code.com>
1 parent 18365e0 commit 875f838

File tree

1 file changed

+25
-10
lines changed

1 file changed

+25
-10
lines changed

how-to-guides/http-to-hdfs.md

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,25 +16,40 @@ The figure below shows the high-level architecture:
1616

1717
![HTTP-to-HDFS Overview](../.gitbook/assets/http-to-hdfs.png)
1818

19-
## Install
19+
## Prerequisites
20+
21+
The following software/services are required to be set up correctly:
22+
23+
* [Fluentd](https://www.fluentd.org/)
24+
* Apache HDFS
25+
* [WebHDFS Output Plugin](https://github.com/fluent/fluent-plugin-webhdfs/) ([`out_webhdfs`](../output/webhdfs.md))
2026

2127
For simplicity, this article will describe how to set up a one-node configuration. Please install the following software on the same node:
2228

23-
* [Fluentd](http://fluentd.org/)
24-
* [WebHDFS Output Plugin](https://github.com/fluent/fluent-plugin-webhdfs/)
29+
You can install Fluentd via major packaging systems.
2530

26-
\([`out_webhdfs`](../output/webhdfs.md)\)
31+
* [Installation](../installation/)
2732

28-
* Apache HDFS
33+
For Cloudera CDH, please refer to the [downloads page](https://www.cloudera.com/downloads.html)
2934

30-
The WebHDFS Output plugin is included in the latest version of Fluentd's deb/rpm package \(v1.1.10 or later\). If you want to use RubyGems to install the plugin, please use `gem install fluent-plugin-webhdfs`.
35+
{% hint style='info' %}
36+
NOTE: CDH (Cloudera Distributed Hadoop) was discontinued. Superseded by Cloudera's CDP Private Cloud.
37+
{% endhint %}
3138

32-
* [Installation](../installation/)
33-
* For CDH, please refer to the [downloads page](https://www.cloudera.com/downloads.html)
39+
40+
### Install plugin
41+
42+
If `out_webhdfs` (fluent-plugin-webhdfs) is not installed yet, please install it manually.
43+
44+
See [Plugin Management](..//installation/post-installation-guide#plugin-management) section how to install fluent-plugin-webhdfs on your environment.
45+
46+
{% hint style='info' %}
47+
If you use `fluent-package`, out_webhdfs (fluent-plugin-webhdfs) is bundled by default.
48+
{% endhint %}
3449

3550
## Fluentd Configuration
3651

37-
Let's start configuring Fluentd. If you used the deb/rpm package, Fluentd's config file is located at `/etc/td-agent/td-agent.conf`. Otherwise, it is located at `/etc/fluentd/fluentd.conf`.
52+
Let's start configuring Fluentd. If you used the deb/rpm package, Fluentd's config file is located at `/etc/fluent/fluentd.conf`.
3853

3954
### HTTP Input
4055

@@ -101,7 +116,7 @@ To test the configuration, just post the JSON to Fluentd \(we use the `curl` com
101116
```text
102117
$ curl -X POST -d 'json={"action":"login","user":2}' \
103118
http://localhost:8888/hdfs.access.test
104-
$ kill -USR1 `cat /var/run/td-agent/td-agent.pid`
119+
$ kill -USR1 `cat /var/run/fluent/fluentd.pid`
105120
```
106121

107122
We can then access HDFS to see the stored data:

0 commit comments

Comments
 (0)