Skip to content

Commit f11bd9e

Browse files
committed
update(site): update docs for version 1.3
1 parent 1f809eb commit f11bd9e

File tree

3 files changed

+51
-20
lines changed

3 files changed

+51
-20
lines changed

site/assets/scss/_variables_project.scss

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ $light: $white !default;
3838

3939
table{
4040
width:100% !important;
41-
display: table!important;
4241
}
4342

4443
.td-box, .td-content {

site/content/en/docs/Developer Guide/configuration.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
date: 2020-05-21
2+
date: 2020-05-25
33
title: "Basic Configuration"
44
linkTitle: "Basic Configuration"
55
weight: 20
@@ -21,8 +21,14 @@ Those configuration are described in detail in subsequent chapters.
2121
|`fs.scan.filters` | Filters use to list eligible input files| list | *-* | medium |
2222
|`filters` | List of filters aliases to apply on each data (order is important) | list | *-* | medium |
2323
|`internal.kafka.reporter.topic` | Name of the internal topic used by tasks and connector to report and monitor file progression. | class | *connect-file-pulse-status* | high |
24-
|`internal.kafka.reporter.id` | The reporter identifier which is used as a group.id (must be unique for each connect instance) | string | *-* | high |
2524
|`internal.kafka.reporter.bootstrap.servers` |A list of host/port pairs uses by the reporter for establishing the initial connection to the Kafka cluster. | string | *-* | high |
2625
|`task.reader.class` | The fully qualified name of the class which is used by tasks to read input files | class | *io.streamthoughts.kafka.connect.filepulse.reader.RowFileReader* | high |
2726
|`offset.strategy` | The strategy to use for building source offset from an input file; must be one of [name, path, name+hash] | string | *name+hash* | high |
28-
|`topic` | The default output topic to write | string | *-* | high |
27+
|`topic` | The default output topic to write | string | *-* | high |
28+
29+
30+
### Prior to Connect FilePulse 1.3.x (deprecated)
31+
| Configuration | Description | Type | Default | Importance |
32+
| --------------| --------------|-----------| --------- | ------------- |
33+
|`internal.kafka.reporter.id` | The reporter identifier to be used by tasks and connector to report and monitor file progression (default null). This property must only be set for users that have run a connector in version prior to 1.3.x to ensure backward-compatibility (when set, must be unique for each connect instance). | string | *-* | high |
34+

site/content/en/docs/Developer Guide/scanning-files.md

Lines changed: 42 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,73 @@
11
---
2-
date: 2020-05-21
2+
date: 2020-05-25
33
title: "Scanning Files"
44
linkTitle: "Scanning Files"
55
weight: 30
66
description: >
77
The commons configuration for Connect File Pulse.
88
---
99

10-
The connector can be configured with a specific [FSDirectoryWalker](https://github.com/streamthoughts/kafka-connect-file-pulse/blob/master/connect-file-pulse-plugin/src/main/java/io/streamthoughts/kafka/connect/filepulse/scanner/local/FSDirectoryWalker.java)
11-
implementation that will be responsible to scan an input directory looking for files to stream into Kafka.
10+
The connector must be configured with a specific [FSDirectoryWalker](https://github.com/streamthoughts/kafka-connect-file-pulse/blob/master/connect-file-pulse-plugin/src/main/java/io/streamthoughts/kafka/connect/filepulse/scanner/local/FSDirectoryWalker.java)
11+
that will be responsible for scanning an input directory to find files eligible to be streamed in Kafka.
1212

1313
The default `FSDirectoryWalker` implementation is :
1414

1515
`io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker`.
1616

17-
When scheduled, the `LocalFSDirectoryWalker` will recursively scan the input directory configured via `input.directory.path`.
18-
The SourceConnector will run a background-thread to periodically trigger a file system scan using the configured FSDirectoryWalker.
17+
The `FilePulseSourceConnector` periodically triggers a file system scan of the directory specified in the `input.directory.path`
18+
connector property. Scan is executed in a background-thread invoking the configured `FSDirectoryWalker`.
1919

20-
## Connector Configuration
20+
## Configuring Directory Scan (using `LocalFSDirectoryWalker`)
2121

2222
| Configuration | Description | Type | Default | Importance |
2323
| --------------| --------------|-----------| --------- | ------------- |
2424
|`fs.scanner.class` | The class used to scan file system | class | *io.streamthoughts.kafka.connect.filepulse.scanner.local.LocalFSDirectoryWalker* | medium |
2525
|`fs.scan.directory.path` | The input directory to scan | string | *-* | high |
2626
|`fs.scan.interval.ms` | Time interval in milliseconds at wish the input directory is scanned | long | *10000* | high |
27+
|`fs.scan.filters` | The comma-separated list of fully qualified class names of the filter-filters to be uses to list eligible input files| list | *-* | medium |
28+
|`fs.recursive.scan.enable` | Boolean indicating whether local directory should be recursively scanned | boolean | *true* | medium |
2729

28-
## Filter files
30+
## Filtering input files
2931

30-
Files can be filtered to determine if they need to be scheduled or ignored. Files which are filtered are simply skipped and
31-
keep untouched on the file system until next scan. On the next scan, previously filtered files will be evaluate again to determine if there are now eligible to be processing.
32+
You can configure one or more `FileFilter` that will be used to determine if a file should be scheduled for processing or ignored.
33+
All files that are filtered out are simply ignored and remain untouched on the file system until the next scan.
34+
At the next scan, previously filtered files will be evaluated again to determine if they are now eligible for processing.
3235

33-
These filters are available for use with Kafka Connect File Pulse:
36+
FilePulse packs with the following built-in filters :
3437

35-
| Filter | Description |
36-
|--- | --- |
37-
| IgnoreHiddenFileFilter | Filters hidden files from being read. |
38-
| LastModifiedFileFilter | Filters files that been modified to recently based on their last modified date property |
39-
| RegexFileFilter | Filter file that do not match the specified regex |
38+
### IgnoreHiddenFileFilter
4039

40+
The `IgnoreHiddenFileFilter` can be used to filter hidden files from being read.
41+
42+
**Configuration example**
43+
44+
```properties
45+
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.IgnoreHiddenFileListFilter
46+
```
47+
48+
### LastModifiedFileFilter
49+
50+
The `LastModifiedFileFilter` can be used to filter files that have been modified to recently based on their last modified date property.
51+
52+
```properties
53+
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.LastModifiedFileFilter
54+
# The last modified time for a file can be accepted (default: 5000)
55+
file.filter.minimum.age.ms=10000
56+
```
57+
58+
### RegexFileFilter
59+
60+
The `RegexFileFilter` can be used to filter files that do not match the specified regex.
61+
62+
```properties
63+
fs.scan.filters=io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileFilter
64+
# The regex pattern used to matches input files
65+
file.filter.regex.pattern="\\.log$"
66+
```
4167

4268
## Supported File types
4369

44-
`LocalFSDirectoryWalker` will try to detect if a file needs to be decompressed by probing its content type or its extension (javadoc : [Files#probeContentType](https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#probeContentType-java.nio.file.Path)
70+
`LocalFSDirectoryWalker` will try to detect if a file needs to be decompressed by probing its content type or its extension (javadoc : [Files#probeContentType](https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#probeContentType-java.nio.file.Path))
4571

4672
The connector supports the following content types :
4773

0 commit comments

Comments
 (0)