From b0ca87540a28aac823791782d9f12d098ca71ed3 Mon Sep 17 00:00:00 2001 From: Filipe Silva Date: Sun, 4 Feb 2024 22:56:12 +0000 Subject: [PATCH] Don't hash contents of every watched file From https://github.com/gmethvin/directory-watcher?tab=readme-ov-file#configuration: > By default, DirectoryWatcher will try to prevent duplicate events (...). This is done by creating a hash for every file encountered and keeping that hash in memory. This might result in slower performance, because the library has to calculate the hash of the entire file. > ... > In the above example we use the last modified time hasher. This hasher is only suitable for platforms that have at least millisecond precision in last modified times from Java. It's known to work with JDK 10+ on Macs with APFS. The default is rather slow actually. On a 230mb directory with 1700 files, the initial watch call takes 2951ms. With the changes in this PR (which are the changes shown in their docs) it goes down to 2ms. 2951ms isn't terribly bad mind you... but I came across this problem when watching a 12.5gb folder with 180k files instead. The watcher never really started in that case. With these changes it started in 5659ms. Regarding the restrictions listed, APFS is the default on Macs since 10.13 (released late 2017). Perhaps this should not be the default, but I will leave that to your judgement. --- src/nextjournal/beholder.clj | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/nextjournal/beholder.clj b/src/nextjournal/beholder.clj index fb293d6..af74a57 100644 --- a/src/nextjournal/beholder.clj +++ b/src/nextjournal/beholder.clj @@ -1,6 +1,7 @@ (ns nextjournal.beholder (:import [io.methvin.watcher DirectoryChangeEvent DirectoryChangeEvent$EventType DirectoryChangeListener DirectoryWatcher] + [io.methvin.watcher.hashing FileHasher] [java.nio.file Paths])) (defn- fn->listener ^DirectoryChangeListener [f] @@ -24,6 +25,7 @@ Not meant to be called directly but use `watch` or `watch-blocking` instead." [cb paths] (-> (DirectoryWatcher/builder) + (.fileHasher FileHasher/LAST_MODIFIED_TIME) (.paths (map to-path paths)) (.listener (fn->listener cb)) (.build)))