Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ src/borg/crypto/low_level.c
src/borg/item.c
src/borg/chunkers/buzhash.c
src/borg/chunkers/buzhash64.c
src/borg/chunkers/fastcdc.c
src/borg/chunkers/reader.c
src/borg/checksums.c
src/borg/platform/darwin.c
Expand Down
14 changes: 14 additions & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,20 @@ above.

New features:

- buzhash64 chunker: add FastCDC-style normalized chunking and enable it by default
(``nc_level=2``). It switches between a stricter and a looser cut mask around the target
chunk size, which greatly tightens the chunk-size distribution (chunk-size variance /
coefficient of variation roughly cut by ~60% in tests) and removes the dedup-hostile
max-size-clamped chunks, at negligible throughput cost and with unchanged deduplication.
``chunker-params`` for buzhash64 gains a required 6th field ``nc_level``
(``buzhash64,chunk_min,chunk_max,chunk_mask,window_size,nc_level``).
buzhash (32bit) is unchanged and stays bit-compatible with borg 1.x.
- new ``fastcdc`` chunker: a FastCDC content-defined chunker using a window-less, keyed Gear
rolling hash (the gear table is derived from the repo's id key, like buzhash64, so cut points
stay unpredictable without the key). It supports the same normalized chunking as buzhash64 and
produces the same chunk-size distribution and deduplication, but chunks roughly 1.3-1.5x faster.
Select it via ``--chunker-params fastcdc,chunk_min,chunk_max,chunk_mask,nc_level`` (no window
field; e.g. ``fastcdc,19,23,21,2``). ``borg benchmark cpu`` now reports its throughput too.
- repo-create: split ``--encryption`` into orthogonal options. ``--encryption`` now
selects only the cipher / AE algorithm (``none``, ``authenticated``, ``aes256-ocb``
or ``chacha20-poly1305``), the new ``--id-hash`` selects the id hash function
Expand Down
1 change: 1 addition & 0 deletions docs/global.rst.inc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
.. _OpenSSL: https://www.openssl.org/
.. _`Python 3`: https://www.python.org/
.. _Buzhash: https://en.wikipedia.org/wiki/Buzhash
.. _FastCDC: https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia
.. _msgpack: https://msgpack.org/
.. _`msgpack-python`: https://pypi.org/project/msgpack-python/
.. _llfuse: https://pypi.org/project/llfuse/
Expand Down
18 changes: 18 additions & 0 deletions docs/internals/data-structures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,8 @@ Borg has these chunkers:
- "buzhash": variable, content-defined blocksize, uses a rolling hash
computed by the Buzhash_ algorithm.
- "buzhash64": similar to "buzhash", but improved 64bit implementation
- "fastcdc": variable, content-defined blocksize, uses the window-less, keyed
Gear rolling hash (FastCDC_); faster than buzhash, same deduplication.

For some more general usage hints see also ``--chunker-params``.

Expand Down Expand Up @@ -483,6 +485,22 @@ The buzhash table is cryptographically derived from secret key material.
These changes should improve resistance against attacks and also solve
some of the issues of the original (32bit / XORed table) implementation.

"fastcdc" chunker
+++++++++++++++++

FastCDC_ content-defined chunker using the Gear rolling hash. Unlike buzhash it
is window-less (each byte's influence simply decays out of the hash), so its
update is cheaper and it chunks noticeably faster, while producing the same
deduplication and (with normalized chunking) the same chunk-size distribution.

Like "buzhash64", the Gear table is cryptographically derived from secret key
material, so chunk cut points are unpredictable without the key.

``borg create --chunker-params fastcdc,CHUNK_MIN_EXP,CHUNK_MAX_EXP,HASH_MASK_BITS,NC_LEVEL``

There is no window size (Gear is window-less). NC_LEVEL is the normalized
chunking level (0 disables it); 2 is a good default. E.g.: ``fastcdc,19,23,21,2``.

.. _cache:

The cache
Expand Down
2 changes: 1 addition & 1 deletion docs/usage/transfer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ locations and passphrases first:
# The AEAD cipher does not matter (everything must be re-encrypted and
# re-authenticated anyway); you could also choose -e chacha20-poly1305 -i blake3.
$ borg repo-create -e aes256-ocb -i blake3
$ export CHUNKER_PARAMS="buzhash64,19,23,21,4095"
$ export CHUNKER_PARAMS="buzhash64,19,23,21,4095,2"

# 2. Check what and how much it would transfer:
$ borg transfer --from-borg1 --chunker-params=$CHUNKER_PARAMS --dry-run
Expand Down
Loading
Loading