Skip to content

Commit b5fc3da

Browse files
committed
update README.rst
Signed-off-by: Stephen L. <lrq3000@gmail.com>
1 parent 89ee145 commit b5fc3da

File tree

1 file changed

+32
-9
lines changed

1 file changed

+32
-9
lines changed

README.rst

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,13 @@ pyFileFixity
55

66
|Build-Status| |Coverage|
77

8-
This project aims to provide a set of open source, cross-platform, easy
8+
pyFileFixity provides a suite of open source, cross-platform, easy
99
to use and easy to maintain (readable code) to protect and manage data
10-
for long term storage. The project is done in pure-Python to meet those criteria.
10+
for long term storage/archival, and also test the performance of any data protection algorithms.
11+
12+
The project is done in pure-Python to meet those criteria,
13+
although cythonized extensions are available for core routines to speed up encoding/decoding,
14+
but always with a pure python specification available so as to allow long term replication.
1115

1216
Here is an example of what pyFileFixity can do:
1317

@@ -104,20 +108,20 @@ Note: this also works for a single file, just replace "your_folder" by "your_fil
104108

105109
- DEPRECATED (because Gooey is not maintained anymore it seems): To use the GUI with any tool, use ``--gui`` and do not supply any other argument, eg: ``python rfigc.py --gui``.
106110

107-
- You can also use `PyPy <http://pypy.org/>`_ to hugely speedup the processing time of any tool here.
111+
- You can also use `PyPy <http://pypy.org/>`_ or Cython to hugely speedup the processing time of any tool here.
108112

109113
The problem of long term storage
110114
--------------------------------
111115

112-
Why are data corrupted with time? Entropy, my friend, entropy.
116+
Why are data corrupted with time? One sole reason: entropy.
113117
Entropy refers to the universal tendency for systems to become
114-
less ordered over time. Corruption is exactly that: a disorder
118+
less ordered over time. Data corruption is exactly that: a disorder
115119
in bits order. In other words: *the Universe hates your data*.
116120

117121
Long term storage is thus a very difficult topic: it's like fighting with
118122
death (in this case, the death of data). Indeed, because of entropy,
119123
data will eventually fade away because of various silent errors such as
120-
bit rot. pyFileFixity aims to provide tools to detect any data
124+
bit rot or cosmic rays. pyFileFixity aims to provide tools to detect any data
121125
corruption, but also fight data corruption by providing repairing tools.
122126

123127
The only solution is to use a principle of engineering that is long
@@ -178,6 +182,15 @@ corruption, so that you can process it by your own means if you want to,
178182
without having to study for hours how the code works (contrary to PAR2
179183
format).
180184

185+
In practice, both approaches are not exclusive, and the best is to
186+
combine them: protect the most precious data with error correction codes,
187+
then duplicate them across multiple storage mediums. Hence, this suite of
188+
data protection tools, just like any other such suite, is not sufficient to
189+
guarantee your data is protected, you must have an active data curation strategy
190+
which includes regularly checking your data and replacing copies that are damaged.
191+
192+
For a primer on storage mediums and data protection strategies, see `this post I wrote <https://web.archive.org/web/20220529125543/https://superuser.com/questions/374609/what-medium-should-be-used-for-long-term-high-volume-data-storage-archival/873260>`_.
193+
181194
Why not just use RAID ?
182195
-----------------------
183196

@@ -645,10 +658,20 @@ Cython implementation
645658
---------------------
646659

647660
This section describes how to use the Cython implementation. However,
648-
you should first try PyPy, as it did give 10x to 100x speedup over
649-
Cython in our case.
661+
you should first try PyPy, as it may give great performances too.
662+
663+
Simply follow the instruction to install the `reedsolo <https://github.com/tomerfiliba/reedsolomon/releases/tag/v2.0.5>`_ module with
664+
the cythonized module:
665+
666+
.. code:: sh
667+
668+
pip install --upgrade reedsolo --install-option="--cythonize" --verbose
669+
670+
Then make sure to use ``ecc_algo=3`` in all your ``eccman`` calls, and you
671+
are then good to go, the cythonized module ``creedsolo`` will always be used
672+
for both encoding and decoding transparently.
650673

651-
THIS SECTION IS OLD AND DEPRECATED, because the Cython compilation is now
674+
THE REST OF THIS SECTION IS OLD AND DEPRECATED, because the Cython compilation is now
652675
done directly in the Reed-Solomon submodules, instead of here, so you
653676
should not need to worry about it, just pip install with the requirements.txt
654677
and you should be set. The information below is left for historical purposes:

0 commit comments

Comments
 (0)