diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md new file mode 100644 index 0000000000..491b984f83 --- /dev/null +++ b/docs/support/tutorials/data-mover.md @@ -0,0 +1,75 @@ +# Data-mover + +Data-mover is a tool to move data between Puhti and Mahti local filesystems and +Allas and LUMI-O object storage servers, when +[simple transfers](../faq/how-to-move-data-between-puhti-and-allas.md#move-data-with-rclone) +are not practical, either because there are many small files, or the size of the +dataset is large. + +Data-mover stores the data in the object storage in [Restic](https://restic.readthedocs.io) +repository format. Restic, as used by data-mover, in turn uses [Rclone](https://rclone.org) +backend for the actual data transfers to the object storage servers and back, with S3 authentication. +Moving data to Restic repository in an object storage could also be achieved by using Restic directly, or with +[allas-backup tool](../../data/Allas/using_allas/a_backup/). The main differentiating feature is +that data-mover does the actual transfers unattended using batch jobs. + +## Simple example case, moving data from Puhti to Allas and back + +Below is a guide for a simple scenario, moving data from Puhti project scratch +directory to corresponding project in Allas, and then back. Similar works with +Mahti and LUMI-O. Please have a look at `data-mover help` and `data-mover --help` +for additional documentation. + +### Setting up the connection from Puhti to Allas + +1. Your CSC project needs to have Allas service enabled. The project PI can add +Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and +the project members need to [accept the service terms](../../accounts/how-to-add-service-access-for-project.md). + +2. Create a configuration for rclone and store the authentication token in the +file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from +[Puhti web interface](https://puhti.csc.fi). Open "Cloud storage configuration" from the +"Tools" drop-down menu, and +[create Allas S3 rclone configuration for the project](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). + +4. Open a terminal to Puhti, and take the data-mover tool `data-mover` into use with +``` +module load data-mover +``` + +### Moving a single directory in Puhti to Allas + +1. Clean (delete) all the files that are not needed from the target scratch directory, +`/scratch//exampledir`, for example. There is no need +to compress the files separately. + +3. Move the data to Allas +``` +data-mover export /scratch//exampledir +``` + +3. Check the status of the data transfer with +``` +data-mover status +``` + +### Listing the data in Allas + +``` +data-mover list +``` + +### Moving data from Allas to Puhti + +Import data back to the original directory with +``` +data-mover import /scratch//exampledir +``` + +## Links to related material + +- [Lue tool for data inventory](lue.md) +- [Data cleaning](clean-up-data.md) +- [Allas introduction](../../data/Allas/introduction.md) +- [Allas-backup tool](../../data/Allas/using_allas/a_backup/) +- [Restic](https://restic.readthedocs.io) diff --git a/docs/support/tutorials/index.md b/docs/support/tutorials/index.md index 4bdff3756a..13f848c14a 100644 --- a/docs/support/tutorials/index.md +++ b/docs/support/tutorials/index.md @@ -12,11 +12,15 @@ * [Developing scripts remotely](remote-dev.md) * [Using CSC HPC environment efficiently](https://csc-training.github.io/csc-env-eff/) * [How to run existing containers in Puhti](../../computing/containers/overview.md#running-containers) -* [Getting disk usage using Lue](lue.md) * [Running Julia jobs on Puhti and Mahti clusters](julia.md) * [Using Python on CSC supercomputers](python-usage-guide.md) * [Setting up SSH keys at CSC](https://csc-training.github.io/csc-env-eff/hands-on/connecting/ssh-keys.html) +## Data management +* [Managing data on Puhti and Mahti scratch disks](clean-up-data.md) +* [Getting disk usage using Lue](lue.md) +* [Moving large datasets to Allas](data-mover.md) + ## Installation of tools on supercomputers * [Installing software with Spack](user-spack.md)