From ac3f09f3388ae97bfa93e342887853012733113e Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Wed, 5 Mar 2025 14:54:38 +0200 Subject: [PATCH 01/11] WIP data-mover https://csc-guide-preview.2.rahtiapp.fi/origin/data-mover/support/tutorials/data-mover --- docs/support/tutorials/data-mover.md | 75 ++++++++++++++++++++++++++++ 1 file changed, 75 insertions(+) create mode 100644 docs/support/tutorials/data-mover.md diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md new file mode 100644 index 0000000000..0c32eb4aba --- /dev/null +++ b/docs/support/tutorials/data-mover.md @@ -0,0 +1,75 @@ +# Data-mover + +Data-mover is a tool to move data between Puhti and Mahti local filesystems and +Allas and LUMI-O object storage servers, when a simple transfers like +https://docs.csc.fi/support/faq/how-to-move-data-between-puhti-and-allas/#move-data-with-rclone +are not practical, either because there are many small files, or the size of the +dataset is large. + +We wish the data-mover tool `dm` to be simple to use, and handle all possible +hard corner cases. It is basically a wrapper around Restic backup tool +https://restic.readthedocs.io/ , and stores the data in Restic repositories. +Restic in turn uses Rclone https://rclone.org/ for the actual data transfers to +the object storage servers and back. In addition, the data-mover tool does the +data transfers in the background, using batch jobs, allowing larger transfers +than would be practical in regular interactive login sessions. + +Below is a guide for a simple scenario, moving data from Puhti project scratch +directory to corresponding project in Allas, and then back. Similar works with +Mahti and LUMI-O. Please have a look at `dm help` and `dm --help` +for additional documentation. + +## Setting up the connection from Puhti or Mahti to Allas + +1. Your CSC project needs to have Allas service enabled. The project PI can add +Allas service for the project in https://my.csc.fi , if not already enabled, and +the project members need to accept the service terms +https://docs.csc.fi/accounts/how-to-add-service-access-for-project/ . + +2. Create a configuration for rclone and store the authentication token in the +file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from the +web interface https://puhti.csc.fi . Open "Cloud storage configuration" from the +"Tools" drop-down menu, and create Allas S3 rclone configuration for the +project, +https://docs.csc.fi/computing/webinterface/file-browser/#accessing-allas-and-lumi-o +. + +3. Open a terminal to Puhti, and take the data-mover tool `dm` into use with +``` +module load .data-mover +``` + +## Moving data from Puhti to Allas + +1. Put the data in a single directory, for example +`/scratch/project_/exampledir` in Puhti, _deleting all the files that +you do not need_. There is no need to compress the files. + +2. Move the data to Allas +``` +dm export /scratch/project_/exampledir +``` + +3. Check the status of the data transfer with +``` +dm status +``` + +## Listing the data in Allas + +``` +dm list +``` + +## Moving data from Allas to Puhti + +Import data back to the original directory with +``` +dm import /scratch/project_/exampledir +``` + +## Links to related material + +- https://docs.csc.fi/support/tutorials/lue/ +- https://docs.csc.fi/support/tutorials/clean-up-data/ +- https://docs.csc.fi/data/Allas/introduction/ From fe732777f462d07cc07f6f3c3e1c6e617ae75b05 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Wed, 5 Mar 2025 15:07:24 +0200 Subject: [PATCH 02/11] Update index.md --- docs/support/tutorials/index.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/support/tutorials/index.md b/docs/support/tutorials/index.md index bed52f3db6..536c931f3b 100644 --- a/docs/support/tutorials/index.md +++ b/docs/support/tutorials/index.md @@ -3,7 +3,6 @@ ## General * [Getting started with supercomputing at CSC](hpc-quick.md) * [Getting started with Helmi](../../computing/quantum-computing/helmi/helmi-from-lumi.md) -* [Managing data on Puhti and Mahti scratch disks](clean-up-data.md) * [CSC Quick reference (pdf)](../../img/csc-quick-reference/csc-quick-reference.pdf) * [Linux basics for CSC](env-guide/index.md) * [Interactive and batch job hands-on in Puhti](cmdline-handson.md) @@ -11,10 +10,14 @@ * [Developing scripts remotely](remote-dev.md) * [Using CSC HPC environment efficiently](https://csc-training.github.io/csc-env-eff/) * [How to run existing containers in Puhti](../../computing/containers/run-existing.md) -* [Getting disk usage using Lue](lue.md) * [Running Julia jobs on Puhti and Mahti clusters](julia.md) * [Using Python on CSC supercomputers](python-usage-guide.md) +## Data management +* [Managing data on Puhti and Mahti scratch disks](clean-up-data.md) +* [Getting disk usage using Lue](lue.md) +* [Moving large datasets to Allas](data-mover.md) + ## Installation of tools on supercomputers * [Installing software with Spack](user-spack.md) * [Building Singularity containers from scratch](singularity-scratch.md) From 3d0fe01e76647b839039f7c6f6883a95cc45ec27 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Thu, 6 Mar 2025 08:38:06 +0200 Subject: [PATCH 03/11] Fixing links --- docs/support/tutorials/data-mover.md | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index 0c32eb4aba..8f3d4f0608 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -1,15 +1,15 @@ # Data-mover Data-mover is a tool to move data between Puhti and Mahti local filesystems and -Allas and LUMI-O object storage servers, when a simple transfers like -https://docs.csc.fi/support/faq/how-to-move-data-between-puhti-and-allas/#move-data-with-rclone +Allas and LUMI-O object storage servers, when +[simple transfers](../faq/how-to-move-data-between-puhti-and-allas.md#move-data-with-rclone) are not practical, either because there are many small files, or the size of the dataset is large. We wish the data-mover tool `dm` to be simple to use, and handle all possible -hard corner cases. It is basically a wrapper around Restic backup tool -https://restic.readthedocs.io/ , and stores the data in Restic repositories. -Restic in turn uses Rclone https://rclone.org/ for the actual data transfers to +hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) +, and stores the data in Restic repositories. +Restic in turn uses [Rclone](https://rclone.org) for the actual data transfers to the object storage servers and back. In addition, the data-mover tool does the data transfers in the background, using batch jobs, allowing larger transfers than would be practical in regular interactive login sessions. @@ -22,17 +22,15 @@ for additional documentation. ## Setting up the connection from Puhti or Mahti to Allas 1. Your CSC project needs to have Allas service enabled. The project PI can add -Allas service for the project in https://my.csc.fi , if not already enabled, and -the project members need to accept the service terms -https://docs.csc.fi/accounts/how-to-add-service-access-for-project/ . +Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and +the project members need to [accept the service terms](../../accounts/how-to-add-service-access-for-project.md). 2. Create a configuration for rclone and store the authentication token in the -file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from the -web interface https://puhti.csc.fi . Open "Cloud storage configuration" from the +file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from +[Puhti web interface](https://puhti.csc.fi). Open "Cloud storage configuration" from the "Tools" drop-down menu, and create Allas S3 rclone configuration for the project, -https://docs.csc.fi/computing/webinterface/file-browser/#accessing-allas-and-lumi-o -. +[](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). 3. Open a terminal to Puhti, and take the data-mover tool `dm` into use with ``` @@ -70,6 +68,6 @@ dm import /scratch/project_/exampledir ## Links to related material -- https://docs.csc.fi/support/tutorials/lue/ -- https://docs.csc.fi/support/tutorials/clean-up-data/ -- https://docs.csc.fi/data/Allas/introduction/ +- [Lue tool for data inventory](lue.md) +- [Data cleaning](clean-up-data.md) +- [Allas introduction](../../data/Allas/introduction.md) From 88c863a7d4079f0bae6f776ed2accf581ae8e50d Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Thu, 6 Mar 2025 08:41:15 +0200 Subject: [PATCH 04/11] Fixing links --- docs/support/tutorials/data-mover.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index 8f3d4f0608..cfe0419d75 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -28,11 +28,10 @@ the project members need to [accept the service terms](../../accounts/how-to-add 2. Create a configuration for rclone and store the authentication token in the file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from [Puhti web interface](https://puhti.csc.fi). Open "Cloud storage configuration" from the -"Tools" drop-down menu, and create Allas S3 rclone configuration for the -project, -[](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). +"Tools" drop-down menu, and +[create Allas S3 rclone configuration for the project](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). -3. Open a terminal to Puhti, and take the data-mover tool `dm` into use with +4. Open a terminal to Puhti, and take the data-mover tool `dm` into use with ``` module load .data-mover ``` From 70475656e4243dcdfab3821284230f7da1706167 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Thu, 6 Mar 2025 08:44:33 +0200 Subject: [PATCH 05/11] example is Puhti-Allas-Puhti --- docs/support/tutorials/data-mover.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index cfe0419d75..d843e03dd8 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -19,7 +19,7 @@ directory to corresponding project in Allas, and then back. Similar works with Mahti and LUMI-O. Please have a look at `dm help` and `dm --help` for additional documentation. -## Setting up the connection from Puhti or Mahti to Allas +## Setting up the connection from Puhti to Allas 1. Your CSC project needs to have Allas service enabled. The project PI can add Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and From c01f224a1c6ab37827e21702d77e0d258aa064d4 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Fri, 7 Mar 2025 10:37:34 +0200 Subject: [PATCH 06/11] Update data-mover.md based on the comments so far --- docs/support/tutorials/data-mover.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index d843e03dd8..83a9316721 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -8,7 +8,7 @@ dataset is large. We wish the data-mover tool `dm` to be simple to use, and handle all possible hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) -, and stores the data in Restic repositories. +, and stores the data in Restic repository format. Restic in turn uses [Rclone](https://rclone.org) for the actual data transfers to the object storage servers and back. In addition, the data-mover tool does the data transfers in the background, using batch jobs, allowing larger transfers @@ -36,13 +36,13 @@ file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from module load .data-mover ``` -## Moving data from Puhti to Allas +## Moving a single directory in Puhti to Allas -1. Put the data in a single directory, for example -`/scratch/project_/exampledir` in Puhti, _deleting all the files that -you do not need_. There is no need to compress the files. +1. Delete all the files that are not needed from the scratch directory, +`/scratch/project_/exampledir`, for example. There is no need +to compress the files. -2. Move the data to Allas +3. Move the data to Allas ``` dm export /scratch/project_/exampledir ``` From c36d49f5a0f607ceea557e1a0ccc5ef009dbaceb Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Mon, 24 Mar 2025 10:40:47 +0200 Subject: [PATCH 07/11] Update data-mover.md --- docs/support/tutorials/data-mover.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index 83a9316721..b6b3a8baf6 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -9,17 +9,19 @@ dataset is large. We wish the data-mover tool `dm` to be simple to use, and handle all possible hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) , and stores the data in Restic repository format. -Restic in turn uses [Rclone](https://rclone.org) for the actual data transfers to +Restic (as used by data-mover) in turn uses [Rclone](https://rclone.org) backend for the actual data transfers to the object storage servers and back. In addition, the data-mover tool does the data transfers in the background, using batch jobs, allowing larger transfers than would be practical in regular interactive login sessions. +## Simple exaple case, moving data from Puhti to Allas and back + Below is a guide for a simple scenario, moving data from Puhti project scratch directory to corresponding project in Allas, and then back. Similar works with Mahti and LUMI-O. Please have a look at `dm help` and `dm --help` for additional documentation. -## Setting up the connection from Puhti to Allas +### Setting up the connection from Puhti to Allas 1. Your CSC project needs to have Allas service enabled. The project PI can add Allas service for the project in [my.csc.fi](https://my.csc.fi) , if not already enabled, and @@ -36,7 +38,7 @@ file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from module load .data-mover ``` -## Moving a single directory in Puhti to Allas +### Moving a single directory in Puhti to Allas 1. Delete all the files that are not needed from the scratch directory, `/scratch/project_/exampledir`, for example. There is no need @@ -52,13 +54,13 @@ dm export /scratch/project_/exampledir dm status ``` -## Listing the data in Allas +### Listing the data in Allas ``` dm list ``` -## Moving data from Allas to Puhti +### Moving data from Allas to Puhti Import data back to the original directory with ``` From 4db9e9c4dc14ba2a6f8d31360b59fd281f87e6a9 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Wed, 26 Mar 2025 12:13:34 +0200 Subject: [PATCH 08/11] `dm` --> `data-mover` executable name --- docs/support/tutorials/data-mover.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index b6b3a8baf6..6f883ed91d 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -6,7 +6,7 @@ Allas and LUMI-O object storage servers, when are not practical, either because there are many small files, or the size of the dataset is large. -We wish the data-mover tool `dm` to be simple to use, and handle all possible +We wish the data-mover tool `data-mover` to be simple to use, and handle all possible hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) , and stores the data in Restic repository format. Restic (as used by data-mover) in turn uses [Rclone](https://rclone.org) backend for the actual data transfers to @@ -18,7 +18,7 @@ than would be practical in regular interactive login sessions. Below is a guide for a simple scenario, moving data from Puhti project scratch directory to corresponding project in Allas, and then back. Similar works with -Mahti and LUMI-O. Please have a look at `dm help` and `dm --help` +Mahti and LUMI-O. Please have a look at `data-mover help` and `data-mover --help` for additional documentation. ### Setting up the connection from Puhti to Allas @@ -33,7 +33,7 @@ file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from "Tools" drop-down menu, and [create Allas S3 rclone configuration for the project](../../computing/webinterface/file-browser.md#accessing-allas-and-lumi-o). -4. Open a terminal to Puhti, and take the data-mover tool `dm` into use with +4. Open a terminal to Puhti, and take the data-mover tool `data-mover` into use with ``` module load .data-mover ``` @@ -46,25 +46,25 @@ to compress the files. 3. Move the data to Allas ``` -dm export /scratch/project_/exampledir +data-mover export /scratch/project_/exampledir ``` 3. Check the status of the data transfer with ``` -dm status +data-mover status ``` ### Listing the data in Allas ``` -dm list +data-mover list ``` ### Moving data from Allas to Puhti Import data back to the original directory with ``` -dm import /scratch/project_/exampledir +data-mover import /scratch/project_/exampledir ``` ## Links to related material From 058cffb45b2b1169a81b24fd536dc5a6995ff288 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Wed, 26 Mar 2025 12:18:00 +0200 Subject: [PATCH 09/11] Update data-mover.md --- docs/support/tutorials/data-mover.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index 6f883ed91d..25a33280f6 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -14,7 +14,7 @@ the object storage servers and back. In addition, the data-mover tool does the data transfers in the background, using batch jobs, allowing larger transfers than would be practical in regular interactive login sessions. -## Simple exaple case, moving data from Puhti to Allas and back +## Simple example case, moving data from Puhti to Allas and back Below is a guide for a simple scenario, moving data from Puhti project scratch directory to corresponding project in Allas, and then back. Similar works with From 8865c71f312edad84100341398f44af90b79024c Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Tue, 28 Oct 2025 13:03:26 +0200 Subject: [PATCH 10/11] Update data-mover.md --- docs/support/tutorials/data-mover.md | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/docs/support/tutorials/data-mover.md b/docs/support/tutorials/data-mover.md index 25a33280f6..491b984f83 100644 --- a/docs/support/tutorials/data-mover.md +++ b/docs/support/tutorials/data-mover.md @@ -6,13 +6,12 @@ Allas and LUMI-O object storage servers, when are not practical, either because there are many small files, or the size of the dataset is large. -We wish the data-mover tool `data-mover` to be simple to use, and handle all possible -hard corner cases. It is basically a wrapper around [Restic backup tool](https://restic.readthedocs.io) -, and stores the data in Restic repository format. -Restic (as used by data-mover) in turn uses [Rclone](https://rclone.org) backend for the actual data transfers to -the object storage servers and back. In addition, the data-mover tool does the -data transfers in the background, using batch jobs, allowing larger transfers -than would be practical in regular interactive login sessions. +Data-mover stores the data in the object storage in [Restic](https://restic.readthedocs.io) +repository format. Restic, as used by data-mover, in turn uses [Rclone](https://rclone.org) +backend for the actual data transfers to the object storage servers and back, with S3 authentication. +Moving data to Restic repository in an object storage could also be achieved by using Restic directly, or with +[allas-backup tool](../../data/Allas/using_allas/a_backup/). The main differentiating feature is +that data-mover does the actual transfers unattended using batch jobs. ## Simple example case, moving data from Puhti to Allas and back @@ -35,18 +34,18 @@ file `$HOME/.config/rclone/rclone.conf` in Puhti. This is easiest to do from 4. Open a terminal to Puhti, and take the data-mover tool `data-mover` into use with ``` -module load .data-mover +module load data-mover ``` ### Moving a single directory in Puhti to Allas -1. Delete all the files that are not needed from the scratch directory, -`/scratch/project_/exampledir`, for example. There is no need -to compress the files. +1. Clean (delete) all the files that are not needed from the target scratch directory, +`/scratch//exampledir`, for example. There is no need +to compress the files separately. 3. Move the data to Allas ``` -data-mover export /scratch/project_/exampledir +data-mover export /scratch//exampledir ``` 3. Check the status of the data transfer with @@ -64,7 +63,7 @@ data-mover list Import data back to the original directory with ``` -data-mover import /scratch/project_/exampledir +data-mover import /scratch//exampledir ``` ## Links to related material @@ -72,3 +71,5 @@ data-mover import /scratch/project_/exampledir - [Lue tool for data inventory](lue.md) - [Data cleaning](clean-up-data.md) - [Allas introduction](../../data/Allas/introduction.md) +- [Allas-backup tool](../../data/Allas/using_allas/a_backup/) +- [Restic](https://restic.readthedocs.io) From ebbf57c05991c47b342702643bfc3c782011b610 Mon Sep 17 00:00:00 2001 From: Juha Lento Date: Tue, 28 Oct 2025 13:12:13 +0200 Subject: [PATCH 11/11] Update index.md --- docs/support/tutorials/index.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/support/tutorials/index.md b/docs/support/tutorials/index.md index d8a0ab45d7..13f848c14a 100644 --- a/docs/support/tutorials/index.md +++ b/docs/support/tutorials/index.md @@ -12,7 +12,6 @@ * [Developing scripts remotely](remote-dev.md) * [Using CSC HPC environment efficiently](https://csc-training.github.io/csc-env-eff/) * [How to run existing containers in Puhti](../../computing/containers/overview.md#running-containers) -* [Getting disk usage using Lue](lue.md) * [Running Julia jobs on Puhti and Mahti clusters](julia.md) * [Using Python on CSC supercomputers](python-usage-guide.md) * [Setting up SSH keys at CSC](https://csc-training.github.io/csc-env-eff/hands-on/connecting/ssh-keys.html)