[HWORKS-2731] Document HopsFS access path for shared datasets#569
[HWORKS-2731] Document HopsFS access path for shared datasets#569gibchikafa merged 13 commits intologicalclocks:mainfrom
Conversation
Add the mounted shared-dataset path to the Accessing project data sections in the Python notebook, Ray notebook, Python job, notebook job, and Ray job guides. The new docs clarify that when HopsFS is mounted, shared datasets are available under /hopsfs/shared-datasets/<source-project>/<dataset-name>, alongside the existing examples for project-local datasets under /hopsfs.
There was a problem hiding this comment.
Pull request overview
Updates the Hopsworks user guides to document the filesystem path for accessing shared datasets when HopsFS is mounted, complementing existing /hopsfs/<dataset>/... examples for project-local datasets.
Changes:
- Add shared-dataset mount path
/hopsfs/shared-datasets/<source-project>/<dataset-name>to notebook guides (Python, Ray). - Add the same shared-dataset mount path to job guides (Python, Notebook Job, Ray Job).
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/user_guides/projects/jupyter/ray_notebook.md | Documents the shared-dataset access path for Ray notebooks. |
| docs/user_guides/projects/jupyter/python_notebook.md | Documents the shared-dataset access path for Python notebooks. |
| docs/user_guides/projects/jobs/ray_job.md | Documents the shared-dataset access path for Ray jobs. |
| docs/user_guides/projects/jobs/python_job.md | Documents the shared-dataset access path for Python jobs. |
| docs/user_guides/projects/jobs/notebook_job.md | Documents the shared-dataset access path for notebook jobs. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Extend the shared datasets guidance in notebook and job documentation to mention that the mounted shared datasets directory is also exposed through the SHARED_DATASETS_DIR environment variable.
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, project datasets are available under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>`. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
|
|
||
| The project datasets are mounted under `/hopsfs` in the Ray containers, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv`. | ||
| If HopsFS is mounted in the Ray containers, project datasets are available under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv`. | ||
| If HopsFS is mounted in the Ray containers, shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>`. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, project datasets are available under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>`. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, project datasets are available under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>`. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
There was a problem hiding this comment.
@copilot apply changes based on this feedback
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, project datasets are available under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| If HopsFS is mounted, shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>`. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your notebook. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
| ### Absolute paths | ||
|
|
||
| The project datasets are mounted under `/hopsfs`, so you can access `data.csv` from the `Resources` dataset using `/hopsfs/Resources/data.csv` in your script. | ||
| Shared datasets are accessible at `/hopsfs/shared-datasets/<source-project>/<dataset-name>` if HopsFS is mounted. The shared datasets directory is also available through the `SHARED_DATASETS_DIR` environment variable. |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@copilot apply changes based on the comments in this thread. Also commit all the changes. |
|
@copilot apply changes based on the comments in this thread |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Add the mounted shared-dataset path to the Accessing project data sections in the Python notebook, Ray notebook, Python job, notebook job, and Ray job guides.
The new docs clarify that when HopsFS is mounted, shared datasets are available under /hopsfs/shared-datasets//, alongside the existing examples for project-local datasets under /hopsfs.