feat(storage): add datasets resource-type prefix to dataset logical paths#5911
feat(storage): add datasets resource-type prefix to dataset logical paths#5911aicam wants to merge 37 commits into
datasets resource-type prefix to dataset logical paths#5911Conversation
…ent via public cluster services - Add CloudMapperSourceOpDesc, ReferenceGenome, ReferenceGenomeEnum operator classes - Add FileResolver.resolveDirectory for resolving dataset directories by path - Add DatasetFileDocument directory mode: downloads all files as a zip via LakeFS/FileService - Add DocumentFactory.openReadonlyDocument isDirectory parameter - Add ENV_FILE_SERVICE_LIST_DIRECTORY_OBJECTS_ENDPOINT env var - Add Kubernetes Helm chart and PVC for the cloudmapper service Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ntend integration - Add ClusterResource, ClusterCallbackResource, ClusterServiceClient, ClusterUtils backend API for managing EC2 clusters - Add cluster dashboard component with launch/stop/terminate/start actions and management modal - Add ClusterSelectionComponent and ClusterAutoCompleteComponent for operator property panel - Add DirectoryPathInput and DirectorySelection components for dataset directory selection - Add cluster route in app-routing, cluster declarations in app.module - Add cluster_enabled feature flag to gui-config, dashboard sidebar, and admin settings - Add clusterautocomplete and directorypathinput formly field types - Register cluster/directoryName/fastQFiles/fastAFiles/gtfFile fields in operator property editor - Add SQL schema for cluster and cluster_activity tables - Add dknet logo, CloudBioMapper operator icon, and sequence-alignment workflow assets - Add DatasetDirectoryDocument and PathUtils storage utilities Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Feat/cloudbiomapper
Revert "Feat/cloudbiomapper"
|
👋 Thanks for opening this pull request, @aicam! It looks like the pull request description doesn't quite follow our template yet:
Filling out the template helps reviewers understand and triage your contribution faster. Please edit the description to complete it. This message will disappear automatically once the template is followed. You can find the template prompts by editing the description, or see CONTRIBUTING.md for the full contribution flow. |
Automated Reviewer SuggestionsBased on the
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5911 +/- ##
============================================
- Coverage 54.11% 54.06% -0.05%
+ Complexity 2819 2811 -8
============================================
Files 1103 1103
Lines 42650 42654 +4
Branches 4588 4589 +1
============================================
- Hits 23079 23061 -18
- Misses 18226 18245 +19
- Partials 1345 1348 +3
*This pull request uses carry forward flags. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
aglinxinyuan
left a comment
There was a problem hiding this comment.
Should it be datasets or dataset? Should ownerEmail always be the first one?
i.e. /ownerEmail/dataset/name/versionName/fileRelativePath
There was a problem hiding this comment.
Pull request overview
This PR introduces a datasets resource-type prefix segment for dataset logical file paths (e.g., /datasets/<ownerEmail>/<datasetName>/<versionName>/...) to better namespace dataset resources and leave room for future resource types.
Changes:
- Backend: update dataset logical-path parsing/resolution and dataset file-tree generation to include/handle the
datasetsprefix. - Frontend: update dataset path parsing and relative-path extraction to account for the extra leading segment.
- Tests: update
FileResolverSpecfixtures to use the new prefixed path format.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/app/common/type/datasetVersionFileTree.ts | Strips 4 leading segments when computing relative paths (accounts for datasets/owner/dataset/version). |
| frontend/src/app/common/type/dataset-file.ts | Accepts optional datasets prefix when parsing; always emits prefixed paths when serializing. |
| file-service/src/main/scala/org/apache/texera/service/type/dataset/DatasetFileNode.scala | Adds an intermediate datasets directory node as the root child for committed-object file trees. |
| file-service/src/main/scala/org/apache/texera/service/resource/DatasetResource.scala | Adjusts traversal to descend through the new datasets node; updates size-rooting. |
| common/workflow-core/src/test/scala/org/apache/texera/amber/storage/FileResolverSpec.scala | Updates dataset test paths to include /datasets/.... |
| common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/FileResolver.scala | Strips known resource-type prefix (datasets) before parsing dataset logical paths; updates docs/examples. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1197,6 +1197,8 @@ class DatasetResource extends LazyLogging { | |||
| ) | |||
| .head | |||
|
|
|||
| val ownerNode = datasetsNode.children.get.head | |||
|
|
|||
| ) | ||
| .head | ||
|
|
||
| val ownerFileNode = datasetsNode.children.get.head |
| * Parses a dataset file path and extracts its components. | ||
| * Expected format: /ownerEmail/datasetName/versionName/fileRelativePath | ||
| * Expected format: /datasets/ownerEmail/datasetName/versionName/fileRelativePath | ||
| * | ||
| * The first segment is a resource type prefix (e.g. "datasets") and is stripped before parsing. | ||
| * |
We checked similar systems and they follow the same naming convention: "datasets/" |
aglinxinyuan
left a comment
There was a problem hiding this comment.
LGTM! Can we add test case coverage for types as well?
This is a draft PR for @tanishqgandhi1908 so I expect him to take over and finally I review it, I just convert it to draft for now |
Summary
Introduces a
datasetsresource-type prefix to dataset logical file paths, changing the format from:to:
This namespaces dataset paths under an explicit resource-type segment, leaving room for other resource types in the future.
Changes
Backend (Scala)
FileResolver: strips a knownRESOURCE_TYPE_PREFIXES = Set("datasets")leading segment before parsing; docs/examples updated.DatasetFileNode.fromLakeFSRepositoryCommittedObjects: builds an intermediatedatasetsdirectory node as the parent of owner nodes.DatasetResource: descends through the newdatasetsnode to reach the owner node; total-size calculation rooted at thedatasetsnode.FileResolverSpec: test fixtures updated to the prefixed path format.Frontend (TypeScript)
dataset-file.ts:parseFilePathToDatasetFilestrips thedatasetsprefix if present;parseDatasetFileToFilePathemits it.datasetVersionFileTree.ts: relative-path extraction now strips four leading segments (datasets/owner/dataset/version) instead of three.Notes
apache/main(via the aicam fork'smain) before opening this PR.🤖 Generated with Claude Code