Skip to content

Reusable React components on JSF: file uploader (DVWebloader v2) and lazy file tree view (#6691, #12179)#12382

Open
ErykKul wants to merge 61 commits intodevelopfrom
6691_reusable_components
Open

Reusable React components on JSF: file uploader (DVWebloader v2) and lazy file tree view (#6691, #12179)#12382
ErykKul wants to merge 61 commits intodevelopfrom
6691_reusable_components

Conversation

@ErykKul
Copy link
Copy Markdown
Collaborator

@ErykKul ErykKul commented May 5, 2026

TL;DR

A reusable React file tree + uploader that mounts inside the modern SPA and inside the legacy JSF dataset page — same component, two contexts. Off by default; flip a feature flag to enable.

image image

For users:

  • Browse & select with a folder tree. Lazy folder loading — opens 100k-file datasets without choking. Tri-state checkboxes (partial folder selection), "select all" at any level. Full keyboard navigation (WAI-ARIA tree). The folder you're viewing is bookmarkable in the URL.
  • Streaming-zip download of the selection. Per-file bytes pipe into a single .zip written progressively to disk by the browser. No :ZipDownloadLimit cap, no Payara timeout, no server-side zip build.
  • Per-file resume on connection drops. With S3 download redirect on, the browser talks to S3 directly; flaky links reconnect and continue rather than restart from byte 0.
  • Failure tray, not a lost zip. A file that errors mid-stream surfaces as a recoverable item; retry, skip, or run a "second pass" that walks only the failures. Successful files in the same selection stay on disk.
  • Multipart upload for large files. Server splits into S3 multipart parts; the SDK uploads parts in parallel and stitches via ETags. Browser-computed checksum (MD5 default; :FileFixityChecksumAlgorithm honored) — server verifies on receipt. Cancel and retry per file.
  • i18n out of the box (English + Spanish), bundle pulls translations from the same path operators serve.

For operators:

  • Three feature flags (off by default): dataverse.feature.react-uploader, dataverse.feature.react-tree-view, plus the existing dataverse.feature.api-session-auth.
  • Bundle ships baked into the WAR by default (webapp/dvwebloader/reusable-components/); the dataverse.reusable-components.base-url setting lets operators redirect to a sidecar container (gdcc/dataverse-reusable-components) or CDN if they prefer that deployment shape. Out of the box: nothing extra to host.

For installations on JSF that haven't migrated to the SPA: you can adopt the new bulk-download UX today — the React bundles work via Shadow-DOM mount points inside the existing JSF page, no SPA rollout required.

Presented at Dataverse Community Meeting 2026 — Barcelona, 2026-05-14 ("Creating New UI components that can be used in Dataverse JSF & SPA: A DualMode Architecture for Dataverse UI Extensions").


What this PR does / why we need it:

Adds the server-side surface for the reusable React frontend components (the dual-mode pattern that lets one React tree run in both the SPA and the legacy JSF UI), plus a paginated tree-listing endpoint to back the new React file tree.

Concretely:

  • New tree-listing API endpoint GET /api/datasets/{id}/versions/{vid}/tree with opaque-cursor keyset pagination, ETag/304 caching for released versions, and a covering index. Backs the React tree on both the SPA and JSF.
  • Two new feature flags (off by default): dataverse.feature.react-uploader and dataverse.feature.react-tree-view. When on, the matching JSF page mounts the React bundle in place of the PrimeFaces widget.
  • One new JVM setting dataverse.reusable-components.base-url (default /dvwebloader). Tells JSF where to load the bundle from — same-origin path, sidecar container, or CDN URL.
  • tagging field on the S3 upload destination response (additive, non-breaking). Lets the SDK decide whether to send x-amz-tagging based on the storage's dataverse.files.<id>.disable-tagging setting. Fixes the compatibility blocker for KU Leuven (storage that doesn't support S3 tagging).
  • Reusable-components bundle baked into the WAR at webapp/dvwebloader/reusable-components/ (A1-build distribution variant — see "Open ends" below).

Which issue(s) this PR closes:

Special notes for your reviewer:

Cross-repo coupling — read together

This PR is one of three that ship together:

Repo PR
IQSS/dataverse this PR (IQSS/dataverse#12382)
IQSS/dataverse-frontend IQSS/dataverse-frontend#898
IQSS/dataverse-client-javascript IQSS/dataverse-client-javascript#403

Recommended ordering: SDK (IQSS/dataverse-client-javascript#403) → frontend (IQSS/dataverse-frontend#898) → server (#12382).

This PR has a co-landing dependency on IQSS/dataverse#12188 (session-cookie API hardening). The standalone bundles authenticate via dataverse.feature.api-session-auth; #12188 adds the matching dataverse.feature.api-session-auth-hardening (Origin/Referer + X-Dataverse-CSRF-Token) for CSRF protection, and a small follow-up PR on @iqss/dataverse-client-javascript adds the matching CSRF-token wiring on the SDK side. All four PRs target the same Dataverse release. Reviewers can read this PR independently of #12188 — the hardening flag has no effect when off.

Reviewer's guide

The diff is ~4k LOC across 37 files. Roughly half of that is binary bundle artifacts and tests; the actual Java surface is 7 source files. Walk in this order:

Java code (~30 min):

  1. src/main/java/edu/harvard/iq/dataverse/api/Datasets.java — the new endpoint method (search for versions/{versionId}/tree). ETag handling, cursor decode, permission gate, response envelope. ~140 lines added.
  2. src/main/java/edu/harvard/iq/dataverse/datasetversiontree/DatasetVersionTreeService.java — the SQL keyset paginator. Two native queries (folders + files), positional ?N binding only, no string-concatenated user input. The cursor is opaque base64 JSON, server-issued. Read this carefully — the keyset stability is the load-bearing piece.
  3. src/main/java/edu/harvard/iq/dataverse/util/SystemConfig.java — three small additions: getReusableComponentsBaseUrl, getReusableComponentsVersion (cache-busting), and the two is*Enabled flag helpers.
  4. src/main/java/edu/harvard/iq/dataverse/datasetversiontree/{DatasetVersionTreeMapper,FileTreePage,...}.java — DTOs and JSON mapping; mostly mechanical.

JSF (~10 min):

  1. src/main/webapp/filesFragment.xhtml — adds the useReactTreeView ui:param and the conditional mount-point <div id="dv-tree-view"> + script tag.
  2. src/main/webapp/editFilesFragment.xhtml — same shape for useReactUploader.
  3. src/main/webapp/editdatafiles.xhtml — hides the legacy "Done" button when the React uploader is active, so users don't see two finish actions.

Skim or skip:

  • src/main/webapp/dvwebloader/reusable-components/*.js — pre-built minified JS, ~1.4 MB total. Unreviewable as text; see "Bundle artifacts" below.
  • src/main/webapp/dvwebloader/locales/{en,es}/*.json — i18n strings.
  • doc/release-notes/6691-reusable-frontend-components.md — the operator-facing summary.
  • doc/sphinx-guides/source/container/running/reusable-components.rst — the new operator guide.

Bundle artifacts in the diff — explained

src/main/webapp/dvwebloader/reusable-components/ contains 7 pre-built minified JS files (~1.4 MB): two entry bundles (dv-tree-view.js, dv-uploader.js) and five content-hashed chunks (chunks/dataverse-shared-*.js, chunks/i18n-*.js, chunks/react-*.js, chunks/shadow-mount-*.js, chunks/vendor-*.js). This is the A1-build distribution variant: the bundle ships inside the WAR, JSF references it via a same-origin path. Reviewers will reasonably ask "why is minified JS in git?" — short version: the alternative (build the React frontend at WAR-package time) requires Node-on-CI for every Dataverse build, which we wanted to avoid.

The longer-term variant (A1-extract: download the bundle from a published dataverse-frontend artifact at WAR-build time, no Node on the build host) is deliberately not implemented in this PR — open for the team to decide. Either choice is consistent with the JVM setting (dataverse.reusable-components.base-url); operators can already point this at a CDN or a sidecar container today, regardless of what the WAR ships with.

Suggestions on how to test this:

Required developer setup: /etc/hosts on the developer machine: 127.0.0.1 localstack, 127.0.0.1 minio, 127.0.0.1 keycloak.mydomain.com. Documented in the release notes.

Test paths:

  • API smoke: curl -b cookies.txt "http://localhost:8080/api/datasets/{id}/versions/:draft/tree?limit=10" should return paginated tree items with nextCursor.
  • Direct upload via React uploader (LocalStack): enable dataverse.feature.react-uploader=true, navigate to dataset Edit Files, upload a file. Should PUT directly to LocalStack S3 (visible in DevTools Network).
  • Same against MinIO: switch the dataset's storage driver to minio1, repeat. The post-fix compose has the same behaviour as LocalStack.
  • Tree-view toggle stability: enable dataverse.feature.react-tree-view=true, on the Files tab toggle Table↔Tree several times rapidly. The tree should load every time. (Pre-fix it would orphan after the first toggle due to the let cite redeclaration aborting the PF partial-response pipeline.)
  • Streaming-zip download: select multiple files in the tree, click Download. The browser should produce a single zip (no server-side ZIP endpoint involved).

Backend tests

mvn test -Dtest=DatasetVersionTreeKeysetIT
mvn test -Dtest=DatasetsTreeIT

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Yes — for users on instances with the new feature flags enabled:

  • React tree replaces the PrimeFaces tree on the dataset Files tab when in Tree view.
  • React uploader replaces the PrimeFaces upload widget on the dataset Edit Files page.
  • Legacy "Done" button is hidden when the React uploader is active.

The visual/UX is documented in IQSS/dataverse-frontend#898.

Is there a release notes update needed for this change?:

Yes — doc/release-notes/6691-reusable-frontend-components.md is included in this PR.

Additional documentation:

  • New sphinx guide: doc/sphinx-guides/source/container/running/reusable-components.rst — operator-facing setup.
  • Architecture doc: doc/Architecture/reusable_frontend_components.md — the backend half of the dual-mode contract; the frontend half lives in IQSS/dataverse-frontend's docs/reusable-components.md.
  • API doc additions in doc/sphinx-guides/source/api/native-api.rst for the new tree endpoint, including the cursor opacity contract and ETag behaviour.

AI-assistance disclosure

Some parts of this work — including the SQL keyset paginator design (after the EclipseLink positional-parameter discovery), and most of the operator-facing documentation in doc/release-notes/6691-reusable-frontend-components.md — were developed with the help of Claude (Anthropic) via Claude Code.

Reviewer attention is still required: AI-assisted code is still author-owned, and we've reviewed every diff that landed. Flagging this so reviewers can apply whatever scrutiny they reserve for AI-touched changes — particularly the SQL paginator and the Flyway migration.

ErykKul added 14 commits May 5, 2026 11:08
When S3 tagging is enabled (DISABLE_S3_TAGGING is false or unset),
generateTemporaryS3UploadUrls now includes "tagging": "dv-state=temp" in
the JSON response. The client reads this field and sets x-amz-tagging
accordingly — making the server authoritative instead of duplicating the
JVM setting on the client.

Also adds doc/Architecture/reusable_frontend_components.md covering
the cross-repo uploader and tree view design decisions.
- FeatureFlags.REACT_UPLOADER: replace @todo with @SInCE 6.11; document
  the runtime requirement (api-session-auth) and the expected bundle URL.
- editFilesFragment.xhtml: short comment explaining why
  dropBoxUploadFinished is now hoisted out of the legacy upload block
  (the Dropbox panel renders independently of the React/JSF upload
  switch and still needs the callback).
- reusable_frontend_components.md: document the CSS isolation strategy
  and the remaining Bootstrap-globals limitation, with PostCSS scoping
  / Shadow DOM as the planned follow-ups.
The JSF page that mounts the React uploader currently hardcodes the
bundle path as `/dvwebloader/...` (legacy from DVWebloader v1). This
worked only when the dataverse-frontend dev environment served the
build output at that same-origin path.

To support institutions that don't run the SPA — and that may host
the bundle from a sidecar container, an existing nginx alias, or a
CDN — make the base URL configurable.

- JvmSettings: new entry REUSABLE_COMPONENTS_BASE_URL bound to
  `dataverse.reusable-components.base-url`.
- SystemConfig.getReusableComponentsBaseUrl(): returns the configured
  URL with any trailing slash trimmed, defaulting to `/dvwebloader`
  to preserve backward compatibility with the existing dev nginx
  alias and any same-origin operator setup.
- editFilesFragment.xhtml: the React-uploader script tag now reads
  `#{systemConfig.reusableComponentsBaseUrl}/reusable-components/
  dv-uploader.js` instead of the literal `/dvwebloader/...`. JSF
  fallback path is unchanged.

Non-breaking: default behaviour matches the previous hardcoded path.
Operator-facing documentation for the reusable React components track:
how to host the bundle, how to point the JSF page at it, and how
versioning flows through npm → Docker image → JVM setting.

- doc/sphinx-guides/source/container/running/reusable-components.rst
  is a new guide page modelled on previewers-provider in the demo
  guide. It explains the npm + sidecar-image distribution model,
  walks through three valid hosting choices (gdcc/dataverse-reusable-
  components container, operator-managed nginx, CDN), gives a sample
  Docker Compose service block, and cross-references the relevant
  feature flags + the frontend-side contract document.

- frontend-dev.rst now links to the new page so readers landing on
  the SPA-frontend guide find the JSF integration story.

- container/running/index.rst toctree includes the new page between
  frontend-dev and backend-dev.

- installation/config.rst adds:
  - dataverse.feature.react-uploader (the existing flag, finally
    documented) with prerequisite notes.
  - dataverse.reusable-components.base-url next to dataverse.siteUrl,
    with examples for sidecar / nginx / CDN setups.

- doc/release-notes/6691-reusable-frontend-components.md describes
  the React uploader feature flag, the new JVM setting, the S3
  tagging server-authoritative change, the prerequisites for
  enabling the feature, and the cross-repo coordination.
The original document mixed cross-repo decision-log content with
backend-side integration mechanics. Split that responsibility:

- This document (in dataverse) is now strictly the BACKEND HALF of
  the dual-mode contract: how JSF pages mount React components built
  in dataverse-frontend, how feature flags gate the swap, how nginx
  hosts the bundle, and how to add a new JSF page that mounts an
  SPA component.

- The matching FRONTEND HALF — config interfaces, build pipeline,
  CSS isolation, how to make a component reusable — lives in
  dataverse-frontend/docs/reusable-components.md (added in that repo).

- Cross-repo decisions, branch tracking, and active-track notes move
  out of this file entirely; they belong in the working plan rather
  than in committed Dataverse documentation.

The new content covers:
- Why dual-mode + the integration pattern diagram.
- Feature flag conventions and naming.
- Authentication prerequisites (session-cookie + hardening).
- Hosting options for the bundle (image / nginx / CDN).
- A worked example of replacing a JSF widget with an SPA component
  (the uploader).
- Adding a brand-new reusable component to a JSF page (the upcoming
  tree-view case).
- Currently shipped components (uploader, tree-view planned).
- Risks and trade-offs (Bootstrap collision, session-cookie, etc.).
New API endpoint that lazy-lists the immediate children (folders +
files) inside a folder of a dataset version, enabling tree-view UIs
to fetch on demand and paginate stably across very large datasets:

  GET /api/datasets/{id}/versions/{versionId}/tree

Query parameters: path, limit (default 100, clamped 1-1000), cursor
(opaque keyset token), include (all|folders|files), order
(NameAZ|NameZA), includeDeaccessioned, originals.

Response: {path, items[], nextCursor, limit, order, include,
approximateCount}. Folders come first, then files; both name-sorted
case-insensitively, files break ties on data file id for stability.
Folder items carry counts of distinct subfolders + descendant files.
File items carry id, size, contentType, access (public/restricted/
embargoed), optional checksum, and downloadUrl. Permissions and
embargoes are honoured exactly as on .../files.

Implementation:
- DatasetVersionTreeService (new package edu.harvard.iq.dataverse.
  datasetversiontree): walks DatasetVersion.fileMetadatas once,
  groups files by their first segment relative to the requested
  path, applies include/order, paginates in memory with an opaque
  Base64 "offset=N" cursor. Wire format and cursor behaviour are
  stable; promotion to native keyset SQL is tracked as a follow-up
  and won't change the contract.
- Datasets.getVersionTree handler + jsonTreePage serialiser.
- Bundle.properties keys for invalid-query / not-found errors.

Tests:
- DatasetVersionTreeServiceTest covers root grouping, folder-only
  immediate-children listing, path normalisation
  (/data//sub/// → data/sub), include filter, cursor-paginated
  retrieval, invalid-cursor / invalid-order rejection, originals
  toggle on the downloadUrl, descending order, restricted /
  embargoed access strings, and folder-counts semantics.

Sphinx native-api.rst gains a "List a Folder of a Dataset Version
(Tree View)" section. Release-notes snippet at
doc/release-notes/6691-dataset-version-tree-listing-api.md.
End-to-end coverage of the new dataset-version tree endpoint, run
against a live container in CI. Complements the unit-level
DatasetVersionTreeServiceTest which only exercises the service bean.

Tests:
- root listing returns immediate children, folders first, with the
  expected counts {files, folders} on each folder item.
- folder listing returns only immediate children.
- path normalisation (/data//sub///) → "data/sub".
- cursor pagination is stable and exhausts cleanly.
- invalid cursor → 400.
- invalid order → 400.
- include filter restricts items to folders or files.
- descending order keeps folders-first but reverses the within-type
  sort.
- originals=true switches the file downloadUrl to ?format=original.
- unauthenticated access to a draft → 401/403.
- another authenticated user without permission → 404 (Dataverse's
  standard "draft not visible" behaviour, not 403).
- empty dataset → empty items list with approximateCount=0.
- a published dataset is readable via :latest.

UtilIT gains a getVersionTree helper that mirrors the existing
getVersionFiles helper.
For published, non-deaccessioned versions, the response now carries:

  ETag:          "<sha256-prefix>"
  Cache-Control: public, immutable

The ETag is derived from a stable hash of (version id, version
state, path, limit, cursor, include, order, originals,
includeDeaccessioned). Subsequent requests including a matching
If-None-Match header receive 304 Not Modified with no body.

Drafts and deaccessioned versions do not emit an ETag because their
content can change in place. The published-version assumption holds
because Dataverse versions are immutable once released; deaccession
is the only state change, and we exclude it explicitly.

Doc + release-notes updates describe the caching contract.
DatasetsTreeIT gains two tests:
- draft response must NOT carry an ETag
- published response carries ETag + Cache-Control, honours
  If-None-Match (returning 304), and changes the ETag on
  different query params.
Sphinx guide and the per-issue release-notes snippet now mention the
ETag / Cache-Control / If-None-Match contract added in the previous
commit. The behaviour itself is unchanged.
…#12179)

Mirrors the existing react-uploader pattern: a JVM feature flag
controls whether the JSF page renders the React reusable component
or the classic PrimeFaces widget.

- New feature flag dataverse.feature.react-tree-view in
  FeatureFlags.java + SystemConfig.isReactTreeViewEnabled().
- filesFragment.xhtml: when the flag is on AND the user selects the
  Tree mode of the existing Table/Tree toggle, the page renders
  <div id="dv-tree-view"> + a window.dvTreeViewConfig snippet + a
  module script tag pointing at #{systemConfig.reusableComponentsBaseUrl}
  /reusable-components/dv-tree-view.js. Otherwise the existing
  p:tree continues to render unchanged.
- Sphinx config.rst documents the new flag next to react-uploader
  and links to the operator guide.
- container/running/reusable-components.rst notes both shipped
  components share the same build/distribution.
- 6691-reusable-frontend-components.md release-notes file gains a
  bullet for the tree-view flag.

The React bundle is built by the dataverse-frontend
build-uploader script (vite.config.uploader.ts) and ships
alongside dv-uploader.js with shared chunks.

This satisfies #12179 (direct JS mount in JSF for tree view).
Replaces the 'in development' tree-view note with the shipped surface
(JSF mount path, config interface, backend endpoint, ETag, streaming
zip) and updates the greenfield-pattern paragraph to reflect that the
tree view has landed.
Drops the last two FQN references in the new tree handler's ETag
helper. Cosmetic; matches prevailing style in the file.
@github-actions github-actions Bot added Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: File Upload & Handling Type: Feature a feature request Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc. User Role: Guest Anyone using the system, even without an account labels May 5, 2026
@ErykKul ErykKul marked this pull request as draft May 5, 2026 16:33
Title 'List a Folder of a Dataset Version (Tree View)' is 46
characters; underline was 45. Sphinx 7.x treats this as a build error
('Warning, treated as error: Title underline too short.') under the
docs / readthedocs CI. One extra tilde fixes it.
@coveralls
Copy link
Copy Markdown

coveralls commented May 5, 2026

Coverage Status

Coverage is 24.881%6691_reusable_components into develop. No base build found for develop.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

This lands the manual one-shot variant of the §6 distribution
mechanism: the prebuilt `dist-uploader/` from `dataverse-frontend`
is committed into `src/main/webapp/dvwebloader/` so the WAR carries
it and Payara serves it at `/dvwebloader/...` out of the box. Same
end result as the planned A1-extract / A1-build Maven steps in §11.5
of the cross-repo plan, just done by hand for the local end-to-end
test pass. Replacing this with a `docker-maven-plugin` execution
that pulls (A1-extract) or builds (A1-build) the bundle stays a
pre-merge follow-up before the PR opens for review.

What's in `dvwebloader/`:
* `reusable-components/{dv-uploader.js, dv-tree-view.js}` — entry
  bundles, plus shared chunks (`react`, `i18n`, `vendor`,
  `dataverse-shared`).
* `locales/{en,es}/{files.json, shared.json}` — only the
  namespaces the standalone wrappers actually load
  (`ns: ['files', 'shared']` for tree-view, `ns: ['shared']` for
  uploader). The 22 other namespaces under `public/locales/` were
  copied during build but are dead weight at runtime, so they're
  trimmed here.

Compose dev-env feature flags:
* `DATAVERSE_FEATURE_REACT_TREE_VIEW=1` — turns on the JSF tree
  mount.
* `DATAVERSE_FEATURE_REACT_UPLOADER=1` — turns on the JSF uploader
  mount.
* `DATAVERSE_FEATURE_API_SESSION_AUTH=1` and
  `DATAVERSE_FEATURE_API_SESSION_AUTH_HARDENING=1` — the React
  bundles authenticate via JSESSIONID; without these flags the API
  rejects cookie-bearing requests as `:guest`. The frontend
  `dev-env/docker-compose-dev.yml` already sets both; mirroring
  here so the dataverse-side dev compose is independently usable
  for the JSF mount tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

…eaks not related to reusable components
Comment thread modules/container-configbaker/scripts/apply-db-settings.sh
Comment thread modules/container-configbaker/scripts/apply-db-settings.sh
@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@ErykKul ErykKul marked this pull request as ready for review May 9, 2026 22:26
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:6691-reusable-components
ghcr.io/gdcc/configbaker:6691-reusable-components

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: JSF Involves modifying JSF (Jakarta Server Faces) code, which is being replaced with React. Feature: File Upload & Handling Type: Feature a feature request Type: Suggestion an idea User Role: Depositor Creates datasets, uploads data, etc. User Role: Guest Anyone using the system, even without an account

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Feature-Flagged Direct JSF Mount for SPA Tree View (with iframe Fallback) Allow selecting of files in Tree View to Edit or Download

2 participants