Skip to content

Conversation

@pnedelko
Copy link

@pnedelko pnedelko commented Jan 7, 2026

Description

Fixes #4252

This PR adds support for storing pre-aggregations in subdirectories within an export bucket for the BigQuery driver, enabling multi-tenant scenarios where each tenant can have isolated folders in a shared bucket.

Changes

  • Parse exportBucket URL to extract bucket name and path using BaseDriver's parseBucketUrl() method
  • Support paths in exportBucket configuration (e.g., gs://bucket/tenant-1)
  • Use extracted path as prefix when creating and listing unload files
  • Maintain full backward compatibility with existing configurations

Implementation Details

Modified Files

  • packages/cubejs-bigquery-driver/src/BigQueryDriver.ts
    • Constructor (lines 146-150): Parse bucket URL to extract bucket name without path component
    • unload() method (lines 328-360): Extract path from bucket URL and use as prefix for file operations

Code Changes

The implementation leverages BaseDriver's existing parseBucketUrl() method (inherited from @cubejs-backend/base-driver) to parse bucket URLs and extract path components. This approach is consistent with how Snowflake and Databricks drivers handle bucket paths.

Constructor change:

// Before
this.bucket = this.storage.bucket(this.options.exportBucket);

// After
const { bucketName } = this.parseBucketUrl(this.options.exportBucket);
this.bucket = this.storage.bucket(bucketName);

unload() method change:

// Extract path and build prefix
const { path } = this.parseBucketUrl(this.options.exportBucket);
const exportPrefix = path ? `${path}/${table}` : table;

// Use exportPrefix instead of table directly
const destination = this.bucket.file(`${exportPrefix}-*.csv.gz`);
const [files] = await this.bucket.getFiles({ prefix: `${exportPrefix}-` });

Configuration Examples

// Backward compatible - files at bucket root
exportBucket: 'my-bucket' or 'gs://my-bucket'
// Result: gs://my-bucket/schema.table-*.csv.gz

// Single-level path - tenant isolation
exportBucket: 'gs://my-bucket/tenant-1'
// Result: gs://my-bucket/tenant-1/schema.table-*.csv.gz

// Multi-level path
exportBucket: 'gs://my-bucket/data/exports/tenant-1'
// Result: gs://my-bucket/data/exports/tenant-1/schema.table-*.csv.gz

Use Case: Multi-Tenant with IAM Isolation

This feature enables a shared bucket architecture where:

  • Each tenant has a dedicated subdirectory (e.g., gs://shared-bucket/tenant-1/)
  • IAM permissions can be scoped to specific paths
  • Reduces infrastructure overhead vs. provisioning separate buckets per tenant

Testing

The existing integration tests (testUnload, testUnloadEscapeSymbol) already cover the unload functionality and will validate this change with real GCS buckets in the CI environment.

Backward Compatibility

Fully backward compatible - existing configurations without paths continue to work exactly as before. The change is purely additive.


🤖 Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Fixes cube-js#4252

This change enables storing pre-aggregations in subdirectories within
an export bucket for the BigQuery driver, enabling multi-tenant
scenarios where each tenant can have isolated folders in a shared bucket.

Changes:
- Parse exportBucket URL to extract bucket name and path using
  BaseDriver's parseBucketUrl() method
- Support paths in exportBucket configuration (e.g., gs://bucket/tenant-1)
- Use extracted path as prefix when creating and listing unload files
- Maintain full backward compatibility with existing configurations

Configuration examples:
- gs://my-bucket -> files at bucket root (backward compatible)
- gs://my-bucket/tenant-1 -> files in tenant-1/ subdirectory
- gs://my-bucket/data/exports/tenant-1 -> multi-level paths supported

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@pnedelko pnedelko requested a review from a team as a code owner January 7, 2026 16:28
@github-actions github-actions bot added driver:bigquery Issues related to the BigQuery driver javascript Pull requests that update Javascript code data source driver pr:community Contribution from Cube.js community members. labels Jan 7, 2026
The parseBucketUrl tests were testing BaseDriver's implementation,
not BigQueryDriver-specific functionality. The existing integration
tests already cover the unload functionality adequately.
Add documentation for the new subdirectory feature in CUBEJS_DB_EXPORT_BUCKET,
including:
- Updated environment variable description in the table
- New section explaining subdirectory configuration
- Examples for single-level and multi-level paths
- Multi-tenant use case with IAM permissions
Copy link
Member

@KSDaemon KSDaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 LGTM!

@pnedelko
Copy link
Author

pnedelko commented Jan 7, 2026

Works on my PC 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data source driver driver:bigquery Issues related to the BigQuery driver javascript Pull requests that update Javascript code pr:community Contribution from Cube.js community members.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store pre-aggregations in a directory within an export bucket with the BigQuery driver

2 participants