[FSTORE-2021] Add support for SAP Hana as a Data Source#571
Open
jimdowling wants to merge 3 commits intologicalclocks:mainfrom
Open
[FSTORE-2021] Add support for SAP Hana as a Data Source#571jimdowling wants to merge 3 commits intologicalclocks:mainfrom
jimdowling wants to merge 3 commits intologicalclocks:mainfrom
Conversation
Adds a SAP HANA creation page modeled on the Snowflake guide, registers it under Cloud Agnostic in the data source index, links it into mkdocs.yml under Configuration and Creation, and extends the dltHub ingestion page's supported-source list to mention SAP HANA in the SQL family. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Match the new Java/Python/UI default. 39015 is the SQL port for the first tenant DB on a default multi-tenant or HANA Express install (instance 90); 30015 is documented as the alternative for a non-tenant single-host install (instance 00). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Type mapping table that lists the source HANA type → Hopsworks offline feature type Hopsworks now produces, so users can predict the shape of their feature group up front (DECIMAL(p,s) preserves precision/scale, SMALLINT/TINYINT distinct from INT, BOOLEAN/REAL mapped, etc.). Add a Known limitations section calling out the two practical traps discovered while bringing up the integration: - Tables under the SYSTEM schema do not reflect cleanly through the sqlalchemy-hana DLT path. Recommend creating a regular user schema (HOPSDEMO etc.) and renaming source tables there. - DLT online ingestion validates non-null primary keys. If the source can hold NULLs in the chosen PK column, filter them out, pick a different PK, or disable online serving for the feature group. Move the existing Authentication admonition from the prerequisites list into Known limitations so all caveats live together. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends the Hopsworks documentation site with a new SAP HANA Data Source guide and wires it into the existing navigation and ingestion-source documentation.
Changes:
- Added a new “SAP HANA” Data Source creation guide.
- Updated Data Source index and mkdocs navigation to include the new SAP HANA page.
- Updated the dltHub ingestion guide to list SAP HANA among supported SQL-family sources.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
mkdocs.yml |
Adds the SAP HANA page to the Data Source creation nav. |
docs/user_guides/fs/feature_group/ingest_with_dlthub.md |
Documents SAP HANA as part of the supported SQL-family ingestion sources. |
docs/user_guides/fs/data_source/index.md |
Adds SAP HANA to the “Cloud Agnostic” Data Source list. |
docs/user_guides/fs/data_source/creation/sap_hana.md |
New how-to page describing SAP HANA Data Source prerequisites, UI setup, and limitations. |
Comment on lines
+1
to
+7
| # How-To set up a SAP HANA Data Source | ||
|
|
||
| ## Introduction | ||
|
|
||
| SAP HANA is an in-memory relational database used by many enterprises as the system of record for ERP, CRM, and analytics workloads. | ||
|
|
||
| A SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database. |
|
|
||
| SAP HANA is an in-memory relational database used by many enterprises as the system of record for ERP, CRM, and analytics workloads. | ||
|
|
||
| A SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database. |
Comment on lines
+23
to
+27
| The default is `39015`, the SQL port for the first tenant database on a default | ||
| multi-tenant or HANA Express (HXE) install (instance number 90). | ||
| For a non-tenant single-host install (instance 00) use `30015`. | ||
| SAP HANA Cloud typically uses `443`. | ||
| Consult your DBA if you are unsure. |
Comment on lines
+34
to
+36
| Use this when your SAP HANA system hosts more than one tenant database and you need to target a specific one. | ||
| - **Schema**: The default schema applied to unqualified queries on the connection. | ||
| If you leave this empty, queries must fully qualify table names with the schema prefix. |
| If you leave this empty, queries must fully qualify table names with the schema prefix. | ||
| - **Table**: The default table the connector points at when no SQL query is provided. | ||
| - **Application**: A short identifier surfaced in HANA's session tracing (`APPLICATION` session variable). | ||
| This makes it easier to attribute load to Hopsworks in HANA monitoring tools. |
| 05. Provide the **Password** for that user. | ||
| 06. Optionally fill in **Database**, **Schema**, **Table**, and **Application**. | ||
| 07. Optionally add additional key/value arguments. | ||
| These are forwarded both to the Python driver used by the on-demand read path and to the Spark JDBC reader used by notebook jobs. |
|
|
||
| ## Use it as an ingestion source | ||
|
|
||
| Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub](../../feature_group/ingest_with_dlthub.md). |
|
|
||
| ## Next Steps | ||
|
|
||
| Move on to the [usage guide for data sources](../usage.md) to see how you can use your newly created SAP HANA connector. |
| 4. [HopsFS](creation/hopsfs.md): Easily connect and read from directories of Hopsworks' internal File System. | ||
| 5. [CRM, Sales & Analytics](creation/crm_sales_analytics.md): Connect to supported CRM, sales, and analytics platforms. | ||
| 6. [REST API](creation/rest_api.md): Connect to external HTTP APIs with configurable headers and authentication. | ||
| 7. [SAP HANA](creation/sap_hana.md): Query SAP HANA tenant databases using SQL. |
Comment on lines
+103
to
+104
| Place tables you intend to ingest or expose as feature groups in a regular user schema (for example a project-specific `MYAPP` or `HOPSDEMO`). | ||
| Tables created under the system-owned `SYSTEM` schema do not reflect cleanly through the SQLAlchemy HANA dialect that powers DLT ingestion. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
🤖 Generated with Claude Code