From 7b3b94c4e8755b2d50d513a693dbf14f86caa4f9 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Mon, 4 May 2026 11:04:56 +0200 Subject: [PATCH 1/5] [FSTORE-2050] Document SAP HANA data source Adds a SAP HANA creation page modeled on the Snowflake guide, registers it under Cloud Agnostic in the data source index, links it into mkdocs.yml under Configuration and Creation, and extends the dltHub ingestion page's supported-source list to mention SAP HANA in the SQL family. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../fs/data_source/creation/sap_hana.md | 82 +++++++++++++++++++ docs/user_guides/fs/data_source/index.md | 1 + .../fs/feature_group/ingest_with_dlthub.md | 1 + mkdocs.yml | 1 + 4 files changed, 85 insertions(+) create mode 100644 docs/user_guides/fs/data_source/creation/sap_hana.md diff --git a/docs/user_guides/fs/data_source/creation/sap_hana.md b/docs/user_guides/fs/data_source/creation/sap_hana.md new file mode 100644 index 0000000000..7072988740 --- /dev/null +++ b/docs/user_guides/fs/data_source/creation/sap_hana.md @@ -0,0 +1,82 @@ +# How-To set up a SAP HANA Data Source + +## Introduction + +SAP HANA is an in-memory relational database used by many enterprises as the system of record for ERP, CRM, and analytics workloads. + +A SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database. +Once configured, you can use the same data source as the basis for an external (on-demand) Feature Group, or as the source for a dltHub-driven ingestion job that materialises HANA data into a managed Feature Group. + +In this guide, you will configure a Data Source in Hopsworks that holds the authentication information needed to connect to your SAP HANA database. + +!!! note + Currently, it is only possible to create data sources in the Hopsworks UI. + You cannot create a data source programmatically. + +## Prerequisites + +Before you begin this guide you'll need to retrieve the following information from your SAP HANA tenant. +The following options are **mandatory**: + +- **Host**: The hostname of the SAP HANA endpoint, for example `hxehost.example.com` for an on-premise instance or the endpoint shown in SAP BTP for SAP HANA Cloud. +- **Port**: The SQL port of the tenant database. +The default is `30015` for the first tenant on a single-host system. +SAP HANA Cloud typically uses `443`. +Consult your DBA if you are unsure. +- **User**: The HANA database user that the connector authenticates as. +- **Password**: The password for that user. + +These are a few additional **optional** arguments: + +- **Database**: The tenant database name. +Use this when your SAP HANA system hosts more than one tenant database and you need to target a specific one. +- **Schema**: The default schema applied to unqualified queries on the connection. +If you leave this empty, queries must fully qualify table names with the schema prefix. +- **Table**: The default table the connector points at when no SQL query is provided. +- **Application**: A short identifier surfaced in HANA's session tracing (`APPLICATION` session variable). +This makes it easier to attribute load to Hopsworks in HANA monitoring tools. +- **Additional arguments**: Free-form key/value options forwarded to the underlying SAP HANA Python driver (`hdbcli`) and the Spark JDBC reader. + +!!! note "Authentication" + The SAP HANA data source currently supports username and password authentication. + Certificate-based and JWT authentication are tracked as follow-up work. + +!!! info "Drivers" + Hopsworks ships the SAP HANA drivers needed to read from HANA out of the box. + The Hopsworks Spark image bundles the SAP `ngdbc` JDBC driver for Spark JDBC reads, and the dlt ingestion image and Arrow Flight server bundle SAP's `hdbcli` Python DBAPI driver. + You do not need to install or upload the drivers yourself. + +## Creation in the UI + +### Step 1: Set up new Data Source + +Head to the Data Source View on Hopsworks and start the creation flow for a new data source. + +
+ ![Data Source Creation](../../../../assets/images/guides/fs/data_source/data_source_overview.png) +
The Data Source View in the User Interface
+
+ +### Step 2: Enter SAP HANA Settings + +Enter the details for your SAP HANA connector. +Start by giving it a **name** and an optional **description**. + +01. Select "SAP HANA" as storage. +02. Specify the **Host** of your SAP HANA endpoint. +03. Specify the **Port** the tenant SQL service listens on (default `30015`). +04. Provide the **User** name of the HANA database user. +05. Provide the **Password** for that user. +06. Optionally fill in **Database**, **Schema**, **Table**, and **Application**. +07. Optionally add additional key/value arguments. +These are forwarded both to the Python driver used by the on-demand read path and to the Spark JDBC reader used by notebook jobs. +08. Click on "Save Credentials". + +## Use it as an ingestion source + +Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub](../../feature_group/ingest_with_dlthub.md). +SAP HANA is treated as a SQL-like source, so the ingestion job supports both full and incremental loading. + +## Next Steps + +Move on to the [usage guide for data sources](../usage.md) to see how you can use your newly created SAP HANA connector. diff --git a/docs/user_guides/fs/data_source/index.md b/docs/user_guides/fs/data_source/index.md index dda3a29711..061a78b008 100644 --- a/docs/user_guides/fs/data_source/index.md +++ b/docs/user_guides/fs/data_source/index.md @@ -38,6 +38,7 @@ Cloud agnostic storage systems: 4. [HopsFS](creation/hopsfs.md): Easily connect and read from directories of Hopsworks' internal File System. 5. [CRM, Sales & Analytics](creation/crm_sales_analytics.md): Connect to supported CRM, sales, and analytics platforms. 6. [REST API](creation/rest_api.md): Connect to external HTTP APIs with configurable headers and authentication. +7. [SAP HANA](creation/sap_hana.md): Query SAP HANA tenant databases using SQL. ## AWS diff --git a/docs/user_guides/fs/feature_group/ingest_with_dlthub.md b/docs/user_guides/fs/feature_group/ingest_with_dlthub.md index 24bff8edd6..dc9f29fd59 100644 --- a/docs/user_guides/fs/feature_group/ingest_with_dlthub.md +++ b/docs/user_guides/fs/feature_group/ingest_with_dlthub.md @@ -31,6 +31,7 @@ Use `Ingest Data to New Feature Group` when you want to: This ingestion flow supports multiple data sources: - SQL-like sources can either create an external feature group or ingest data into a new feature group. +- The SQL family currently includes Snowflake, BigQuery, Redshift, generic JDBC (MySQL, PostgreSQL, Oracle), and SAP HANA. - CRM and REST API sources use the ingestion path only. - Incremental loading is available for SQL and REST API sources. - CRM sources currently use full-load ingestion. diff --git a/mkdocs.yml b/mkdocs.yml index 652e0f461e..6d90744fbe 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -74,6 +74,7 @@ nav: - CRM, Sales & Analytics: user_guides/fs/data_source/creation/crm_sales_analytics.md - REST API: user_guides/fs/data_source/creation/rest_api.md - Unity Catalog: user_guides/fs/data_source/creation/unity_catalog.md + - SAP HANA: user_guides/fs/data_source/creation/sap_hana.md - Usage: user_guides/fs/data_source/usage.md - Feature Group: - user_guides/fs/feature_group/index.md From 5203af8e40b48c3e2d023c491b0bdbffc27df39a Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Tue, 5 May 2026 08:43:49 +0200 Subject: [PATCH 2/5] [FSTORE-2050] Document default SAP HANA port as 39015 Match the new Java/Python/UI default. 39015 is the SQL port for the first tenant DB on a default multi-tenant or HANA Express install (instance 90); 30015 is documented as the alternative for a non-tenant single-host install (instance 00). Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user_guides/fs/data_source/creation/sap_hana.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/user_guides/fs/data_source/creation/sap_hana.md b/docs/user_guides/fs/data_source/creation/sap_hana.md index 7072988740..fe6cecfd80 100644 --- a/docs/user_guides/fs/data_source/creation/sap_hana.md +++ b/docs/user_guides/fs/data_source/creation/sap_hana.md @@ -20,7 +20,9 @@ The following options are **mandatory**: - **Host**: The hostname of the SAP HANA endpoint, for example `hxehost.example.com` for an on-premise instance or the endpoint shown in SAP BTP for SAP HANA Cloud. - **Port**: The SQL port of the tenant database. -The default is `30015` for the first tenant on a single-host system. +The default is `39015`, the SQL port for the first tenant database on a default +multi-tenant or HANA Express (HXE) install (instance number 90). +For a non-tenant single-host install (instance 00) use `30015`. SAP HANA Cloud typically uses `443`. Consult your DBA if you are unsure. - **User**: The HANA database user that the connector authenticates as. @@ -64,7 +66,7 @@ Start by giving it a **name** and an optional **description**. 01. Select "SAP HANA" as storage. 02. Specify the **Host** of your SAP HANA endpoint. -03. Specify the **Port** the tenant SQL service listens on (default `30015`). +03. Specify the **Port** the tenant SQL service listens on (default `39015`). 04. Provide the **User** name of the HANA database user. 05. Provide the **Password** for that user. 06. Optionally fill in **Database**, **Schema**, **Table**, and **Application**. From a4db5428a5f86d96e7b541f588fd3ea34c32e8c1 Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Wed, 6 May 2026 11:29:33 +0200 Subject: [PATCH 3/5] [FSTORE-2050] Document SAP HANA type mapping + known limitations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a Type mapping table that lists the source HANA type → Hopsworks offline feature type Hopsworks now produces, so users can predict the shape of their feature group up front (DECIMAL(p,s) preserves precision/scale, SMALLINT/TINYINT distinct from INT, BOOLEAN/REAL mapped, etc.). Add a Known limitations section calling out the two practical traps discovered while bringing up the integration: - Tables under the SYSTEM schema do not reflect cleanly through the sqlalchemy-hana DLT path. Recommend creating a regular user schema (HOPSDEMO etc.) and renaming source tables there. - DLT online ingestion validates non-null primary keys. If the source can hold NULLs in the chosen PK column, filter them out, pick a different PK, or disable online serving for the feature group. Move the existing Authentication admonition from the prerequisites list into Known limitations so all caveats live together. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../fs/data_source/creation/sap_hana.md | 50 +++++++++++++++++-- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/docs/user_guides/fs/data_source/creation/sap_hana.md b/docs/user_guides/fs/data_source/creation/sap_hana.md index fe6cecfd80..bf2356214b 100644 --- a/docs/user_guides/fs/data_source/creation/sap_hana.md +++ b/docs/user_guides/fs/data_source/creation/sap_hana.md @@ -39,10 +39,6 @@ If you leave this empty, queries must fully qualify table names with the schema This makes it easier to attribute load to Hopsworks in HANA monitoring tools. - **Additional arguments**: Free-form key/value options forwarded to the underlying SAP HANA Python driver (`hdbcli`) and the Spark JDBC reader. -!!! note "Authentication" - The SAP HANA data source currently supports username and password authentication. - Certificate-based and JWT authentication are tracked as follow-up work. - !!! info "Drivers" Hopsworks ships the SAP HANA drivers needed to read from HANA out of the box. The Hopsworks Spark image bundles the SAP `ngdbc` JDBC driver for Spark JDBC reads, and the dlt ingestion image and Arrow Flight server bundle SAP's `hdbcli` Python DBAPI driver. @@ -79,6 +75,52 @@ These are forwarded both to the Python driver used by the on-demand read path an Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub](../../feature_group/ingest_with_dlthub.md). SAP HANA is treated as a SQL-like source, so the ingestion job supports both full and incremental loading. +## Type mapping + +Hopsworks reads each source column's HANA type from the cursor description and maps it to a Hopsworks offline feature type. +The mapping preserves precision and scale where possible, so a source `DECIMAL(12, 2)` becomes a Hopsworks `decimal(12,2)` feature rather than collapsing to `bigint`. + +| SAP HANA type | Hopsworks offline feature type | +|---------------|-------------------------------| +| `TINYINT` | `tinyint` | +| `SMALLINT` | `smallint` | +| `INTEGER` | `int` | +| `BIGINT` | `bigint` | +| `DECIMAL(p, s)` | `decimal(p,s)` | +| `REAL` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `DATE` | `date` | +| `TIME` | `timestamp` | +| `TIMESTAMP` / `SECONDDATE` / `LONGDATE` | `timestamp` | +| `CHAR` / `VARCHAR` / `NCHAR` / `NVARCHAR` / `TEXT` / `CLOB` / `NCLOB` / `ALPHANUM` | `string` | +| `BINARY` / `VARBINARY` / `BLOB` | `binary` | + +## Known limitations + +### Avoid the `SYSTEM` schema for source tables + +Place tables you intend to ingest or expose as feature groups in a regular user schema (for example a project-specific `MYAPP` or `HOPSDEMO`). +Tables created under the system-owned `SYSTEM` schema do not reflect cleanly through the SQLAlchemy HANA dialect that powers DLT ingestion. +A typical setup is: + +```sql +CREATE SCHEMA HOPSDEMO; +RENAME TABLE SYSTEM.MY_TABLE TO HOPSDEMO.MY_TABLE; +``` + +Then set **Schema** in the data source to `HOPSDEMO` (or pick it from the schema browser) and use that as the basis for any external feature group or DLT ingestion job. + +### Online ingestion requires non-null primary keys + +When you create a managed feature group fed from SAP HANA via DLT and enable online serving, online ingestion validates that every row has a non-null value in the feature group's primary-key column. +If the source rows can carry `NULL` in that column, either filter them out at source, pick a different primary key on the feature group, or disable online serving for the feature group. + +### Authentication + +The SAP HANA data source currently supports username and password authentication. +Certificate-based and JWT authentication are tracked as follow-up work. + ## Next Steps Move on to the [usage guide for data sources](../usage.md) to see how you can use your newly created SAP HANA connector. From 42977207f24fcc6c5876a30f0f50d5507c1e680b Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Wed, 6 May 2026 19:09:56 +0200 Subject: [PATCH 4/5] [FSTORE-2021] Address Copilot review on SAP HANA docs Title and lead use 'an SAP' (vowel sound), bullets indented under their parent items so Markdown renders the continuation in-list, internal links go through stable heading IDs (`[text][heading-id]`) instead of relative file paths so links survive mike versioning, and 'Feature Group' is the consistent product noun. Adds a `data-source-sap-hana` heading id so the data-source index can cross-reference the page. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../fs/data_source/creation/sap_hana.md | 30 +++++++++---------- docs/user_guides/fs/data_source/index.md | 2 +- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/docs/user_guides/fs/data_source/creation/sap_hana.md b/docs/user_guides/fs/data_source/creation/sap_hana.md index bf2356214b..1dd643d804 100644 --- a/docs/user_guides/fs/data_source/creation/sap_hana.md +++ b/docs/user_guides/fs/data_source/creation/sap_hana.md @@ -1,10 +1,10 @@ -# How-To set up a SAP HANA Data Source +# How-To set up an SAP HANA Data Source { #data-source-sap-hana } ## Introduction SAP HANA is an in-memory relational database used by many enterprises as the system of record for ERP, CRM, and analytics workloads. -A SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database. +An SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database. Once configured, you can use the same data source as the basis for an external (on-demand) Feature Group, or as the source for a dltHub-driven ingestion job that materialises HANA data into a managed Feature Group. In this guide, you will configure a Data Source in Hopsworks that holds the authentication information needed to connect to your SAP HANA database. @@ -20,23 +20,23 @@ The following options are **mandatory**: - **Host**: The hostname of the SAP HANA endpoint, for example `hxehost.example.com` for an on-premise instance or the endpoint shown in SAP BTP for SAP HANA Cloud. - **Port**: The SQL port of the tenant database. -The default is `39015`, the SQL port for the first tenant database on a default -multi-tenant or HANA Express (HXE) install (instance number 90). -For a non-tenant single-host install (instance 00) use `30015`. -SAP HANA Cloud typically uses `443`. -Consult your DBA if you are unsure. + The default is `39015`, the SQL port for the first tenant database on a default + multi-tenant or HANA Express (HXE) install (instance number 90). + For a non-tenant single-host install (instance 00) use `30015`. + SAP HANA Cloud typically uses `443`. + Consult your DBA if you are unsure. - **User**: The HANA database user that the connector authenticates as. - **Password**: The password for that user. These are a few additional **optional** arguments: - **Database**: The tenant database name. -Use this when your SAP HANA system hosts more than one tenant database and you need to target a specific one. + Use this when your SAP HANA system hosts more than one tenant database and you need to target a specific one. - **Schema**: The default schema applied to unqualified queries on the connection. -If you leave this empty, queries must fully qualify table names with the schema prefix. + If you leave this empty, queries must fully qualify table names with the schema prefix. - **Table**: The default table the connector points at when no SQL query is provided. - **Application**: A short identifier surfaced in HANA's session tracing (`APPLICATION` session variable). -This makes it easier to attribute load to Hopsworks in HANA monitoring tools. + This makes it easier to attribute load to Hopsworks in HANA monitoring tools. - **Additional arguments**: Free-form key/value options forwarded to the underlying SAP HANA Python driver (`hdbcli`) and the Spark JDBC reader. !!! info "Drivers" @@ -67,12 +67,12 @@ Start by giving it a **name** and an optional **description**. 05. Provide the **Password** for that user. 06. Optionally fill in **Database**, **Schema**, **Table**, and **Application**. 07. Optionally add additional key/value arguments. -These are forwarded both to the Python driver used by the on-demand read path and to the Spark JDBC reader used by notebook jobs. + These are forwarded both to the Python driver used by the on-demand read path and to the Spark JDBC reader used by notebook jobs. 08. Click on "Save Credentials". ## Use it as an ingestion source -Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub](../../feature_group/ingest_with_dlthub.md). +Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub][ingest-data-with-dlthub]. SAP HANA is treated as a SQL-like source, so the ingestion job supports both full and incremental loading. ## Type mapping @@ -113,8 +113,8 @@ Then set **Schema** in the data source to `HOPSDEMO` (or pick it from the schema ### Online ingestion requires non-null primary keys -When you create a managed feature group fed from SAP HANA via DLT and enable online serving, online ingestion validates that every row has a non-null value in the feature group's primary-key column. -If the source rows can carry `NULL` in that column, either filter them out at source, pick a different primary key on the feature group, or disable online serving for the feature group. +When you create a managed Feature Group fed from SAP HANA via DLT and enable online serving, online ingestion validates that every row has a non-null value in the Feature Group's primary-key column. +If the source rows can carry `NULL` in that column, either filter them out at source, pick a different primary key on the Feature Group, or disable online serving for the Feature Group. ### Authentication @@ -123,4 +123,4 @@ Certificate-based and JWT authentication are tracked as follow-up work. ## Next Steps -Move on to the [usage guide for data sources](../usage.md) to see how you can use your newly created SAP HANA connector. +Move on to the [usage guide for data sources][data-source-usage] to see how you can use your newly created SAP HANA connector. diff --git a/docs/user_guides/fs/data_source/index.md b/docs/user_guides/fs/data_source/index.md index 061a78b008..3fdb9deb23 100644 --- a/docs/user_guides/fs/data_source/index.md +++ b/docs/user_guides/fs/data_source/index.md @@ -38,7 +38,7 @@ Cloud agnostic storage systems: 4. [HopsFS](creation/hopsfs.md): Easily connect and read from directories of Hopsworks' internal File System. 5. [CRM, Sales & Analytics](creation/crm_sales_analytics.md): Connect to supported CRM, sales, and analytics platforms. 6. [REST API](creation/rest_api.md): Connect to external HTTP APIs with configurable headers and authentication. -7. [SAP HANA](creation/sap_hana.md): Query SAP HANA tenant databases using SQL. +7. [SAP HANA][data-source-sap-hana]: Query SAP HANA tenant databases using SQL. ## AWS From 6049e2b84c8cb7e906a36cea55053108ebf8e29e Mon Sep 17 00:00:00 2001 From: Jim Dowling Date: Wed, 6 May 2026 20:24:03 +0200 Subject: [PATCH 5/5] [FSTORE-2021] Use spaced pipes in SAP HANA type table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit markdownlint MD060 rejected the table separator row in the SAP HANA type-mapping table because the dashes were flush against the pipes (`|---|---|`) but the data rows used spaced pipes (`| col | col |`). Switch the separator to `| --- | --- |` to match the rest of the cells — no rendered output change. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/user_guides/fs/data_source/creation/sap_hana.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/user_guides/fs/data_source/creation/sap_hana.md b/docs/user_guides/fs/data_source/creation/sap_hana.md index 1dd643d804..96d0e8a3f0 100644 --- a/docs/user_guides/fs/data_source/creation/sap_hana.md +++ b/docs/user_guides/fs/data_source/creation/sap_hana.md @@ -81,7 +81,7 @@ Hopsworks reads each source column's HANA type from the cursor description and m The mapping preserves precision and scale where possible, so a source `DECIMAL(12, 2)` becomes a Hopsworks `decimal(12,2)` feature rather than collapsing to `bigint`. | SAP HANA type | Hopsworks offline feature type | -|---------------|-------------------------------| +| --- | --- | | `TINYINT` | `tinyint` | | `SMALLINT` | `smallint` | | `INTEGER` | `int` |