diff --git a/docs/user_guides/fs/data_source/creation/sap_hana.md b/docs/user_guides/fs/data_source/creation/sap_hana.md new file mode 100644 index 000000000..96d0e8a3f --- /dev/null +++ b/docs/user_guides/fs/data_source/creation/sap_hana.md @@ -0,0 +1,126 @@ +# How-To set up an SAP HANA Data Source { #data-source-sap-hana } + +## Introduction + +SAP HANA is an in-memory relational database used by many enterprises as the system of record for ERP, CRM, and analytics workloads. + +An SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database. +Once configured, you can use the same data source as the basis for an external (on-demand) Feature Group, or as the source for a dltHub-driven ingestion job that materialises HANA data into a managed Feature Group. + +In this guide, you will configure a Data Source in Hopsworks that holds the authentication information needed to connect to your SAP HANA database. + +!!! note + Currently, it is only possible to create data sources in the Hopsworks UI. + You cannot create a data source programmatically. + +## Prerequisites + +Before you begin this guide you'll need to retrieve the following information from your SAP HANA tenant. +The following options are **mandatory**: + +- **Host**: The hostname of the SAP HANA endpoint, for example `hxehost.example.com` for an on-premise instance or the endpoint shown in SAP BTP for SAP HANA Cloud. +- **Port**: The SQL port of the tenant database. + The default is `39015`, the SQL port for the first tenant database on a default + multi-tenant or HANA Express (HXE) install (instance number 90). + For a non-tenant single-host install (instance 00) use `30015`. + SAP HANA Cloud typically uses `443`. + Consult your DBA if you are unsure. +- **User**: The HANA database user that the connector authenticates as. +- **Password**: The password for that user. + +These are a few additional **optional** arguments: + +- **Database**: The tenant database name. + Use this when your SAP HANA system hosts more than one tenant database and you need to target a specific one. +- **Schema**: The default schema applied to unqualified queries on the connection. + If you leave this empty, queries must fully qualify table names with the schema prefix. +- **Table**: The default table the connector points at when no SQL query is provided. +- **Application**: A short identifier surfaced in HANA's session tracing (`APPLICATION` session variable). + This makes it easier to attribute load to Hopsworks in HANA monitoring tools. +- **Additional arguments**: Free-form key/value options forwarded to the underlying SAP HANA Python driver (`hdbcli`) and the Spark JDBC reader. + +!!! info "Drivers" + Hopsworks ships the SAP HANA drivers needed to read from HANA out of the box. + The Hopsworks Spark image bundles the SAP `ngdbc` JDBC driver for Spark JDBC reads, and the dlt ingestion image and Arrow Flight server bundle SAP's `hdbcli` Python DBAPI driver. + You do not need to install or upload the drivers yourself. + +## Creation in the UI + +### Step 1: Set up new Data Source + +Head to the Data Source View on Hopsworks and start the creation flow for a new data source. + +
+ ![Data Source Creation](../../../../assets/images/guides/fs/data_source/data_source_overview.png) +
The Data Source View in the User Interface
+
+ +### Step 2: Enter SAP HANA Settings + +Enter the details for your SAP HANA connector. +Start by giving it a **name** and an optional **description**. + +01. Select "SAP HANA" as storage. +02. Specify the **Host** of your SAP HANA endpoint. +03. Specify the **Port** the tenant SQL service listens on (default `39015`). +04. Provide the **User** name of the HANA database user. +05. Provide the **Password** for that user. +06. Optionally fill in **Database**, **Schema**, **Table**, and **Application**. +07. Optionally add additional key/value arguments. + These are forwarded both to the Python driver used by the on-demand read path and to the Spark JDBC reader used by notebook jobs. +08. Click on "Save Credentials". + +## Use it as an ingestion source + +Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub][ingest-data-with-dlthub]. +SAP HANA is treated as a SQL-like source, so the ingestion job supports both full and incremental loading. + +## Type mapping + +Hopsworks reads each source column's HANA type from the cursor description and maps it to a Hopsworks offline feature type. +The mapping preserves precision and scale where possible, so a source `DECIMAL(12, 2)` becomes a Hopsworks `decimal(12,2)` feature rather than collapsing to `bigint`. + +| SAP HANA type | Hopsworks offline feature type | +| --- | --- | +| `TINYINT` | `tinyint` | +| `SMALLINT` | `smallint` | +| `INTEGER` | `int` | +| `BIGINT` | `bigint` | +| `DECIMAL(p, s)` | `decimal(p,s)` | +| `REAL` | `float` | +| `DOUBLE` | `double` | +| `BOOLEAN` | `boolean` | +| `DATE` | `date` | +| `TIME` | `timestamp` | +| `TIMESTAMP` / `SECONDDATE` / `LONGDATE` | `timestamp` | +| `CHAR` / `VARCHAR` / `NCHAR` / `NVARCHAR` / `TEXT` / `CLOB` / `NCLOB` / `ALPHANUM` | `string` | +| `BINARY` / `VARBINARY` / `BLOB` | `binary` | + +## Known limitations + +### Avoid the `SYSTEM` schema for source tables + +Place tables you intend to ingest or expose as feature groups in a regular user schema (for example a project-specific `MYAPP` or `HOPSDEMO`). +Tables created under the system-owned `SYSTEM` schema do not reflect cleanly through the SQLAlchemy HANA dialect that powers DLT ingestion. +A typical setup is: + +```sql +CREATE SCHEMA HOPSDEMO; +RENAME TABLE SYSTEM.MY_TABLE TO HOPSDEMO.MY_TABLE; +``` + +Then set **Schema** in the data source to `HOPSDEMO` (or pick it from the schema browser) and use that as the basis for any external feature group or DLT ingestion job. + +### Online ingestion requires non-null primary keys + +When you create a managed Feature Group fed from SAP HANA via DLT and enable online serving, online ingestion validates that every row has a non-null value in the Feature Group's primary-key column. +If the source rows can carry `NULL` in that column, either filter them out at source, pick a different primary key on the Feature Group, or disable online serving for the Feature Group. + +### Authentication + +The SAP HANA data source currently supports username and password authentication. +Certificate-based and JWT authentication are tracked as follow-up work. + +## Next Steps + +Move on to the [usage guide for data sources][data-source-usage] to see how you can use your newly created SAP HANA connector. diff --git a/docs/user_guides/fs/data_source/index.md b/docs/user_guides/fs/data_source/index.md index dda3a2971..3fdb9deb2 100644 --- a/docs/user_guides/fs/data_source/index.md +++ b/docs/user_guides/fs/data_source/index.md @@ -38,6 +38,7 @@ Cloud agnostic storage systems: 4. [HopsFS](creation/hopsfs.md): Easily connect and read from directories of Hopsworks' internal File System. 5. [CRM, Sales & Analytics](creation/crm_sales_analytics.md): Connect to supported CRM, sales, and analytics platforms. 6. [REST API](creation/rest_api.md): Connect to external HTTP APIs with configurable headers and authentication. +7. [SAP HANA][data-source-sap-hana]: Query SAP HANA tenant databases using SQL. ## AWS diff --git a/docs/user_guides/fs/feature_group/ingest_with_dlthub.md b/docs/user_guides/fs/feature_group/ingest_with_dlthub.md index 24bff8edd..dc9f29fd5 100644 --- a/docs/user_guides/fs/feature_group/ingest_with_dlthub.md +++ b/docs/user_guides/fs/feature_group/ingest_with_dlthub.md @@ -31,6 +31,7 @@ Use `Ingest Data to New Feature Group` when you want to: This ingestion flow supports multiple data sources: - SQL-like sources can either create an external feature group or ingest data into a new feature group. +- The SQL family currently includes Snowflake, BigQuery, Redshift, generic JDBC (MySQL, PostgreSQL, Oracle), and SAP HANA. - CRM and REST API sources use the ingestion path only. - Incremental loading is available for SQL and REST API sources. - CRM sources currently use full-load ingestion. diff --git a/mkdocs.yml b/mkdocs.yml index 845e8466b..1a6c715d7 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -74,6 +74,7 @@ nav: - CRM, Sales & Analytics: user_guides/fs/data_source/creation/crm_sales_analytics.md - REST API: user_guides/fs/data_source/creation/rest_api.md - Unity Catalog: user_guides/fs/data_source/creation/unity_catalog.md + - SAP HANA: user_guides/fs/data_source/creation/sap_hana.md - Usage: user_guides/fs/data_source/usage.md - Feature Group: - user_guides/fs/feature_group/index.md