Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions docs/user_guides/fs/data_source/creation/sap_hana.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# How-To set up an SAP HANA Data Source { #data-source-sap-hana }

## Introduction

SAP HANA is an in-memory relational database used by many enterprises as the system of record for ERP, CRM, and analytics workloads.

An SAP HANA Data Source in Hopsworks stores the connection details required to read tables and views from a HANA tenant database.
Once configured, you can use the same data source as the basis for an external (on-demand) Feature Group, or as the source for a dltHub-driven ingestion job that materialises HANA data into a managed Feature Group.

In this guide, you will configure a Data Source in Hopsworks that holds the authentication information needed to connect to your SAP HANA database.

!!! note
Currently, it is only possible to create data sources in the Hopsworks UI.
You cannot create a data source programmatically.

## Prerequisites

Before you begin this guide you'll need to retrieve the following information from your SAP HANA tenant.
The following options are **mandatory**:

- **Host**: The hostname of the SAP HANA endpoint, for example `hxehost.example.com` for an on-premise instance or the endpoint shown in SAP BTP for SAP HANA Cloud.
- **Port**: The SQL port of the tenant database.
The default is `39015`, the SQL port for the first tenant database on a default
multi-tenant or HANA Express (HXE) install (instance number 90).
For a non-tenant single-host install (instance 00) use `30015`.
SAP HANA Cloud typically uses `443`.
Consult your DBA if you are unsure.
- **User**: The HANA database user that the connector authenticates as.
- **Password**: The password for that user.

These are a few additional **optional** arguments:

- **Database**: The tenant database name.
Use this when your SAP HANA system hosts more than one tenant database and you need to target a specific one.
- **Schema**: The default schema applied to unqualified queries on the connection.
If you leave this empty, queries must fully qualify table names with the schema prefix.
- **Table**: The default table the connector points at when no SQL query is provided.
- **Application**: A short identifier surfaced in HANA's session tracing (`APPLICATION` session variable).
This makes it easier to attribute load to Hopsworks in HANA monitoring tools.
- **Additional arguments**: Free-form key/value options forwarded to the underlying SAP HANA Python driver (`hdbcli`) and the Spark JDBC reader.

!!! info "Drivers"
Hopsworks ships the SAP HANA drivers needed to read from HANA out of the box.
The Hopsworks Spark image bundles the SAP `ngdbc` JDBC driver for Spark JDBC reads, and the dlt ingestion image and Arrow Flight server bundle SAP's `hdbcli` Python DBAPI driver.
You do not need to install or upload the drivers yourself.

## Creation in the UI

### Step 1: Set up new Data Source

Head to the Data Source View on Hopsworks and start the creation flow for a new data source.

<figure markdown>
![Data Source Creation](../../../../assets/images/guides/fs/data_source/data_source_overview.png)
<figcaption>The Data Source View in the User Interface</figcaption>
</figure>

### Step 2: Enter SAP HANA Settings

Enter the details for your SAP HANA connector.
Comment thread
jimdowling marked this conversation as resolved.
Start by giving it a **name** and an optional **description**.

01. Select "SAP HANA" as storage.
02. Specify the **Host** of your SAP HANA endpoint.
03. Specify the **Port** the tenant SQL service listens on (default `39015`).
04. Provide the **User** name of the HANA database user.
05. Provide the **Password** for that user.
06. Optionally fill in **Database**, **Schema**, **Table**, and **Application**.
07. Optionally add additional key/value arguments.
These are forwarded both to the Python driver used by the on-demand read path and to the Spark JDBC reader used by notebook jobs.
08. Click on "Save Credentials".

## Use it as an ingestion source

Once the SAP HANA data source exists, you can also use it with the dltHub-based ingestion workflow described in [Ingest Data with dltHub][ingest-data-with-dlthub].
SAP HANA is treated as a SQL-like source, so the ingestion job supports both full and incremental loading.

## Type mapping

Hopsworks reads each source column's HANA type from the cursor description and maps it to a Hopsworks offline feature type.
The mapping preserves precision and scale where possible, so a source `DECIMAL(12, 2)` becomes a Hopsworks `decimal(12,2)` feature rather than collapsing to `bigint`.

| SAP HANA type | Hopsworks offline feature type |
| --- | --- |
| `TINYINT` | `tinyint` |
| `SMALLINT` | `smallint` |
| `INTEGER` | `int` |
| `BIGINT` | `bigint` |
| `DECIMAL(p, s)` | `decimal(p,s)` |
| `REAL` | `float` |
| `DOUBLE` | `double` |
| `BOOLEAN` | `boolean` |
| `DATE` | `date` |
| `TIME` | `timestamp` |
| `TIMESTAMP` / `SECONDDATE` / `LONGDATE` | `timestamp` |
| `CHAR` / `VARCHAR` / `NCHAR` / `NVARCHAR` / `TEXT` / `CLOB` / `NCLOB` / `ALPHANUM` | `string` |
| `BINARY` / `VARBINARY` / `BLOB` | `binary` |

## Known limitations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jimdowling if this section is supposed to be blank we can remove it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to fill this in — there are a few SAP HANA-specific things worth calling out:

  • Data types not yet in the mapping table above (numerics like DECIMAL / SMALLDECIMAL / DECFLOAT, date/time including SECONDDATE / DAYDATE / SMALLDATE, BOOLEAN, and spatial types ST_GEOMETRY / ST_POINT / ARRAY / REF).
  • Supported authentication modes (username/password).
  • HANA-specific object handling — calculation views vs. plain tables, and any HANA Cloud vs. on-prem differences worth noting.

Will push a commit and resolve this thread.


### Avoid the `SYSTEM` schema for source tables

Place tables you intend to ingest or expose as feature groups in a regular user schema (for example a project-specific `MYAPP` or `HOPSDEMO`).
Tables created under the system-owned `SYSTEM` schema do not reflect cleanly through the SQLAlchemy HANA dialect that powers DLT ingestion.
Comment thread
jimdowling marked this conversation as resolved.
A typical setup is:

```sql
CREATE SCHEMA HOPSDEMO;
RENAME TABLE SYSTEM.MY_TABLE TO HOPSDEMO.MY_TABLE;
```

Then set **Schema** in the data source to `HOPSDEMO` (or pick it from the schema browser) and use that as the basis for any external feature group or DLT ingestion job.

### Online ingestion requires non-null primary keys

When you create a managed Feature Group fed from SAP HANA via DLT and enable online serving, online ingestion validates that every row has a non-null value in the Feature Group's primary-key column.
If the source rows can carry `NULL` in that column, either filter them out at source, pick a different primary key on the Feature Group, or disable online serving for the Feature Group.

### Authentication

The SAP HANA data source currently supports username and password authentication.
Certificate-based and JWT authentication are tracked as follow-up work.

## Next Steps

Move on to the [usage guide for data sources][data-source-usage] to see how you can use your newly created SAP HANA connector.
1 change: 1 addition & 0 deletions docs/user_guides/fs/data_source/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Cloud agnostic storage systems:
4. [HopsFS](creation/hopsfs.md): Easily connect and read from directories of Hopsworks' internal File System.
5. [CRM, Sales & Analytics](creation/crm_sales_analytics.md): Connect to supported CRM, sales, and analytics platforms.
6. [REST API](creation/rest_api.md): Connect to external HTTP APIs with configurable headers and authentication.
7. [SAP HANA][data-source-sap-hana]: Query SAP HANA tenant databases using SQL.

## AWS

Expand Down
1 change: 1 addition & 0 deletions docs/user_guides/fs/feature_group/ingest_with_dlthub.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Use `Ingest Data to New Feature Group` when you want to:
This ingestion flow supports multiple data sources:

- SQL-like sources can either create an external feature group or ingest data into a new feature group.
- The SQL family currently includes Snowflake, BigQuery, Redshift, generic JDBC (MySQL, PostgreSQL, Oracle), and SAP HANA.
- CRM and REST API sources use the ingestion path only.
- Incremental loading is available for SQL and REST API sources.
- CRM sources currently use full-load ingestion.
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ nav:
- CRM, Sales & Analytics: user_guides/fs/data_source/creation/crm_sales_analytics.md
- REST API: user_guides/fs/data_source/creation/rest_api.md
- Unity Catalog: user_guides/fs/data_source/creation/unity_catalog.md
- SAP HANA: user_guides/fs/data_source/creation/sap_hana.md
- Usage: user_guides/fs/data_source/usage.md
- Feature Group:
- user_guides/fs/feature_group/index.md
Expand Down
Loading