microsoft · astafan8 · Jun 12, 2026 · Jun 30, 2026 · Jun 30, 2026 · Jun 30, 2026
@@ -75,3 +75,33 @@ We note that the dataset currently exclusively supports storing data in an
 SQLite database. This is not an intrinsic limitation of the dataset and
 measurement layer. It is possible that at a future state support for writing
 to a different backend will be added.
+
+.. _sec:design_split_storage:
+
+Split Raw Data Storage
+======================
+
+As the main SQLite database grows with many datasets, browsing experiments and
+loading metadata can become slower due to the file size. To address this,
+QCoDeS supports an optional **split raw data storage** mode (see
+:ref:`sec:intro_split_raw_data` for user-facing details).
+
+From a design perspective, this feature adds a thin routing layer inside the
+``DataSet`` class without changing any public interfaces:
+
+- A ``_data_conn`` property transparently returns either the main database
+  connection or a per-dataset raw data connection, depending on the
+  configuration.
+- Write paths (``add_results``, ``_BackgroundWriter``) and read paths
+  (``get_parameter_data``, ``DataSetCacheWithDBBackend``, ``number_of_results``,
+  ``__len__``) all go through this single routing point.
+- The per-dataset SQLite file is a lightweight database containing only the
+  results table and numpy type adapters -- no QCoDeS metadata schema.
+- Subscriber triggers (used for real-time data callbacks) are created on the
+  data connection so that they fire regardless of which database holds the
+  results table.
+
+The implementation is contained in ``qcodes.dataset.raw_data_storage`` (helper
+functions) and a handful of additions to ``qcodes.dataset.data_set`` (routing
+logic). The ``Measurement`` context manager, ``DataSaver``, and all export
+functions work without modification.
@@ -75,3 +75,44 @@ For dataset operations, QCoDeS provides functions for:
 - **Exporting datasets**: :doc:`Exporting data to other file formats <../examples/DataSet/Exporting-data-to-other-file-formats>`
 - **Extracting runs between databases**: :doc:`Extracting runs from one DB file to another <../examples/DataSet/Extracting-runs-from-one-DB-file-to-another>` and :func:`qcodes.dataset.extract_runs_into_db`
 - **Bulk export and metadata-only databases**: :func:`qcodes.dataset.export_datasets_and_create_metadata_db` for creating lightweight metadata-only databases while exporting all data to NetCDF files
+
+.. _sec:intro_split_raw_data:
+
+Split Raw Data Storage
+======================
+
+By default, all measurement data (the results table rows) is stored in the same SQLite database alongside metadata such as experiments, runs, parameter layouts, and dependencies. Over time, the main database file can grow very large, which can slow down operations like browsing experiments and loading metadata.
+
+QCoDeS supports an optional **split raw data storage** mode in which the actual measurement data for each ``DataSet`` is written to an individual, per-dataset SQLite file while all metadata remains in the main database. Each per-dataset file is named after the dataset's GUID (e.g. ``<guid>.db``) and is stored in a configurable folder.
+
+This feature is controlled by two configuration options in ``qcodesrc.json``:
+
+- ``dataset.raw_data_to_separate_db`` (bool, default ``false``): enables or disables split storage.
+- ``dataset.raw_data_path`` (string, default ``"{db_location}"``): the folder where per-dataset files are created. The ``{db_location}`` placeholder is expanded to a folder derived from the main database path (e.g. ``~/experiments.db`` becomes ``~/experiments_db/``).
+
+When enabled:
+
+- The main database retains the full results table schema (column definitions) but no data rows are written to it, keeping it lightweight.
+- All ``INSERT`` and ``SELECT`` operations on results data are transparently routed to the per-dataset file.
+- The path to the per-dataset file is persisted in the run's metadata (``raw_data_db_path``), so ``load_by_id`` and related loading functions automatically reconnect to the correct file.
+- All public ``DataSet`` APIs (``get_parameter_data``, ``to_pandas_dataframe``, ``to_xarray_dataset``, ``cache``, ``export``, etc.) work identically whether split storage is enabled or not.
+
+Example runtime configuration::
+
+    import qcodes as qc
+
+    qc.config.dataset.raw_data_to_separate_db = True
+    qc.config.dataset.raw_data_path = "/data/raw_measurements/"
+
+If the per-dataset raw data files are moved to a different folder (e.g. during data migration or archival), the stored paths in the main database will become stale. Use the :func:`~qcodes.dataset.update_raw_data_paths` helper to update them::
+
+    from qcodes.dataset import update_raw_data_paths
+
+    update_raw_data_paths(
+        db_path="/path/to/main_database.db",
+        new_raw_data_folder="/new/location/of/raw_files/"
+    )
+
+This scans all datasets with a ``raw_data_db_path`` metadata entry, checks whether the corresponding ``.db`` file exists in the new folder, and updates the stored path accordingly.
+
+For more details on database management, see the :doc:`Database notebook <../examples/DataSet/Database>`.
@@ -167,6 +167,97 @@
     "\n",
     "Moreover, we have also written an [example notebook](Extracting-runs-from-one-DB-file-to-another.ipynb) of transferring `DataSets` between database flies that may serve as a template for more complex data organization protocols."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Split Raw Data Storage\n",
+    "\n",
+    "As the main database grows with many datasets, browsing experiments and loading metadata can become slower. QCoDeS supports an optional **split raw data storage** mode that writes the raw measurement data for each dataset into its own individual SQLite file, while keeping all metadata (experiments, runs, parameters, dependencies) in the main database.\n",
+    "\n",
+    "This keeps the main database lightweight and makes it faster to work with, while still allowing all existing `DataSet` APIs to function identically.\n",
+    "\n",
+    "### Configuration\n",
+    "\n",
+    "Split raw data storage is controlled by two configuration options:\n",
+    "\n",
+    "- `dataset.raw_data_to_separate_db` (bool, default `False`): enables or disables split storage.\n",
+    "- `dataset.raw_data_path` (string, default `\"{db_location}\"`): the folder where per-dataset SQLite files are created. The `{db_location}` placeholder expands to a folder derived from the main database path (e.g. `~/experiments.db` becomes `~/experiments_db/`).\n",
+    "\n",
+    "You can enable it at runtime:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Enable split raw data storage\n",
+    "qc.config.dataset.raw_data_to_separate_db = True\n",
+    "\n",
+    "# Optionally set a custom path for per-dataset files\n",
+    "qc.config.dataset.raw_data_path = \"/data/raw_measurements/\"\n",
+    "\n",
+    "# Or use the default which derives from the main DB location:\n",
+    "# qc.config.dataset.raw_data_path = \"{db_location}\"\n",
+    "# e.g. ~/experiments.db -> ~/experiments_db/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Or permanently in your `qcodesrc.json`:\n",
+    "\n",
+    "```json\n",
+    "{\n",
+    "  \"dataset\": {\n",
+    "    \"raw_data_to_separate_db\": true,\n",
+    "    \"raw_data_path\": \"{db_location}\"\n",
+    "  }\n",
+    "}\n",
+    "```\n",
+    "\n",
+    "### How It Works\n",
+    "\n",
+    "When split storage is enabled:\n",
+    "\n",
+    "1. When a measurement starts (`mark_started()`), a per-dataset SQLite file named `<guid>.db` is created in the configured folder.\n",
+    "2. All measurement data (results table rows) is written to this per-dataset file instead of the main database.\n",
+    "3. The main database retains the results table schema (column definitions) but contains no data rows, keeping it small.\n",
+    "4. The path to the per-dataset file is saved in the run metadata, so `load_by_id()` and related functions automatically find and reconnect to the correct file.\n",
+    "5. All `DataSet` methods (`get_parameter_data`, `to_pandas_dataframe`, `to_xarray_dataset`, `cache`, `export`, etc.) work transparently with split storage.\n",
+    "\n",
+    "> **Note:** Datasets created with split storage enabled can always be loaded later, even if the configuration is changed back to the default, as long as the per-dataset files remain at their original paths.\n",
+    "\n",
+    "### Updating Paths After Moving Raw Data Files\n",
+    "\n",
+    "If you move the per-dataset raw data files to a different folder, the paths stored in the main database become stale. Use `update_raw_data_paths` to fix them:"
+    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# from qcodes.dataset import update_raw_data_paths\n",
+    "\n",
+    "# After moving raw data files to a new folder:\n",
+    "# update_raw_data_paths(\n",
+    "#     db_path=\"/path/to/main_database.db\",\n",
+    "#     new_raw_data_folder=\"/new/location/of/raw_files/\"\n",
+    "# )"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The function scans all datasets that have a `raw_data_db_path` metadata entry, checks whether the corresponding `.db` file exists in the new folder, and updates the stored path. Datasets whose files are not found in the new folder are skipped with a warning."
+   ]
   }
  ],
  "metadata": {

@@ -79,7 +79,9 @@
         "export_chunked_export_of_large_files_enabled": false,
         "export_chunked_threshold": 1000,
         "in_memory_cache": true,
-        "load_from_exported_file": false
+        "load_from_exported_file": false,
+        "raw_data_to_separate_db": false,
+        "raw_data_path": "{db_location}"
     },
     "telemetry":
     {

@@ -382,6 +382,16 @@
                     "type": "boolean",
                     "default": true,
                     "description": "Should the data be cached in memory as it is measured. Useful to disable for large datasets to save on memory consumption."
+                },
+                "raw_data_to_separate_db": {
+                    "type": "boolean",
+                    "default": false,
+                    "description": "If true, raw measurement data (results tables) will be written to individual per-dataset SQLite files instead of the main database. Metadata remains in the main database."
+                },
+                "raw_data_path": {
+                    "type": "string",
+                    "default": "{db_location}",
+                    "description": "Path to the folder where per-dataset raw data SQLite files are stored. {db_location} is a directory in the same folder as the .db file with a matching name, e.g. for ~/experiments.db raw data files will be stored in ~/experiments_db/"
                 }
             },
             "description": "Settings related to the DataSet and Measurement Context manager",

@@ -3,6 +3,7 @@
 and from disk
 """
 
+from ._raw_data_storage import update_raw_data_paths
 from .data_set import (
     get_guids_by_run_spec,
     load_by_counter,
@@ -120,4 +121,5 @@
     "plot_dataset",
     "reset_default_experiment_id",
     "rundescriber_from_json",
+    "update_raw_data_paths",
 ]