Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 65 additions & 47 deletions create-use-case/prepare-dataset.mdx
Original file line number Diff line number Diff line change
@@ -1,31 +1,31 @@
---
title: "Prepare Data"
description: "Learn how to prepare and ingest your datasets into tracebloc using containerized data ingestors. Complete guide for CSV, image, and text data with Kubernetes deployment steps."

Check warning on line 3 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L3

Did you really mean 'tracebloc'?

Check warning on line 3 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L3

Did you really mean 'ingestors'?
---

## Overview

Make your data available to the Kubernetes cluster so it can be used for training and evaluation. Regardless of where your client runs on Azure, AWS, Google Cloud, or a local Minikube setup, the process of ingesting datasets works the same way.

Check warning on line 8 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L8

Did you really mean 'Minikube'?

The data ingestor is a lightweight service that bridges your raw data and the cluster's persistent storage. It comes with ready-made templates (CSV, images, text) that you can use as starting points and customize for your own dataset. By containerizing the ingestion step, the ingestor validates data format and schema, enforces consistency, and transfers the dataset securely into cluster's SQL storage where it becomes accessible to all training and evaluation jobs.

Check warning on line 10 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L10

Did you really mean 'ingestor'?

Check warning on line 10 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L10

Did you really mean 'ingestor'?

This guide covers:
- Customizing ingestor templates for different data types (CSV, images, text)

Check warning on line 13 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L13

Did you really mean 'ingestor'?
- Deploying the data ingestor for training and test data using Kubernetes

Check warning on line 14 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L14

Did you really mean 'ingestor'?
- Managing datasets through the tracebloc interface

Check warning on line 15 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L15

Did you really mean 'tracebloc'?

**IMPORTANT** Make sure that the data format and ML task is supported and that data standards are met by reviewing the [docs](/create-use-case/prerequisites). You must run the process twice, once to ingest training and once to ingest testing data.

## Quick Setup

Use this quick setup if you already have an ingestor configured and just want to switch datasets or toggle between training and testing. If you are setting up for the first time, go to the next section for the detailed walkthrough.

Check warning on line 21 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L21

Did you really mean 'ingestor'?

Check warning on line 21 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L21

Did you really mean 'walkthrough'?

### Steps

1. Pick a template script and edit it. E.g. `/templates/tabular_classification/tabular_classification.py`
- Update csv options and data_path

Check warning on line 26 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L26

Did you really mean 'csv'?

Check warning on line 26 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L26

Did you really mean 'data_path'?
- Only for tabular data: Update schema
- Set `schema` and `CSVIngestor()`parameters like category, intent, label_column, etc. to match data type, task and train/test purpose

Check warning on line 28 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L28

Did you really mean 'label_column'?

```python
ingestor = CSVIngestor(
Expand All @@ -41,28 +41,27 @@
Make sure Docker is running on your system (e.g. by starting Docker Desktop), then execute the following command:

```bash
# Build for cloud and push directly to registry
docker buildx build --platform linux/amd64 -t <your-username>/<image-name>:<tag> --push .
# Build for cloud (multi-arch) and push directly to registry
docker buildx build --platform linux/amd64,linux/arm64 -t <your-username>/<image-name>:<tag> --push .
```
3. Edit ingestor-job.yaml:
- `metadata.name`: Unique job name (e.g. ingestor-job-train and ingestor-job-test)
- `image`: The tag you built and pushed
- `LABEL_FILE`: Path inside container (e.g. /data/train.csv). - Points to csv file with labels and/or data in case of tabular data
- `LABEL_FILE`: Path inside the pod to the labels CSV, under the PVC mount (e.g. `/data/shared/labels.csv`). For tabular data, this is the same file that contains both labels and features.
- `TABLE_NAME`: Unique table name (no spaces, one per dataset). Title is optional
- `PATH_TO_LOCAL_DATASET_FILE`: Path to your dataset file within the container
- `SRC_PATH`: Root inside the container where your files are mounted
- `SRC_PATH`: Root of the mounted dataset directory inside the pod (`/data/shared`, backed by `~/.tracebloc/<workspace>/data` on the client host)

4. Deploy to Kubernetes
```bash
`kubectl apply -f ingestor-job.yaml -n <namespace>`
`kubectl apply -f ingestor-job.yaml -n <workspace>`
```
## Detailed Setup

### 1. Configure a Template

This section walks you through the step-by-step setup of a data ingestor. You will clone the repository, select the right template for your data type, and customize it to match your task. Follow this guide if you are setting up an ingestor for the first time or need full control beyond the quick setup.

Check warning on line 62 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L62

Did you really mean 'ingestor'?

Check warning on line 62 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L62

Did you really mean 'ingestor'?

### Clone the Data Ingestor Repository

Check warning on line 64 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L64

Did you really mean 'Ingestor'?

Clone the public [Data Ingestor GitHub repository](https://github.com/tracebloc/data-ingestors):

Expand All @@ -85,6 +84,9 @@
| Data Type | Template File | Data Category | Data Format |
|-----------|---------------|---------------|-------------|
| Tabular | templates/tabular_classification/tabular_classification.py | `TaskCategory.TABULAR_CLASSIFICATION` | `DataFormat.TABULAR` |
| Tabular | templates/tabular_regression/tabular_regression.py | `TaskCategory.TABULAR_REGRESSION` | `DataFormat.TABULAR` |
| Tabular | templates/time_series_forecasting/time_series_forecasting.py | `TaskCategory.TIME_SERIES_FORECASTING` | `DataFormat.TABULAR` |
| Tabular | templates/time_to_event_prediction/time_to_event_prediction.py | `TaskCategory.TIME_TO_EVENT_PREDICTION` | `DataFormat.TABULAR` |
| Image | templates/image_classification/image_classification.py | `TaskCategory.IMAGE_CLASSIFICATION` | `DataFormat.IMAGE` |
| Image | templates/object_detection/object_detection.py | `TaskCategory.OBJECT_DETECTION` | `DataFormat.IMAGE` |
| Text | templates/text_classification/text_classification.py | `TaskCategory.TEXT_CLASSIFICATION` | `DataFormat.TEXT` |
Expand Down Expand Up @@ -124,14 +126,14 @@
...
```

Both Database, APIClient and other values are configured automatically from the environment variables defined in `ingestor_job.yaml`.

Check warning on line 129 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L129

Did you really mean 'APIClient'?

- `config.LABEL_FILE`: Path to local csv label file

Check warning on line 131 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L131

Did you really mean 'csv'?
- `config.BATCH_SIZE`: Batch size used during ingestion

### Customize a Template

Templates provide a starting point, but every dataset has its own format and labels. In this step you adapt the template to your data by tuning CSV ingestion options and setting the ingestor parameters (category, label column, intent, data path and schema). The following example in `templates/tabular_classification/tabular_classification.py` shows how to ingest a tabular dataset, but the setup works the same way for image or text data.

Check warning on line 136 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L136

Did you really mean 'ingestor'?

#### Needed for Tabular Data: Define Schema

Expand Down Expand Up @@ -180,11 +182,11 @@
Define file extensions.

```python
text_options = {"allowed_extension": FileExtension.TXT} # Allowed text file extensions
text_options = {"extension": FileExtension.TXT} # Allowed text file extensions
```

#### Set CSV ingestion options
Customize parsing, memory handling, and data cleaning with the csv_options dictionary:

Check warning on line 189 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L189

Did you really mean 'csv_options'?

```python
csv_options = {
Expand All @@ -199,9 +201,9 @@
}
```

#### Set Up the Ingestor

Check warning on line 204 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L204

Did you really mean 'Ingestor'?

Define the Ingestor instance with the required configuration. See the tabular data example below:

Check warning on line 206 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L206

Did you really mean 'Ingestor'?

```python
ingestor = CSVIngestor(
Expand Down Expand Up @@ -231,31 +233,54 @@

With your template configured, the next step is to package it into a Docker image so it can run inside the Kubernetes cluster.

### Edit Dockerfile
### Docker Hub Setup (first-time users)

The cluster pulls your ingestor image from a public Docker registry, so you need an account before you can push. If you already have one, skip to [Edit Dockerfile](#edit-dockerfile).

Check warning on line 238 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L238

Did you really mean 'ingestor'?

1. **Create a Docker Hub account** at [hub.docker.com/signup](https://hub.docker.com/signup) and verify your email.
2. **Log in from your terminal** so the `docker push` command can authenticate:

```bash
docker login
```

3. **Push the data ingestor image** to your account using the build/push commands in the next section. The image name takes the form `<your-docker-username>/<image-name>:<tag>` — the username segment must match the account you just created.

Check warning on line 247 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L247

Did you really mean 'ingestor'?
4. **Make the image public** so the cluster can pull it without credentials:
- Go to [hub.docker.com/repositories](https://hub.docker.com/repositories), open the repository you just pushed.
- Click **Settings → Visibility settings → Make public**.

Keeping the image private is also fine, but then you must create a Kubernetes `imagePullSecret` named `regcred` in the client namespace (the `ingestor-job.yaml` already references it).

Check warning on line 252 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L252

Did you really mean 'namespace'?

### Place data files on the client host

Before building the image, update your `Dockerfile` so that both the dataset and the ingestion script are copied into the container. This ensures the ingestor has everything it needs at runtime, independent of your local file system.
Datasets are **not** baked into the Docker image. They live on the client host in the per-workspace data directory and are mounted into the ingestor pod through the shared PVC (`client-pvc` → `/data/shared`).

Check warning on line 256 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L256

Did you really mean 'ingestor'?

#### Copy data files
For all use cases except tabular data (where labels and features are contained within a single labels.csv file), copy the data files into the Docker container:
Copy your dataset into the client's data directory, where `<workspace>` is the workspace name you chose during client install (which is also the Helm release name and the Kubernetes namespace — the chart uses the same value for all three). The directory `~/.tracebloc/<workspace>/data/` is created automatically by the installer; just drop your files into it:

Check warning on line 258 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L258

Did you really mean 'namespace'?

```bash
# Needed for image and text data: Copy source data into the container to /app
COPY LOCAL_PATH/images/ app/images/
# Copy labels to /app
COPY LOCAL_PATH/labels.csv /app/labels.csv
# Host path on the machine where the tracebloc client is installed.
# HOST_DATA_DIR defaults to ~/.tracebloc; override only if you set it during install.
cp -R LOCAL_PATH/images ~/.tracebloc/<workspace>/data/
cp LOCAL_PATH/labels.csv ~/.tracebloc/<workspace>/data/
```

Then, move the ingestion script over to the container as well:
Inside the ingestor pod this directory is mounted at `/data/shared`, so the same files appear as `/data/shared/images/...` and `/data/shared/labels.csv`. Set `SRC_PATH` and `LABEL_FILE` in `ingestor-job.yaml` to point at those in-pod paths (see [Configure Kubernetes](#3-configure-kubernetes) below).

Check warning on line 267 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L267

Did you really mean 'ingestor'?

```bash
For tabular data the same rule applies — drop the single `labels.csv` (with features and labels) into `~/.tracebloc/<workspace>/data/`.

### Edit Dockerfile

Check warning on line 271 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L271

Did you really mean 'Dockerfile'?

The Dockerfile only needs to package the ingestion script — the dataset is mounted at runtime, so do **not** `COPY` data into the image:

Check warning on line 273 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L273

Did you really mean 'Dockerfile'?

```dockerfile
# Copy the ingestion script into /app
COPY templates/tabular_classification/tabular_classification.py /app/ingestor.py
```


### Build Docker Image

You need a docker user and password to proceed with the next step. Most cloud platforms (AWS, Azure, GCP) run on Linux AMD64. Specifying `--platform linux/amd64` guarantees compatibility, particularly if you build images on Apple Silicon (M1/M2) or other ARM-based systems. Pick a setup, build and deploy the image:
You need a docker user and password to proceed with the next step. Cloud platforms run a mix of x86 and ARM nodes (e.g. AWS Graviton, Azure Ampere, GCP Tau T2A). Building a multi-arch image with `--platform linux/amd64,linux/arm64` guarantees the image runs on either, particularly if you build on Apple Silicon (M1/M2) or other ARM-based systems. Pick a setup, build and deploy the image:

#### For Local Development/Testing

Expand All @@ -270,11 +295,8 @@
#### For Cloud Deployment (AWS, Azure, GCP)

```bash
# Build for Linux AMD64 (required for most cloud platforms)
docker build --platform linux/amd64 -t <your-username>/<image-name>:<tag> .

# Build and push directly to registry
docker buildx build --platform linux/amd64 -t <your-username>/<image-name>:<tag> --push .
# Build a multi-arch image (works on x86 and ARM cloud nodes) and push directly to the registry
docker buildx build --platform linux/amd64,linux/arm64 -t <your-username>/<image-name>:<tag> --push .
```


Expand All @@ -287,7 +309,7 @@
kind: Job
metadata:
name: <JOBNAME> # Set a job name e.g. ingestor-job-train
namespace: <NAMESPACE> # Use the client namespace
namespace: <workspace> # Use the client namespace
spec:
template:
spec:
Expand All @@ -297,7 +319,7 @@
imagePullPolicy: Always # Use IfNotPresent only for local tests
volumeMounts:
- name: shared-volume
mountPath: "/data/shared" # Client shared storage. Target for copied files, not the local source path
mountPath: "/data/shared" # Client shared PVC. Backed by ~/.tracebloc/<workspace>/data on the client host — read your dataset from here
env:
# Client credentials
- name: CLIENT_ENV
Expand All @@ -315,25 +337,23 @@
- name: MYSQL_HOST # value has to match the mysql deployment name in the client values.yaml
value: "mysql-client"

# Dataset information
# Dataset information — paths inside the ingestor pod.
# /data/shared is the mount of the client-pvc, which is backed by
# ~/.tracebloc/<workspace>/data on the client host.
- name: SRC_PATH
value: "/app" # Source folder path within the data ingestor
value: "/data/shared" # Root of the mounted dataset directory
- name: LABEL_FILE
value: <PATH_TO_DATASET_OR_LABELS_FILE_IN_DOCKER_CONTAINER> # Example: "/app/labels.csv"
- name: COMPANY
value: <YOUR_COMPANY_OR_ORGANISATION_NAME>
value: "/data/shared/labels.csv" # Path to the labels CSV inside the pod
- name: TABLE_NAME
value: <UNIQUE_TABLE_NAME> # Different for train and test, no spaces
- name: TITLE
value: <DATASET_TITLE> # Optional
- name: BATCH_SIZE
value: "4000" # Number of entries per request. Depends on CPU memory, not data size. 5,000 is a safe default, tested up to 10,000.
value: "4000" # Optional, defaults to 4000
- name: LOG_LEVEL
value: "DEBUG" # Set DEBUG, "WARNING", "INFO" or "ERROR"
imagePullSecrets:
- name: regcred
nodeSelector:
type: system
volumes:
- name: shared-volume
persistentVolumeClaim:
Expand All @@ -347,30 +367,28 @@
- `image`, your Docker image (imagePullPolicy: Always for DockerHub, IfNotPresent for local)
- `CLIENT_ID`, `CLIENT_PASSWORD` from the [tracebloc client view](https://ai.tracebloc.io/clients)
- `TABLE_NAME`, unique per dataset, train and test use different names, no spaces. Different names for train and test data is mandatory
- `LABEL_FILE`, path inside the ingestor container, for images this is usually a CSV with file path and label columns. Ensure it matches the copy path in the `Dockerfile`
- `PATH_TO_LOCAL_DATASET_FILE`, path to your dataset file within the container
- `SRC_PATH`, root inside the container where your files are mounted
- `YOUR_COMPANY_OR_ORGANISATION_NAME`, chose a suitable company or organisation name
- `BATCH_SIZE`, number of entries sent per request. Depends on available CPU memory, not data size (e.g. image dimensions). Too large can exhaust memory. Tested up to 10,000, but 5,000 is a safe default for most systems.
- `LABEL_FILE`, path inside the ingestor pod (under `/data/shared`) to the CSV with file paths and labels — must match the location of the file you placed in `~/.tracebloc/<workspace>/data/`

Check warning on line 370 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L370

Did you really mean 'ingestor'?
- `SRC_PATH`, root inside the pod where the dataset directory is mounted (`/data/shared`)
- `BATCH_SIZE` is the number of entries sent to the server per request. Optional — defaults to 4000. Keep it consistent across data types. It depends on available CPU memory, not for example image size. Too large can exhaust memory. It was tested up to 10,000, but 5,000 is a safe default for most systems.
- `LOG_LEVEL`, "WARNING" for all warnings and errors, "INFO" for all logs, "ERROR" for errors only

### 4. Deploy

Run the ingestor as a Kubernetes Job:

Check warning on line 377 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L377

Did you really mean 'ingestor'?

```bash
kubectl apply -f ingestor-job.yaml -n <namespace>
kubectl wait -n <namespace> --for=condition=complete job/<INGESTOR_JOB_NAME>
kubectl logs -n <namespace> job/<INGESTOR_JOB_NAME>
kubectl apply -f ingestor-job.yaml -n <workspace>
kubectl wait -n <workspace> --for=condition=complete job/<INGESTOR_JOB_NAME>
kubectl logs -n <workspace> job/<INGESTOR_JOB_NAME>

# Delete the job only after verifying logs
kubectl delete -n <namespace> job/<INGESTOR_JOB_NAME>
kubectl delete -n <workspace> job/<INGESTOR_JOB_NAME>
```
This will start a pod, run the ingestion process once, and once complete you can delete the job.

**IMPORTANT:** You must run this process twice — once for training data and once for test data. Use different `JOBNAME` and `TABLE_NAME` values for each run (e.g. `ingestor-job-train` / `ingestor-job-test`), and set `intent` to `TRAIN` or `TEST` accordingly in your template script.

The data ingestor always runs a validation step before ingestion and moving files.

Check warning on line 391 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L391

Did you really mean 'ingestor'?


#### Verify Deployment
Expand All @@ -378,8 +396,8 @@
Verify if jobs and pods are deployed successfully and running:

```bash
kubectl get jobs,pods -n <namespace>
kubectl logs -n <namespace> <pod-name>
kubectl get jobs,pods -n <workspace>
kubectl logs -n <workspace> <pod-name>
```

Look for "All records processed successfully" in the logs.
Expand All @@ -392,7 +410,7 @@
**Interface displays:**
- Dataset name, ID, and record count
- Data type (Tabular, Image, Text) and purpose (Training/Testing)
- Namespace and GPU requirements

Check warning on line 413 in create-use-case/prepare-dataset.mdx

View check run for this annotation

Mintlify / Mintlify Validation (tracebloc-develop) - vale-spellcheck

create-use-case/prepare-dataset.mdx#L413

Did you really mean 'Namespace'?

## Best Practices
- Deploy jobs for training and testing simultaneously using different job names
Expand All @@ -402,17 +420,17 @@

## Troubleshooting

**Recommended for debugging:** Use [k9s](https://k9scli.io/), a terminal-based Kubernetes dashboard, to monitor jobs, pods, and logs in real time. Run `k9s -n <NAMESPACE>` to get a live view of resources, switch between them instantly, and inspect logs or events with a few keystrokes. Compared to kubectl, it is faster and more convenient.
**Recommended for debugging:** Use [k9s](https://k9scli.io/), a terminal-based Kubernetes dashboard, to monitor jobs, pods, and logs in real time. Run `k9s -n <workspace>` to get a live view of resources, switch between them instantly, and inspect logs or events with a few keystrokes. Compared to kubectl, it is faster and more convenient.

**Stale Kubernetes Job preventing new Job execution:**
```bash
kubectl delete job ingestor-job -n <namespace>
kubectl delete job ingestor-job -n <workspace>
kubectl logs <pod-name>
```

**Storage Issues:**
```bash
kubectl get pvc -n <namespace>
kubectl get pvc -n <workspace>
```

---
Expand Down
Loading
Loading