ModelTrainer generates sm_train.sh with CRLF line endings on Windows causing training job failure

**PySDK Version**
- [ ] PySDK V3 (3.x)

**Describe the bug**
When using `ModelTrainer` with `SourceCode` on Windows, the SDK internally generates `sm_train.sh` with CRLF (`\r\n`) line endings. This causes the training job to fail immediately when the Linux container tries to execute it.

The root cause is in model_trainer.py in the _prepare_train_script method:
with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w") as f:
    f.write(train_script)

**To reproduce**
from sagemaker.train import ModelTrainer
from sagemaker.train.configs import SourceCode, Compute, InputData, OutputDataConfig

source_code = SourceCode(
    source_dir="src",
    entry_script="train.py",
    requirements="requirements.txt"
)

compute = Compute(
    instance_type="ml.m5.xlarge",
    instance_count=1
)

model_trainer = ModelTrainer(
    training_image="<xgboost-image-uri>",
    role="<iam-role>",
    source_code=source_code,
    compute=compute,
)

train_data = InputData(channel_name="train", data_source="s3://bucket/train/")
val_data = InputData(channel_name="validation", data_source="s3://bucket/val/")

model_trainer.train(input_data_config=[train_data, val_data], wait=True)

**Expected behavior**
sm_train.sh should always be written with LF (\n) line endings regardless of the host OS, since it will always be executed inside a Linux container.

**Error in CloudWatch Logs**
/opt/ml/input/data/sm_drivers/sm_train.sh: line 1: $'\r': command not found
/opt/ml/input/data/sm_drivers/sm_train.sh: line 3: set: -#015: invalid option
set: usage: set [-abefhkmnptuvxBCHP] [-o option-name] [--] [arg ...]
/opt/ml/input/data/sm_drivers/sm_train.sh: line 6: syntax error near unexpected token `$'{\r''
/opt/ml/input/data/sm_drivers/sm_train.sh: line 6: `handle_error() {#015'

**Proposed Fix**
## Current code (line in _prepare_train_script):
### with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w") as f:

## Fix — force LF line endings:
### with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w", newline="\n") as f:

**System information**
A description of your system. Please provide:
- **SageMaker Python SDK version**:3.x
- **Framework name  or algorithm **:XgBoost
- **Framework version**:1.7-1
- **Python version**:3.11.5
- **CPU or GPU**:CPU (ml.m5.xlarge)
- **Custom Docker image (Y/N)**:N

**Additional context**
This issue affects all Windows users of the new ModelTrainer API (PySDK V3). The sm_train.sh file is generated entirely by the SDK on the client machine and is never touched by the user, making it impossible to fix without either patching the SDK or switching to the older Estimator API. The fix is a single-character change adding newline="\n" to the open() call.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ModelTrainer generates sm_train.sh with CRLF line endings on Windows causing training job failure #5904

Current code (line in _prepare_train_script):

with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w") as f:

Fix — force LF line endings:

with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w", newline="\n") as f:

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

ModelTrainer generates sm_train.sh with CRLF line endings on Windows causing training job failure #5904

Description

Current code (line in _prepare_train_script):

with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w") as f:

Fix — force LF line endings:

with open(os.path.join(tmp_dir.name, TRAIN_SCRIPT), "w", newline="\n") as f:

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions