Skip to content

feat: add Trackio as a new experiment monitoring backend#8065

Open
chanduripranav wants to merge 1 commit into
deepspeedai:masterfrom
chanduripranav:feature/trackio-monitor-7964
Open

feat: add Trackio as a new experiment monitoring backend#8065
chanduripranav wants to merge 1 commit into
deepspeedai:masterfrom
chanduripranav:feature/trackio-monitor-7964

Conversation

@chanduripranav

Copy link
Copy Markdown

Closes #7964

Summary

Adds Trackio as a new experiment monitoring backend to DeepSpeed,
following the existing pattern used by WandB, TensorBoard, Comet, and CSV monitors.

Trackio is a lightweight, offline-first logging library developed by
Hugging Face with a WandB-compatible API. Runs can be visualized as
an HF Space or dataset on the HF Hub.

Changes

  • deepspeed/monitor/trackio.py — new TrackioMonitor class implementing
    the Monitor interface with log() and write_events() methods
  • deepspeed/monitor/config.py — new TrackioConfig with enabled and
    project fields; registered in get_monitor_config() and
    DeepSpeedMonitorConfig; included in check_enabled validator
  • deepspeed/monitor/utils.py — new check_trackio_availability() helper
    with install instructions
  • deepspeed/monitor/monitor.pyTrackioMonitor imported and wired
    into MonitorMaster.__init__() and write_events()

Usage

Add to your DeepSpeed config:

{
  "trackio": {
    "enabled": true,
    "project": "my-deepspeed-run"
  }
}

Testing

  • Verified pattern consistency with existing WandB and Comet backends
  • TrackioMonitor only initializes on rank 0, consistent with other monitors
  • Graceful ImportError with helpful install message if trackio not installed

…#7964)

- Add TrackioMonitor class in deepspeed/monitor/trackio.py
- Add TrackioConfig with enabled and project fields in config.py
- Add check_trackio_availability() helper in utils.py
- Register TrackioMonitor in MonitorMaster in monitor.py
- Trackio is a lightweight offline-first logging library with
  WandB-compatible API, logs can be visualized on HF Hub

Signed-off-by: Pranav Chanduri <preethivardhanc@gmail.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 486c1022a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +125 to +131
class TrackioConfig(DeepSpeedConfigModel):
"""Sets parameters for Trackio monitor."""

enabled: bool = False
""" Whether logging to Trackio is enabled. Requires `trackio` package is installed. """

project: str = "deepspeed"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Add required Trackio tests and docs

This introduces a new monitoring backend but the diff only changes production code; the workspace AGENTS.md requires new features to include corresponding tests and documentation updates. Please add coverage for the Trackio config/MonitorMaster wiring and logging behavior, and update the monitor docs so users can discover the new trackio configuration.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[REQUEST] Add Trackio as a New Backend for Experiment Monitoring

1 participant