Skip to content

Commit 1e4e224

Browse files
committed
[blog] adds Using Codex for education at Dagster Labs
1 parent c31be99 commit 1e4e224

File tree

1 file changed

+151
-0
lines changed

1 file changed

+151
-0
lines changed
Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
title: 'Using Codex for education at Dagster Labs'
3+
date: "2025-10-27"
4+
tags: ["ai", "education"]
5+
categories: ["ai"]
6+
---
7+
8+
I recently had the opportunity to showcase how we use Codex for educational content at Dagster Labs which was featured on the OpenAI developer blog; you can find a full copy of that post below.
9+
10+
<!--more-->
11+
12+
https://developers.openai.com/blog/codex-for-documentation-dagster
13+
14+
## Overview
15+
16+
At [Dagster Labs](https://dagster.io/), we produce a lot of technical educational content for data engineers, machine learning engineers, and analysts to better understand how to use Dagster, an open source workflow orchestration framework. Because our users come from varied technical backgrounds, we’ve found it essential to meet each persona at the right technical depth.
17+
18+
In this post, I’ll share how we use OpenAI’s Codex to accelerate documentation, translate content across mediums, and even measure how complete our docs are.
19+
The power of CONTRIBUTING.md files
20+
21+
To make it easier for our community members and internal engineers to contribute documentation, we overhauled our [CONTRIBUTING.md](https://github.com/dagster-io/dagster/blob/3c2d36054f4014ca8316e533975a538d6eff62c4/docs/CONTRIBUTING.md) file. To our surprise, we had inadvertently significantly improved the utility of Codex. It turns out there is serious value in clearly outlining the hierarchy, structure, and best practices for writing documentation in your code base. Both for humans and robots.
22+
23+
# Contributing documentation
24+
25+
## Content
26+
27+
### Links
28+
29+
#### Use full paths instead of relative links
30+
31+
Docusaurus doesn't always render relative links correctly, which can result in users seeing intermittent 404s when accessing those links. Use full paths instead of relative links, like this:
32+
33+
```
34+
For more information, see "[Defining assets](/guides/build/assets/defining-assets)".
35+
```
36+
37+
instead of this:
38+
39+
```
40+
For more information, see "[Defining assets](defining-assets)".
41+
```
42+
43+
#### Use non-trailing slash links to Dagster docs
44+
45+
e.g. use `/guides/build/assets/defining-assets` instead of `/guides/build/assets/defining-assets/`.
46+
47+
**Context:** Links to Dagster docs with trailing slashes automatically redirect to non-trailing slash links. While that's helpful for docs links we don't control, too many redirects on our own pages can confuse search engines and cause SEO issues.
48+
49+
### API documentation
50+
51+
...
52+
53+
Codex is only as good as the scaffolding you give it. A well-structured CONTRIBUTING.md becomes both documentation for humans and a map for AI.
54+
55+
## Codex for understanding
56+
57+
Beyond writing docs, Codex can act as an always-available code explainer. For developer advocates and technical writers, this has been invaluable. In open source projects, or projects with many engineers, it can often be difficult to stay up-to-date on all of the features being developed, and how they work. This is especially true for smaller teams of developer advocates and technical writers. We’ve found that some of the best assistance Codex provides is through explaining pull requests, or pointing it to a part of the codebase and asking for an explanation.
58+
59+
A tip we’ve found here is to leverage the gh command from within Codex to explain pull requests. Ask it to review the PR description and diff, summarize why the feature was implemented, and explain how it should be exposed to end users.
60+
The power of the mono repo
61+
62+
This might be a controversial opinion, but I’m a big fan of mono repos. When context is key, having everything accessible in a single repository makes it so much easier to get access to what you need, and in the case of Codex, this means complete context: code, docs, and examples all in one place.
63+
64+
While some people have expressed concern about tools like Codex failing to scale with repositories, I’ve found this not to be the case. By making use of file references within Codex (@) you can give Codex the initial sub directory or file to start with before further exploration. And there are some serious benefits to having your framework code alongside your documentation in a single repository. This setup lets us prompt Codex to read framework code and draft initial documentation scaffolds that we can then refine.
65+
66+
Here’s an example where we prompted Codex to review an existing pull request, and add a section to the documentation on how exactly these environment variables are useful when configuring your deployment.
67+
68+
>_ You are using OpenAI Codex in ~/src/dagster
69+
70+
To get started, describe a task or try one of these commands:
71+
72+
/init - create an AGENTS.md file with instructions for Codex
73+
/status - show current session configuration and token usage
74+
/approvals - choose what Codex can do without approval
75+
/model - choose what model and reasoning effort to use
76+
77+
▌ Review the changes made in pull request 32557 using `gh pr diff <number>`, review the environment
78+
▌ variables that were introduced and already exist for configuring the gRPC server, then update
79+
▌ docs/docs/deployment/oss/oss-instance-configuration.md page to outline the configuration options.
80+
81+
> I’ll fetch the diff for PR 32557 and scan for added/updated gRPC-related
82+
environment variables, then cross-check with existing env var usage in the
83+
repo.
84+
85+
• Ran gh pr diff 32557 | sed -n '1,200p'
86+
87+
• Ran gh pr diff 32557 | sed -n '200,400p'
88+
89+
• Explored
90+
└ Search DAGSTER_GRPC_[A-Z_]+ in dagster
91+
Read api.py, code_server.py, oss-instance-configuration.md
92+
Search DAGSTER_CODE_SERVER_[A-Z_]+
93+
Read constants.py
94+
Search DAGSTER_(REPOSITORY|SCHEDULE|SENSOR)_GRPC_TIMEOUT_SECONDS|
95+
DAGSTER_GRPC_SHUTDOWN_GRACE_PERIOD in dagster
96+
Read proxy_server_manager.py
97+
98+
> Next, I’ll expand the gRPC servers section of the instance configuration docs
99+
to include API gRPC server variables and communication timeouts/limits, keeping
100+
style consisten
101+
102+
...
103+
104+
You can find the pull request that was produced by this session at [dagster-io/dagster#32558](https://github.com/dagster-io/dagster/pull/32558)
105+
106+
## Translating between content mediums
107+
108+
Different personas prefer different learning formats, but the underlying ideas are often the same. Because of that, we produce a variety of content mediums: blog posts, tutorials, online courses, YouTube videos, and more. The core content of these materials can often be the same, just presented differently depending on the target persona.
109+
110+
Codex works exceptionally well at translating between these mediums. For example, taking a tutorial and producing a transcript for a YouTube video. Or taking a low-level tutorial, and abstracting it slightly for a blog post. In the below screenshot you can find an example prompt used to take one of our example projects and produce a video transcript. This approach saves hours of rewriting while keeping our messaging consistent across channels.
111+
112+
>_ You are using OpenAI Codex in ~/src/dagster
113+
114+
▌Review the example project: examples/docs_projects/project_dagster_modal_pipes
115+
▌and produce a video transcript to teach users the core Dagster and Modal concept
116+
▌covered. Break the transcript into sections, and include code snippets to be
117+
▌showcased in the video. The prose of the script should be fun but professional.
118+
▌Save this to MODAL_VIDEO_TRANSCRIPT.md.
119+
120+
> I’m going to scan the example project to understand its structure and code
121+
before drafting the video transcript.
122+
123+
• Ran
124+
└ bash -lc 'ls -la' bash -lc 'ls -la examples/docs_projects/
125+
project_dagster_modal_pipes' bash -lc "rg --files examples/docs_projects/
126+
project_dagster_modal_pipes | sed -n '1,120p'"
127+
128+
• Explored
129+
└ List project_dagster_modal_pipes
130+
Search rg --files examples/docs_projects/project_dagster_modal_pipes
131+
Read README.md, definitions.py, __init__.py, resources.py, constants.py,
132+
feeds.py, pipeline_factory.py, summarize.py, config.py, transcribe.py,
133+
hello_dagster.py, verify_cloud_mount.py
134+
135+
• Added MODAL_VIDEO_TRANSCRIPT.md (+335 -0)
136+
137+
See the full video transcript [here](https://gist.github.com/cmpadden/a371c1300bdc42601a77b3eefb938483)!
138+
139+
## Evaluating documentation coverage
140+
141+
One of the more experimental ways we’re using Codex is as a proxy for human understanding.
142+
143+
By using documentation as the source-of-truth, and base context for Codex, we can have it generate code. For example, people often use Dagster for running and observing their dbt data models alongside other data processing code.
144+
145+
By prompting Codex to reference the documentation and produce the code for this project, we can then run a test suite on the resulting code to check that it operates as expected. If it works as expected we can assume that our documentation has adequate coverage of the required content. If Codex can generate working code purely from our documentation, it’s a strong signal that humans can as well which becomes an indirect measure of documentation completeness.
146+
147+
## Summary
148+
149+
All in all, the Dagster team has found Codex to be immensely helpful for creating, reviewing, and translating education content. It has enabled us to scale beyond our original capacity, helped us ensure that we have adequate documentation coverage as the framework evolves, and more importantly, it has made it so that we can more easily support our community.
150+
151+
Codex has underscored how important context and structure are. For us, that means refining our documentation architecture so both humans and AI can navigate it easily. This feedback loop, powered by AI, has improved both how we create content and how users generate framework code. As AI tools evolve, the line between documentation, code, and automation will blur. Teams that treat documentation as structured data will have a major advantage.

0 commit comments

Comments
 (0)