Skip to content

CLI slugify drops non-ASCII characters from job and workflow identifiers #1388

@elias-ba

Description

@elias-ba

Came across this while investigating lightning#4577 (allowing special characters in workflow step names). Flagging it here because it's a pre-existing bug in @openfn/project that affects the CLI's pull/deploy round-trip, and it'll get more user-visible once Lightning loosens its own validation.

The slugify() helper in packages/project/src/util/slugify.ts is:

text.replace(/\W/g, ' ').trim().replace(/\s+/g, '-').toLowerCase()

JavaScript's \W is ASCII-only, so any non-Latin character gets treated as a separator. Running it on a few realistic inputs:

  • Vérifier l'état du patient becomes v-rifier-l-tat-du-patient
  • café becomes caf
  • résumé becomes r-sum
  • Anything written entirely in a non-Latin script collapses to an empty string

The problem is that slugify() isn't just used for URLs or display. It's used as identity. In parse/from-app-state.ts it sets the local workflow id (line 117), the local job id (line 164), and the keys used for edge references (lines 144 and 187). In serialize/to-app-state.ts it's the workflow key at line 60 and a fallback job key at line 155. So on pull, a non-Latin name becomes a collapsed or empty id locally, and on deploy those collapsed ids flow back as keys and break edge references. Two jobs in the same script would even collide to the same id.

This already affects workflow names today, since Lightning has no charset restriction on workflows. Step names being restricted was the only reason we hadn't seen it more. Once lightning#4577 merges, step names will be free too, and users with non-English names will hit it immediately when they try to pull their project via the CLI.

The fix direction I'd suggest: stop using slugify for identity. Keep the original string or the UUID as the local key. Slugification is still fine for filenames or URLs, but it shouldn't be what decides whether two things are the same thing.

For testing, the golden path is round-tripping a project with non-ASCII names (French accents, Arabic, Chinese, an apostrophe or two) and asserting the deployed state matches what Lightning sent. A worthwhile edge case is two jobs whose names share the same slugified form, to make sure they stay distinct locally.

Related: OpenFn/lightning#4577, OpenFn/apollo#446.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CLITracking issue to do with the CLI

    Type

    No type

    Projects

    Status

    Tech Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions