Skip to content

feat(android): resolve consumer locales against shipped translation bundles#493

Open
jkmassel wants to merge 2 commits intotrunkfrom
jkmassel/locale-resolver-android
Open

feat(android): resolve consumer locales against shipped translation bundles#493
jkmassel wants to merge 2 commits intotrunkfrom
jkmassel/locale-resolver-android

Conversation

@jkmassel
Copy link
Copy Markdown
Contributor

@jkmassel jkmassel commented May 5, 2026

Summary

  • Adds an Android LocaleResolver and a new EditorConfiguration.Builder.setLocale(locale: Locale) method that resolves against a compile-time-generated set of shipped locales before storing the tag for serialization.
  • Drops the public setLocale(String?) overload. It is replaced by an internal setLocaleTag(String?) reserved for toBuilder round-trip and tests; external consumers must go through the Locale API.
  • The set of shipped locales is generated into a Kotlin internal object SupportedLocales at build time from the JS-side manifest, so a missing manifest fails the gradle build instead of silently falling through to English at runtime.
  • The shared Vite manifest plugin from Resolve consumer locales against shipped translation bundles #490 also lands here; the iOS sibling PR carries the same plugin. Whichever lands second is a no-op for that file.

Refs #490.

Root Cause

A locale string is a lossy encoding of what the system actually knows. Android hands the consumer a Locale — language, region, script, variant, extensions — and any boundary that flattens that to a string before the library decodes it throws data away. WP-Android's PerAppLocaleManager.getCurrentLocaleLanguageCode() returns Locale.getLanguage() — ISO 639-1 only — so a Brazilian user sends pt (matches the language-only pt bundle but not the regional Brazilian one), a Chinese user sends zh (no zh-cn/zh-tw match → English), and a user who picked en_GB / nl_BE / pt_BR in the system language picker loses their region variant before we ever reach GutenbergKit.

The library is the obvious place to decode a Locale into a wire-format tag, because the library is the only place that knows what bundles actually ship. Historically the supported list lived only in bin/prep-translations.js, so consumers had to mirror that table to do the right thing — and historically hadn't. Taking a Locale at the boundary also keeps signal the string would have dropped: the script subtag lets zh-Hant-HK resolve to zh-tw instead of English, and Android's legacy ISO 639-1 codes (iw for Hebrew, in for Indonesian) get aliased to the canonical bundle names before lookup.

Changes

Build

vite.config.js: New emitSupportedLocalesManifest plugin scans src/translations/ at build time and emits dist/supported-locales.json (sorted array of locale tags). The existing make copy-dist-android target ships it into the library's assets/.

android/Gutenberg/build.gradle.kts: New :Gutenberg:generateSupportedLocales gradle task reads the manifest from src/main/assets/supported-locales.json and emits an internal object SupportedLocales { val ALL: Set<String> } into build/generated/source/locales/main/. Wired into the main source set and runs ahead of every compile*Kotlin task. A missing manifest fails the build with a message pointing at make build, so the silent-fall-through-to-English failure mode is unreachable in shipped artifacts. Manifest parsing is type-checked — non-string entries fail the task with a clear message instead of silently shipping a nonsense locale.

.gitignore: The generated supported-locales.json is treated the same as the other build outputs in assets/ so it cannot be accidentally committed.

Resolution chain

For an input locale, normalised to lowercase with _-:

  1. Full tag (xx-yy) — match if shipped
  2. Script-implied region for macrolanguages we ship disjoint regional bundles for (e.g. zh-Hant-HKzh-tw, zh-Hanszh-cn)
  3. Language-only tag (xx) — match if shipped
  4. Fall back to en

Legacy ISO 639-1 codes that Android's Locale class still emits (iwhe, inid, nonb) are aliased to their canonical bundle names before lookup, so Hebrew/Indonesian/Norwegian users on devices reporting the legacy codes don't silently land on English.

Implemented in android/Gutenberg/src/main/java/org/wordpress/gutenberg/model/LocaleResolver.kt as an internal class with a cached LocaleResolver.Default singleton the builder reuses. No runtime IO, no Context parameter.

API change

fun setLocale(locale: Locale)                     // new — runs resolution
internal fun setLocaleTag(locale: String?)        // toBuilder + tests only

The previously-public setLocale(String?) is removed. Consumers must pass a Locale value; the resolver decides the wire-format string. The internal setLocaleTag exists so toBuilder can round-trip a stored tag without re-running resolution.

Tests

  • Curated LocaleResolverTest covers the resolution chain (full-tag → script-implied region → language-only → en fallback, normalisation of pt_BR / EN_GB / etc., script subtags, legacy alias mapping).
  • A parameterised test asserts that every locale in the generated SupportedLocales.ALL resolves to itself — catches regressions where a locale gets added but the resolver mishandles it. Reads from the generated constant directly, so the test can never drift from what the resolver actually uses in production.
  • Builder-level integration test exercises setLocale(Locale) through to config.locale against the shipped manifest.

What We Explored

  • Hard-coding the supported set in native code — the footgun the issue calls out. Rejected.
  • Reading the manifest from assets/ at runtime — the original direction. Worked, but had three problems collapsed into one fix by the build-time generator: a missing manifest silently degraded every consumer to English with no log signal; the builder did a disk read on every setLocale call (StrictMode-visible on the main thread); and the API needed a Context parameter purely to reach assets/. Generating a Kotlin constant at build time fails the build if the manifest is missing, removes the runtime IO entirely, and lets setLocale take just a Locale.
  • Reading the locale list from filenames in assets/ — Vite hashes the chunk filenames (pt-br-UCkBcRdR.js) and prefixes can collide (nl-be-... vs nl-...), so a parser would have to re-encode the SUPPORTED_LOCALES list anyway. A manifest is simpler and unambiguous.
  • A JS-side mirror resolver — initially this PR carried one to defend the JS load path against opaque locale strings. With the native side now resolving to a canonical tag before the value reaches the editor, the JS resolver was redundant duplication — and during review, a divergence had already crept in (JS treated zh-Hans-CN as language-only zh while the native side correctly collapsed it to zh-cn). The JS load path now trusts the native resolution result and falls back to English for anything not in the static glob.

Behaviour change

Input Before After
`setLocale(Locale("pt", "BR"))` (didn't exist) `pt-br` ✅
`setLocale(Locale("zh", "CN"))` (didn't exist) `zh-cn` ✅
`setLocale(Locale("nl", "BE"))` (didn't exist) `nl-be` ✅
`setLocale(Locale("fr", "CA"))` (didn't exist) `fr` ✅ (regional bundle absent → language fallback)
`setLocale(Locale.forLanguageTag("zh-Hant-HK"))` (didn't exist) `zh-tw` ✅ (script-implied region)
`setLocale(Locale("iw", "IL"))` (didn't exist) `he` ✅ (legacy alias)
`setLocale("pt_BR")` (was opaque) `pt_BR` (no match → English) does not compile
`Locale.getLanguage()` returning `zh` (existing path) English English (unchanged — no `zh` bundle; consumers passing the full `Locale` now get `zh-cn`)

Removing the string overload is a breaking change for any caller that was passing a raw string. The migration is mechanical (`setLocale("pt_BR")` → `setLocale(Locale.forLanguageTag("pt-BR"))`).

Test plan

  • `./android/gradlew :Gutenberg:testDebugUnitTest --tests "org.wordpress.gutenberg.model.LocaleResolverTest"` — curated + 49-locale exhaustive, all pass
  • `./android/gradlew :Gutenberg:testDebugUnitTest --tests "org.wordpress.gutenberg.model.EditorConfigurationBuilderTest"` — existing suite still green (callsites that passed strings updated to `setLocaleTag`)
  • `./android/gradlew detekt` — clean
  • Deleting `src/main/assets/supported-locales.json` and re-running `:Gutenberg:generateSupportedLocales` fails with the expected "run `make build`" message
  • `npm run lint:js` on the changed files — clean

Out of scope

  • Changing the `SUPPORTED_LOCALES` list itself.
  • Changing the wire format of `EditorConfiguration.locale` — still an opaque string on the JS side.
  • Pluralization / RTL handling — separate concerns.

Related

@github-actions github-actions Bot added the [Type] Enhancement A suggestion for improvement. label May 5, 2026
@jkmassel jkmassel force-pushed the jkmassel/locale-resolver-android branch 3 times, most recently from ba9eaed to a40dbff Compare May 5, 2026 20:14
Adds a Vite build plugin that scans `src/translations/` and emits
`dist/supported-locales.json` — a sorted array of the locale tags we
actually ship. The native iOS and Android sides consume this so the
"what do we ship?" answer has exactly one source of truth.

Also switches `loadTranslations` from a dynamic `import()` (which threw
on a missing locale and fell back to English from the catch) to an
`import.meta.glob` lookup that returns early with a warning when the
tag isn't in the static map. Same exact-match-or-English behaviour, but
the loader map is enumerable at build time so the failure mode is
explicit rather than catch-driven.

The native side now resolves consumer-supplied locales to a shipped tag
before the value reaches JS, so the JS load path doesn't need its own
resolver — anything not in the static glob is a bug upstream and falls
back to English with a warn().
@jkmassel jkmassel force-pushed the jkmassel/locale-resolver-android branch from a40dbff to 7863e1d Compare May 5, 2026 20:16
@jkmassel jkmassel requested a review from oguzkocer May 5, 2026 20:17
@jkmassel jkmassel marked this pull request as ready for review May 5, 2026 20:17
@jkmassel jkmassel force-pushed the jkmassel/locale-resolver-android branch 4 times, most recently from a262058 to d5eaa76 Compare May 5, 2026 20:49
…undles

Adds an Android `LocaleResolver` and a new
`EditorConfiguration.Builder.setLocale(locale: Locale)` that resolves
against a compile-time-generated set of shipped locales before storing
the tag for serialization. The previously-public `setLocale(String?)`
overload is removed; an `internal` `@JvmSynthetic setLocaleTag(String?)`
is reserved for `toBuilder` round-trip and tests.

The set of shipped locales is generated into a Kotlin
`internal object SupportedLocales` at build time from the JS-side
manifest, so a missing manifest fails the gradle build instead of
silently falling through to English at runtime. The Gradle task uses
`JsonSlurper` to parse the manifest and validates each entry is a
string; non-string entries fail the task with a clear message.

## Resolution chain

For an input locale, normalised to lowercase with `_` → `-`:

1. Full tag (`xx-yy`) — match if shipped
2. Script-implied region for macrolanguages we ship disjoint regional
   bundles for (e.g. `zh-Hant-HK` → `zh-tw`, `zh-Hans` → `zh-cn`)
3. Language-only tag (`xx`) — match if shipped
4. Fall back to `en`

Inputs are parsed as BCP-47 via `Locale.forLanguageTag`, so script-
tagged inputs like `zh-Hans-CN` collapse to `zh-cn` rather than falling
through to English. Variant and Unicode-extension subtags (e.g.
`de-DE-u-ca-gregory`) are ignored — the editor doesn't vary
translations by calendar or numbering system.

Legacy ISO 639-1 codes that Android's `Locale` class still emits
(`iw` → `he`, `in` → `id`, `no` → `nb`) are aliased to canonical
bundle names before lookup, so Hebrew/Indonesian/Norwegian users on
devices reporting the legacy codes don't silently land on English.

## Why a Locale and not a String

A locale string is a lossy encoding of what the system actually knows.
Android hands the consumer a `Locale` — language, region, script,
variant, extensions — and any boundary that flattens that to a string
before the library decodes it throws data away. Taking a `Locale` at
the boundary keeps signal the string would have dropped: the script
subtag lets `zh-Hant-HK` resolve to `zh-tw` instead of English, and
Android's legacy ISO 639-1 codes get aliased to the canonical bundle
names before lookup.

## Tests

- Curated `LocaleResolverTest` covers the resolution chain (full-tag →
  script-implied region → language-only → `en` fallback, normalisation
  of `pt_BR` / `EN_GB` / etc., script subtags, legacy alias mapping).
- A parameterised test asserts that every locale in the generated
  `SupportedLocales.ALL` resolves to itself — catches regressions
  where a locale gets added but the resolver mishandles it. Reads from
  the generated constant directly, so the test can never drift from
  what the resolver actually uses in production.
- Builder-level integration test exercises `setLocale(Locale)` through
  to `config.locale` against the shipped manifest.

`make test-android` now depends on `make build` so the manifest is
populated before the exhaustive test runs.

Refs #490.
@jkmassel jkmassel force-pushed the jkmassel/locale-resolver-android branch from d5eaa76 to 5d2d4ac Compare May 5, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Type] Enhancement A suggestion for improvement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant