feat(android): resolve consumer locales against shipped translation bundles#493
Open
feat(android): resolve consumer locales against shipped translation bundles#493
Conversation
ba9eaed to
a40dbff
Compare
Adds a Vite build plugin that scans `src/translations/` and emits `dist/supported-locales.json` — a sorted array of the locale tags we actually ship. The native iOS and Android sides consume this so the "what do we ship?" answer has exactly one source of truth. Also switches `loadTranslations` from a dynamic `import()` (which threw on a missing locale and fell back to English from the catch) to an `import.meta.glob` lookup that returns early with a warning when the tag isn't in the static map. Same exact-match-or-English behaviour, but the loader map is enumerable at build time so the failure mode is explicit rather than catch-driven. The native side now resolves consumer-supplied locales to a shipped tag before the value reaches JS, so the JS load path doesn't need its own resolver — anything not in the static glob is a bug upstream and falls back to English with a warn().
a40dbff to
7863e1d
Compare
a262058 to
d5eaa76
Compare
…undles Adds an Android `LocaleResolver` and a new `EditorConfiguration.Builder.setLocale(locale: Locale)` that resolves against a compile-time-generated set of shipped locales before storing the tag for serialization. The previously-public `setLocale(String?)` overload is removed; an `internal` `@JvmSynthetic setLocaleTag(String?)` is reserved for `toBuilder` round-trip and tests. The set of shipped locales is generated into a Kotlin `internal object SupportedLocales` at build time from the JS-side manifest, so a missing manifest fails the gradle build instead of silently falling through to English at runtime. The Gradle task uses `JsonSlurper` to parse the manifest and validates each entry is a string; non-string entries fail the task with a clear message. ## Resolution chain For an input locale, normalised to lowercase with `_` → `-`: 1. Full tag (`xx-yy`) — match if shipped 2. Script-implied region for macrolanguages we ship disjoint regional bundles for (e.g. `zh-Hant-HK` → `zh-tw`, `zh-Hans` → `zh-cn`) 3. Language-only tag (`xx`) — match if shipped 4. Fall back to `en` Inputs are parsed as BCP-47 via `Locale.forLanguageTag`, so script- tagged inputs like `zh-Hans-CN` collapse to `zh-cn` rather than falling through to English. Variant and Unicode-extension subtags (e.g. `de-DE-u-ca-gregory`) are ignored — the editor doesn't vary translations by calendar or numbering system. Legacy ISO 639-1 codes that Android's `Locale` class still emits (`iw` → `he`, `in` → `id`, `no` → `nb`) are aliased to canonical bundle names before lookup, so Hebrew/Indonesian/Norwegian users on devices reporting the legacy codes don't silently land on English. ## Why a Locale and not a String A locale string is a lossy encoding of what the system actually knows. Android hands the consumer a `Locale` — language, region, script, variant, extensions — and any boundary that flattens that to a string before the library decodes it throws data away. Taking a `Locale` at the boundary keeps signal the string would have dropped: the script subtag lets `zh-Hant-HK` resolve to `zh-tw` instead of English, and Android's legacy ISO 639-1 codes get aliased to the canonical bundle names before lookup. ## Tests - Curated `LocaleResolverTest` covers the resolution chain (full-tag → script-implied region → language-only → `en` fallback, normalisation of `pt_BR` / `EN_GB` / etc., script subtags, legacy alias mapping). - A parameterised test asserts that every locale in the generated `SupportedLocales.ALL` resolves to itself — catches regressions where a locale gets added but the resolver mishandles it. Reads from the generated constant directly, so the test can never drift from what the resolver actually uses in production. - Builder-level integration test exercises `setLocale(Locale)` through to `config.locale` against the shipped manifest. `make test-android` now depends on `make build` so the manifest is populated before the exhaustive test runs. Refs #490.
d5eaa76 to
5d2d4ac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
LocaleResolverand a newEditorConfiguration.Builder.setLocale(locale: Locale)method that resolves against a compile-time-generated set of shipped locales before storing the tag for serialization.setLocale(String?)overload. It is replaced by aninternalsetLocaleTag(String?)reserved fortoBuilderround-trip and tests; external consumers must go through theLocaleAPI.internal object SupportedLocalesat build time from the JS-side manifest, so a missing manifest fails the gradle build instead of silently falling through to English at runtime.Refs #490.
Root Cause
A locale string is a lossy encoding of what the system actually knows. Android hands the consumer a
Locale— language, region, script, variant, extensions — and any boundary that flattens that to a string before the library decodes it throws data away. WP-Android'sPerAppLocaleManager.getCurrentLocaleLanguageCode()returnsLocale.getLanguage()— ISO 639-1 only — so a Brazilian user sendspt(matches the language-onlyptbundle but not the regional Brazilian one), a Chinese user sendszh(nozh-cn/zh-twmatch → English), and a user who pickeden_GB/nl_BE/pt_BRin the system language picker loses their region variant before we ever reach GutenbergKit.The library is the obvious place to decode a
Localeinto a wire-format tag, because the library is the only place that knows what bundles actually ship. Historically the supported list lived only inbin/prep-translations.js, so consumers had to mirror that table to do the right thing — and historically hadn't. Taking aLocaleat the boundary also keeps signal the string would have dropped: the script subtag letszh-Hant-HKresolve tozh-twinstead of English, and Android's legacy ISO 639-1 codes (iwfor Hebrew,infor Indonesian) get aliased to the canonical bundle names before lookup.Changes
Build
vite.config.js: New
emitSupportedLocalesManifestplugin scanssrc/translations/at build time and emitsdist/supported-locales.json(sorted array of locale tags). The existingmake copy-dist-androidtarget ships it into the library'sassets/.android/Gutenberg/build.gradle.kts: New
:Gutenberg:generateSupportedLocalesgradle task reads the manifest fromsrc/main/assets/supported-locales.jsonand emits aninternal object SupportedLocales { val ALL: Set<String> }intobuild/generated/source/locales/main/. Wired into themainsource set and runs ahead of everycompile*Kotlintask. A missing manifest fails the build with a message pointing atmake build, so the silent-fall-through-to-English failure mode is unreachable in shipped artifacts. Manifest parsing is type-checked — non-string entries fail the task with a clear message instead of silently shipping a nonsense locale..gitignore: The generated
supported-locales.jsonis treated the same as the other build outputs inassets/so it cannot be accidentally committed.Resolution chain
For an input locale, normalised to lowercase with
_→-:xx-yy) — match if shippedzh-Hant-HK→zh-tw,zh-Hans→zh-cn)xx) — match if shippedenLegacy ISO 639-1 codes that Android's
Localeclass still emits (iw→he,in→id,no→nb) are aliased to their canonical bundle names before lookup, so Hebrew/Indonesian/Norwegian users on devices reporting the legacy codes don't silently land on English.Implemented in android/Gutenberg/src/main/java/org/wordpress/gutenberg/model/LocaleResolver.kt as an
internal classwith a cachedLocaleResolver.Defaultsingleton the builder reuses. No runtime IO, noContextparameter.API change
The previously-public
setLocale(String?)is removed. Consumers must pass aLocalevalue; the resolver decides the wire-format string. The internalsetLocaleTagexists sotoBuildercan round-trip a stored tag without re-running resolution.Tests
LocaleResolverTestcovers the resolution chain (full-tag → script-implied region → language-only →enfallback, normalisation ofpt_BR/EN_GB/ etc., script subtags, legacy alias mapping).SupportedLocales.ALLresolves to itself — catches regressions where a locale gets added but the resolver mishandles it. Reads from the generated constant directly, so the test can never drift from what the resolver actually uses in production.setLocale(Locale)through toconfig.localeagainst the shipped manifest.What We Explored
assets/at runtime — the original direction. Worked, but had three problems collapsed into one fix by the build-time generator: a missing manifest silently degraded every consumer to English with no log signal; the builder did a disk read on everysetLocalecall (StrictMode-visible on the main thread); and the API needed aContextparameter purely to reachassets/. Generating a Kotlin constant at build time fails the build if the manifest is missing, removes the runtime IO entirely, and letssetLocaletake just aLocale.assets/— Vite hashes the chunk filenames (pt-br-UCkBcRdR.js) and prefixes can collide (nl-be-...vsnl-...), so a parser would have to re-encode the SUPPORTED_LOCALES list anyway. A manifest is simpler and unambiguous.zh-Hans-CNas language-onlyzhwhile the native side correctly collapsed it tozh-cn). The JS load path now trusts the native resolution result and falls back to English for anything not in the static glob.Behaviour change
Removing the string overload is a breaking change for any caller that was passing a raw string. The migration is mechanical (`setLocale("pt_BR")` → `setLocale(Locale.forLanguageTag("pt-BR"))`).
Test plan
Out of scope
Related