Skip to content

Commit d96faf5

Browse files
committed
Add population sizing doc
1 parent ce39146 commit d96faf5

File tree

1 file changed

+196
-0
lines changed

1 file changed

+196
-0
lines changed
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
id: population-sizing
3+
title: Population Sizing
4+
sidebar_position: 3
5+
---
6+
7+
Population sizing can be estimated using targeting context metrics available for Firefox on Desktop, iOS, and Android. Fields needed to translate Advanced Targeting expressions may not be fully available for all platforms.
8+
9+
This page covers how to estimate the number of clients that match a particular targeting expression, not how to decide how many enrolled clients an experiment might need.
10+
11+
### Accuracy
12+
13+
Population sizing accuracy of the queries below was investigated in [EXP-6101](https://mozilla-hub.atlassian.net/browse/EXP-6101), and the `(estimated population from preceding week)/(observed population)` ratios were in the following ranges for recent experiments run on Firefox versions that had been out for at least one week:
14+
15+
* Desktop: 0.9-1.3, average 1.163
16+
* Android: 1.02-1.21, average 1.136
17+
* iOS: 1.41-1.55, average 1.506
18+
19+
This means that Desktop estimates were up to +30% larger than observed enrollments, Android estimates were up to 21% larger than observed enrollments, iOS estimates were up to 55% larger than observed enrollments, so estimates may need to be adjusted accordingly to ensure sufficient minimum populations.
20+
21+
### Estimating for Desktop
22+
23+
The following query can be used to estimate available population for Firefox Desktop, if not using Redash then `{{ sql_targeting_expression }}` should be replaced with the actual sql expression:
24+
25+
```
26+
SELECT
27+
COUNT(DISTINCT metrics.uuid.legacy_telemetry_profile_group_id) AS available_weekly_population,
28+
FROM
29+
`mozdata.firefox_desktop.nimbus_targeting_context`
30+
WHERE
31+
DATE(submission_timestamp) BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
32+
AND ({{ sql_targeting_expression }})
33+
```
34+
35+
#### Translating Desktop Targeting Expressions from JEXL to SQL
36+
37+
To determine the SQL targeting expression for a given experiment, first get the full targeting expression from the Audience section, which is in JEXL format. A simple targeting expression might look like this:
38+
39+
```
40+
(browserSettings.update.channel in ["release"]) && (version|versionCompare('140.!') >= 0) && (locale in ['en-CA', 'en-GB', 'en-US']) && (region == 'US')
41+
```
42+
43+
Translating one condition at a time:
44+
* `browserSettings.update.channel` is available as `normalized_channel`, so `browserSettings.update.channel in ["release"]` could be translated as `normalized_channel = "release"`
45+
* When comparing only the major version, it's available as `metrics.quantity.nimbus_targeting_context_firefox_version`, so `(version|versionCompare('140.!') >= 0)` could be translated as `metrics.quantity.nimbus_targeting_context_firefox_version >= 140`
46+
* The full version is available as `metrics.string.nimbus_targeting_context_version`, so it could alternatively be translated as `mozfun.norm.extract_version(metrics.string.nimbus_targeting_context_version, "major") >= 140`
47+
* `locale` is available as `metrics.string.nimbus_targeting_context_locale`, and `[]` should be converted to `()` for BigQuery SQL `IN` expressions, so `locale in ['en-CA', 'en-GB', 'en-US']` could be translated as `metrics.string.nimbus_targeting_context_locale IN ('en-CA', 'en-GB', 'en-US')`
48+
* region is available as `metrics.string.nimbus_targeting_context_region`, and SQL uses a single `=` for equality comparison, so `region == 'US'` could be translated as `metrics.string.nimbus_targeting_context_region = 'US'`
49+
* `==`, `&&`, `||`, and `!` translate to `=`, `AND`, `OR`, and `NOT` respectively
50+
51+
So the SQL version of the targeting expression for Desktop could be:
52+
53+
```
54+
(normalized_channel = "release")
55+
AND (metrics.quantity.nimbus_targeting_context_firefox_version >= 140)
56+
AND (metrics.string.nimbus_targeting_context_locale IN ('en-CA', 'en-GB', 'en-US'))
57+
AND (metrics.string.nimbus_targeting_context_region = 'US')
58+
```
59+
60+
:::warning
61+
`NOT` has a different operator precedence in BigQuery than `!` in JEXL, so `NOT` should always be used inside a `()` with a single expression for clarity. For example `!isMac && isFxAEnabled` must be translated as `(NOT BOOL(metrics.object.nimbus_targeting_context_os.isMac)) AND metrics.boolean.nimbus_targeting_context_is_fx_a_enabled`
62+
:::
63+
64+
The following python script can be referenced for additional examples of converting JEXL targeting expression conditions to SQL for Desktop, but it is by no means comprehensive, and it may produce invalid SQL for some expressions. Of particular note is that it treats all pref values as boolean, and it cannot add parentheses to correct for the difference in `NOT` precedence.
65+
66+
<details><summary>`desktop_jexl_to_sql.py`</summary>
67+
68+
```python
69+
#!/usr/bin/env python3
70+
71+
import re
72+
import sys
73+
74+
jexl = sys.stdin.read()
75+
jexl = re.sub(r"\&\&", "AND", jexl)
76+
jexl = re.sub(r"\|\|", "OR", jexl)
77+
jexl = re.sub(r"== false", r"IS FALSE", jexl)
78+
jexl = re.sub(r"browserSettings.update.channel in \[([^]]*)\]", r"normalized_channel IN (\1)", jexl)
79+
jexl = re.sub(r"""browserSettings.update.channel == ("[^"]*")""", r"normalized_channel IN (\1)", jexl)
80+
jexl = re.sub(r"region in \[([^]]*)\]", r"metrics.string.nimbus_targeting_context_region IN (\1)", jexl)
81+
jexl = re.sub(r"""region == ("[^"]*")""", r"metrics.string.nimbus_targeting_context_region IN (\1)", jexl)
82+
jexl = re.sub(r"locale in \[([^]]*)\]", r"metrics.string.nimbus_targeting_context_locale IN (\1)", jexl)
83+
jexl = re.sub(r"""locale == ("[^"]*")""", r"metrics.string.nimbus_targeting_context_locale IN (\1)", jexl)
84+
jexl = re.sub(r"('[^']*') in enrollments", r"EXISTS(SELECT * FROM UNNEST(JSON_EXTRACT_ARRAY(metrics.object.nimbus_targeting_context_enrollments_map)) AS _ WHERE STRING(_.experimentSlug) = \1)", jexl)
85+
jexl = re.sub(r"enrollmentsMap\[('[^']*')\] == ('[^']*')", r"EXISTS(SELECT * FROM UNNEST(JSON_EXTRACT_ARRAY(metrics.object.nimbus_targeting_context_enrollments_map)) AS _ WHERE STRING(_.experimentSlug) = \1 AND STRING(_.branchSlug) = \2)", jexl)
86+
jexl = re.sub(r"\(version\|versionCompare\('([0-9]+).!'\) >= 0\)", r'metrics.quantity.nimbus_targeting_context_firefox_version >= \1', jexl)
87+
jexl = re.sub(r"\(experiment.slug in activeExperiments\) OR ", r"", jexl)
88+
jexl = re.sub(r"isFirstStartup", r"metrics.boolean.nimbus_targeting_context_is_first_startup", jexl)
89+
jexl = re.sub(r"os.isMac", r"BOOL(metrics.object.nimbus_targeting_context_os.isMac)", jexl)
90+
jexl = re.sub(r"os.isWindows && os.windowsVersion", r"os.windowsVersion", jexl)
91+
jexl = re.sub(r"os.windowsVersion", r"FLOAT64(metrics.object.nimbus_targeting_context_os.windowsVersion)", jexl)
92+
jexl = re.sub(r"os.windowsBuildNumber", r"INT64(metrics.object.nimbus_targeting_context_os.windowsBuildNumber)", jexl)
93+
jexl = re.sub(r"os.isWindows", r"metrics.object.nimbus_targeting_context_os.windowsVersion IS NOT NULL", jexl)
94+
jexl = re.sub(r"isFxAEnabled", r"metrics.boolean.nimbus_targeting_context_is_fx_a_enabled", jexl)
95+
jexl = re.sub(r"isFxASignedIn", r"metrics.boolean.nimbus_targeting_context_is_fx_a_signed_in", jexl)
96+
jexl = re.sub(r"hasActiveEnterprisePolicies", r"metrics.boolean.nimbus_targeting_context_has_active_enterprise_policies", jexl)
97+
jexl = re.sub(r"\(currentDate\|date - profileAgeCreated\|date\) / 86400000", r'(UNIX_MILLIS(SAFE.PARSE_TIMESTAMP("%a, %d %b %Y %H:%M:%S %Z", metrics.string.nimbus_targeting_context_current_date)) - metrics.quantity.nimbus_targeting_context_profile_age_created) / 86400000', jexl)
98+
jexl = re.sub(r"!\(('[^']*')\|preferenceIsUserSet\)", r"\1 NOT IN UNNEST(JSON_EXTRACT_STRING_ARRAY(metrics.object.nimbus_targeting_environment_user_set_prefs))", jexl)
99+
jexl = re.sub(r"\(('[^']*')\|preferenceIsUserSet\)", r"\1 IN UNNEST(JSON_EXTRACT_STRING_ARRAY(metrics.object.nimbus_targeting_environment_user_set_prefs))", jexl)
100+
for match in re.findall(r"('[^']*')\|preferenceValue", jexl):
101+
path = match.replace("-", "_").replace(".", "__").replace("'", "")
102+
jexl = jexl.replace(f"{match}|preferenceValue", f"metrics.object.nimbus_targeting_environment_pref_values.{path}")
103+
jexl = re.sub(r"(metrics.object.nimbus_targeting_environment_pref_values.[a-zA-Z0-9_]+)", r"BOOL(\1)", jexl)
104+
jexl = re.sub(r"!", r"NOT ", jexl)
105+
print(jexl)
106+
```
107+
108+
</details>
109+
110+
### Estimating for Firefox on Android and iOS
111+
112+
The following queries can be used to estimate available population for release channel experiments on Android and iOS, if not using Redash then `{{ sql_targeting_expression }}` should be replaced with the actual sql expression:
113+
114+
For Android release channel:
115+
116+
```sql
117+
SELECT
118+
COUNT(DISTINCT client_info.client_id) AS available_weekly_population,
119+
FROM
120+
`mozdata.org_mozilla_firefox.nimbus`
121+
WHERE
122+
DATE(submission_timestamp) BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
123+
AND ({{ sql_targeting_expression }})
124+
```
125+
126+
For iOS release the table is `mozdata.org_mozilla_ios_firefox.nimbus`:
127+
128+
```sql
129+
SELECT
130+
COUNT(DISTINCT client_info.client_id) AS available_weekly_population,
131+
FROM
132+
`mozdata.org_mozilla_ios_firefox.nimbus`
133+
WHERE
134+
DATE(submission_timestamp) BETWEEN CURRENT_DATE - 7 AND CURRENT_DATE - 1
135+
AND ({{ sql_targeting_expression }})
136+
```
137+
138+
#### Translating Android and iOS Targeting Expressions from JEXL to SQL
139+
140+
To determine the SQL targeting expression for a given experiment, first get the full targeting expression from the Audience section, which is in JEXL format. A simple targeting expression might look like this:
141+
142+
```
143+
((is_already_enrolled) || ((isFirstRun == 'true') && (app_version|versionCompare('142.!') >= 0)))
144+
```
145+
146+
Translating one condition at a time:
147+
* `is_already_enrolled` should not omitted, because this is theoretically being done before enrollment has started
148+
* `isFirstRun` is available inside the `JSON` field `metrics.object.nimbus_system_recorded_nimbus_context`, so it could be translated as `BOOL(metrics.object.nimbus_system_recorded_nimbus_context.isFirstRun)`
149+
* `app_version` is available inside the JSON field `metrics.object.nimbus_system_recorded_nimbus_context` at `appVersion`, so `app_version|versionCompare('142.!') >= 0` could be translated as `mozfun.norm.extract_version(STRING(metrics.object.nimbus_system_recorded_nimbus_context.appVersion), "major") >= 142`
150+
151+
So the SQL version of the targeting expression could be:
152+
153+
```
154+
BOOL(metrics.object.nimbus_system_recorded_nimbus_context.isFirstRun)
155+
AND mozfun.norm.extract_version(STRING(metrics.object.nimbus_system_recorded_nimbus_context.appVersion), "major") >= 142
156+
```
157+
158+
:::warning
159+
`NOT` has a different operator precedence in BigQuery than `!` in JEXL, so `NOT` should always be used inside a `()` with a single expression for clarity. For example `!isMac && isFxAEnabled` must be translated as `(NOT BOOL(metrics.object.nimbus_targeting_context_os.isMac)) AND metrics.boolean.nimbus_targeting_context_is_fx_a_enabled`
160+
:::
161+
162+
The following python script can be referenced for additional examples of converting JEXL targeting expression conditions to SQL for mobile, but it is by no means comprehensive, and it may produce invalid SQL for some expressions. Of particular note is that current enrollments are not available in the mobile nimbus ping, and the script cannot add parentheses to correct for the difference in `NOT` precedence.
163+
164+
<details><summary>`mobile_jexl_to_sql.py`</summary>
165+
166+
```python
167+
#!/usr/bin/env python3
168+
169+
import re
170+
import sys
171+
172+
jexl = sys.stdin.read()
173+
jexl = re.sub(r"\&\&", "AND", jexl)
174+
jexl = re.sub(r"\|\|", "OR", jexl)
175+
jexl = re.sub(r"== false", r"IS FALSE", jexl)
176+
jexl = re.sub(r"region in \[([^]]*)\]", r"STRING(metrics.object.nimbus_system_recorded_nimbus_context.region) IN (\1)", jexl)
177+
jexl = re.sub(r"""region == ("[^"]*")""", r"STRING(metrics.object.nimbus_system_recorded_nimbus_context.region) = \1", jexl)
178+
jexl = re.sub(r"locale in \[([^]]*)\]", r"STRING(metrics.object.nimbus_system_recorded_nimbus_context.locale) IN (\1)", jexl)
179+
jexl = re.sub(r"""locale == ("[^"]*")""", r"STRING(metrics.object.nimbus_system_recorded_nimbus_context.locale) = \1", jexl)
180+
jexl = re.sub(r"language in \[([^]]*)\]", r"STRING(metrics.object.nimbus_system_recorded_nimbus_context.language) IN (\1)", jexl)
181+
jexl = re.sub(r"""language == ("[^"]*")""", r"STRING(metrics.object.nimbus_system_recorded_nimbus_context.language) = \1", jexl)
182+
jexl = re.sub(r"app_version\|versionCompare\('([0-9]+).!'\) >= 0", r"mozfun.norm.extract_version(STRING(metrics.object.nimbus_system_recorded_nimbus_context.appVersion), 'major') >= \1)", jexl)
183+
jexl = re.sub(r"android_sdk_version\|versionCompare\('([0-9]+)'\) >= 0", r"INT64(metrics.object.nimbus_system_recorded_nimbus_context.androidSdkVersion) >= \1", jexl)
184+
jexl = re.sub(r"\(is_already_enrolled\) OR ", "", jexl)
185+
jexl = re.sub(r"user_accepted_tou", "BOOL(metrics.object.nimbus_system_recorded_nimbus_context.userAcceptedTou)", jexl)
186+
jexl = re.sub(r"user_clicked_tou_prompt_link", "BOOL(metrics.object.nimbus_system_recorded_nimbus_context.userClickedTouPromptLink)", jexl)
187+
jexl = re.sub(r"user_clicked_tou_prompt_remind_me_later", "BOOL(metrics.object.nimbus_system_recorded_nimbus_context.userClickedTouPromptRemindMeLater)", jexl)
188+
jexl = re.sub(r"isFirstRun == 'true'", "BOOL(metrics.object.nimbus_system_recorded_nimbus_context.isFirstRun)", jexl)
189+
jexl = re.sub(r"cannot_use_apple_intelligence", "BOOL(metrics.object.nimbus_system_recorded_nimbus_context.cannotUseAppleIntelligence)", jexl)
190+
jexl = re.sub(r"days_since_install", "INT64(metrics.object.nimbus_system_recorded_nimbus_context.daysSinceInstall)", jexl)
191+
jexl = re.sub(r"(?<=[^.])!", r"NOT ", jexl)
192+
jexl = re.sub(r" AND \(TRUE\)", r"", jexl)
193+
print(jexl)
194+
```
195+
196+
</details>

0 commit comments

Comments
 (0)