Skip to content

Flaky test analysis: Xamarin.Android-PR pipeline, April 10–24 2026 #11203

@simonrozsival

Description

@simonrozsival

Analysis scope

Period: 2026-04-10 to 2026-04-24 (14 days)
Analysis run: 2026-04-24 09:39 UTC
Repository: dotnet/android
PRs analyzed: 57 merged PRs


Data sources

Resource Details
GitHub REST API Merged PRs via gh pr list --state merged --limit 100 --json number,title,mergedAt,headRefName,isCrossRepository
Azure DevOps pipeline Xamarin.Android-PR (definition ID 12278) on devdiv.visualstudio.com, project DevDiv
AZDO test runs API GET /DevDiv/_apis/test/runs?buildUri=vstfs:///Build/Build/{buildId}&api-version=7.1
AZDO test results API GET /DevDiv/_apis/test/runs/{runId}/results?outcomes=Failed&$top=200&api-version=7.1

The dotnet-android pipeline on dnceng-public (dev.azure.com) was not used as the primary data source. Since #11153 (merged 2026-04-17), that pipeline runs build-only for direct team-member PRs — device test results for those PRs exist exclusively in the Xamarin.Android-PR devdiv pipeline.

Of the 57 PRs in the window:

For each merged PR, the script used the highest-numbered build ID for that PR in Xamarin.Android-PR (the last/final build, representing the merge commit). Build-to-PR correlation used the sourceBranch field (refs/pull/{number}/merge).


How AZDO auto-retry creates the flakiness signal

When a test fails in Xamarin.Android-PR, the pipeline creates a separate auto-retry run identified by (Auto-Retry) in the test run name (e.g. MSBuildDeviceIntegration On Device - macOS-3 (Auto-Retry)). Only originally-failed tests are re-executed in this run.

Critical data artifact: after an auto-retry run completes, the failedTests counter on the original test run is reset to 0 regardless of the retry outcome. As a result, querying failedTests on standard runs almost always returns 0, even when tests genuinely failed. All test failure evidence lives exclusively in the (Auto-Retry) named runs.

This analysis targets (Auto-Retry) runs and uses the result outcome of each test within those runs as the signal.

A PR is classified as "merged with red CI" if the devdiv Xamarin.Android-PR build's result field was "failed" (evaluated at analysis time). This includes builds where the failure was in a non-test step (setup, artifact publishing, infra) — those will have a failed build result but zero auto-retry test records.


Signal categories

Signal Symbol Meaning
red_auto_retry_passed 🔴 Test appeared in auto-retry run AND passed on retry; the overall build result was failed — the PR was merged past this test failure
green_auto_retry_passed 🟢 Test appeared in auto-retry run AND passed on retry; the overall build was ultimately succeeded
auto_retry_failed ⚠️ Test appeared in auto-retry run but still failed on retry — the build remained red due to this test

The 🔴 column is the highest-confidence flakiness signal: the team observed the failure, CI retried, it passed, and the PR was merged — the team explicitly decided the failure was not a blocker.


Test run environments

All auto-retry runs executed on macOS agents. The test suite is sharded into parallel slots (macOS-1 through macOS-12). Two suites produced auto-retry data in this window:

  • MSBuildDeviceIntegration On Device — primary device integration test suite
  • WearOS On Device — WearOS-specific tests (runs on macOS agents with an Android emulator)

A test appearing in "Both" suites means it was observed failing independently in both suite's auto-retry runs across different PRs.


Summary statistics

Metric Count
Merged PRs analyzed 57
Direct (team-member) PRs 55
Fork PRs 2
PRs merged with failed Xamarin.Android-PR build 38
Unique test names observed in auto-retry runs 51
Tests with ≥1 🔴 (passed on retry in a red build) 33
Tests with ≥1 🟢 (passed on retry in a green build only) 14
Tests that appear in both 🔴 and 🟢 ~10

Full flaky test table (51 tests)

Sorted by 🔴 count descending, then total PRs affected descending.

Legend: SuiteMSDI = MSBuildDeviceIntegration, WearOS = WearOS On Device

# Test Suite 🔴 RedRetryPass 🟢 GreenRetryPass ⚠️ RetryFail PRs
1 ApplicationRunsWithDebuggerAndBreaks(True,null,"apk",True,MonoVM) WearOS 8 1 0 9
2 Build_XAML_Change(False) MSDI 5 0 3 8
3 DotNetRunWaitForExit MSDI 5 0 0 5
4 ApplicationRunsWithDebuggerAndBreaks(False,null,"aab",True,MonoVM) Both 4 1 1 6
5 ApplicationRunsWithDebuggerAndBreaks(False,"guest1","aab",True,MonoVM) Both 4 1 1 6
6 ApplicationRunsWithDebuggerAndBreaks(False,null,"apk",True,MonoVM) WearOS 4 0 1 5
7 ApplicationRunsWithDebuggerAndBreaks(True,null,"apk",False,MonoVM) WearOS 4 0 0 4
8 ApplicationRunsWithDebuggerAndBreaks(False,"guest1","apk",True,MonoVM) Both 3 1 2 6
9 DesignTimeBuild_CSharp_From_Clean MSDI 3 0 1 4
10 EnsureUncaughtExceptionWorks(MonoVM) MSDI 3 0 0 3
11 ApplicationRunsWithDebuggerAndBreaks(True,null,"aab",True,MonoVM) WearOS 2 2 0 4
12 Install_CSharp_Change MSDI 2 0 5 7
13 Build_AndroidManifest_Change MSDI 2 0 4 6
14 BuildBasicApplicationAndAotProfileIt MSDI 2 1 1 4
15 Build_AndroidResource_Change MSDI 2 1 1 4
16 Install_CSharp_FromClean MSDI 2 0 0 2
17 SupportDesugaringStaticInterfaceMethods(MonoVM) MSDI 2 0 0 2
18 DotNetRunWithDeviceParameter MSDI 2 0 0 2
19 MonoAndroidExportReferencedAppStarts(False,False,MonoVM) MSDI 2 0 0 2
20 ApplicationRunsWithDebuggerAndBreaks(True,"guest1","apk",True,MonoVM) WearOS 1 1 1 3
21 ApplicationRunsWithDebuggerAndBreaks(True,"guest1","aab",True,MonoVM) WearOS 1 1 0 2
22 SupportDesugaringStaticInterfaceMethods(CoreCLR) MSDI 1 1 0 2
23 Build_CSharp_Change MSDI 1 0 3 4
24 Build_AndroidAsset_Change MSDI 1 0 2 3
25 DotNetInstallAndRunPreviousSdk(True,MonoVM) WearOS 1 0 1 2
26 MonoAndroidExportReferencedAppStarts(True,False,CoreCLR) MSDI 1 0 0 1
27 MonoAndroidExportReferencedAppStarts(True,True,CoreCLR) MSDI 1 0 0 1
28 ApplicationRunsWithoutDebugger(True,True,True,MonoVM) MSDI 1 0 0 1
29 ApplicationRunsWithoutDebugger(True,False,True,MonoVM) MSDI 1 0 0 1
30 SubscribeToAppDomainUnhandledException(MonoVM) MSDI 1 0 0 1
31 ExportedMembersSurviveGarbageCollection(True,CoreCLR) MSDI 1 0 0 1
32 JsonDeserializationCreatesJavaHandle(True,CoreCLR) MSDI 1 0 0 1
33 DotNetInstallAndRunPreviousSdk(False,MonoVM) WearOS 1 0 0 1
34 TypeAndMemberRemapping(False,MonoVM) MSDI 0 1 0 1
35 SmokeTestBuildAndRunWithSpecialCharacters("随机生成器",CoreCLR) MSDI 0 1 0 1
36 ApkSet MSDI 0 1 0 1
37 GradleFBProj(True,CoreCLR) MSDI 0 1 0 1
38 DotNetNewAndroidTest(MonoVM) MSDI 0 0 1 1
39 DotNetInstallAndRunMinorAPILevels(True,"net10.0-android36.1",MonoVM) MSDI 0 0 1 1
40 UnhandledExceptionFromButtonClick(CoreCLR) MSDI 0 0 1 1
41 InstallWithoutSharedRuntime(CoreCLR) MSDI 0 0 1 1
42 SkiaSharpCanvasBasedAppRuns(True,True,MonoVM) MSDI 0 0 1 1
43 FixLegacyResourceDesignerStep(False,CoreCLR) MSDI 0 0 1 1
44 InstantRunFastDevDexes(False) MSDI 0 0 1 1
45 CustomLinkDescriptionPreserve(SdkOnly,MonoVM) MSDI 0 0 1 1
46 GradleFBProj(False,CoreCLR) MSDI 0 0 1 1
47 DotNetRun(False,"llvm-ir",MonoVM) MSDI 0 0 1 1
48 TestAndroidStoreKey(False,False,"apk","True","file:android","-keystore test.keystore",True) MSDI 0 0 1 1
49 SingleProject_ApplicationId(True,MonoVM) MSDI 0 0 1 1
50 Build_No_Changes MSDI 0 0 1 1
51 Build_XAML_Change(True) MSDI 0 0 1 1

Suite totals: MSBuildDeviceIntegration — 39 tests; WearOS — 10 tests; Both — 2 tests (appearing in both suites across different PRs)


Tests with elevated ⚠️ retry-fail counts

Several tests have a notably high ⚠️ count relative to 🔴, meaning they frequently fail and do not recover on retry. These may represent persistent failures in specific configurations or environments, not just flakiness:

Test 🔴 ⚠️ PRs
Install_CSharp_Change 2 5 7
Build_AndroidManifest_Change 2 4 6
Build_XAML_Change(False) 5 3 8
Build_CSharp_Change 1 3 4
Build_AndroidAsset_Change 1 2 3
ApplicationRunsWithDebuggerAndBreaks(False,"guest1","apk",True,MonoVM) 3 2 6

PRs merged with red CI (38)

These PRs had a failed result on their last Xamarin.Android-PR build at the time of merge.

PR Title
#11070 Guard Mono-specific AOT targets for CoreCLR runtime, add XA1042 warning
#11082 Remove duplicate @ prefix from issueAuthor in GitOps
#11084 Run FixLegacyResourceDesigner before trimming
#11109 Remove broken 'Windows > Tests > Debugging' CI lane
#11110 Use Assert.Inconclusive for emulator acquisition failures
#11112 [Mono.Android] fix global ref leak in TypeManager.Activate
#11117 LEGO: Pull request from juno/hb_6dddf33b-c6da-43d8-ac04-14d2c339cb00_20260415103450130 to main
#11121 Localized file check-in by OneLocBuild Task: Build definition ID 17928: Build ID 13854166
#11122 [TrimmableTypeMap] Implement alias support in codegen and runtime
#11124 Remove unused HashSet allocations in HashJavaNames
#11125 Bump external/Java.Interop from 85919bb to 7b018fe
#11126 Port TypeMapObjectsXmlFile.Import to XmlReader streaming
#11127 Avoid O(n²) array growth in GenerateTypeMappings
#11129 Document AndroidInstrumentation and EnableMSTestRunner build properties
#11132 Compute Java name hashes on demand instead of pre-computing both
#11133 Use stackalloc in TypeMapHelper to reduce allocations
#11141 Add investigation & debugging practices to copilot-instructions
#11142 [TrimmableTypeMap] Package CoreCLR preserve list in SDK pack
#11143 [TrimmableTypeMap] Manifest generator fixes
#11144 [TrimmableTypeMap] Fix IL1034 by excluding app assembly from trimmer roots
#11149 [copilot] Add /review agentic workflow and update android-reviewer skill
#11150 [copilot] Use Claude Opus 4.6 for android-reviewer workflow
#11152 Set min-integrity: none on /review workflow
#11153 [ci] Only run dnceng-public pipeline for fork PRs
#11155 Localized file check-in by OneLocBuild Task: Build definition ID 17928: Build ID 13878536
#11159 Bump com.android.tools.build:manifest-merger from 32.1.0 to 32.1.1
#11160 [main] Update dependencies from dotnet/dotnet
#11162 [tests] Improve NUnit runner reporting and dry-run auditing
#11163 LEGO: Pull request from juno/hb_6dddf33b-c6da-43d8-ac04-14d2c339cb00_20260420103228482 to main
#11164 Add network allowlist to android-reviewer workflow
#11168 [TrimmableTypeMap] Fix UCO boolean return type mismatch causing n_* callback trimming
#11169 Make CoreCLR the default runtime for Debug builds
#11171 Bump external/Java.Interop from 7b018fe to 69c9daa
#11173 Add roles restriction to /review slash command
#11175 Localized file check-in by OneLocBuild Task: Build definition ID 17928: Build ID 13905231
#11178 [TrimmableTypeMap] Add exception handling to UCO constructor callbacks (nctor_*_uco)
#11181 [TrimmableTypeMap] Per-assembly typemap universes with startup hook initialization
#11195 [Xamarin.Android.Build.Tasks] Retry RemoveDirFixed on ERROR_DIR_NOT_EMPTY

Caveats and limitations

  • Counter reset artifact: the failedTests count on original AZDO test runs is reset to 0 after auto-retry. All failure evidence exists exclusively in (Auto-Retry) named runs. Tests that fail without triggering an auto-retry are not captured by this analysis.
  • Single sample per PR: only the last build for each PR is analyzed. If a PR was rebuilt multiple times, only the final build's test records are included.
  • No error message content: this analysis captures test names and pass/fail outcomes only. Error messages and stack traces were not collected.
  • Infrastructure failures: some of the 38 red-CI PRs had failures in non-test stages (setup, artifact publishing). Those PRs have no associated test records in this dataset. The exact count of infra-vs-test failures was not determined.
  • Time window: the 14-day window captures the most recent flaky tests but does not reflect longer-term trends. Tests with ⚠️ retry-fail counts of 1 (rows 38–51) may represent one-off events.
  • No per-shard attribution: when a test fails on multiple macOS shards for the same PR, each shard failure is counted as a separate ⚠️ or 🔴 occurrence. The PR count (total_prs) deduplicated this; the signal counts did not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    copilot`copilot-cli` or other AIs were used to author thisneeds-triageIssues that need to be assigned.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions