Skip to content

[Feat] [SDK-399] Capture logcat output as telemetry events#369

Open
buongarzoni wants to merge 1 commit intomasterfrom
feat/SDK-399/add-android-logs-telemetry
Open

[Feat] [SDK-399] Capture logcat output as telemetry events#369
buongarzoni wants to merge 1 commit intomasterfrom
feat/SDK-399/add-android-logs-telemetry

Conversation

@buongarzoni
Copy link
Copy Markdown
Collaborator

@buongarzoni buongarzoni commented Apr 20, 2026

Description of the change

Please include a summary of the change and which issues are fixed.
Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Maintenance
  • New release

Related issues

Shortcut stories and GitHub issues (delete irrelevant)

Checklists

Development

  • Lint rules pass locally
  • The code changed/added as part of this pull request has been covered with tests
  • All tests related to the changed code pass in development

Code review

  • This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached
  • "Ready for review" label attached to the PR and reviewers assigned
  • Issue from task tracker has a link to this pull request
  • Changes have been reviewed by at least one other engineer

@buongarzoni buongarzoni added this to the 2.3.0 milestone Apr 20, 2026
@buongarzoni buongarzoni self-assigned this Apr 20, 2026
@linear
Copy link
Copy Markdown

linear bot commented Apr 20, 2026

@buongarzoni
Copy link
Copy Markdown
Collaborator Author

@claude review

Comment on lines +163 to +181
static String logcatPriorityFor(Level level) {
if (level == null) {
return "W";
}
switch (level) {
case DEBUG:
return "D";
case INFO:
return "I";
case WARNING:
return "W";
case ERROR:
return "E";
case CRITICAL:
return "F";
default:
return "W";
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When minimumLevel=Level.DEBUG is configured, Verbose (V) logcat entries are silently dropped because logcatPriorityFor(Level.DEBUG) returns "D", launching logcat with "*:D" which excludes V-priority messages at the OS level. Since mapPriorityToLevel("V") returns Level.DEBUG, the SDK itself declares V as DEBUG-level, but the logcat filter never delivers those messages to the reader thread. Fix: change logcatPriorityFor(Level.DEBUG) to return "V" (and update the corresponding test assertion).

Extended reasoning...

The inconsistency between the two mapping functions

mapPriorityToLevel() maps both "V" and "D" to Level.DEBUG (lines 147–149 of LogcatTelemetryCapture.java). This is the SDK explicit declaration that Verbose logcat entries are semantically equivalent to DEBUG-level telemetry. The regex pattern [VDIWEF] in LOGCAT_LINE_PATTERN further confirms that V-priority lines are expected to be parsed and processed.

How the bug manifests

logcatPriorityFor(Level.DEBUG) returns "D" (line 169). This return value is used in start() to build the logcat command: logcat -v threadtime :D. Android logcat priority ordering is V(2) < D(3) < I(4) < W(5) < E(6) < F(7). The filter ":D" means "emit messages at priority D and above," so all V-priority messages are discarded by the Android OS-level logcat daemon before they ever reach the app BufferedReader in readLoop().

Why existing code does not prevent it

The software-level filter in processLine() (the level.level() < minimumLevel.level() check) is never reached for V messages because they are filtered out upstream at the OS/daemon level. The test suite encodes the broken behavior directly: assertEquals("D", LogcatTelemetryCapture.logcatPriorityFor(Level.DEBUG)) at line 48 of LogcatTelemetryCaptureTest passes with the current (buggy) implementation and will need to be updated to assert "V".

Concrete step-by-step proof

  1. User calls AndroidConfiguration.Builder().captureLogsAsTelemetry(true).minimumLogCaptureLevel(Level.DEBUG).build().
  2. initAutomaticCaptureOfLogTelemetryEvents() creates a LogcatTelemetryCapture with minimumLevel=Level.DEBUG.
  3. start() calls logcatPriorityFor(Level.DEBUG) which returns "D".
  4. ProcessBuilder launches: logcat -v threadtime *:D.
  5. Android logcat daemon applies filter: only D, I, W, E, F messages are emitted to stdout.
  6. An app calls Log.v("MyTag", "verbose detail") — this writes a V-priority line to logcat.
  7. The logcat daemon drops it silently. readLoop() / processLine() never sees the line.
  8. The user Rollbar telemetry is missing all Verbose logs despite configuring DEBUG as the minimum.

Impact

Any user who opts into captureLogsAsTelemetry and sets minimumLogCaptureLevel(Level.DEBUG) to capture maximum detail will silently lose all V-priority log output. Since Level.DEBUG is the lowest Rollbar level and is advertised by mapPriorityToLevel as the home for Verbose entries, this represents a complete failure to deliver on the feature implied contract.

Fix

Change logcatPriorityFor(Level.DEBUG) to return "V" instead of "D", and update the test assertion from assertEquals("D", ...) to assertEquals("V", ...). This aligns the OS-level filter with the SDK own semantic mapping and ensures V messages flow through to processLine().

Comment on lines +88 to +108
private void readLoop() {
Process currentProcess = this.process;
if (currentProcess == null) {
return;
}
BufferedReader reader = new BufferedReader(
new InputStreamReader(currentProcess.getInputStream(), Charset.forName("UTF-8")));
try {
String line;
while (running && (line = reader.readLine()) != null) {
processLine(line);
}
} catch (IOException e) {
// Process died or was destroyed — expected on stop().
} finally {
try {
reader.close();
} catch (IOException ignored) {
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When the logcat process dies unexpectedly (e.g., killed by OS, not available on some ROMs), readLine() returns null and readLoop() exits silently while leaving running=true and the dead Process reference intact. Any subsequent call to start() hits the if (running) return guard and silently does nothing, permanently disabling telemetry capture with no indication to the caller. Fix by resetting running=false and cleaning up the dead process in readLoop()'s finally block (or by calling stop() when the loop exits with running still true).

Extended reasoning...

The bug

LogcatTelemetryCapture.readLoop() exits the while loop in two ways: (1) running becomes false (normal stop() path), or (2) reader.readLine() returns null because the logcat process has died. The finally block handles only reader cleanup; it does not reset the running flag or clean up the dead Process reference.

Concrete code path

After start() launches the logcat subprocess, the reader thread runs readLoop(). Suppose the OS kills the logcat process (memory pressure, permission restriction on custom ROMs, or logcat simply unavailable). currentProcess.getInputStream() reaches EOF and readLine() returns null. The while condition fails and the loop exits. Control falls into the finally block, which closes the BufferedReader and returns. At this point: running == true, process is the now-dead Process object, and thread holds the terminated Thread.

Why existing code doesn't prevent it

The stop() method correctly sets running=false, destroys the process, and nulls both fields — but stop() is only called externally (e.g., from Rollbar.close()). The readLoop() method has no symmetric cleanup path for the abnormal-exit case. The comment in the catch block says "expected on stop()", which is true for the IOException path but ignores the equally-possible null-readline path.

Impact

Once the logcat process dies unexpectedly, running stays true forever. Every future call to start() returns immediately at if (running) return without spawning a new process. Telemetry capture is silently disabled with no log message, no exception, and no recovery path short of the caller knowing to call stop() first. On standard Android this is uncommon, but on custom ROMs, restricted-permission environments, or any device where logcat is unavailable, the capture will silently fail on the very first process start.

Step-by-step proof

  1. start() is called → running = true, subprocess started, reader thread begins readLoop().
  2. OS kills the logcat process.
  3. readLine() returns null → while-loop exits normally.
  4. finally closes the BufferedReader only. running is still true.
  5. Reader thread exits. thread field still holds a reference to the now-dead thread.
  6. App calls start() again (e.g., after a configuration change or retry logic): hits synchronized void start() { if (running) return; } → returns immediately, no new process launched.
  7. Telemetry capture is permanently disabled for the lifetime of the LogcatTelemetryCapture instance.

Fix

In the finally block of readLoop(), after closing the reader, check whether running is still true (indicating unexpected exit rather than a clean stop() call) and if so call stop() — or manually replicate the cleanup: acquire the lock, set running=false, call process.destroy(), and null both process and thread. Care should be taken to synchronize this cleanup with the start()/stop() methods to avoid races.

Comment on lines +187 to +196
private static ProcessFactory defaultProcessFactory() {
return new ProcessFactory() {
@Override
public Process start(String priorityFilter) throws IOException {
return new ProcessBuilder(
"logcat", "-v", "threadtime", "*:" + priorityFilter)
.redirectErrorStream(true)
.start();
}
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The logcat command in defaultProcessFactory() is missing a -T flag, causing logcat to replay the entire existing ring buffer as fresh telemetry events on every app startup. This can flood the telemetry queue with thousands of stale, historical log entries, exhausting the telemetry quota and polluting telemetry data with messages that may be minutes or hours old. Fix: add "-T", "1" to the ProcessBuilder arguments to skip the existing buffer and stream only new messages going forward.

Extended reasoning...

What the bug is and how it manifests

The defaultProcessFactory() method in LogcatTelemetryCapture.java (lines 187–196) launches logcat with the command logcat -v threadtime *:<priority>. Android's logcat tool, when invoked without a -T (timestamp) flag or count filter, starts by replaying the entire existing in-memory ring buffer before entering streaming mode. The ring buffer can hold thousands of log lines spanning minutes or even hours of prior app activity. Every one of those lines is fed through processLine(), which records qualifying entries as fresh telemetry events via tracker.recordManualEventFor().

The specific code path that triggers it

  1. Rollbar.initAutomaticCaptureOfLogTelemetryEvents() creates a LogcatTelemetryCapture and calls start().
  2. start() calls processFactory.start(priority), which builds and launches the ProcessBuilder with no -T argument.
  3. The background thread calls readLoop(), which reads every line from the process stdout.
  4. Each line is passed to processLine(), which parses it and calls tracker.recordManualEventFor(level, Source.CLIENT, message) for every matching entry.
  5. Because logcat starts from the beginning of the ring buffer, all historical entries at or above the minimum level are recorded as if they were new events happening right now.

Why existing code doesn't prevent it

There is no timestamp comparison, no replay guard, and no duplicate-detection logic anywhere in processLine() or readLoop(). The code has no way to distinguish a line from the ring buffer that is 30 minutes old from a freshly emitted log line. The self-tag filter only discards lines tagged Rollbar; all other historic entries pass through.

What the impact would be

On a device that has been running for even a short time, the logcat ring buffer easily contains hundreds to thousands of entries. At app startup with captureLogsAsTelemetry enabled, the SDK will fire recordManualEventFor for every buffered entry at or above the configured minimum level. This burst of stale events can: (a) exhaust the in-memory telemetry queue, displacing real-time events that follow; (b) consume the user's Rollbar telemetry quota with meaningless historical data; and (c) make telemetry timestamps unreliable, since old messages appear with their original logcat timestamp but are recorded at the current wall-clock time of the recordManualEventFor call.

How to fix it

Add "-T", "1" to the ProcessBuilder argument list. The -T flag tells logcat to start printing only from the N-th most recent entry; -T 1 effectively means "start from the very last entry already in the buffer" so at most one stale line is replayed before the process enters streaming mode for new messages. The fixed factory call becomes:

return new ProcessBuilder(
    "logcat", "-v", "threadtime", "-T", "1", "*:" + priorityFilter)
    .redirectErrorStream(true)
    .start();

An alternative is -T $(date +%s.%3N) (epoch-based cutoff), but that requires a shell invocation and the format availability varies by Android version; -T 1 is simpler, universally supported since Android 5, and achieves the same goal.

Step-by-step proof

  1. Device has been running for 10 minutes; logcat ring buffer contains 5,000 entries, 200 of which are W/E/F level.
  2. App starts and calls Rollbar.init(ctx, new AndroidConfiguration.Builder().captureLogsAsTelemetry(true).build()).
  3. initAutomaticCaptureOfLogTelemetryEvents creates LogcatTelemetryCapture and calls start().
  4. ProcessBuilder("logcat", "-v", "threadtime", "*:W") is launched.
  5. logcat first outputs all 200 buffered W/E/F entries from the past 10 minutes.
  6. readLoop reads each of those 200 lines and processLine calls tracker.recordManualEventFor for each.
  7. The telemetry queue now holds 200 stale events from before the app even started, before the first real event is recorded.

Comment on lines +135 to +137
try {
tracker.recordManualEventFor(level, Source.CLIENT, message);
} catch (Exception e) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 LogcatTelemetryCapture.processLine() calls tracker.recordManualEventFor(), which classifies automatically-captured logcat entries as TelemetryType.MANUAL instead of TelemetryType.LOG. This misrepresents the data in the Rollbar UI and makes it impossible to filter automatic log capture separately from developer-placed manual instrumentation checkpoints. Fix: change the call on line 136 to tracker.recordLogEventFor(level, Source.CLIENT, message).

Extended reasoning...

What the bug is and how it manifests

In LogcatTelemetryCapture.processLine() (lines 135-137), after parsing a logcat line into its priority, tag, and message components, the code records the telemetry event via:

tracker.recordManualEventFor(level, Source.CLIENT, message);

The TelemetryEventTracker interface exposes two distinct methods: recordLogEventFor() which produces a TelemetryType.LOG event, and recordManualEventFor() which produces a TelemetryType.MANUAL event. The wrong method is being called here.

The specific code path that triggers it

Every time LogcatTelemetryCapture reads and parses a logcat line (in readLoop() -> processLine()), it unconditionally routes through recordManualEventFor(), stamping a "manual" type onto what is in fact automatically-captured log output. This fires on every logcat line at or above the configured minimum level throughout the lifetime of the capture session.

Why existing code does not prevent it

Both methods share the identical signature (Level, Source, String), so there is no compile-time warning. The tests in LogcatTelemetryCaptureTest verify calls to recordManualEventFor(), which means they pass despite the semantic error - the tests themselves inherit the wrong expectation from the production code. There is no type enforcement at the interface level to distinguish the two use cases.

What the impact would be

In the Rollbar payload, each captured logcat line will carry type: "manual" instead of type: "log". Users who try to filter their telemetry trail to see only automatic log output will see nothing; users looking at manual checkpoints will see their timeline polluted with automatically-captured logcat lines. The two categories - developer-placed instrumentation and automatic system log capture - become indistinguishable.

How to fix it

Replace the single call on line 136:

// Before (wrong)
tracker.recordManualEventFor(level, Source.CLIENT, message);
// After (correct)
tracker.recordLogEventFor(level, Source.CLIENT, message);

The corresponding test assertions that verify recordManualEventFor() should also be updated to verify recordLogEventFor().

Step-by-step proof

  1. Developer enables log capture: AndroidConfiguration.Builder().captureLogsAsTelemetry(true).build()
  2. An Android library logs: Log.w("NetworkLib", "Timeout connecting to api.example.com")
  3. logcat emits: 04-20 12:00:00.000 1234 5678 W NetworkLib: Timeout connecting to api.example.com
  4. LogcatTelemetryCapture.readLoop() reads this line and calls processLine()
  5. processLine() parses priority=W, tag=NetworkLib, message=Timeout..., maps W -> Level.WARNING
  6. processLine() calls tracker.recordManualEventFor(Level.WARNING, Source.CLIENT, "Timeout...")
  7. The resulting Rollbar telemetry event has type=manual - it appears in the UI as a developer checkpoint, not a captured log line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant