[Feat] [SDK-399] Capture logcat output as telemetry events#369
[Feat] [SDK-399] Capture logcat output as telemetry events#369buongarzoni wants to merge 1 commit intomasterfrom
Conversation
|
@claude review |
| static String logcatPriorityFor(Level level) { | ||
| if (level == null) { | ||
| return "W"; | ||
| } | ||
| switch (level) { | ||
| case DEBUG: | ||
| return "D"; | ||
| case INFO: | ||
| return "I"; | ||
| case WARNING: | ||
| return "W"; | ||
| case ERROR: | ||
| return "E"; | ||
| case CRITICAL: | ||
| return "F"; | ||
| default: | ||
| return "W"; | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 When minimumLevel=Level.DEBUG is configured, Verbose (V) logcat entries are silently dropped because logcatPriorityFor(Level.DEBUG) returns "D", launching logcat with "*:D" which excludes V-priority messages at the OS level. Since mapPriorityToLevel("V") returns Level.DEBUG, the SDK itself declares V as DEBUG-level, but the logcat filter never delivers those messages to the reader thread. Fix: change logcatPriorityFor(Level.DEBUG) to return "V" (and update the corresponding test assertion).
Extended reasoning...
The inconsistency between the two mapping functions
mapPriorityToLevel() maps both "V" and "D" to Level.DEBUG (lines 147–149 of LogcatTelemetryCapture.java). This is the SDK explicit declaration that Verbose logcat entries are semantically equivalent to DEBUG-level telemetry. The regex pattern [VDIWEF] in LOGCAT_LINE_PATTERN further confirms that V-priority lines are expected to be parsed and processed.
How the bug manifests
logcatPriorityFor(Level.DEBUG) returns "D" (line 169). This return value is used in start() to build the logcat command: logcat -v threadtime :D. Android logcat priority ordering is V(2) < D(3) < I(4) < W(5) < E(6) < F(7). The filter ":D" means "emit messages at priority D and above," so all V-priority messages are discarded by the Android OS-level logcat daemon before they ever reach the app BufferedReader in readLoop().
Why existing code does not prevent it
The software-level filter in processLine() (the level.level() < minimumLevel.level() check) is never reached for V messages because they are filtered out upstream at the OS/daemon level. The test suite encodes the broken behavior directly: assertEquals("D", LogcatTelemetryCapture.logcatPriorityFor(Level.DEBUG)) at line 48 of LogcatTelemetryCaptureTest passes with the current (buggy) implementation and will need to be updated to assert "V".
Concrete step-by-step proof
- User calls AndroidConfiguration.Builder().captureLogsAsTelemetry(true).minimumLogCaptureLevel(Level.DEBUG).build().
- initAutomaticCaptureOfLogTelemetryEvents() creates a LogcatTelemetryCapture with minimumLevel=Level.DEBUG.
- start() calls logcatPriorityFor(Level.DEBUG) which returns "D".
- ProcessBuilder launches: logcat -v threadtime *:D.
- Android logcat daemon applies filter: only D, I, W, E, F messages are emitted to stdout.
- An app calls Log.v("MyTag", "verbose detail") — this writes a V-priority line to logcat.
- The logcat daemon drops it silently. readLoop() / processLine() never sees the line.
- The user Rollbar telemetry is missing all Verbose logs despite configuring DEBUG as the minimum.
Impact
Any user who opts into captureLogsAsTelemetry and sets minimumLogCaptureLevel(Level.DEBUG) to capture maximum detail will silently lose all V-priority log output. Since Level.DEBUG is the lowest Rollbar level and is advertised by mapPriorityToLevel as the home for Verbose entries, this represents a complete failure to deliver on the feature implied contract.
Fix
Change logcatPriorityFor(Level.DEBUG) to return "V" instead of "D", and update the test assertion from assertEquals("D", ...) to assertEquals("V", ...). This aligns the OS-level filter with the SDK own semantic mapping and ensures V messages flow through to processLine().
| private void readLoop() { | ||
| Process currentProcess = this.process; | ||
| if (currentProcess == null) { | ||
| return; | ||
| } | ||
| BufferedReader reader = new BufferedReader( | ||
| new InputStreamReader(currentProcess.getInputStream(), Charset.forName("UTF-8"))); | ||
| try { | ||
| String line; | ||
| while (running && (line = reader.readLine()) != null) { | ||
| processLine(line); | ||
| } | ||
| } catch (IOException e) { | ||
| // Process died or was destroyed — expected on stop(). | ||
| } finally { | ||
| try { | ||
| reader.close(); | ||
| } catch (IOException ignored) { | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 When the logcat process dies unexpectedly (e.g., killed by OS, not available on some ROMs), readLine() returns null and readLoop() exits silently while leaving running=true and the dead Process reference intact. Any subsequent call to start() hits the if (running) return guard and silently does nothing, permanently disabling telemetry capture with no indication to the caller. Fix by resetting running=false and cleaning up the dead process in readLoop()'s finally block (or by calling stop() when the loop exits with running still true).
Extended reasoning...
The bug
LogcatTelemetryCapture.readLoop() exits the while loop in two ways: (1) running becomes false (normal stop() path), or (2) reader.readLine() returns null because the logcat process has died. The finally block handles only reader cleanup; it does not reset the running flag or clean up the dead Process reference.
Concrete code path
After start() launches the logcat subprocess, the reader thread runs readLoop(). Suppose the OS kills the logcat process (memory pressure, permission restriction on custom ROMs, or logcat simply unavailable). currentProcess.getInputStream() reaches EOF and readLine() returns null. The while condition fails and the loop exits. Control falls into the finally block, which closes the BufferedReader and returns. At this point: running == true, process is the now-dead Process object, and thread holds the terminated Thread.
Why existing code doesn't prevent it
The stop() method correctly sets running=false, destroys the process, and nulls both fields — but stop() is only called externally (e.g., from Rollbar.close()). The readLoop() method has no symmetric cleanup path for the abnormal-exit case. The comment in the catch block says "expected on stop()", which is true for the IOException path but ignores the equally-possible null-readline path.
Impact
Once the logcat process dies unexpectedly, running stays true forever. Every future call to start() returns immediately at if (running) return without spawning a new process. Telemetry capture is silently disabled with no log message, no exception, and no recovery path short of the caller knowing to call stop() first. On standard Android this is uncommon, but on custom ROMs, restricted-permission environments, or any device where logcat is unavailable, the capture will silently fail on the very first process start.
Step-by-step proof
start()is called →running = true, subprocess started, reader thread beginsreadLoop().- OS kills the logcat process.
readLine()returnsnull→ while-loop exits normally.finallycloses theBufferedReaderonly.runningis stilltrue.- Reader thread exits.
threadfield still holds a reference to the now-dead thread. - App calls
start()again (e.g., after a configuration change or retry logic): hitssynchronized void start() { if (running) return; }→ returns immediately, no new process launched. - Telemetry capture is permanently disabled for the lifetime of the
LogcatTelemetryCaptureinstance.
Fix
In the finally block of readLoop(), after closing the reader, check whether running is still true (indicating unexpected exit rather than a clean stop() call) and if so call stop() — or manually replicate the cleanup: acquire the lock, set running=false, call process.destroy(), and null both process and thread. Care should be taken to synchronize this cleanup with the start()/stop() methods to avoid races.
| private static ProcessFactory defaultProcessFactory() { | ||
| return new ProcessFactory() { | ||
| @Override | ||
| public Process start(String priorityFilter) throws IOException { | ||
| return new ProcessBuilder( | ||
| "logcat", "-v", "threadtime", "*:" + priorityFilter) | ||
| .redirectErrorStream(true) | ||
| .start(); | ||
| } | ||
| }; |
There was a problem hiding this comment.
🔴 The logcat command in defaultProcessFactory() is missing a -T flag, causing logcat to replay the entire existing ring buffer as fresh telemetry events on every app startup. This can flood the telemetry queue with thousands of stale, historical log entries, exhausting the telemetry quota and polluting telemetry data with messages that may be minutes or hours old. Fix: add "-T", "1" to the ProcessBuilder arguments to skip the existing buffer and stream only new messages going forward.
Extended reasoning...
What the bug is and how it manifests
The defaultProcessFactory() method in LogcatTelemetryCapture.java (lines 187–196) launches logcat with the command logcat -v threadtime *:<priority>. Android's logcat tool, when invoked without a -T (timestamp) flag or count filter, starts by replaying the entire existing in-memory ring buffer before entering streaming mode. The ring buffer can hold thousands of log lines spanning minutes or even hours of prior app activity. Every one of those lines is fed through processLine(), which records qualifying entries as fresh telemetry events via tracker.recordManualEventFor().
The specific code path that triggers it
Rollbar.initAutomaticCaptureOfLogTelemetryEvents()creates aLogcatTelemetryCaptureand callsstart().start()callsprocessFactory.start(priority), which builds and launches theProcessBuilderwith no-Targument.- The background thread calls
readLoop(), which reads every line from the process stdout. - Each line is passed to
processLine(), which parses it and callstracker.recordManualEventFor(level, Source.CLIENT, message)for every matching entry. - Because logcat starts from the beginning of the ring buffer, all historical entries at or above the minimum level are recorded as if they were new events happening right now.
Why existing code doesn't prevent it
There is no timestamp comparison, no replay guard, and no duplicate-detection logic anywhere in processLine() or readLoop(). The code has no way to distinguish a line from the ring buffer that is 30 minutes old from a freshly emitted log line. The self-tag filter only discards lines tagged Rollbar; all other historic entries pass through.
What the impact would be
On a device that has been running for even a short time, the logcat ring buffer easily contains hundreds to thousands of entries. At app startup with captureLogsAsTelemetry enabled, the SDK will fire recordManualEventFor for every buffered entry at or above the configured minimum level. This burst of stale events can: (a) exhaust the in-memory telemetry queue, displacing real-time events that follow; (b) consume the user's Rollbar telemetry quota with meaningless historical data; and (c) make telemetry timestamps unreliable, since old messages appear with their original logcat timestamp but are recorded at the current wall-clock time of the recordManualEventFor call.
How to fix it
Add "-T", "1" to the ProcessBuilder argument list. The -T flag tells logcat to start printing only from the N-th most recent entry; -T 1 effectively means "start from the very last entry already in the buffer" so at most one stale line is replayed before the process enters streaming mode for new messages. The fixed factory call becomes:
return new ProcessBuilder(
"logcat", "-v", "threadtime", "-T", "1", "*:" + priorityFilter)
.redirectErrorStream(true)
.start();An alternative is -T $(date +%s.%3N) (epoch-based cutoff), but that requires a shell invocation and the format availability varies by Android version; -T 1 is simpler, universally supported since Android 5, and achieves the same goal.
Step-by-step proof
- Device has been running for 10 minutes; logcat ring buffer contains 5,000 entries, 200 of which are
W/E/Flevel. - App starts and calls
Rollbar.init(ctx, new AndroidConfiguration.Builder().captureLogsAsTelemetry(true).build()). initAutomaticCaptureOfLogTelemetryEventscreatesLogcatTelemetryCaptureand callsstart().ProcessBuilder("logcat", "-v", "threadtime", "*:W")is launched.- logcat first outputs all 200 buffered W/E/F entries from the past 10 minutes.
readLoopreads each of those 200 lines andprocessLinecallstracker.recordManualEventForfor each.- The telemetry queue now holds 200 stale events from before the app even started, before the first real event is recorded.
| try { | ||
| tracker.recordManualEventFor(level, Source.CLIENT, message); | ||
| } catch (Exception e) { |
There was a problem hiding this comment.
🟡 LogcatTelemetryCapture.processLine() calls tracker.recordManualEventFor(), which classifies automatically-captured logcat entries as TelemetryType.MANUAL instead of TelemetryType.LOG. This misrepresents the data in the Rollbar UI and makes it impossible to filter automatic log capture separately from developer-placed manual instrumentation checkpoints. Fix: change the call on line 136 to tracker.recordLogEventFor(level, Source.CLIENT, message).
Extended reasoning...
What the bug is and how it manifests
In LogcatTelemetryCapture.processLine() (lines 135-137), after parsing a logcat line into its priority, tag, and message components, the code records the telemetry event via:
tracker.recordManualEventFor(level, Source.CLIENT, message);
The TelemetryEventTracker interface exposes two distinct methods: recordLogEventFor() which produces a TelemetryType.LOG event, and recordManualEventFor() which produces a TelemetryType.MANUAL event. The wrong method is being called here.
The specific code path that triggers it
Every time LogcatTelemetryCapture reads and parses a logcat line (in readLoop() -> processLine()), it unconditionally routes through recordManualEventFor(), stamping a "manual" type onto what is in fact automatically-captured log output. This fires on every logcat line at or above the configured minimum level throughout the lifetime of the capture session.
Why existing code does not prevent it
Both methods share the identical signature (Level, Source, String), so there is no compile-time warning. The tests in LogcatTelemetryCaptureTest verify calls to recordManualEventFor(), which means they pass despite the semantic error - the tests themselves inherit the wrong expectation from the production code. There is no type enforcement at the interface level to distinguish the two use cases.
What the impact would be
In the Rollbar payload, each captured logcat line will carry type: "manual" instead of type: "log". Users who try to filter their telemetry trail to see only automatic log output will see nothing; users looking at manual checkpoints will see their timeline polluted with automatically-captured logcat lines. The two categories - developer-placed instrumentation and automatic system log capture - become indistinguishable.
How to fix it
Replace the single call on line 136:
// Before (wrong)
tracker.recordManualEventFor(level, Source.CLIENT, message);
// After (correct)
tracker.recordLogEventFor(level, Source.CLIENT, message);
The corresponding test assertions that verify recordManualEventFor() should also be updated to verify recordLogEventFor().
Step-by-step proof
- Developer enables log capture: AndroidConfiguration.Builder().captureLogsAsTelemetry(true).build()
- An Android library logs: Log.w("NetworkLib", "Timeout connecting to api.example.com")
- logcat emits: 04-20 12:00:00.000 1234 5678 W NetworkLib: Timeout connecting to api.example.com
- LogcatTelemetryCapture.readLoop() reads this line and calls processLine()
- processLine() parses priority=W, tag=NetworkLib, message=Timeout..., maps W -> Level.WARNING
- processLine() calls tracker.recordManualEventFor(Level.WARNING, Source.CLIENT, "Timeout...")
- The resulting Rollbar telemetry event has type=manual - it appears in the UI as a developer checkpoint, not a captured log line
Description of the change
Type of change
Related issues
Checklists
Development
Code review