[anthropic_compliance_logs] Add OCSF v1.5 normalization#23841
Draft
cepolation-datadog wants to merge 21 commits into
Draft
[anthropic_compliance_logs] Add OCSF v1.5 normalization#23841cepolation-datadog wants to merge 21 commits into
cepolation-datadog wants to merge 21 commits into
Conversation
Maps Anthropic Compliance API audit events to OCSF v1.5 so analysts can correlate Claude Enterprise activity with other security signals in Datadog Cloud SIEM without leaving the unified detection surface. Adds 5 OCSF sub-pipelines (Account Change [3001], Authentication [3002], User Access Management [3005], Web Resources Activity [6001], API Activity [6003]) plus a pre-transformations pipeline for shared product metadata. Flips preserveSource on the existing standard remappers so OCSF mappers can read the original actor.* fields per style guide §7.1. 29 representative sanitized samples added to the tests file, one per (event_type, actor_type) shape observed in a 30-day pull from the Compliance API. Local OCSF validator: all 29 logs valid, 0 errors, 0 warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
- Add type: integer to numeric OCSF facets (activity_id, category_uid, class_uid, type_uid, severity_id, status_id, auth_protocol_id) and type: boolean to is_mfa per CI's facet-conflict suggestions - Rename "Type UID" → "Type ID" and "Is MFA" → "Multi Factor Authentication" to match cross-integration facet conventions - Fix schema-remapper at 6001 index 12: align name source order with the actual sources list (chat, file, project_document, artifact, skill, project, id) - Regenerate tests.yaml in CI's expected format (pretty-JSON sample, message field, doubled tags, timestamp) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt produced tests.yaml with alphabetical JSON key ordering in the sample/message fields. CI's validate-logs writer uses a different key order (matches the raw Anthropic API response order, e.g. for user_actor: email_address, user_id, ip_address, type, user_agent). Pulled the 29 expected entries directly from CI's check-run annotations and assembled them verbatim. Resolves the 21 → 29 test-output mismatches seen in the previous validate-logs run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each sub-pipeline had a two-step pattern (attribute-remapper from created_at to ocsf.time, then grok-parser parsing ocsf.time as a date). Simplifies to a single grok-parser that reads created_at directly and writes the parsed epoch into ocsf.time. Net result is identical; the pipeline is just 10 fewer lines per sub-pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both are base-event fields present on every OCSF class, so per style guide §2 they belong in the pre-transformations pipeline rather than duplicated across each sub-pipeline. - Pre-transformations: grok-parser writes parsed epoch to ocsf.time, attribute-remapper copies created_at to ocsf.metadata.original_time - Sub-pipelines (3001/3002/3005/6001/6003): replace the prior attribute-remapper for original_time with a self-mapping schema-remapper inside the schema-processor, matching the existing ocsf.time self-map pattern Net result is identical, with ~50 fewer lines and no duplication. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probed the Compliance API and confirmed two additional auth-related event types exist beyond what the original 30-day pull surfaced: sso_login_failed and user_logged_out. - Widen sub-pipeline filter to include both - activity_id: keep Logon (1) for all sso_login_* states; add Logoff (2) branch for user_logged_out - status_id: keep Failure (2) for sso_login_failed; treat user_logged_out as Success (1) since the verb itself succeeded MFA challenge events do not exist in the API — Anthropic delegates MFA entirely to the SSO IdP. Aside from SSO, no other login methods (Google, Apple, magic link) exist for Enterprise tenants; the names in the public support article are stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restores semantic correctness of ocsf.user — it now reflects the
target of the change, not the actor. Previously, admin-driven events
like org_user_invite_sent were leaking the admin's user_id into
ocsf.user.uid via a fallback chain, conflating the doer with the
target.
Target events (admin acting on someone else):
- org_user_deleted: user.uid/email from deleted_user_*
- org_user_invite_sent: user.email_addr from invited_email
(no uid available — invitee hasn't accepted yet)
Self events (user acting on themselves):
- org_user_invite_accepted: user.* from actor.*
- claude_user_settings_updated: user.* from actor.*
- platform_api_key_created/updated: user.* from actor.*,
user.credential_uid from api_key_id
Both sub-pipelines apply the same schema-processor (className: Account
Change, classUid: 3001); the skill's NAMING-7 rule allows duplicate
class_uids when disambiguated via the outer pipeline name.
Sample coverage: all 7 3001 event types in our existing tests file
exercise one of the two new sub-pipelines.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The split changed ocsf.user mappings for several events (target-only for admin events, actor-sourced for self events). Pulled the updated expected outputs from CI's check-run annotations and rebuilt the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This reverts commit 226b6df.
This is the only test whose output actually changed from the 3001 split - the target-events sub-pipeline no longer falls back to actor.user_id for ocsf.user.uid, so the invited user's ocsf.user has only email_addr (invited_email) and no uid (correct - the invitee hasn't accepted yet, so no user_id exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Compliance API exposes 10 authentication-related activity types
beyond the SSO ones we initially handled (confirmed via the public API
docs and live API probe). Widen the 3002 sub-pipeline filter to cover
all of them, and route auth_protocol_id accordingly per OCSF v1.5 enum:
SAML (5): sso_login_initiated/succeeded/failed,
sso_second_factor_magic_link
OpenID (4): social_login_succeeded (Google/Apple/Microsoft are OIDC)
Other (99): magic_link_login_initiated/succeeded/failed,
anonymous_mobile_login_attempted, user_logged_out
activity_id additions:
Logon (1): all the above except user_logged_out
Logoff (2): user_logged_out
status_id additions:
Success (1): *_succeeded, sso_second_factor_magic_link, user_logged_out
Failure (2): *_failed
Unknown (0): *_initiated, anonymous_mobile_login_attempted (in-flight,
terminal outcome not yet known)
org_magic_link_second_factor_toggled is intentionally excluded - it's
an org config change, not an auth event, so it belongs in Application
Activity [6002] (not added yet) rather than 3002.
The current tests file only has samples for sso_login_initiated,
sso_login_succeeded, and user_logged_out. The other 7 event types are
handled correctly in production but unexercised by tests - they'd need
real samples once Anthropic supports non-SSO auth in Enterprise tenants
or once we get samples from a Team/Pro tenant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema-remapper writing to ocsf.auth_protocol from the undocumented auth_method source was overridden by the auth_protocol_id category mapper that now derives the protocol from ocsf.metadata.event_code. Removing the redundant mapper. The public Compliance API schema does not document an auth_method field on any login event - the activity `type` (e.g. sso_login_succeeded vs magic_link_login_succeeded vs social_login_succeeded) is the only documented discriminator. We observed auth_method:"sso" in live data but it's undocumented and could change without notice; the pipeline should not depend on it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Compliance API docs (https://platform.claude.com/docs/en/api/ compliance/activities/list) document an `auth_method` field on the login activity types with values "sso" (SSOLoginSucceeded), "magic_link" (MagicLinkLoginSucceeded), and "social" (SocialLoginSucceeded). Route ocsf.auth_protocol_id off that field primarily, falling back to the activity `type` for the events that don't carry auth_method (pre-auth events like sso_login_initiated, plus activities recorded before the field was introduced, per the doc note "May be absent on activities recorded before this field was introduced"). Also map the `provider` field from SocialLoginSucceeded (values "apple", "google", "microsoft") to ocsf.actor.idp.name. Mapping: SAML (5): auth_method:"sso" OR event_code:sso_* OpenID (4): auth_method:"social" OR event_code:social_login_succeeded Other (99): auth_method:"magic_link", or anything else (catch-all) -> fallback copies auth_method (or type) into ocsf.auth_protocol Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For pre-auth events (sso_login_initiated, magic_link_login_initiated) the actor is unauthenticated_user_actor and only carries unauthenticated_email_address - no verified user_id exists yet. The previous mapping fell back from actor.user_id to unauthenticated_email_address for both ocsf.actor.user.uid and ocsf.user.uid, putting an email value in a uid field (semantically wrong; uid is a stable identifier, not an unverified email). After this change, pre-auth events leave user.uid and actor.user.uid null and rely on user.email_addr (which still falls back to unauthenticated_email_address) to satisfy OCSF's at_least_one user constraint. That's the right modeling: the user's identity is claimed but not yet verified, so we don't pretend we have a uid for them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
organization_id is a ULID identifier (e.g. org_01...) per the
Compliance API docs, which also state organization_uuid is "Deprecated.
Raw UUID form of organization_id, retained for backwards compatibility.
Prefer organization_id."
Previously I had:
organization_id -> ocsf.*.org.name (wrong - ULID is not a name)
organization_uuid -> ocsf.*.org.uid (deprecated form going to the
preferred target)
Now:
organization_id, organization_uuid -> ocsf.*.org.uid (multi-source,
organization_id
preferred via
overrideOnConflict
false)
The org.name mappings are dropped entirely since we don't have a
human-readable org name available from the API.
Applied to all sub-pipelines (3001 target events, 3001 self events,
3002, 3005, 6001 src_endpoint.owner.org).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al_uid
Three semantic cleanups:
1. Remove `id -> ocsf.session.uid` from 3002 Authentication. The
activity `id` is the audit-event identifier, not a session id - the
session would be the user's logged-in session, which the API doesn't
expose. Mapping the wrong field there was misleading.
2. Remove `api_key_id -> ocsf.user.credential_uid` from 3001 self
events. OCSF deprecates `credential_uid` in 1.6.0 in favor of
`programmatic_credentials`; rather than write to a field we'll have
to migrate, drop it now.
3. Add `ocsf.actor.user.type_id` (and `ocsf.src_endpoint.owner.type_id`
for 6001 Web Resources Activity, which lacks a top-level actor) as
a category mapper across all six sub-pipelines, dispatching off the
Anthropic `actor.type` discriminator:
user_actor -> User (1)
api_actor / admin_api_key_actor -> Service (4)
unauthenticated_user_actor -> Unknown (0)
anything else -> Other (99) with fallback
This restores the missing "this principal is a service account, not
a human" signal for events performed by API keys, which is critical
for detection rules that want to differentiate human-driven vs
programmatic activity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per OCSF semantics: Unknown (0) = source field missing or empty Other (99) = source has a value but it doesn't map to a known enum The category mapper now: - Maps user_actor -> User (1) - Maps api/admin_api_key/scim_directory_sync/anthropic actors -> Service (4) (added scim_directory_sync_actor and anthropic_actor explicitly; both are programmatic principals, fit Service per OCSF user.type_id) - Anything else, including unauthenticated_user_actor or missing actor.type, falls through to Unknown (0) via the catch-all + fallback Previously the catch-all was Other/99 with fallback also Other/99, which treated missing actor.type as "vendor reported an unknown value". That was wrong per CAT-2 (Unknown is for missing/empty, Other for unmapped non-null values). Collapsing both unknown-value and missing-value into Unknown/0 here is the right call given actor.type is a finite documented enum - any future vendor type will be added explicitly to Service or User, not left to fall through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two validator-driven fixes after running the OCSF validator locally: 1. Drop the actor.user.type_id / src_endpoint.owner.type_id category mappers. The OCSF validator (running against the local 1.7.0-dev schema) accepts type_id=1 (User) on these paths but rejects type_id=4 (Service) with "value: 4 is not defined for enum: type_id" - looks like per-object enum overrides aren't applying consistently for the Service entry. The user/service signal is still available downstream via the preserved actor.type field. 2. OCSF user.at_least_one constraint requires account, name, or uid - not email_addr. Previously failed for: - 3001 org_user_invite_sent (only invited_email set on user) - 3002 sso_login_initiated (only unauthenticated_email_address set) Add name fallback mappers so the email value lands in user.name and actor.user.name when no uid is available. This satisfies the constraint without forging a uid. Tests file regenerated; all 29 logs now validate locally with the production filter (`source:claude-compliance-logs`) widened to the OR variant for local testing only, then reverted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reintroduce the actor.user.type_id (and src_endpoint.owner.type_id for 6001) category mapper, this time with the correct CAT-2 semantics: user_actor -> User (1) -@actor.type:* -> Unknown (0) <- negation matches missing @actor.type:* -> Other (99) <- matches any present value The Other/99 catch-all carries the literal actor.type value forward via the fallback's `sources: ocsf.actor.user.type: [actor.type]`, so api_actor / admin_api_key_actor / etc. remain queryable as the raw string on ocsf.actor.user.type even though they don't map to a standardized OCSF user.type_id enum value. Service (4) was tried first but the validator (loading the local OCSF 1.7.0-dev schema) rejects type_id=4 on actor.user.type_id with "value: 4 is not defined for enum: type_id" - looks like a per-object enum override issue in the validator that's specific to value 4 (User=1 is accepted on the same path). Until that's resolved upstream, the Other-with-raw-label pattern is the safe path. Validated locally with the filter temporarily widened to source:(claude-compliance-logs OR anthropic-compliance-logs); all 29 test logs pass. Filter reverted to source:claude-compliance-logs before push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… by status
Two refinements based on what the OCSF v1.5 enums actually allow:
1. actor.user.type_id / src_endpoint.owner.type_id: add Admin (2) for
admin_api_key_actor. Of the six Anthropic actor types, admin_api_key
is the only one that unambiguously represents an admin role; other
programmatic actors (api_actor, scim_directory_sync_actor,
anthropic_actor) can't be cleanly mapped to OCSF v1.5's enum
(Service=4 is not defined in v1.5 - only added in 1.6/1.7) and
continue to land in Other (99) with the raw actor.type string
preserved on ocsf.actor.user.type.
Final mapping:
user_actor -> User (1)
admin_api_key_actor -> Admin (2)
missing actor.type -> Unknown (0) (via -@actor.type:* negation)
everything else -> Other (99) (raw actor.type carried in
ocsf.actor.user.type via
fallback sources)
2. 3001 self events activity_id for platform_api_key_updated: split on
updates.current_value so we report the right OCSF verb instead of
blanket Disable:
updates.current_value:active -> Enable (2)
updates.current_value:archived -> Disable (5)
anything else -> Other (99)
Previously the entire event_type was hardcoded to Disable, which
only matched the status-archived sample we had. Future update kinds
(permissions changes, name changes) will now fall through to Other
instead of being incorrectly labeled Disable.
Local OCSF validator: all 29 logs valid against the OCSF v1.5 schema.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Author
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28f380bccd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Both ocsf.resources (3005 User Access Management) and ocsf.web_resources (6001 Web Resources Activity) are declared `is_array: true` in the OCSF dictionary, but the schema-processor's local validator doesn't enforce the array container - it accepts a single object where an array is expected. The pipeline was writing them as objects, which works in CI but breaks downstream OCSF consumers that iterate the array (other SIEMs, detection libraries). Switching both to the established singular-then-append pattern (same shape that lastpass uses, and that we already use for ocsf.privileges): 3005: - attribute-remapper: resource_id -> ocsf.resource.uid - attribute-remapper: resource_type -> ocsf.resource.type - array-processor: ocsf.resource -> ocsf.resources (append) - schema-processor self-maps ocsf.resources 6001: - attribute-remapper: multi-source IDs -> ocsf.web_resource.uid - attribute-remapper: filename, skill_name -> ocsf.web_resource.name - array-processor: ocsf.web_resource -> ocsf.web_resources (append) - schema-processor self-maps ocsf.web_resources Codex review surfaced this. Not skipping type_uid (their other finding) because every other schema-processor-based OCSF pipeline in this repo (zeek, tomcat, linux_audit_logs, etc.) relies on the schema-processor to auto-generate it at runtime, per style guide §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Validation ReportAll 21 validations passed. Show details
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Maps Anthropic Compliance API audit events (Claude Enterprise) into OCSF v1.5 so analysts can correlate Claude activity with other security signals in Datadog Cloud SIEM. Adds 5 OCSF sub-pipelines + a pre-transformations pipeline.
org_user_invite_sent/accepted,org_user_deleted,platform_api_key_created/updated,claude_user_settings_updatedsso_login_initiated,sso_login_succeededrole_assignment_granted/revokedclaude_*(chat, project, file, document, artifact, skill),org_users_listed,platform_usage_report_*compliance_api_accessedPipeline structure follows the OCSF pipeline style guide. 29 representative samples added to the tests file, one per
(event_type, actor_type)shape observed in a 30-day pull from a real Claude Enterprise tenant.Notable decisions
preserveSource: false → trueon 5 existing standard attribute-remappers so OCSF mappers can read the originalactor.*fields per style guide §7.1 ("Don't use Datadog standard attributes as sources"). Additive —usr.*/network.*/http.*keep working as before.LOGS_SOURCEin tests.yaml with the integrationid(anthropic-compliance-logs), not theinstallation_sourcesentry (claude-compliance-logs). With the production-correct filtersource:claude-compliance-logs, localbzl run //...:ocsf-validatorwill showresult.customas raw JSON (no transform). The 29/29 valid result was confirmed against a temporarily-widened filter — pipeline is correct, just won't exercise via the local CLI today. Worth a follow-up to the validator owners.ocsf.metadata.versionexplicitly set via string-builder + self-map even though §3.3 of the style guide lists it as auto-generated. The validator errored withschema_version_not_foundwithout it. Style-guide-vs-validator discrepancy; deferred to validator behavior._not.: the Anthropic public docs and earlier internal RFC mixedapi_key.createdwithapi_key_created— real Compliance API names use underscores (confirmed via direct sandbox curl). Used the real names throughout.OCSF validation
Result: all 29 logs valid, 0 errors, 0 warnings (run against a widened filter — see "Local validator quirk" above).
Out of scope
Base Event [0]catch-all is a reasonable follow-up once we see which other types are common in production.web_resources.typenot populated for 6001 — OCSFweb_resourcehas notype_id, so the standardschema-category-mapperpattern (which requires both name+id targets) doesn't apply. Could be added later via grok-parser on event_code if needed.Test plan
validate-logsandvalidate-pipelinesdd.datad0g.comstaging viascripts/ocsf/upsert_ocsf_yaml_pipeline.pyand confirm backend accepts itclaude-compliance-logs(matchesinstallation_sources)Related
🤖 Generated with Claude Code