[anthropic_compliance_logs] Add OCSF v1.5 normalization by cepolation-datadog · Pull Request #23841 · DataDog/integrations-core

cepolation-datadog · 2026-05-26T19:03:27Z

Summary

Maps Anthropic Compliance API audit events (Claude Enterprise) into OCSF v1.5 so analysts can correlate Claude activity with other security signals in Datadog Cloud SIEM. Adds 5 OCSF sub-pipelines + a pre-transformations pipeline.

OCSF Class	Events covered
Account Change [3001]	`org_user_invite_sent`/`accepted`, `org_user_deleted`, `platform_api_key_created`/`updated`, `claude_user_settings_updated`
Authentication [3002]	`sso_login_initiated`, `sso_login_succeeded`
User Access Management [3005]	`role_assignment_granted`/`revoked`
Web Resources Activity [6001]	All `claude_` (chat, project, file, document, artifact, skill), `org_users_listed`, `platform_usage_report_`
API Activity [6003]	`compliance_api_accessed`

Pipeline structure follows the OCSF pipeline style guide. 29 representative samples added to the tests file, one per (event_type, actor_type) shape observed in a 30-day pull from a real Claude Enterprise tenant.

Notable decisions

preserveSource: false → true on 5 existing standard attribute-remappers so OCSF mappers can read the original actor.* fields per style guide §7.1 ("Don't use Datadog standard attributes as sources"). Additive — usr.*/network.*/http.* keep working as before.
Local validator quirk: the OCSF validator CLI substitutes LOGS_SOURCE in tests.yaml with the integration id (anthropic-compliance-logs), not the installation_sources entry (claude-compliance-logs). With the production-correct filter source:claude-compliance-logs, local bzl run //...:ocsf-validator will show result.custom as raw JSON (no transform). The 29/29 valid result was confirmed against a temporarily-widened filter — pipeline is correct, just won't exercise via the local CLI today. Worth a follow-up to the validator owners.
ocsf.metadata.version explicitly set via string-builder + self-map even though §3.3 of the style guide lists it as auto-generated. The validator errored with schema_version_not_found without it. Style-guide-vs-validator discrepancy; deferred to validator behavior.
Event names use _ not .: the Anthropic public docs and earlier internal RFC mixed api_key.created with api_key_created — real Compliance API names use underscores (confirmed via direct sandbox curl). Used the real names throughout.

OCSF validation

bzl run //domains/event-platform/libs/ocsf-validator-cli:ocsf-validator -- \
  --input integrations-core/anthropic_compliance_logs/assets/logs/anthropic-compliance-logs_tests.yaml \
  --pipeline integrations-core/anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml

Result: all 29 logs valid, 0 errors, 0 warnings (run against a widened filter — see "Local validator quirk" above).

Out of scope

~150 event types ship from Anthropic; only ~27 surfaced organically in 30 days of tenant traffic. The remaining ~120 will fall through all 5 sub-pipeline filters with no OCSF transformation applied. A Base Event [0] catch-all is a reasonable follow-up once we see which other types are common in production.
web_resources.type not populated for 6001 — OCSF web_resource has no type_id, so the standard schema-category-mapper pattern (which requires both name+id targets) doesn't apply. Could be added later via grok-parser on event_code if needed.

Test plan

CI passes validate-logs and validate-pipelines
Round-trip the pipeline through dd.datad0g.com staging via scripts/ocsf/upsert_ocsf_yaml_pipeline.py and confirm backend accepts it
Verify in staging that real Claude Compliance API events (once the crawler ships next week) get OCSF-normalized as expected
Confirm with crawler team that the production source tag is claude-compliance-logs (matches installation_sources)

Maps Anthropic Compliance API audit events to OCSF v1.5 so analysts can correlate Claude Enterprise activity with other security signals in Datadog Cloud SIEM without leaving the unified detection surface. Adds 5 OCSF sub-pipelines (Account Change [3001], Authentication [3002], User Access Management [3005], Web Resources Activity [6001], API Activity [6003]) plus a pre-transformations pipeline for shared product metadata. Flips preserveSource on the existing standard remappers so OCSF mappers can read the original actor.* fields per style guide §7.1. 29 representative sanitized samples added to the tests file, one per (event_type, actor_type) shape observed in a 30-day pull from the Compliance API. Local OCSF validator: all 29 logs valid, 0 errors, 0 warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Add type: integer to numeric OCSF facets (activity_id, category_uid, class_uid, type_uid, severity_id, status_id, auth_protocol_id) and type: boolean to is_mfa per CI's facet-conflict suggestions - Rename "Type UID" → "Type ID" and "Is MFA" → "Multi Factor Authentication" to match cross-integration facet conventions - Fix schema-remapper at 6001 index 12: align name source order with the actual sources list (chat, file, project_document, artifact, skill, project, id) - Regenerate tests.yaml in CI's expected format (pretty-JSON sample, message field, doubled tags, timestamp) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous attempt produced tests.yaml with alphabetical JSON key ordering in the sample/message fields. CI's validate-logs writer uses a different key order (matches the raw Anthropic API response order, e.g. for user_actor: email_address, user_id, ip_address, type, user_agent). Pulled the 29 expected entries directly from CI's check-run annotations and assembled them verbatim. Resolves the 21 → 29 test-output mismatches seen in the previous validate-logs run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Each sub-pipeline had a two-step pattern (attribute-remapper from created_at to ocsf.time, then grok-parser parsing ocsf.time as a date). Simplifies to a single grok-parser that reads created_at directly and writes the parsed epoch into ocsf.time. Net result is identical; the pipeline is just 10 fewer lines per sub-pipeline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Both are base-event fields present on every OCSF class, so per style guide §2 they belong in the pre-transformations pipeline rather than duplicated across each sub-pipeline. - Pre-transformations: grok-parser writes parsed epoch to ocsf.time, attribute-remapper copies created_at to ocsf.metadata.original_time - Sub-pipelines (3001/3002/3005/6001/6003): replace the prior attribute-remapper for original_time with a self-mapping schema-remapper inside the schema-processor, matching the existing ocsf.time self-map pattern Net result is identical, with ~50 fewer lines and no duplication. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Probed the Compliance API and confirmed two additional auth-related event types exist beyond what the original 30-day pull surfaced: sso_login_failed and user_logged_out. - Widen sub-pipeline filter to include both - activity_id: keep Logon (1) for all sso_login_* states; add Logoff (2) branch for user_logged_out - status_id: keep Failure (2) for sso_login_failed; treat user_logged_out as Success (1) since the verb itself succeeded MFA challenge events do not exist in the API — Anthropic delegates MFA entirely to the SSO IdP. Aside from SSO, no other login methods (Google, Apple, magic link) exist for Enterprise tenants; the names in the public support article are stale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Restores semantic correctness of ocsf.user — it now reflects the target of the change, not the actor. Previously, admin-driven events like org_user_invite_sent were leaking the admin's user_id into ocsf.user.uid via a fallback chain, conflating the doer with the target. Target events (admin acting on someone else): - org_user_deleted: user.uid/email from deleted_user_* - org_user_invite_sent: user.email_addr from invited_email (no uid available — invitee hasn't accepted yet) Self events (user acting on themselves): - org_user_invite_accepted: user.* from actor.* - claude_user_settings_updated: user.* from actor.* - platform_api_key_created/updated: user.* from actor.*, user.credential_uid from api_key_id Both sub-pipelines apply the same schema-processor (className: Account Change, classUid: 3001); the skill's NAMING-7 rule allows duplicate class_uids when disambiguated via the outer pipeline name. Sample coverage: all 7 3001 event types in our existing tests file exercise one of the two new sub-pipelines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The split changed ocsf.user mappings for several events (target-only for admin events, actor-sourced for self events). Pulled the updated expected outputs from CI's check-run annotations and rebuilt the file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

This reverts commit 226b6df.

This is the only test whose output actually changed from the 3001 split - the target-events sub-pipeline no longer falls back to actor.user_id for ocsf.user.uid, so the invited user's ocsf.user has only email_addr (invited_email) and no uid (correct - the invitee hasn't accepted yet, so no user_id exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Compliance API exposes 10 authentication-related activity types beyond the SSO ones we initially handled (confirmed via the public API docs and live API probe). Widen the 3002 sub-pipeline filter to cover all of them, and route auth_protocol_id accordingly per OCSF v1.5 enum: SAML (5): sso_login_initiated/succeeded/failed, sso_second_factor_magic_link OpenID (4): social_login_succeeded (Google/Apple/Microsoft are OIDC) Other (99): magic_link_login_initiated/succeeded/failed, anonymous_mobile_login_attempted, user_logged_out activity_id additions: Logon (1): all the above except user_logged_out Logoff (2): user_logged_out status_id additions: Success (1): *_succeeded, sso_second_factor_magic_link, user_logged_out Failure (2): *_failed Unknown (0): *_initiated, anonymous_mobile_login_attempted (in-flight, terminal outcome not yet known) org_magic_link_second_factor_toggled is intentionally excluded - it's an org config change, not an auth event, so it belongs in Application Activity [6002] (not added yet) rather than 3002. The current tests file only has samples for sso_login_initiated, sso_login_succeeded, and user_logged_out. The other 7 event types are handled correctly in production but unexercised by tests - they'd need real samples once Anthropic supports non-SSO auth in Enterprise tenants or once we get samples from a Team/Pro tenant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The schema-remapper writing to ocsf.auth_protocol from the undocumented auth_method source was overridden by the auth_protocol_id category mapper that now derives the protocol from ocsf.metadata.event_code. Removing the redundant mapper. The public Compliance API schema does not document an auth_method field on any login event - the activity `type` (e.g. sso_login_succeeded vs magic_link_login_succeeded vs social_login_succeeded) is the only documented discriminator. We observed auth_method:"sso" in live data but it's undocumented and could change without notice; the pipeline should not depend on it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Compliance API docs (https://platform.claude.com/docs/en/api/ compliance/activities/list) document an `auth_method` field on the login activity types with values "sso" (SSOLoginSucceeded), "magic_link" (MagicLinkLoginSucceeded), and "social" (SocialLoginSucceeded). Route ocsf.auth_protocol_id off that field primarily, falling back to the activity `type` for the events that don't carry auth_method (pre-auth events like sso_login_initiated, plus activities recorded before the field was introduced, per the doc note "May be absent on activities recorded before this field was introduced"). Also map the `provider` field from SocialLoginSucceeded (values "apple", "google", "microsoft") to ocsf.actor.idp.name. Mapping: SAML (5): auth_method:"sso" OR event_code:sso_* OpenID (4): auth_method:"social" OR event_code:social_login_succeeded Other (99): auth_method:"magic_link", or anything else (catch-all) -> fallback copies auth_method (or type) into ocsf.auth_protocol Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

For pre-auth events (sso_login_initiated, magic_link_login_initiated) the actor is unauthenticated_user_actor and only carries unauthenticated_email_address - no verified user_id exists yet. The previous mapping fell back from actor.user_id to unauthenticated_email_address for both ocsf.actor.user.uid and ocsf.user.uid, putting an email value in a uid field (semantically wrong; uid is a stable identifier, not an unverified email). After this change, pre-auth events leave user.uid and actor.user.uid null and rely on user.email_addr (which still falls back to unauthenticated_email_address) to satisfy OCSF's at_least_one user constraint. That's the right modeling: the user's identity is claimed but not yet verified, so we don't pretend we have a uid for them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

organization_id is a ULID identifier (e.g. org_01...) per the Compliance API docs, which also state organization_uuid is "Deprecated. Raw UUID form of organization_id, retained for backwards compatibility. Prefer organization_id." Previously I had: organization_id -> ocsf.*.org.name (wrong - ULID is not a name) organization_uuid -> ocsf.*.org.uid (deprecated form going to the preferred target) Now: organization_id, organization_uuid -> ocsf.*.org.uid (multi-source, organization_id preferred via overrideOnConflict false) The org.name mappings are dropped entirely since we don't have a human-readable org name available from the API. Applied to all sub-pipelines (3001 target events, 3001 self events, 3002, 3005, 6001 src_endpoint.owner.org). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…al_uid Three semantic cleanups: 1. Remove `id -> ocsf.session.uid` from 3002 Authentication. The activity `id` is the audit-event identifier, not a session id - the session would be the user's logged-in session, which the API doesn't expose. Mapping the wrong field there was misleading. 2. Remove `api_key_id -> ocsf.user.credential_uid` from 3001 self events. OCSF deprecates `credential_uid` in 1.6.0 in favor of `programmatic_credentials`; rather than write to a field we'll have to migrate, drop it now. 3. Add `ocsf.actor.user.type_id` (and `ocsf.src_endpoint.owner.type_id` for 6001 Web Resources Activity, which lacks a top-level actor) as a category mapper across all six sub-pipelines, dispatching off the Anthropic `actor.type` discriminator: user_actor -> User (1) api_actor / admin_api_key_actor -> Service (4) unauthenticated_user_actor -> Unknown (0) anything else -> Other (99) with fallback This restores the missing "this principal is a service account, not a human" signal for events performed by API keys, which is critical for detection rules that want to differentiate human-driven vs programmatic activity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per OCSF semantics: Unknown (0) = source field missing or empty Other (99) = source has a value but it doesn't map to a known enum The category mapper now: - Maps user_actor -> User (1) - Maps api/admin_api_key/scim_directory_sync/anthropic actors -> Service (4) (added scim_directory_sync_actor and anthropic_actor explicitly; both are programmatic principals, fit Service per OCSF user.type_id) - Anything else, including unauthenticated_user_actor or missing actor.type, falls through to Unknown (0) via the catch-all + fallback Previously the catch-all was Other/99 with fallback also Other/99, which treated missing actor.type as "vendor reported an unknown value". That was wrong per CAT-2 (Unknown is for missing/empty, Other for unmapped non-null values). Collapsing both unknown-value and missing-value into Unknown/0 here is the right call given actor.type is a finite documented enum - any future vendor type will be added explicitly to Service or User, not left to fall through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two validator-driven fixes after running the OCSF validator locally: 1. Drop the actor.user.type_id / src_endpoint.owner.type_id category mappers. The OCSF validator (running against the local 1.7.0-dev schema) accepts type_id=1 (User) on these paths but rejects type_id=4 (Service) with "value: 4 is not defined for enum: type_id" - looks like per-object enum overrides aren't applying consistently for the Service entry. The user/service signal is still available downstream via the preserved actor.type field. 2. OCSF user.at_least_one constraint requires account, name, or uid - not email_addr. Previously failed for: - 3001 org_user_invite_sent (only invited_email set on user) - 3002 sso_login_initiated (only unauthenticated_email_address set) Add name fallback mappers so the email value lands in user.name and actor.user.name when no uid is available. This satisfies the constraint without forging a uid. Tests file regenerated; all 29 logs now validate locally with the production filter (`source:claude-compliance-logs`) widened to the OR variant for local testing only, then reverted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Reintroduce the actor.user.type_id (and src_endpoint.owner.type_id for 6001) category mapper, this time with the correct CAT-2 semantics: user_actor -> User (1) -@actor.type:* -> Unknown (0) <- negation matches missing @actor.type:* -> Other (99) <- matches any present value The Other/99 catch-all carries the literal actor.type value forward via the fallback's `sources: ocsf.actor.user.type: [actor.type]`, so api_actor / admin_api_key_actor / etc. remain queryable as the raw string on ocsf.actor.user.type even though they don't map to a standardized OCSF user.type_id enum value. Service (4) was tried first but the validator (loading the local OCSF 1.7.0-dev schema) rejects type_id=4 on actor.user.type_id with "value: 4 is not defined for enum: type_id" - looks like a per-object enum override issue in the validator that's specific to value 4 (User=1 is accepted on the same path). Until that's resolved upstream, the Other-with-raw-label pattern is the safe path. Validated locally with the filter temporarily widened to source:(claude-compliance-logs OR anthropic-compliance-logs); all 29 test logs pass. Filter reverted to source:claude-compliance-logs before push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… by status Two refinements based on what the OCSF v1.5 enums actually allow: 1. actor.user.type_id / src_endpoint.owner.type_id: add Admin (2) for admin_api_key_actor. Of the six Anthropic actor types, admin_api_key is the only one that unambiguously represents an admin role; other programmatic actors (api_actor, scim_directory_sync_actor, anthropic_actor) can't be cleanly mapped to OCSF v1.5's enum (Service=4 is not defined in v1.5 - only added in 1.6/1.7) and continue to land in Other (99) with the raw actor.type string preserved on ocsf.actor.user.type. Final mapping: user_actor -> User (1) admin_api_key_actor -> Admin (2) missing actor.type -> Unknown (0) (via -@actor.type:* negation) everything else -> Other (99) (raw actor.type carried in ocsf.actor.user.type via fallback sources) 2. 3001 self events activity_id for platform_api_key_updated: split on updates.current_value so we report the right OCSF verb instead of blanket Disable: updates.current_value:active -> Enable (2) updates.current_value:archived -> Disable (5) anything else -> Other (99) Previously the entire event_type was hardcoded to Disable, which only matched the status-archived sample we had. Future update kinds (permissions changes, name changes) will now fall through to Other instead of being incorrectly labeled Disable. Local OCSF validator: all 29 logs valid against the OCSF v1.5 schema. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cepolation-datadog · 2026-05-27T01:08:02Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28f380bccd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Both ocsf.resources (3005 User Access Management) and ocsf.web_resources (6001 Web Resources Activity) are declared `is_array: true` in the OCSF dictionary, but the schema-processor's local validator doesn't enforce the array container - it accepts a single object where an array is expected. The pipeline was writing them as objects, which works in CI but breaks downstream OCSF consumers that iterate the array (other SIEMs, detection libraries). Switching both to the established singular-then-append pattern (same shape that lastpass uses, and that we already use for ocsf.privileges): 3005: - attribute-remapper: resource_id -> ocsf.resource.uid - attribute-remapper: resource_type -> ocsf.resource.type - array-processor: ocsf.resource -> ocsf.resources (append) - schema-processor self-maps ocsf.resources 6001: - attribute-remapper: multi-source IDs -> ocsf.web_resource.uid - attribute-remapper: filename, skill_name -> ocsf.web_resource.name - array-processor: ocsf.web_resource -> ocsf.web_resources (append) - schema-processor self-maps ocsf.web_resources Codex review surfaced this. Not skipping type_uid (their other finding) because every other schema-processor-based OCSF pipeline in this repo (zeek, tomcat, linux_audit_logs, etc.) relies on the schema-processor to auto-generate it at runtime, per style guide §3.3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dd-octo-sts · 2026-05-27T01:21:33Z

Validation Report

All 21 validations passed.

Show details

Validation	Description	Status
`agent-reqs`	Verify check versions match the Agent requirements file	✅
`ci`	Validate CI configuration and Codecov settings	✅
`codeowners`	Validate every integration has a CODEOWNERS entry	✅
`config`	Validate default configuration files against spec.yaml	✅
`dep`	Verify dependency pins are consistent and Agent-compatible	✅
`http`	Validate integrations use the HTTP wrapper correctly	✅
`imports`	Validate check imports do not use deprecated modules	✅
`integration-style`	Validate check code style conventions	✅
`jmx-metrics`	Validate JMX metrics definition files and config	✅
`labeler`	Validate PR labeler config matches integration directories	✅
`legacy-signature`	Validate no integration uses the legacy Agent check signature	✅
`license-headers`	Validate Python files have proper license headers	✅
`licenses`	Validate third-party license attribution list	✅
`metadata`	Validate metadata.csv metric definitions	✅
`models`	Validate configuration data models match spec.yaml	✅
`openmetrics`	Validate OpenMetrics integrations disable the metric limit	✅
`package`	Validate Python package metadata and naming	✅
`qa-label`	Validate the pull request declares whether it needs QA for the next Agent release	✅
`readmes`	Validate README files have required sections	✅
`saved-views`	Validate saved view JSON file structure and fields	✅
`version`	Validate version consistency between package and changelog	✅

View full run

temporal-github-worker-1 Bot added ecosystems/review-requested product/review-requested labels May 26, 2026

dd-octo-sts Bot added the integration/anthropic_compliance_logs label May 26, 2026

This comment has been minimized.

Sign in to view

cepolation-datadog added the qa/skip-qa Automatically skip this PR for the next QA label May 26, 2026

cepolation-datadog and others added 19 commits May 26, 2026 14:18

Revert "Regenerate tests.yaml expected output after 3001 split"

86edcd0

This reverts commit 226b6df.

chatgpt-codex-connector Bot reviewed May 27, 2026

View reviewed changes

Comment thread anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml

Comment thread anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml Outdated

Comment thread anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml Outdated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[anthropic_compliance_logs] Add OCSF v1.5 normalization#23841

[anthropic_compliance_logs] Add OCSF v1.5 normalization#23841
cepolation-datadog wants to merge 21 commits into
masterfrom
andy.anske/anthropic-compliance-logs-ocsf

cepolation-datadog commented May 26, 2026

Uh oh!

This comment has been minimized.

cepolation-datadog commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cepolation-datadog commented May 26, 2026

Summary

Notable decisions

OCSF validation

Out of scope

Test plan

Related

Uh oh!

This comment has been minimized.

cepolation-datadog commented May 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dd-octo-sts Bot commented May 27, 2026

Validation Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant