Skip to content

[anthropic_compliance_logs] Add OCSF v1.5 normalization#23841

Draft
cepolation-datadog wants to merge 21 commits into
masterfrom
andy.anske/anthropic-compliance-logs-ocsf
Draft

[anthropic_compliance_logs] Add OCSF v1.5 normalization#23841
cepolation-datadog wants to merge 21 commits into
masterfrom
andy.anske/anthropic-compliance-logs-ocsf

Conversation

@cepolation-datadog
Copy link
Copy Markdown
Contributor

Summary

Maps Anthropic Compliance API audit events (Claude Enterprise) into OCSF v1.5 so analysts can correlate Claude activity with other security signals in Datadog Cloud SIEM. Adds 5 OCSF sub-pipelines + a pre-transformations pipeline.

OCSF Class Events covered
Account Change [3001] org_user_invite_sent/accepted, org_user_deleted, platform_api_key_created/updated, claude_user_settings_updated
Authentication [3002] sso_login_initiated, sso_login_succeeded
User Access Management [3005] role_assignment_granted/revoked
Web Resources Activity [6001] All claude_* (chat, project, file, document, artifact, skill), org_users_listed, platform_usage_report_*
API Activity [6003] compliance_api_accessed

Pipeline structure follows the OCSF pipeline style guide. 29 representative samples added to the tests file, one per (event_type, actor_type) shape observed in a 30-day pull from a real Claude Enterprise tenant.

Notable decisions

  • preserveSource: false → true on 5 existing standard attribute-remappers so OCSF mappers can read the original actor.* fields per style guide §7.1 ("Don't use Datadog standard attributes as sources"). Additive — usr.*/network.*/http.* keep working as before.
  • Local validator quirk: the OCSF validator CLI substitutes LOGS_SOURCE in tests.yaml with the integration id (anthropic-compliance-logs), not the installation_sources entry (claude-compliance-logs). With the production-correct filter source:claude-compliance-logs, local bzl run //...:ocsf-validator will show result.custom as raw JSON (no transform). The 29/29 valid result was confirmed against a temporarily-widened filter — pipeline is correct, just won't exercise via the local CLI today. Worth a follow-up to the validator owners.
  • ocsf.metadata.version explicitly set via string-builder + self-map even though §3.3 of the style guide lists it as auto-generated. The validator errored with schema_version_not_found without it. Style-guide-vs-validator discrepancy; deferred to validator behavior.
  • Event names use _ not .: the Anthropic public docs and earlier internal RFC mixed api_key.created with api_key_created — real Compliance API names use underscores (confirmed via direct sandbox curl). Used the real names throughout.

OCSF validation

bzl run //domains/event-platform/libs/ocsf-validator-cli:ocsf-validator -- \
  --input integrations-core/anthropic_compliance_logs/assets/logs/anthropic-compliance-logs_tests.yaml \
  --pipeline integrations-core/anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml

Result: all 29 logs valid, 0 errors, 0 warnings (run against a widened filter — see "Local validator quirk" above).

Out of scope

  • ~150 event types ship from Anthropic; only ~27 surfaced organically in 30 days of tenant traffic. The remaining ~120 will fall through all 5 sub-pipeline filters with no OCSF transformation applied. A Base Event [0] catch-all is a reasonable follow-up once we see which other types are common in production.
  • web_resources.type not populated for 6001 — OCSF web_resource has no type_id, so the standard schema-category-mapper pattern (which requires both name+id targets) doesn't apply. Could be added later via grok-parser on event_code if needed.

Test plan

  • CI passes validate-logs and validate-pipelines
  • Round-trip the pipeline through dd.datad0g.com staging via scripts/ocsf/upsert_ocsf_yaml_pipeline.py and confirm backend accepts it
  • Verify in staging that real Claude Compliance API events (once the crawler ships next week) get OCSF-normalized as expected
  • Confirm with crawler team that the production source tag is claude-compliance-logs (matches installation_sources)

Related

🤖 Generated with Claude Code

Maps Anthropic Compliance API audit events to OCSF v1.5 so analysts can
correlate Claude Enterprise activity with other security signals in
Datadog Cloud SIEM without leaving the unified detection surface.

Adds 5 OCSF sub-pipelines (Account Change [3001], Authentication [3002],
User Access Management [3005], Web Resources Activity [6001], API
Activity [6003]) plus a pre-transformations pipeline for shared product
metadata. Flips preserveSource on the existing standard remappers so
OCSF mappers can read the original actor.* fields per style guide §7.1.

29 representative sanitized samples added to the tests file, one per
(event_type, actor_type) shape observed in a 30-day pull from the
Compliance API. Local OCSF validator: all 29 logs valid, 0 errors,
0 warnings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@datadog-datadog-prod-us1

This comment has been minimized.

@cepolation-datadog cepolation-datadog added the qa/skip-qa Automatically skip this PR for the next QA label May 26, 2026
cepolation-datadog and others added 19 commits May 26, 2026 14:18
- Add type: integer to numeric OCSF facets (activity_id, category_uid,
  class_uid, type_uid, severity_id, status_id, auth_protocol_id) and
  type: boolean to is_mfa per CI's facet-conflict suggestions
- Rename "Type UID" → "Type ID" and "Is MFA" → "Multi Factor
  Authentication" to match cross-integration facet conventions
- Fix schema-remapper at 6001 index 12: align name source order with
  the actual sources list (chat, file, project_document, artifact,
  skill, project, id)
- Regenerate tests.yaml in CI's expected format (pretty-JSON sample,
  message field, doubled tags, timestamp)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous attempt produced tests.yaml with alphabetical JSON key ordering
in the sample/message fields. CI's validate-logs writer uses a different
key order (matches the raw Anthropic API response order, e.g. for
user_actor: email_address, user_id, ip_address, type, user_agent).

Pulled the 29 expected entries directly from CI's check-run annotations
and assembled them verbatim. Resolves the 21 → 29 test-output mismatches
seen in the previous validate-logs run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each sub-pipeline had a two-step pattern (attribute-remapper from
created_at to ocsf.time, then grok-parser parsing ocsf.time as a date).
Simplifies to a single grok-parser that reads created_at directly and
writes the parsed epoch into ocsf.time. Net result is identical; the
pipeline is just 10 fewer lines per sub-pipeline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both are base-event fields present on every OCSF class, so per style
guide §2 they belong in the pre-transformations pipeline rather than
duplicated across each sub-pipeline.

- Pre-transformations: grok-parser writes parsed epoch to ocsf.time,
  attribute-remapper copies created_at to ocsf.metadata.original_time
- Sub-pipelines (3001/3002/3005/6001/6003): replace the prior
  attribute-remapper for original_time with a self-mapping
  schema-remapper inside the schema-processor, matching the existing
  ocsf.time self-map pattern

Net result is identical, with ~50 fewer lines and no duplication.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Probed the Compliance API and confirmed two additional auth-related
event types exist beyond what the original 30-day pull surfaced:
sso_login_failed and user_logged_out.

- Widen sub-pipeline filter to include both
- activity_id: keep Logon (1) for all sso_login_* states; add Logoff
  (2) branch for user_logged_out
- status_id: keep Failure (2) for sso_login_failed; treat
  user_logged_out as Success (1) since the verb itself succeeded

MFA challenge events do not exist in the API — Anthropic delegates MFA
entirely to the SSO IdP. Aside from SSO, no other login methods (Google,
Apple, magic link) exist for Enterprise tenants; the names in the
public support article are stale.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Restores semantic correctness of ocsf.user — it now reflects the
target of the change, not the actor. Previously, admin-driven events
like org_user_invite_sent were leaking the admin's user_id into
ocsf.user.uid via a fallback chain, conflating the doer with the
target.

Target events (admin acting on someone else):
  - org_user_deleted: user.uid/email from deleted_user_*
  - org_user_invite_sent: user.email_addr from invited_email
    (no uid available — invitee hasn't accepted yet)

Self events (user acting on themselves):
  - org_user_invite_accepted: user.* from actor.*
  - claude_user_settings_updated: user.* from actor.*
  - platform_api_key_created/updated: user.* from actor.*,
    user.credential_uid from api_key_id

Both sub-pipelines apply the same schema-processor (className: Account
Change, classUid: 3001); the skill's NAMING-7 rule allows duplicate
class_uids when disambiguated via the outer pipeline name.

Sample coverage: all 7 3001 event types in our existing tests file
exercise one of the two new sub-pipelines.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The split changed ocsf.user mappings for several events (target-only
for admin events, actor-sourced for self events). Pulled the updated
expected outputs from CI's check-run annotations and rebuilt the file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This is the only test whose output actually changed from the 3001 split
- the target-events sub-pipeline no longer falls back to actor.user_id
for ocsf.user.uid, so the invited user's ocsf.user has only email_addr
(invited_email) and no uid (correct - the invitee hasn't accepted yet,
so no user_id exists).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Compliance API exposes 10 authentication-related activity types
beyond the SSO ones we initially handled (confirmed via the public API
docs and live API probe). Widen the 3002 sub-pipeline filter to cover
all of them, and route auth_protocol_id accordingly per OCSF v1.5 enum:

  SAML (5):  sso_login_initiated/succeeded/failed,
             sso_second_factor_magic_link
  OpenID (4): social_login_succeeded (Google/Apple/Microsoft are OIDC)
  Other (99): magic_link_login_initiated/succeeded/failed,
              anonymous_mobile_login_attempted, user_logged_out

activity_id additions:
  Logon (1): all the above except user_logged_out
  Logoff (2): user_logged_out

status_id additions:
  Success (1): *_succeeded, sso_second_factor_magic_link, user_logged_out
  Failure (2): *_failed
  Unknown (0): *_initiated, anonymous_mobile_login_attempted (in-flight,
               terminal outcome not yet known)

org_magic_link_second_factor_toggled is intentionally excluded - it's
an org config change, not an auth event, so it belongs in Application
Activity [6002] (not added yet) rather than 3002.

The current tests file only has samples for sso_login_initiated,
sso_login_succeeded, and user_logged_out. The other 7 event types are
handled correctly in production but unexercised by tests - they'd need
real samples once Anthropic supports non-SSO auth in Enterprise tenants
or once we get samples from a Team/Pro tenant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The schema-remapper writing to ocsf.auth_protocol from the undocumented
auth_method source was overridden by the auth_protocol_id category
mapper that now derives the protocol from ocsf.metadata.event_code.
Removing the redundant mapper.

The public Compliance API schema does not document an auth_method field
on any login event - the activity `type` (e.g. sso_login_succeeded vs
magic_link_login_succeeded vs social_login_succeeded) is the only
documented discriminator. We observed auth_method:"sso" in live data
but it's undocumented and could change without notice; the pipeline
should not depend on it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Compliance API docs (https://platform.claude.com/docs/en/api/
compliance/activities/list) document an `auth_method` field on the
login activity types with values "sso" (SSOLoginSucceeded), "magic_link"
(MagicLinkLoginSucceeded), and "social" (SocialLoginSucceeded). Route
ocsf.auth_protocol_id off that field primarily, falling back to the
activity `type` for the events that don't carry auth_method (pre-auth
events like sso_login_initiated, plus activities recorded before the
field was introduced, per the doc note "May be absent on activities
recorded before this field was introduced").

Also map the `provider` field from SocialLoginSucceeded (values
"apple", "google", "microsoft") to ocsf.actor.idp.name.

Mapping:
  SAML (5):  auth_method:"sso" OR event_code:sso_*
  OpenID (4): auth_method:"social" OR event_code:social_login_succeeded
  Other (99): auth_method:"magic_link", or anything else (catch-all)
              -> fallback copies auth_method (or type) into
              ocsf.auth_protocol

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For pre-auth events (sso_login_initiated, magic_link_login_initiated)
the actor is unauthenticated_user_actor and only carries
unauthenticated_email_address - no verified user_id exists yet. The
previous mapping fell back from actor.user_id to
unauthenticated_email_address for both ocsf.actor.user.uid and
ocsf.user.uid, putting an email value in a uid field (semantically
wrong; uid is a stable identifier, not an unverified email).

After this change, pre-auth events leave user.uid and actor.user.uid
null and rely on user.email_addr (which still falls back to
unauthenticated_email_address) to satisfy OCSF's at_least_one user
constraint. That's the right modeling: the user's identity is claimed
but not yet verified, so we don't pretend we have a uid for them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
organization_id is a ULID identifier (e.g. org_01...) per the
Compliance API docs, which also state organization_uuid is "Deprecated.
Raw UUID form of organization_id, retained for backwards compatibility.
Prefer organization_id."

Previously I had:
  organization_id  -> ocsf.*.org.name   (wrong - ULID is not a name)
  organization_uuid -> ocsf.*.org.uid   (deprecated form going to the
                                         preferred target)

Now:
  organization_id, organization_uuid -> ocsf.*.org.uid  (multi-source,
                                                         organization_id
                                                         preferred via
                                                         overrideOnConflict
                                                         false)

The org.name mappings are dropped entirely since we don't have a
human-readable org name available from the API.

Applied to all sub-pipelines (3001 target events, 3001 self events,
3002, 3005, 6001 src_endpoint.owner.org).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al_uid

Three semantic cleanups:

1. Remove `id -> ocsf.session.uid` from 3002 Authentication. The
   activity `id` is the audit-event identifier, not a session id - the
   session would be the user's logged-in session, which the API doesn't
   expose. Mapping the wrong field there was misleading.

2. Remove `api_key_id -> ocsf.user.credential_uid` from 3001 self
   events. OCSF deprecates `credential_uid` in 1.6.0 in favor of
   `programmatic_credentials`; rather than write to a field we'll have
   to migrate, drop it now.

3. Add `ocsf.actor.user.type_id` (and `ocsf.src_endpoint.owner.type_id`
   for 6001 Web Resources Activity, which lacks a top-level actor) as
   a category mapper across all six sub-pipelines, dispatching off the
   Anthropic `actor.type` discriminator:

     user_actor              -> User    (1)
     api_actor / admin_api_key_actor -> Service (4)
     unauthenticated_user_actor -> Unknown (0)
     anything else           -> Other   (99) with fallback

   This restores the missing "this principal is a service account, not
   a human" signal for events performed by API keys, which is critical
   for detection rules that want to differentiate human-driven vs
   programmatic activity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per OCSF semantics:
  Unknown (0) = source field missing or empty
  Other (99)  = source has a value but it doesn't map to a known enum

The category mapper now:
- Maps user_actor -> User (1)
- Maps api/admin_api_key/scim_directory_sync/anthropic actors -> Service (4)
  (added scim_directory_sync_actor and anthropic_actor explicitly; both
  are programmatic principals, fit Service per OCSF user.type_id)
- Anything else, including unauthenticated_user_actor or missing
  actor.type, falls through to Unknown (0) via the catch-all + fallback

Previously the catch-all was Other/99 with fallback also Other/99, which
treated missing actor.type as "vendor reported an unknown value". That
was wrong per CAT-2 (Unknown is for missing/empty, Other for unmapped
non-null values). Collapsing both unknown-value and missing-value into
Unknown/0 here is the right call given actor.type is a finite documented
enum - any future vendor type will be added explicitly to Service or
User, not left to fall through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two validator-driven fixes after running the OCSF validator locally:

1. Drop the actor.user.type_id / src_endpoint.owner.type_id category
   mappers. The OCSF validator (running against the local 1.7.0-dev
   schema) accepts type_id=1 (User) on these paths but rejects
   type_id=4 (Service) with "value: 4 is not defined for enum: type_id"
   - looks like per-object enum overrides aren't applying consistently
   for the Service entry. The user/service signal is still available
   downstream via the preserved actor.type field.

2. OCSF user.at_least_one constraint requires account, name, or uid -
   not email_addr. Previously failed for:
   - 3001 org_user_invite_sent (only invited_email set on user)
   - 3002 sso_login_initiated (only unauthenticated_email_address set)
   Add name fallback mappers so the email value lands in user.name and
   actor.user.name when no uid is available. This satisfies the
   constraint without forging a uid.

Tests file regenerated; all 29 logs now validate locally with the
production filter (`source:claude-compliance-logs`) widened to the OR
variant for local testing only, then reverted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reintroduce the actor.user.type_id (and src_endpoint.owner.type_id for
6001) category mapper, this time with the correct CAT-2 semantics:

  user_actor              -> User    (1)
  -@actor.type:*          -> Unknown (0)   <- negation matches missing
  @actor.type:*           -> Other   (99)  <- matches any present value

The Other/99 catch-all carries the literal actor.type value forward via
the fallback's `sources: ocsf.actor.user.type: [actor.type]`, so
api_actor / admin_api_key_actor / etc. remain queryable as the raw
string on ocsf.actor.user.type even though they don't map to a
standardized OCSF user.type_id enum value.

Service (4) was tried first but the validator (loading the local OCSF
1.7.0-dev schema) rejects type_id=4 on actor.user.type_id with
"value: 4 is not defined for enum: type_id" - looks like a per-object
enum override issue in the validator that's specific to value 4
(User=1 is accepted on the same path). Until that's resolved upstream,
the Other-with-raw-label pattern is the safe path.

Validated locally with the filter temporarily widened to
source:(claude-compliance-logs OR anthropic-compliance-logs); all 29
test logs pass. Filter reverted to source:claude-compliance-logs before
push.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… by status

Two refinements based on what the OCSF v1.5 enums actually allow:

1. actor.user.type_id / src_endpoint.owner.type_id: add Admin (2) for
   admin_api_key_actor. Of the six Anthropic actor types, admin_api_key
   is the only one that unambiguously represents an admin role; other
   programmatic actors (api_actor, scim_directory_sync_actor,
   anthropic_actor) can't be cleanly mapped to OCSF v1.5's enum
   (Service=4 is not defined in v1.5 - only added in 1.6/1.7) and
   continue to land in Other (99) with the raw actor.type string
   preserved on ocsf.actor.user.type.

   Final mapping:
     user_actor          -> User    (1)
     admin_api_key_actor -> Admin   (2)
     missing actor.type  -> Unknown (0)   (via -@actor.type:* negation)
     everything else     -> Other   (99)  (raw actor.type carried in
                                           ocsf.actor.user.type via
                                           fallback sources)

2. 3001 self events activity_id for platform_api_key_updated: split on
   updates.current_value so we report the right OCSF verb instead of
   blanket Disable:

     updates.current_value:active   -> Enable  (2)
     updates.current_value:archived -> Disable (5)
     anything else                  -> Other   (99)

   Previously the entire event_type was hardcoded to Disable, which
   only matched the status-archived sample we had. Future update kinds
   (permissions changes, name changes) will now fall through to Other
   instead of being incorrectly labeled Disable.

Local OCSF validator: all 29 logs valid against the OCSF v1.5 schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cepolation-datadog
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28f380bccd

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml Outdated
Comment thread anthropic_compliance_logs/assets/logs/anthropic-compliance-logs.yaml Outdated
Both ocsf.resources (3005 User Access Management) and ocsf.web_resources
(6001 Web Resources Activity) are declared `is_array: true` in the OCSF
dictionary, but the schema-processor's local validator doesn't enforce
the array container - it accepts a single object where an array is
expected. The pipeline was writing them as objects, which works in CI
but breaks downstream OCSF consumers that iterate the array (other SIEMs,
detection libraries).

Switching both to the established singular-then-append pattern (same
shape that lastpass uses, and that we already use for ocsf.privileges):

3005:
  - attribute-remapper: resource_id   -> ocsf.resource.uid
  - attribute-remapper: resource_type -> ocsf.resource.type
  - array-processor:    ocsf.resource -> ocsf.resources (append)
  - schema-processor self-maps ocsf.resources

6001:
  - attribute-remapper: multi-source IDs -> ocsf.web_resource.uid
  - attribute-remapper: filename, skill_name -> ocsf.web_resource.name
  - array-processor:    ocsf.web_resource  -> ocsf.web_resources (append)
  - schema-processor self-maps ocsf.web_resources

Codex review surfaced this. Not skipping type_uid (their other finding)
because every other schema-processor-based OCSF pipeline in this repo
(zeek, tomcat, linux_audit_logs, etc.) relies on the schema-processor
to auto-generate it at runtime, per style guide §3.3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dd-octo-sts
Copy link
Copy Markdown
Contributor

dd-octo-sts Bot commented May 27, 2026

Validation Report

All 21 validations passed.

Show details
Validation Description Status
agent-reqs Verify check versions match the Agent requirements file
ci Validate CI configuration and Codecov settings
codeowners Validate every integration has a CODEOWNERS entry
config Validate default configuration files against spec.yaml
dep Verify dependency pins are consistent and Agent-compatible
http Validate integrations use the HTTP wrapper correctly
imports Validate check imports do not use deprecated modules
integration-style Validate check code style conventions
jmx-metrics Validate JMX metrics definition files and config
labeler Validate PR labeler config matches integration directories
legacy-signature Validate no integration uses the legacy Agent check signature
license-headers Validate Python files have proper license headers
licenses Validate third-party license attribution list
metadata Validate metadata.csv metric definitions
models Validate configuration data models match spec.yaml
openmetrics Validate OpenMetrics integrations disable the metric limit
package Validate Python package metadata and naming
qa-label Validate the pull request declares whether it needs QA for the next Agent release
readmes Validate README files have required sections
saved-views Validate saved view JSON file structure and fields
version Validate version consistency between package and changelog

View full run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant