AIP-76: Propagate partition_date to consumers of partitioned assets#67285
AIP-76: Propagate partition_date to consumers of partitioned assets#67285nathadfield wants to merge 1 commit into
partition_date to consumers of partitioned assets#67285Conversation
27b7c87 to
bb67204
Compare
7bb59e8 to
47fcfab
Compare
|
cc @Lee-W |
244b329 to
cb36160
Compare
|
so |
4338ce1 to
28c6aa9
Compare
Good point, @Lee-W. |
28c6aa9 to
27444f6
Compare
| ``partition_key``. ``RollupMapper`` also leaves ``partition_date`` | ||
| ``None`` — a rollup collapses many upstream partitions, each with its |
There was a problem hiding this comment.
I think RollupMapper should have partition_date in some cases. But I have another open PR handling it
| from airflow.partition_mappers.identity import IdentityMapper | ||
| from airflow.partition_mappers.temporal import _BaseTemporalMapper | ||
|
|
||
| if is_rollup(mapper): |
There was a problem hiding this comment.
I'm now working on another PR to avoid this type switch. #68266 adds a polymorphic PartitionMapper.to_partition_date(key) where composites delegate (RollupMapper → upstream_mapper, FanOut, Chain) and temporal mappers return the anchor. Adopting it here would support rollup/fan-out/chain, and remove the isinstance/is_rollup branching.
Keep from this PR (not in #68266)
- IdentityMapper passthrough (its key can't yield a date, so the threaded source date is needed)
- theexposure layer —
Context["partition_date"] - execution-API field + Cadwyn versioning
DAGRunResponse, docs. fix(scheduler): populate partition_date for temporal asset partitions #68266 only stampsDagRun.partition_datein the scheduler.
27444f6 to
f4808a4
Compare
|
Thanks @Lee-W , that's helpful context. Agreed on the type switch: I'll leave the From your list I'll keep on this side: the Same on |
f4808a4 to
1c1d9c9
Compare
|
Thanks for bearing with all the sudden change 🙏 #68266 was just merged. i should be able to take a look again tomorrow. Thanks! |
4f63855 to
0c5aade
Compare
|
@Lee-W rebased this onto main now that #68266 is in, and reworked it so it builds on your resolver instead of re-implementing the same thing. Quick rundown for when you get a chance to look again. The scheduler now leans entirely on your What's left on our side is just the bit #68266 can't do: the IdentityMapper case. Its key can't be turned back into a date, so we carry the producer's date onto the APDR and the scheduler falls back to that when the resolver doesn't return one. Plus the exposure layer that was always the point of this PR: One thing I changed in your code that I want to call out: A couple of other things while I was in here. Last one is more of a question. Thanks again for the steer on all this. |
|
The problem
|
|
@Lee-W one idea: rather than returning a tuple, we could pass the carried date into the resolver and keep the return type a plain |
|
Yep, sounds like a good idea! |
Consumers of partitioned assets receive partition_key (str) but
partition_date (datetime) is None on the consumer DagRun, so templates
have to parse the key string. Propagate the datetime form alongside
the string so consumers can use the canonical filter idiom
`{{ partition_date | ds }}` and friends.
The consumer's partition_date is computed alongside its target
partition_key at APDR creation (in assets/manager.py:_queue_partitioned_dags),
threaded in from the producer DagRun's partition_date via register_asset_change
rather than stored on AssetEvent, and persisted on AssetPartitionDagRun. The
scheduler copies apdr.partition_date into the consumer DagRun, so the date
stays consistent with the mapper that produced the key. IdentityMapper passes
the source date through; the StartOf*Mapper family normalizes via
to_downstream_normalized; other mappers leave partition_date None and consumers
fall back to partition_key.
closes: apache#67239
0c5aade to
26ef605
Compare
Great! I've just added it. |
What this does
Makes
partition_datea first-class template variable on the consumer side of AIP-76 partitioned assets, so authors can write:instead of slicing strings out of
partition_key.How
partition_dateis resolved by the scheduler. Temporal and composite mappers derive it from the partition key viaPartitionMapper.to_partition_date(the resolver landed in fix(scheduler): populate partition_date for temporal asset partitions #68266).IdentityMapperis the one case the scheduler can't resolve, since its key carries no datetime, so this PR carries the producer DagRun'spartition_dateonto theAssetPartitionDagRun(threaded in viaregister_asset_change, not stored onAssetEvent) and passes that carry into the scheduler's resolver, which returns it when no temporal mapper contributes a date.partition_dateis populated whenever the consumer's mapper can resolve the key to a datetime: directly for theStartOf*Mapperfamily, by delegation forRollupMapper/ChainMapper/FanOutMapperwhose effective child is temporal, and by carry-through forIdentityMapper. Mappers whose key carries no temporal meaning (ProductMapper,AllowedKeyMapper, custom) leave itNone, and those consumers fall back topartition_key.partition_dateis output-only on trigger payloads, so a manual trigger cannot create an inconsistent(partition_key, partition_date)pair.Context["partition_date"](coerced to tz-aware), the execution-APIDagRun(Cadwyn-versioned to strip the field for older Task SDK clients), the core-APIDAGRunResponse, OTel span attributes, and thestandardprovider'sPythonVirtualenvOperator/ExternalPythonOperatorserializable-context allow-list.Relationship to #68266
#68266 added the polymorphic
PartitionMapper.to_partition_dateand the scheduler's_resolve_partition_date, which now own temporal and composite resolution. This PR builds on that and adds the parts it doesn't cover: the IdentityMapper source-date carry and the exposure layer above._resolve_partition_dategained acarried_partition_dateparameter (the APDR's carried date) and keeps a plaindatetime | Nonereturn: the carry is returned only when no temporal mapper contributes, and is never substituted for a date the resolver deliberately suppressed (conflicting temporal mappers, or a mapper that raised).An earlier revision of this PR also stored the source
partition_dateonAssetEvent; that column was dropped, since the consumer DagRun is self-contained once created and the source date is threaded at APDR creation instead.Known limitations / follow-ups
ProductMapper,AllowedKeyMapper, custom non-temporal mappers),partition_dateisNoneand consumers fall back topartition_key.target_partition_keyto differentpartition_datevalues (two identity-mapped upstreams carrying different producer dates, or temporal mappers configured with different timezones), the carried date is suppressed toNonerather than picked by arrival order, so the consumer DagRun is not stamped with an unstable value.closes: #67239
Was generative AI tooling used to co-author this PR?
Generated-by: Claude Code (Opus 4.8)