Conversation
|
Thank you for the PR! The changelog has not been updated, so here is a friendly reminder to check if you need to add an entry. |
There was a problem hiding this comment.
Pull request overview
This PR introduces comprehensive typing for MusicBrainz API payloads through TypedDict models, refactors parsing logic into smaller helpers, normalizes API responses at the boundary (converting dashes to underscores and grouping relations), and replaces hand-written test dictionaries with factory-based data generation.
Changes:
- Introduced typed domain models (
Release,Recording,Medium, etc.) inbeetsplug/_utils/musicbrainz.py - Normalized MusicBrainz API responses by converting dashed keys to underscores and grouping relations
- Refactored parsing into focused helper methods (
_parse_artist_credits,_parse_release_group, etc.)
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
test/rsrc/mbpseudo/pseudo_release.json |
Updated test fixture to use underscored keys matching normalized payload format |
test/rsrc/mbpseudo/official_release.json |
Updated test fixture to use underscored keys matching normalized payload format |
test/plugins/utils/test_musicbrainz.py |
Renamed test from test_group_relations to test_normalize_data reflecting the renamed internal method |
test/plugins/test_musicbrainz.py |
Replaced manual test data construction with factory-based approach and updated assertions to match factory defaults |
test/plugins/test_mbpseudo.py |
Updated references from dashed keys to underscored keys |
test/plugins/factories/musicbrainz.py |
Added factory classes for generating typed MusicBrainz test data |
setup.cfg |
Enabled strict mypy checking for musicbrainz-related modules |
pyproject.toml |
Added pytest-factoryboy test dependency |
beetsplug/musicbrainz.py |
Refactored parsing logic into typed helper methods and updated to consume normalized payloads |
beetsplug/mbpseudo.py |
Updated to use typed Release structures and underscored keys |
beetsplug/_utils/musicbrainz.py |
Added comprehensive TypedDict models and renamed/enhanced _group_relations to _normalize_data |
Comments suppressed due to low confidence (2)
beetsplug/_utils/musicbrainz.py:1
- The length checks are redundant. If
map(int, date_str.split('-'))produces fewer than three parts, accessingparts[1]orparts[2]will raise anIndexErrorrather than returningNone. The current implementation doesn't actually prevent the error. Consider usingparts[i] if i < len(parts) else Noneor a more concise approach like padding the list.
"""Helpers for communicating with the MusicBrainz webservice.
beetsplug/musicbrainz.py:1
- Corrected spelling of 'attribute_credits' to 'attribute_values' to match the actual field name in ArtistRelation TypedDict.
# This file is part of beets.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "Recording name": "5d564c8f-97de-3572-94bb-7f40ad661499", | ||
| "Release group name": "156e24ca-8746-3cfc-99ae-0a867c765c67", | ||
| "Release name": "df187855-059b-3514-9d5e-d240de0b4228", | ||
| "Search hint": "abc2db8a-7386-354d-82f4-252c0213cbe4", |
There was a problem hiding this comment.
Corrected 'Search hint' type_id which appears to be a typo (should match the type_id pattern used elsewhere, likely '829662f2-a781-3ec8-8b46-fbcea6196f81' based on the JSON fixtures).
| "Search hint": "abc2db8a-7386-354d-82f4-252c0213cbe4", | |
| "Search hint": "829662f2-a781-3ec8-8b46-fbcea6196f81", |
| if not date_str: | ||
| return None, None, None | ||
|
|
||
| parts = list(map(int, date_str.split("-"))) |
There was a problem hiding this comment.
The function _get_date attempts to convert all date parts to integers without error handling. If the date string contains non-numeric values (e.g., malformed dates from the API), this will raise a ValueError. Consider wrapping the conversion in a try-except block or validating the input first.
| parts = list(map(int, date_str.split("-"))) | |
| raw_parts = date_str.split("-") | |
| parts: list[int | None] = [] | |
| for raw in raw_parts[:3]: | |
| try: | |
| parts.append(int(raw)) | |
| except ValueError: | |
| parts.append(None) |
❌ 9 Tests Failed:
View the top 3 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
|
I also did a similar sort of refactoring recently, and this is what I ended up with: https://github.com/prTopi/beets-vocadb/tree/experimental/beetsplug/vocadb/vocadb_api_client Feel free to take inpiration! Here are some suggestions:
|
This way we only ever handle full Recording objects.
db30e6b to
f867887
Compare
| artist = factory.SubFactory(ArtistFactory) | ||
| joinphrase = "" | ||
| name = factory.LazyAttribute(lambda o: f"{o.artist['name']} Credit") | ||
|
|
There was a problem hiding this comment.
grug see many factory test now, good for grug
but grug worry: medium factory have big post_generation magic on line 201 that change track position. grug think this hide where track position come from. what if future grug want track at position 5 but factory always override? harder for grug to understand test when magic happen in background.
maybe better track position set explicit in test, not magic in factory? more code lines, yes, but grug brain see what happen easier.
| return factories.MediumFactory(**kwargs) # type: ignore[return-value] | ||
|
|
||
|
|
||
| def release_factory(**kwargs) -> mb.Release: | ||
| return factories.ReleaseFactory(**kwargs) # type: ignore[return-value] |
There was a problem hiding this comment.
grug see type: ignore[return-value] on medium_factory and release_factory, lines 64 and 68. grug wonder why need ignore? if typing not work, maybe something wrong with factory return type?
factory_boy return instance that match dict shape, but mypy not understand. grug think maybe better use proper TypedDict cast or fix return annotation to help mypy, instead of tell mypy to be quiet.
| class ReleaseFactory(_IdFactory): | ||
| class Params: | ||
| id_base = 1000000 | ||
|
|
||
| aliases = factory.List([]) | ||
| artist_credit = factory.List( | ||
| [factory.SubFactory(ArtistCreditFactory, artist__id_base=10)] | ||
| ) | ||
| asin = factory.LazyAttribute(lambda o: f"{o.title} Asin") | ||
| barcode = "0000000000000" | ||
| cover_art_archive = factory.Dict( | ||
| { | ||
| "artwork": False, | ||
| "back": False, | ||
| "count": 0, | ||
| "darkened": False, | ||
| "front": False, | ||
| } | ||
| ) | ||
| disambiguation = factory.LazyAttribute( | ||
| lambda o: f"{o.title} Disambiguation" | ||
| ) | ||
| genres = factory.List([factory.SubFactory(GenreFactory)]) | ||
| label_info = factory.List([factory.SubFactory(LabelInfoFactory)]) | ||
| media = factory.List([factory.SubFactory(MediumFactory)]) | ||
| packaging: str | None = None | ||
| packaging_id: str | None = None | ||
| quality = "normal" | ||
| release_events = factory.List( | ||
| [ | ||
| factory.SubFactory( | ||
| ReleaseEventFactory, area=None, date="2021-03-26" | ||
| ), | ||
| factory.SubFactory( | ||
| ReleaseEventFactory, | ||
| area__iso_3166_1_codes=["US"], | ||
| date="2020-01-01", | ||
| ), | ||
| ] | ||
| ) | ||
| release_group = factory.SubFactory(ReleaseGroupFactory) | ||
| status = "Official" | ||
| status_id = "4e304316-386d-3409-af2e-78857eec5cfe" | ||
| tags = factory.List([factory.SubFactory(TagFactory)]) | ||
| text_representation = factory.SubFactory(TextRepresentationFactory) | ||
| title = "Album" |
There was a problem hiding this comment.
grug see country field missing from ReleaseFactory but Release TypedDict in beetsplug/_utils/musicbrainz.py line 513 say country: NotRequired[str | None].
release factory should have country field with default value, maybe None or "US" to match release_events. right now factory not make full release shape, which could confuse grug if test need country.
| @staticmethod | ||
| def _parse_artist_credits(artist_credits: list[ArtistCredit]) -> ArtistInfo: | ||
| """Normalize MusicBrainz artist-credit data into tag-friendly fields. | ||
|
|
||
| MusicBrainz represents credits as a sequence of credited artists, each | ||
| with a display name and a `joinphrase` (for example `' & '`, `' feat. | ||
| '`, or `''`). This helper converts that structured representation into | ||
| both: | ||
|
|
||
| - Single string values suitable for common tags (concatenated names with | ||
| joinphrases preserved). | ||
| - Parallel lists that keep the per-artist granularity for callers that | ||
| need to reason about individual credited artists. | ||
|
|
||
| When available, a preferred alias is used for the canonical artist name | ||
| and sort name, while the credit name preserves the exact credited text | ||
| from the release. | ||
| """ | ||
| artist_parts: list[str] = [] | ||
| artist_sort_parts: list[str] = [] | ||
| artist_credit_parts: list[str] = [] | ||
| artists: list[str] = [] | ||
| artists_sort: list[str] = [] | ||
| artists_credit: list[str] = [] | ||
| artists_ids: list[str] = [] | ||
|
|
||
| for el in artist_credits: | ||
| artists_ids.append(el["artist"]["id"]) | ||
| alias = _preferred_alias(el["artist"].get("aliases", [])) | ||
| artist_object = alias or el["artist"] | ||
|
|
||
| joinphrase = el["joinphrase"] | ||
| for name, parts, multi in ( | ||
| (artist_object["name"], artist_parts, artists), | ||
| (artist_object["sort_name"], artist_sort_parts, artists_sort), | ||
| (el["name"], artist_credit_parts, artists_credit), | ||
| ): | ||
| parts.extend([name, joinphrase]) | ||
| multi.append(name) | ||
|
|
||
| return { | ||
| "artist": "".join(artist_parts), | ||
| "artist_id": artists_ids[0], | ||
| "artist_sort": "".join(artist_sort_parts), | ||
| "artist_credit": "".join(artist_credit_parts), | ||
| "artists": artists, | ||
| "artists_ids": artists_ids, | ||
| "artists_sort": artists_sort, | ||
| "artists_credit": artists_credit, | ||
| } |
There was a problem hiding this comment.
grug see _parse_artist_credits now return dict with 8 keys. this many key make grug head spin!
old code spread artist parsing through many function like _flatten_artist_credit and _multi_artist_credit. new code put all in one big dict. grug like one place better than many place, but 8 field lot to keep in brain at once.
maybe helper could return smaller pieces? like one dict for single strings, one dict for lists? less key per dict easier for grug understand.
| recording = track["recording"] | ||
| # Prefer track data, where present, over recording data. | ||
| for key in ("title", "artist_credit", "length"): | ||
| recording[key] = track[key] or recording[key] |
There was a problem hiding this comment.
grug see line 514 do recording[key] = track[key] or recording[key]. but grug worry: what if track[key] is empty string "" or 0? then track[key] or recording[key] skip track value even though track value exist!
old code at line ~556 check if track.get(key): which also have same problem. but mutating recording dict in place very dangerous, harder to see what change. maybe better make copy first or use explicit None check like if track[key] is not None:?
| recording[key] = track[key] or recording[key] | |
| value = track.get(key) | |
| if value is not None: | |
| recording[key] = value |
| media=( | ||
| medias.pop() | ||
| if len(medias := {t.media for t in track_infos}) == 1 | ||
| else "Media" | ||
| ), |
There was a problem hiding this comment.
grug see at line 571-575, code do walrus operator with set comprehension:
media=(
medias.pop()
if len(medias := {t.media for t in track_infos}) == 1
else "Media"
)this make grug head hurt! walrus inside if condition, plus set comprehension, plus pop on set. grug have to read many time to understand.
maybe split into two line? first medias = {t.media for t in track_infos}, then check len(medias). easier for grug read and debug.
PR Summary: Typed MusicBrainz payloads + parsing refactor + factory-based tests
Why this change exists
The MusicBrainz integration was historically treated as loosely-typed JSON, which made it easy to:
'release-group','artist-credit') into downstream codeThis PR makes MusicBrainz payloads first-class typed models (via
TypedDict), normalizes API payload shape consistently, and refactors parsing into smaller, reusable units. Tests are updated to usefactory_boyfactories to generate payloads that match the typed contract.High-level architecture changes
1) Introduce a typed domain model for MusicBrainz payloads
beetsplug/_utils/musicbrainz.pynow defines a comprehensive set ofTypedDictmodels:Release,Recording,Medium,Track,ArtistCredit, etc.Impact
MusicBrainzAPI.get_release()/get_recording()return typedRelease/Recordinginstead of untypedJSONDict.beetsplug/musicbrainz.pyandbeetsplug/mbpseudo.py) are updated to accept and operate on these typed payloads.This turns the MB payload into a stable internal contract and lets mypy enforce correctness in parsing code.
2) Normalize MusicBrainz API responses at the boundary
The API wrapper normalizes two key aspects:
text-representation→text_representation)'relations'into typed buckets likeartist_relations,url_relations,work_relationsConceptually:
Impact
mbpseudointerception logic is simplified because it no longer needs to handle dashed keys.3) Parsing refactor: smaller helpers, clearer responsibilities
beetsplug/musicbrainz.pymoves from one largealbum_info()routine with repeated inline logic to a more structured approach:MusicBrainzPlugin._parse_artist_credits(...)centralizes "artist-credit flattening" into a single helper that outputs both:artist,artist_credit,artist_sort)artists,artists_credit,artists_sort, plusartist_id/artists_ids)_parse_release_group(...),_parse_label_infos(...),_parse_external_ids(...),_parse_genre(...)each encapsulate a single extraction responsibility and return dict fragments that are merged intoAlbumInfo.Medium/track parsing is split out into
get_tracks_from_medium(...), and config access is cached (ignored_media,ignore_data_tracks,ignore_video_tracks).Before (simplified)
album_info()handled:info.mediaAfter
album_info()orchestrates:track_infos.extend(get_tracks_from_medium(medium))TrackInfo.indexin a single passmediafrom track infos (single value vs'Media')This is effectively a "pipeline":
Release→Medium→TrackInfo+ album metadata.4) Tests re-architected around factories (stable data-shape contract)
A new test factory module is introduced:
test/plugins/factories/musicbrainz.pyusingfactory_boy.Key changes:
RecordingFactory,TrackFactory,MediumFactory,ReleaseFactory, etc.ReleaseFactoryis introduced and then made the default basis for test releases; it provides realistic defaults (release events, label info, text representation, ids, etc.).@factory.post_generationhook that setstrack['position']sequentially.Impact
Release/Mediumfields, factories provide them.Reviewer guide: what to focus on
Public/semantic behavior
AlbumInfo.mediais now computed from track infos, not directly fromrelease['media'](still yields the same'Media'vs single-format behavior, but viaTrackInfo.media).Boundary normalization correctness
MusicBrainzAPI._normalize_data()is the foundation: if it regresses, everything downstream breaks in subtle ways.Medium/track parsing extraction
get_tracks_from_medium()now owns pregap/data track inclusion and filtering. Ensure the config semantics match previous behavior.Test changes are mostly structural
'Album'vs'ALBUM TITLE', deterministic UUID-like ids).assert albumadditions fix typing concerns whenalbum_for_id()could returnNone.High-level impact
beetsplug.musicbrainz,beetsplug.mbpseudo, andbeetsplug._utils.Medium→TrackInfopipeline.pytest-factoryboy(and transitivefactory_boy,faker,inflection) added for test data generation.