Filter placeholder records from RescueGroups API#95
Filter placeholder records from RescueGroups API#95MilesMorel wants to merge 1 commit intomasterfrom
Conversation
Some rescues publish placeholder entries pointing users at their website rather than a real adoptable animal. These were getting picked up and posted to social. Skip records whose name matches an entry in adoption_sources/rescue_groups_blocklist.json (case-insensitive). Closes #94 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
|
||
| _blocklist_path = __file__.replace(".py", "_blocklist.json") | ||
| with open(_blocklist_path) as _f: | ||
| _blocklist = json.loads(_f.read()) |
There was a problem hiding this comment.
I don't see the blocklist json file in the PR. Also I'm slightly opposed to loading the blocklist like this. I think we should just have a hardcoded list of strings in the code inside _is_placeholder_name. What do you think?
There was a problem hiding this comment.
Oh I see now it's taking the current file and replacing .py with _blocklist.json, that's a bit weird and fails the grep test principle (clean code concept: https://jamie-wong.com/2013/07/12/grep-test/)
if we do decide to keep the json, the full file name should be used instead of using file and a substitution
| return None | ||
|
|
||
| def _is_placeholder_name(self, name: str) -> bool: | ||
| lowered = (name or "").lower() |
There was a problem hiding this comment.
| lowered = (name or "").lower() | |
| lowered = name.lower() |
name is already typed as str, the or "" should be moved to the caller of the method if needed to guarantee a string type
|
|
||
| def _is_placeholder_name(self, name: str) -> bool: | ||
| lowered = (name or "").lower() | ||
| return any(needle in lowered for needle in PLACEHOLDER_NAME_SUBSTRINGS) |
There was a problem hiding this comment.
| return any(needle in lowered for needle in PLACEHOLDER_NAME_SUBSTRINGS) | |
| return lowered in PLACEHOLDER_NAME_SUBSTRINGS |
I think this can be simplified to this
binamkayastha
left a comment
There was a problem hiding this comment.
I get the concept of the blocklist.json file but I think this is way too over-engineered for this project haha. I think for now just hardcode the string and we can bring back this concept if we find ourselves with a gnarly hardcoded list.
Summary
adoption_sources/rescue_groups_blocklist.json(case-insensitive on both sides)more dogs soon(the example from the linked Instagram post)Background
Reported in #94. A rescue published a placeholder entry (
Name: More Dogs Soon!,Breed: Unknown) pointing users at their website rather than a real adoptable pet. The bot picked it up and posted it.Implementation notes
adoption_sources/manual.jsonsibling-file convention (loaded via__file__.replace(".py", "_blocklist.json")).SourceRescueGroups.fetch_petsafter parsing; an INFO log fires when a record is skipped.What this does NOT cover
Fetched N records+ per-platform success messages — no pet name. I could file a follow-up to log pet names so future incidents are traceable, but that's separable from this fix.Test plan
pytest tests/— 56/56 pass (2 new tests inPlaceholderNameTests)name="More Dogs Soon!",breed="Unknown") confirmed parser previously yielded it; with this change it's filtered.🤖 Generated with Claude Code