Skip to content

Filter placeholder records from RescueGroups API#95

Open
MilesMorel wants to merge 1 commit intomasterfrom
filter-placeholder-names
Open

Filter placeholder records from RescueGroups API#95
MilesMorel wants to merge 1 commit intomasterfrom
filter-placeholder-names

Conversation

@MilesMorel
Copy link
Copy Markdown
Collaborator

Summary

  • Skips RescueGroups records whose name matches an entry in adoption_sources/rescue_groups_blocklist.json (case-insensitive on both sides)
  • Initial blocklist contains the one phrase we have evidence for: more dogs soon (the example from the linked Instagram post)
  • Adding new placeholder phrases later is a one-line JSON edit, no code change required

Background

Reported in #94. A rescue published a placeholder entry (Name: More Dogs Soon!, Breed: Unknown) pointing users at their website rather than a real adoptable pet. The bot picked it up and posted it.

Implementation notes

  • Blocklist file follows the existing adoption_sources/manual.json sibling-file convention (loaded via __file__.replace(".py", "_blocklist.json")).
  • Filter is applied in SourceRescueGroups.fetch_pets after parsing; an INFO log fires when a record is skipped.
  • I deliberately seeded the blocklist conservatively (one entry) rather than guessing other patterns. Future bad posts can be added in one line.

What this does NOT cover

Test plan

  • pytest tests/ — 56/56 pass (2 new tests in PlaceholderNameTests)
  • Local repro of the exact bad record (name="More Dogs Soon!", breed="Unknown") confirmed parser previously yielded it; with this change it's filtered.

🤖 Generated with Claude Code

Some rescues publish placeholder entries pointing users at their
website rather than a real adoptable animal. These were getting picked
up and posted to social. Skip records whose name matches an entry in
adoption_sources/rescue_groups_blocklist.json (case-insensitive).

Closes #94

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

_blocklist_path = __file__.replace(".py", "_blocklist.json")
with open(_blocklist_path) as _f:
_blocklist = json.loads(_f.read())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the blocklist json file in the PR. Also I'm slightly opposed to loading the blocklist like this. I think we should just have a hardcoded list of strings in the code inside _is_placeholder_name. What do you think?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see now it's taking the current file and replacing .py with _blocklist.json, that's a bit weird and fails the grep test principle (clean code concept: https://jamie-wong.com/2013/07/12/grep-test/)
if we do decide to keep the json, the full file name should be used instead of using file and a substitution

return None

def _is_placeholder_name(self, name: str) -> bool:
lowered = (name or "").lower()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lowered = (name or "").lower()
lowered = name.lower()

name is already typed as str, the or "" should be moved to the caller of the method if needed to guarantee a string type


def _is_placeholder_name(self, name: str) -> bool:
lowered = (name or "").lower()
return any(needle in lowered for needle in PLACEHOLDER_NAME_SUBSTRINGS)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return any(needle in lowered for needle in PLACEHOLDER_NAME_SUBSTRINGS)
return lowered in PLACEHOLDER_NAME_SUBSTRINGS

I think this can be simplified to this

Copy link
Copy Markdown
Collaborator

@binamkayastha binamkayastha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the concept of the blocklist.json file but I think this is way too over-engineered for this project haha. I think for now just hardcode the string and we can bring back this concept if we find ourselves with a gnarly hardcoded list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants