Skip to content

Fix OpenGraph extractor to find meta tags placed outside <head>#245

Open
danishashko wants to merge 1 commit intoscrapinghub:masterfrom
danishashko:fix/opengraph-search-body
Open

Fix OpenGraph extractor to find meta tags placed outside <head>#245
danishashko wants to merge 1 commit intoscrapinghub:masterfrom
danishashko:fix/opengraph-search-body

Conversation

@danishashko
Copy link
Copy Markdown

The extractor only searches head.xpath("meta[@property and @content]"), which misses OG meta tags placed outside <head>. Some CMSs and page builders put OG tags in <body>, and those tags are currently silently ignored.

Changing to document.xpath(".//meta[@property and @content]") searches the whole document for OG meta tags with no change to the namespace detection logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant