Fix: handle JSON-LD @type as array in _getJSONLD#1013
Open
JSap0914 wants to merge 1 commit into
Open
Conversation
The JSON-LD spec allows @type to be either a plain string ("NewsArticle") or an array of strings (["NewsArticle", "Article"]). Readability called .match() directly on the @type value, which throws a TypeError when that value is an Array (arrays have no .match()). The error was swallowed by the surrounding try/catch, so the entire JSON-LD block was silently ignored and all metadata (title, byline, excerpt, siteName, publishedTime) fell back to scraped HTML values. Fix: introduce a matchesArticleType() helper inside _getJSONLD that normalises @type to an array before testing each element against REGEXPS.jsonLdArticleTypes. Replace the three .match() call sites that operated directly on @type with this helper. Add a test page (jsonld-array-type) whose JSON-LD block carries @type: ["NewsArticle", "Article"] to prevent regressions.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The JSON-LD specification allows the
@typeproperty to be either a plain string ("NewsArticle") or an array of strings (["NewsArticle", "Article"]). The second form is common in the wild — for example when a page is both aBlogPostingand anArticle.Readability._getJSONLDcalls.match()directly on the@typevalue in three places.Array.prototype.matchdoes not exist, so when@typeis an array the call throws aTypeError. The surroundingtry/catchswallows the error silently, leaving the JSON-LD block completely ignored. Every metadata field — title, byline, excerpt, siteName, publishedTime — then falls back to (usually noisier) scraped HTML values.Fix
Add a small
matchesArticleType(type)helper inside_getJSONLDthat normalises@typeto an array before testing each element againstREGEXPS.jsonLdArticleTypes. Replace the three existing.match()call-sites that operated directly on@typewith this helper.Test
Add a new test page
test/test-pages/jsonld-array-type/whose JSON-LD block uses@type: ["NewsArticle", "Article"]. Without the fix, all five metadata fields fail; with the fix, all pass.