Skip to content

Update crawldata.py#1

Open
saraskardelly wants to merge 1 commit intosaraskardelly-patch-1from
saraskardelly-patch-2
Open

Update crawldata.py#1
saraskardelly wants to merge 1 commit intosaraskardelly-patch-1from
saraskardelly-patch-2

Conversation

@saraskardelly
Copy link
Copy Markdown
Contributor

Hi, I have now added Bloomberg and Washington Post. I will send you a detailed message on Discord. Thank you very much.

Hi, I have now added Bloomberg and Washington Post. I will send you a detailed message on Discord. Thank you very much.
'article-links': {
'overview-urls': ['/markets', '/technology', '/politics', '/world'],
'find-tags': [ # Hierarchy
{'type': 'include', 'name': 'header', 'attrs': {'class': 'story-package-module_ _stories'}},
Copy link
Copy Markdown
Owner

@ywcb00 ywcb00 May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration seems not to find any article links since there is no container named 'header' in the html page of bloomberg.com. You might have forgotten to change the container name in this line (copy-paste mistake?).

'article-links': {
'overview-urls': ['/politics', '/world', '/business', '/technology', '/sports'],
'find-tags': [ # Hierarchy
{'type': 'include', 'name': 'header', 'attrs': {'class': 'headline'}},
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

'overview-urls': ['/markets', '/technology', '/politics', '/world'],
'find-tags': [ # Hierarchy
{'type': 'include', 'name': 'header', 'attrs': {'class': 'story-package-module_ _stories'}},
{'type': 'include', 'name': 'a', 'attrs': {'class': 'story-package-module_ _story_ _headline-link'}}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The crawler cannot handle multiple class definitions separated with whitespaces. Either try to pass an array (untested) or reduce these specifications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants