Skip to content

Conversation

@denilsonsa
Copy link
Contributor

Closes: https://trello.com/c/Jaa1vC24

This is my second scraper. It's almost in a good shape, but I left a TODO comment because I couldn't figure out how to make UrlScraper reliably scrape the website. I keep getting random errors.

Otherwise, it's ready for review.


If you’re adding a new scraper, please ensure that you have:

  • Tested the scraper on a local copy of DevDocs
  • Ensured that the docs are styled similarly to other docs on DevDocs
  • Added these files to the public/icons/your_scraper_name/ directory:
    • 16.png: a 16×16 pixel icon for the doc
    • 16@2x.png: a 32×32 pixel icon for the doc
    • SOURCE: A text file containing the URL to the page the image can be found on or the URL of the original image itself

Closes: https://trello.com/c/Jaa1vC24

This is my second scraper. It's almost in a good shape, but I left a
TODO comment because I couldn't figure out how to make UrlScraper
reliably scrape the website. I keep getting random errors.

Otherwise, it's ready for review.
@denilsonsa denilsonsa requested a review from a team as a code owner October 17, 2025 23:52
@denilsonsa
Copy link
Contributor Author

If anyone can figure out a solution for the web scraper, feel free to modify this PR. It feels like 95% of the work is already done, and I appreciate if anyone else manages to contribute the remaining work.

Copy link
Contributor

@simon04 simon04 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you!

@simon04 simon04 merged commit 96a1059 into freeCodeCamp:main Oct 19, 2025
2 checks passed
@simon04
Copy link
Contributor

simon04 commented Oct 19, 2025

I've implemented a very basic retry logic in 8e07071 and cbaedce when the error code is 0/500/501/502/503/504 to fix the sporadically occurring errors as the following:

ERROR:                                                                                                                                                                                                                                                                            
  https://www.graphviz.org/docs/attr-types/pointList/
  RuntimeError: Error status code (0): Error in the HTTP2 framing layer
    https://www.graphviz.org/docs/attr-types/pointList/
    
    

  /Users/simon/src/devdocs/lib/docs/core/scrapers/url_scraper.rb:49:in 'Docs::UrlScraper#process_response?'
  /Users/simon/src/devdocs/lib/docs/core/scraper.rb:158:in 'Docs::Scraper#handle_response'
  /Users/simon/src/devdocs/lib/docs/core/scraper.rb:77:in 'block in Docs::Scraper#build_pages'
  /Users/simon/src/devdocs/lib/docs/core/requester.rb:59:in 'block (2 levels) in Docs::Requester#handle_response'
  /Users/simon/src/devdocs/lib/docs/core/requester.rb:58:in 'Array#each'
  /Users/simon/src/devdocs/lib/docs/core/requester.rb:58:in 'block in Docs::Requester#handle_response'
  /Users/simon/src/devdocs/lib/docs/core/instrumentable.rb:15:in 'Docs::Instrumentable::Methods#instrument'
  /Users/simon/src/devdocs/lib/docs/core/requester.rb:57:in 'Docs::Requester#handle_response'
  /Users/simon/src/devdocs/lib/docs/core/requester.rb:18:in 'Docs::Requester.run'
  /Users/simon/src/devdocs/lib/docs/core/scrapers/url_scraper.rb:38:in 'Docs::UrlScraper#request_all'
  /Users/simon/src/devdocs/lib/docs/core/scraper.rb:76:in 'Docs::Scraper#build_pages'
  /Users/simon/src/devdocs/lib/docs/core/doc.rb:115:in 'block in Docs::Doc.store_pages'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants