Conversation
bodinsamuel
left a comment
There was a problem hiding this comment.
Thanks for your PR 🚀
Can you provide a test and way to enable/disable the behavior via the API?
We probably won't use this feature on our side so we most likely want it to be disabled by default.
If you prefer I can take over this PR but it will probably take longer
src/lib/browser/Page.ts
Outdated
| const rootAttr = [...document.documentElement.attributes] | ||
| .map(({ name, value }) => `${name}="${value}"`) | ||
| .join(' '); | ||
| const innerContent = (document.documentElement as any).getInnerHTML(); |
There was a problem hiding this comment.
Shouldn't it needs to {includeShadowRoots: true}?
There was a problem hiding this comment.
The open mode ShadowDOM can be obtained without passing this parameter.
In order to preserve encapsulation semantics, any closed shadow roots within an element will not be serialized by default.
The default behavior seems to be what we want
|
@bodinsamuel Would love to have ShadowDOM support for Docsearch since my docs site uses WebComponents. If ShadowDOM is not supported by default, is it possible to enable this feature in Crawler config? Glad you can take over this PR, i don't have much time to perfect it. Thank you for your team's work |
|
ah it's for DocSearch, I wasn't aware. In that case we might want to use it indeed ahah |
|
I have several websites using WebComponents:
I used to use fork docsearch-scraper. |
baacb1e to
b724543
Compare
|
I'm not sure how docsearch retrieves information from the HTML string. If use a parser to analyze the HTML and then query through the DOM API, we need to remove the Update: It seems that the Crawler is using Cheerio, so there is no need to remove the |
Use `getInnerHTML`
| return await promiseWithTimeout( | ||
| (async (): Promise<string | null> => { | ||
| const start = Date.now(); | ||
| const content = await this.#ref?.content(); |
There was a problem hiding this comment.
|
@bodinsamuel now use standard method |
Use
getInnerHTMLhttps://web.dev/declarative-shadow-dom/