Skip to content

strip ascii tab/newline from urls before protocol check#21433

Open
rootvector2 wants to merge 1 commit into
emberjs:mainfrom
rootvector2:strip-url-control-chars
Open

strip ascii tab/newline from urls before protocol check#21433
rootvector2 wants to merge 1 commit into
emberjs:mainfrom
rootvector2:strip-url-control-chars

Conversation

@rootvector2
Copy link
Copy Markdown

On the fastboot path findProtocolForURL uses node's url.parse, which keeps embedded ASCII tab/newline/CR and reports a null protocol for something like java\nscript:alert(1), so it slips past the badProtocols check and gets serialized into the href untouched. Browsers strip those characters before navigating, so the value runs as javascript:. The WHATWG URL parser used on the non-fastboot path already drops them, so strip them here too before parsing.


if (typeof url === 'string') {
protocol = nodeURL.parse(url).protocol;
// browsers strip ASCII tab/newline/CR from urls before navigating, so
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't know this! do you have a link to a spec or something here? or... why don't we use URL's parse?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the WHATWG URL parser itself: step 2 of the basic URL parser says "Remove all ASCII tab or newline from input," where ASCII tab or newline is U+0009, U+000A, U+000D (https://url.spec.whatwg.org/#url-parsing). So new URL('java\nscript:alert(1)').protocol already comes back as javascript:. Node's legacy url.parse predates that and keeps the characters, so it reports a null protocol instead.

On using URL's parse here: we can't on this branch. This is the fastboot path, and fastboot sets the URL global to node's require('url') module, so URL isn't the WHATWG constructor anymore (that's what the comment above and the weirdURL.parse object check are detecting). The non-fastboot else branch does use new URL() and already gets this for free. Stripping the chars here just mirrors that behavior until fastboot stops shadowing the global.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so URL isn't the WHATWG constructor anym

would that make node's URL constructor wrong? I already have an RFC around some phrasing around "SSR environments need to properly implement a rendering env"

emberjs/rfcs#1178

would love your feedback.

the main goal I have is to remove all non-standard codepaths in the codebase, so that we can:

  • be leaner
  • force rendering environments to not push their quirks back on to us

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node's WHATWG URL constructor isn't wrong, it strips those per spec. The non-standard bits are the legacy url.parse and the fact that fastboot swaps the URL global for the old require('url') module, so this branch can't reach the spec-compliant parser. If the RFC gets SSR envs to stop shadowing URL, this whole legacy branch and this patch go away with it, so the direction lines up with what you want. I'll leave some notes on the RFC. Happy to keep this as a stopgap until then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants