String#unescapeHTML() calls stripTags() first and then decodes entities. Because the decode runs after the strip, encoded markup that survives stripping (since it is not a real tag at that point) gets turned back into live markup. Any code that assumes the output is tag-free will be wrong.
Current implementation (around line 439 of src/prototype/lang/string.js):
function unescapeHTML() {
return this.stripTags().replace(/</g,'<').replace(/>/g,'>').replace(/&/g,'&');
}
Reproduction:
'<img src=x onerror=alert(1)>'.unescapeHTML();
// stripTags() leaves the entity text alone (there is no real tag yet),
// then the decode step produces a live tag:
// => '<img src=x onerror=alert(1)>'
If a developer relies on unescapeHTML() to produce safe, tag-free text before inserting it into the page, the decode step reintroduces executable markup, which is a path to XSS.
Suggested fix: decode entities first and then strip, or use a single normalization pass that does not leave decoded markup behind. It would also help to document that the result is not safe to insert into the DOM as HTML.
Refs: CWE-79, CWE-116, OWASP ASVS V5.3.3.
String#unescapeHTML()callsstripTags()first and then decodes entities. Because the decode runs after the strip, encoded markup that survives stripping (since it is not a real tag at that point) gets turned back into live markup. Any code that assumes the output is tag-free will be wrong.Current implementation (around line 439 of
src/prototype/lang/string.js):Reproduction:
If a developer relies on
unescapeHTML()to produce safe, tag-free text before inserting it into the page, the decode step reintroduces executable markup, which is a path to XSS.Suggested fix: decode entities first and then strip, or use a single normalization pass that does not leave decoded markup behind. It would also help to document that the result is not safe to insert into the DOM as HTML.
Refs: CWE-79, CWE-116, OWASP ASVS V5.3.3.