Commit 9d72e92
Conform tokenizer-only U+0000 NUL handling to spec
This change brings the tokenizer’s handling of U+0000 NUL characters in
the DATA state and the CDATA section state into conformance with the
requirements in the HTML spec — for the case where only tokenization is
being performed, without tree construction; that is, the case where the
tokenizer() method is called, rather than parse() or parseFragment().
Specifically, the tokenization steps defined in the spec require that
when a U+0000 NUL is consumed in the DATA state or in the CDATA section
state, the parser must then emit a U+0000 NUL. But when performing tree
construction, the spec requires that when a U+0000 NUL is consumed, the
parser must instead emit a U+FFFD REPLACEMENT CHARACTER.
Without this change, the parser always emits a U+FFFD REPLACEMENT
CHARACTER — even when only tokenization is being performed. That causes
us to fail a number of tests in html5lib-tests suite.
For more background on the relevant behavior, see the following:
* https://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
* whatwg/html@d98f83e
* 9b9c263
Relates to #351 parent 8bcc5bc commit 9d72e92
File tree
5 files changed
+39
-2
lines changed- src/nu/validator/htmlparser
- common
- impl
- test-src/nu/validator/htmlparser/test
5 files changed
+39
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
147 | 158 | | |
148 | 159 | | |
149 | 160 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1578 | 1578 | | |
1579 | 1579 | | |
1580 | 1580 | | |
1581 | | - | |
| 1581 | + | |
1582 | 1582 | | |
1583 | 1583 | | |
1584 | 1584 | | |
| |||
3081 | 3081 | | |
3082 | 3082 | | |
3083 | 3083 | | |
3084 | | - | |
| 3084 | + | |
3085 | 3085 | | |
3086 | 3086 | | |
3087 | 3087 | | |
| |||
6177 | 6177 | | |
6178 | 6178 | | |
6179 | 6179 | | |
| 6180 | + | |
| 6181 | + | |
| 6182 | + | |
| 6183 | + | |
| 6184 | + | |
| 6185 | + | |
| 6186 | + | |
6180 | 6187 | | |
6181 | 6188 | | |
6182 | 6189 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1241 | 1241 | | |
1242 | 1242 | | |
1243 | 1243 | | |
| 1244 | + | |
| 1245 | + | |
| 1246 | + | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
1244 | 1251 | | |
1245 | 1252 | | |
1246 | 1253 | | |
| |||
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
57 | 59 | | |
58 | 60 | | |
59 | 61 | | |
| |||
174 | 176 | | |
175 | 177 | | |
176 | 178 | | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
177 | 184 | | |
178 | 185 | | |
179 | 186 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
203 | 208 | | |
204 | 209 | | |
205 | 210 | | |
| |||
0 commit comments