Skip to content

Commit 36bd06b

Browse files
committed
gh-150875: Speed up JSON string encoding for long ASCII strings
ascii_escape_size() scans each string one character at a time to size the escaped output, and write_escaped_ascii() writes it verbatim when nothing needs escaping. For the one-byte representation, detect that no-escape case eight bytes at a time and return the verbatim size directly; a length guard keeps short strings on the original per-character loop. Strings that need escaping and non-Latin-1 strings keep the current path. Output is byte-identical, verified against test_json and a 199-case dumps differential in both ensure_ascii modes. dumps of long ASCII strings runs up to 5.3x faster; short keys, escaped strings, and non-ASCII are unaffected.
1 parent 7a468a1 commit 36bd06b

2 files changed

Lines changed: 41 additions & 0 deletions

File tree

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Speed up :func:`json.dumps` encoding of strings made up of long runs of
2+
characters that need no escaping, by scanning eight bytes at a time. Short
3+
strings, strings that need escaping, and strings containing non-Latin-1
4+
characters are unaffected. Patch by Bernát Gábor.

Modules/_json.c

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,43 @@ ascii_escape_size(const void *input, int kind, Py_ssize_t input_chars)
164164
Py_ssize_t i;
165165
Py_ssize_t output_size;
166166

167+
/* SWAR no-escape fast path (1-byte): when no byte needs escaping the
168+
output is just the input plus the two surrounding quotes. needs-escape
169+
is c < 0x20 || c > 0x7e || c == '"' || c == '\\'; skip 8 bytes at a time
170+
and fall through to the per-character loop at the first such byte. The
171+
length guard keeps short strings (the common dict key) on the original
172+
loop, where the fast path's setup would not pay off. */
173+
if (kind == PyUnicode_1BYTE_KIND && input_chars >= 16
174+
&& input_chars < PY_SSIZE_T_MAX - 2) {
175+
const Py_UCS1 *p = (const Py_UCS1 *)input;
176+
const uint64_t ones = 0x0101010101010101ULL;
177+
const uint64_t high = 0x8080808080808080ULL;
178+
const uint64_t bq = 0x22ULL * ones, bs = 0x5cULL * ones;
179+
const uint64_t b7f = 0x7fULL * ones, bc = 0xE0ULL * ones;
180+
Py_ssize_t j = 0;
181+
int needs_escape = 0;
182+
for (; j + 8 <= input_chars; j += 8) {
183+
uint64_t w;
184+
memcpy(&w, p + j, 8);
185+
uint64_t mq = w ^ bq; mq = (mq - ones) & ~mq & high; /* == '"' */
186+
uint64_t ms = w ^ bs; ms = (ms - ones) & ~ms & high; /* == '\\' */
187+
uint64_t vc = w & bc; uint64_t mlo = (vc - ones) & ~vc & high;/* < 0x20 */
188+
uint64_t m7 = w ^ b7f; m7 = (m7 - ones) & ~m7 & high; /* == 0x7f */
189+
if (mq | ms | mlo | (w & high) | m7) { /* (w & high): >= 0x80 */
190+
needs_escape = 1;
191+
break;
192+
}
193+
}
194+
if (!needs_escape) {
195+
for (; j < input_chars; j++) {
196+
if (!S_CHAR(p[j])) { needs_escape = 1; break; }
197+
}
198+
}
199+
if (!needs_escape) {
200+
return input_chars + 2;
201+
}
202+
}
203+
167204
/* Compute the output size */
168205
for (i = 0, output_size = 2; i < input_chars; i++) {
169206
Py_UCS4 c = PyUnicode_READ(kind, input, i);

0 commit comments

Comments
 (0)