yes: use as_encoded_bytes#12306
Conversation
Merging this PR will degrade performance by 12.08%
Warning Please fix the performance issues or acknowledge them on CodSpeed. Performance Changes
Tip Investigate this regression by commenting Comparing Footnotes
|
|
please add a test to make sure we don't regress |
|
I don't know what letter corresponds with invalid utf8. |
|
GNU testsuite comparison: |
|
|
||
| #[test] | ||
| #[cfg(any(unix, target_os = "wasi"))] | ||
| fn test_non_utf8() { |
There was a problem hiding this comment.
@ChrisDenton I cannot write test working on Windows
There was a problem hiding this comment.
Windows OsStr does not support non UTF-8. Well, it does support something called "WTF-8" but it's an extension of UTF-8 (so \xFF isn't valid). Also it's an internal encoding and not guaranteed.
There was a problem hiding this comment.
@sylvestre Would you merge this PR as mere refactoring without adding tests?
There was a problem hiding this comment.
Does it cause crush at production? Or is invalid args rejected by OS?
There was a problem hiding this comment.
Does it cause crush at production? Or is invalid args rejected by OS?
It is UB, so it's undefined what will happen. The standard library needs to be able to convert an OsStr (which is [u8]) to a platform string (which for Windows is [u16]). So the OsStr needs to have a special encoding to make that possible.
There was a problem hiding this comment.
Does as_encoded_bytes have defined behavior at production at least for?
There was a problem hiding this comment.
It has defined cross-platform behaviour only for the UTF-8 subset. You can split on UTF-8. You can join UTF-8. Anything else is platform specific. On Unix it's fine to use bytes. On Windows the encoding isn't guaranteed except that it's a superset of UTF-8.
Production is where most issues will arise since it's optimized for performance.
There was a problem hiding this comment.
We only need yes > binary. But it is impossible to test since yes does not use file as input.
I think the byte 0xff is never used in the UTF-8 encoding. https://manpages.ubuntu.com/manpages/stonking/man7/utf-8.7.html |
|
But I can't put it as args. |
Closes #12236