Skip to content

Conversation

@arichardson
Copy link
Contributor

@arichardson arichardson commented Nov 13, 2024

Instead of having a list of unsigned char targets for each OS, follow the logic Clang uses and instead set the value based on architecture with a special case for Darwin and Windows operating systems. This makes it easier to support new operating systems targeting Arm/AArch64 without having to modify this config statement for each new OS. The new list does not quite match Clang since I noticed a few bugs in the Clang implementation (llvm/llvm-project#115957).

Fixes #129945
Closes #131319

@rustbot
Copy link
Collaborator

rustbot commented Nov 13, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 13, 2024
@arichardson
Copy link
Contributor Author

r? @maurer @joshtriplett

@rustbot
Copy link
Collaborator

rustbot commented Nov 13, 2024

Failed to set assignee to maurer: invalid assignee

Note: Only org members with at least the repository "read" role, users with write permissions, or people who have commented on the PR may be assigned.

@beetrees
Copy link
Contributor

I think r? @joshtriplett should work.

@rustbot rustbot assigned joshtriplett and unassigned cuviper Nov 13, 2024
Copy link
Contributor

@maurer maurer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some OSes are no longer conditioned on, for example, os=nto arch=aarch64 or os=l4re, arch=x86_64 are missing.

I didn't manually verify everything, but at least l4re and x86_64 just got flipped from u8, which it appears to have been manually set to, to i8.

Can we get a summary of what changed in terms of supported target triples? I would expect disagreements with clang to be rare, but this patch looks like it accidentally changes obscure targets at a minimum. It might also be clearer what changed if we didn't have the 4-branch if statement for 2 results.

@maurer
Copy link
Contributor

maurer commented Nov 13, 2024

I'm pretty sure this still changes l4re-x86_64 from unsigned to signed.

@beetrees
Copy link
Contributor

As Apple and Windows targets always use signed char, you could simplify the cfg statement further to cfg(all(not(any(windows, target_vendor = "apple")), any(/* target_arch/target_os that use unsigned char */)))

@arichardson
Copy link
Contributor Author

arichardson commented Nov 13, 2024

I'm pretty sure this still changes l4re-x86_64 from unsigned to signed.

I'm trying to figure out if that value is intentional or not. It dates back to 2cf0a4a, but I can't see anything in L4RE that changes the default - maybe this was actually indented to set it for aarch64?

@arichardson
Copy link
Contributor Author

Looks like l4re should be setting this for all architectures, I spotted -funsigned-char in https://github.com/kernkonzept/mk/blob/926afa93e32e64dbdb33cf9ae724924ee1fb16e0/tool/kconfig/Makefile#L550, so that suggests it's the case for all targets. Will update to handle that.

@arichardson arichardson force-pushed the ffi-c-char branch 2 times, most recently from c19462d to 4d8abde Compare November 13, 2024 19:39
@joshtriplett
Copy link
Member

r? libs-api

@rustbot rustbot added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Nov 14, 2024
@rustbot rustbot assigned BurntSushi and unassigned joshtriplett Nov 14, 2024
@arichardson arichardson requested a review from maurer November 14, 2024 00:44
@tgross35
Copy link
Contributor

tgross35 commented Nov 15, 2024

See also an open issue about this #129945 and an open PR that this would supersede #131319 (cc @taiki-e)

I also wrote an issue requesting a test of our types against Clang at some point #133058.

@arichardson
Copy link
Contributor Author

I have validated the new c_char definitions against my build of clang that has fixes for MSP430, Xtensa (although printing the builtin defines fails with error: unknown target triple 'xtensa-unknown-none-elf'), and CSKY (which fails with version 'abiv2' in target triple 'csky-unknown-linux-gnuabiv2' is invalid).

Unlike #131319 this PR does not have links to all the ABI docs, but it does have a correct definition for l4re (which clang does not support and treats as unknown OS).

@taiki-e
Copy link
Member

taiki-e commented Nov 19, 2024

As for L4Re:
I think L4Re code you are linking is config when building kernel. IIUC, userland uses signed for x86_64 and unsigned for aarch64 by default (i.e., same as ELF ABI's default).

$ ./toolchain-l4re-x86_64-gcc-14/bin/x86_64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8

$ ./toolchain-l4re-arm64-gcc-14/bin/aarch64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8
#define __CHAR_UNSIGNED__ 1

({x86_64,aarch64}-l4re-gcc are from https://l4re.org/download/snapshots/toolchain/.)


As for Xtensa:
Section 2.17.1 "Data Types and Alignment" of Xtensa LX Microprocessor Overview handbook https://loboris.eu/ESP32/Xtensa_lx%20Overview%20handbook.pdf says "char type is unsigned by default".

However, it appears that Xtensa has multiple ABIs. (Section 1.4 "Calling convention" of Overview of Xtensa ISA https://dl.espressif.com/github_assets/espressif/xtensa-isa-doc/releases/download/latest/Xtensa.pdf)
(I have not checked whether the defaults are consistent among those ABIs or not -- this is why my PR has done nothing about Xtensa.)

@arichardson
Copy link
Contributor Author

Rebased and fixed the l4re issue.

As for L4Re: I think L4Re code you are linking is config when building kernel. IIUC, userland uses signed for x86_64 and unsigned for aarch64 by default (i.e., same as ELF ABI's default).

$ ./toolchain-l4re-x86_64-gcc-14/bin/x86_64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8

$ ./toolchain-l4re-arm64-gcc-14/bin/aarch64-l4re-gcc -E -dM -x c /dev/null | grep -F __CHAR_
#define __CHAR_BIT__ 8
#define __CHAR_UNSIGNED__ 1

({x86_64,aarch64}-l4re-gcc are from https://l4re.org/download/snapshots/toolchain/.)

Thanks for checking this, I've updated the change to remove l4re from the list. I've done this as a separate commit in case this needs to be reverted in the future.

As for Xtensa: Section 2.17.1 "Data Types and Alignment" of Xtensa LX Microprocessor Overview handbook https://loboris.eu/ESP32/Xtensa_lx%20Overview%20handbook.pdf says "char type is unsigned by default".

However, it appears that Xtensa has multiple ABIs. (Section 1.4 "Calling convention" of Overview of Xtensa ISA https://dl.espressif.com/github_assets/espressif/xtensa-isa-doc/releases/download/latest/Xtensa.pdf) (I have not checked whether the defaults are consistent among those ABIs or not -- this is why my PR has done nothing about Xtensa.)

Regarding Xtensa, my Clang change to make it unsigned by default was approved by Xtensa developers and has been merged now: llvm/llvm-project#115967.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

c_char signedness doesn't match with Clang's default on various no-std and tier 3 targets