Conversation
| // | ||
| //go:linkname CompressCtx ZSTD_compressCCtx | ||
| //go:noescape | ||
| func CompressCtx(ctx *C.ZSTD_CCtx, dst unsafe.Pointer, dc C.size_t, src unsafe.Pointer, sc C.size_t, level C.int) C.size_t |
There was a problem hiding this comment.
Interesting idea 😅 .
I haven't verified this, but if this completely bypasses the runtime's cgo machinery, I could imagine a number of bad things to happen:
- Long GC STW pauses because the runtime is unable to preempt goroutines that are currently in a zstd call.
- The runtime incorrectly trying to do async preemption on a goroutine running C code, which might cause data corruption, crashes, etc.
- All Ps (GOMAXPROCs) calling zstd, causing significant latency on other goroutines.
Generally speaking, I'd says there is no presumption of innocence here. This is code likely to by guilty until we can proof otherwise 🙈.
There was a problem hiding this comment.
Thanks for these specific concerns! I'd definitely like to understand this more - obviously CGO exists for a reason 😄
There was a problem hiding this comment.
It's not specific to cgo, but this talk explains the P-M-G abstraction in the scheduler and why/how non-go function calls (syscalls/cgo - they are treated mostly the same) require special treatment: https://youtu.be/-K11rY57K7k?t=980&si=wFEQWUPe3NayCjBh
tl;dr: a syscall (or cgo call) might block off-cpu (network, sleep, etc.), so if a lot of goroutine would make such calls, it would cause a lot of wasted CPU. Therefor cgo/syscalls get special treatement by the runtime so they don't end up blocking other goroutines from executing.
I can try to explain his a bit more when I find time, but this might be a good starting point for finding articles on the subject.
There was a problem hiding this comment.
Just remembered that I had this bookmarked which you'll probably find interesting: https://words.filippo.io/rustgo/
I also realized I forgot to mention one of the most important issues (that might explain the crashes you get in CI) 😅:
C requires large immovable stacks. When using cgo, the Go runtime takes care of switching from a goroutine stack (which is small and movable) to a system stack (which is large and immovable).
Without this switch, you're running C code on a goroutine stack, which will lead to stack overflows and other nasty problems.
So yeah, I think this is likely a dead-end 😞
There was a problem hiding this comment.
I haven't verified this, but if this completely bypasses the runtime's cgo machinery
In addition to the issues you already mentioned, this is also bypassing the ABI translation. Go has an internal calling convention which is different than the C calling convention, and is not guaranteed to be stable across releases.
I can cook up examples that break on my laptop, e.g. make a C function which takes 9 parameters and call it this way. On my arm64 laptop, the 9th argument is junk since the C and Go ABIs use different registers after the 8th argument.
| package zstd | ||
|
|
||
| /* | ||
| #cgo CFLAGS: -O3 |
There was a problem hiding this comment.
👍 If we aren't already compiling (our external?) zstd library with -O3, this is probably a good idea. But I don't know if that's the case.
Testing some ideas to reduce the overhead of the Go wrapper:
It looks like linkname may not be reliable (panics in some tests?), and -o 3 may not be relevant with external libzstd?
Benchmark results
M1 max (macbook pro)
Without -o 3
With -o 3