-
Notifications
You must be signed in to change notification settings - Fork 147
Fix array compound literal parsing #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Implement proper array compound literal handling that emits element writes, counts initializers, and returns the temporary array pointer instead of collapsing to the first element. This restores correct pointer semantics and avoids discarding array literals during parsing. Struct and scalar compound literals are unchanged. The parser now tracks whether the closing brace was already consumed by the array helper to prevent double reads.
Introduce helpers to centralize array-literal scalar decay. Temporary array compound literals are detected and converted to a scalar only when a scalar is actually required; otherwise the expression yields the temporary array’s address, preserving pointer semantics. Update binary ops, direct/compound assignments, function-call arguments, and ternary results to use the unified helper instead of ad-hoc collapsing. This fixes cases where array literals were reduced to their first element in pointer contexts, while keeping struct and plain scalar behavior unchanged. Addresses sysprog21#299 (array compound literals).
| } | ||
| lex_expect(T_close_curly); | ||
| var->array_size = count; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new blank line.
DrXiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some test cases to the test suite for validation.
jserv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use static qualifier since shecc does not support. Check 'COMPLIANCE.md' carefully.
|
You MUST ensure bootstrapping is fully functional before submitting pull requests. |
|
IIUC, compound literals are a feature supported only since C99, and the shecc README mentions from the very beginning that this project aims to support ANSI C. Therefore, IMO, this at least does not "fix" anything. |
As far as I know, shecc is planned to fully support the C99 standard, so I think the term "fix" is acceptable. |
|
Fine, but since we haven't claimed to fully support C99 and, IIUC, array compound literals were never supported before, this seems more like supporting a new feature to me, rather than fixing an existing problem. |
|
In fact, shecc has ability to handle array compound literals, but it only captures the first element (#299). Therefore, this pull request specifically aims to fix it. |
Thanks, that resolves my doubt. |
jserv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't add tests/array_ptr.c. Instead, consolidate tests/driver.sh.
visitorckw
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please rebase your branch to keep the git history clean.
Commits that fix problems introduced within the same pull request should be avoided.
|
|
||
| add_insn(parent, *bb, OP_read, scalar, array_var, NULL, elem_size, NULL); | ||
| return scalar; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a line break.
bf5f3d0 to
b88294c
Compare
|
Looking at the changes, the core logic for handling array compound literals looks solid—good job centralizing the decay behavior to avoid ad-hoc fixes everywhere. But yeah, there are some style inconsistencies and comment opportunities that could make this even tighter. Here's what I'd suggest, focused on Style Fixes for
|
b88294c to
0de5d30
Compare
0de5d30 to
08f1e29
Compare
|
I defer to @ChAoSUnItY for confirmation. |
| fatal("Unsupported truncation operation with invalid target size"); | ||
| } | ||
| return; | ||
| case OP_sign_ext: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing the curly braces while this case branch has variable declaration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn’t really mean to drop those braces—they were just left behind while trying other tweaks. but it’s fine either way because C99 lets us declare source_size right after the case label without changing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, although C99 allows this. But since in our compiler's standard workflow, stage 0 will be compiled by GCC with -Wpedantic compilation option included, and this will cause building process to output warning even in this case it didn't do anything in a harmful way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted it to the original order in my latest commit
| var_t *expr_result = opstack_pop(); | ||
|
|
||
| /* Handle array compound literal to scalar assignment. | ||
| * When assigning array compound literals to scalar | ||
| * variables, use the first element value rather than array | ||
| * address. | ||
| */ | ||
| if (expr_result && expr_result->array_size > 0 && | ||
| !var->ptr_level && var->array_size == 0 && var->type && | ||
| (var->type->base_type == TYPE_int || | ||
| var->type->base_type == TYPE_short) && | ||
| expr_result->var_name[0] == '.') { | ||
| var_t *first_elem = require_var(parent); | ||
| first_elem->type = var->type; | ||
| gen_name_to(first_elem->var_name); | ||
|
|
||
| /* Extract first element from compound literal array */ | ||
| add_insn(parent, bb, OP_read, first_elem, expr_result, | ||
| NULL, var->type->size, NULL); | ||
| expr_result = first_elem; | ||
| } | ||
| var_t *rhs = expr_result; | ||
| if (!var->ptr_level && var->array_size == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
var_t *rhs = opstack_pop();There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
squashed together in the latest commit!
Fix IR test failure. Regenerate ARM IR snapshots to reflect the corrected lowering. Restore ARM/RISC-V codegen files to original structure after earlier brace experiment. Treat zero-length array compound literals as constant zero during lowering so scalar uses don't load garbage. Fix coding style for parser.c according to cubic's suggestion. Ignore comment modification in tests/driver.sh
08f1e29 to
b13f5b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 issue found across 7 files
Prompt for AI agents (all 1 issues)
Understand the root cause of the following 1 issues and fix them.
<file name="src/parser.c">
<violation number="1" location="src/parser.c:1660">
Variadic call arguments shouldn’t be scalarized—array compound literals must still decay to pointers when passed through `...`, otherwise the callee receives the first element value instead of the array address and pointer-based code crashes.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| param = scalarize_array_literal(parent, bb, param, | ||
| target->type); | ||
| } else if (func->va_args) { | ||
| param = scalarize_array_literal(parent, bb, param, TY_int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variadic call arguments shouldn’t be scalarized—array compound literals must still decay to pointers when passed through ..., otherwise the callee receives the first element value instead of the array address and pointer-based code crashes.
Prompt for AI agents
Address the following comment on src/parser.c at line 1660:
<comment>Variadic call arguments shouldn’t be scalarized—array compound literals must still decay to pointers when passed through `...`, otherwise the callee receives the first element value instead of the array address and pointer-based code crashes.</comment>
<file context>
@@ -1575,7 +1650,16 @@ void read_func_parameters(func_t *func, block_t *parent, basic_block_t **bb)
+ param = scalarize_array_literal(parent, bb, param,
+ target->type);
+ } else if (func->va_args) {
+ param = scalarize_array_literal(parent, bb, param, TY_int);
+ }
+ }
</file context>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 7 files
Prompt for AI agents (all 2 issues)
Understand the root cause of the following 2 issues and fix them.
<file name="src/parser.c">
<violation number="1" location="src/parser.c:1344">
`scalarize_array_literal` reads using the destination type’s width, so assigning a `(short[])` literal to an `int` pulls 4 bytes, mixing multiple elements and producing incorrect values. The read width must match the literal’s element size.</violation>
<violation number="2" location="src/parser.c:2914">
Array literals are scalarized whenever the other operand isn’t pointer-like, so `(int[]){…} + 1` yields a scalar sum instead of pointer arithmetic. Pointer operators need the literal to remain pointer-like; don’t collapse it solely because the opposing operand is an integer.</violation>
</file>
Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR
| bool rs1_is_ptr_like = rs1 && (rs1->ptr_level || rs1->array_size); | ||
| bool rs2_is_ptr_like = rs2 && (rs2->ptr_level || rs2->array_size); | ||
|
|
||
| if (is_array_literal_placeholder(rs1) && !rs2_is_ptr_like) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Array literals are scalarized whenever the other operand isn’t pointer-like, so (int[]){…} + 1 yields a scalar sum instead of pointer arithmetic. Pointer operators need the literal to remain pointer-like; don’t collapse it solely because the opposing operand is an integer.
Prompt for AI agents
Address the following comment on src/parser.c at line 2914:
<comment>Array literals are scalarized whenever the other operand isn’t pointer-like, so `(int[]){…} + 1` yields a scalar sum instead of pointer arithmetic. Pointer operators need the literal to remain pointer-like; don’t collapse it solely because the opposing operand is an integer.</comment>
<file context>
@@ -2810,6 +2908,16 @@ void read_expr(block_t *parent, basic_block_t **bb)
+ bool rs1_is_ptr_like = rs1 && (rs1->ptr_level || rs1->array_size);
+ bool rs2_is_ptr_like = rs2 && (rs2->ptr_level || rs2->array_size);
+
+ if (is_array_literal_placeholder(rs1) && !rs2_is_ptr_like)
+ rs1 = scalarize_array_literal(parent, bb, rs1,
+ rs2 && rs2->type ? rs2->type : NULL);
</file context>
| if (!is_array_literal_placeholder(array_var)) | ||
| return array_var; | ||
|
|
||
| type_t *elem_type = hint_type ? hint_type : array_var->type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scalarize_array_literal reads using the destination type’s width, so assigning a (short[]) literal to an int pulls 4 bytes, mixing multiple elements and producing incorrect values. The read width must match the literal’s element size.
Prompt for AI agents
Address the following comment on src/parser.c at line 1344:
<comment>`scalarize_array_literal` reads using the destination type’s width, so assigning a `(short[])` literal to an `int` pulls 4 bytes, mixing multiple elements and producing incorrect values. The read width must match the literal’s element size.</comment>
<file context>
@@ -1283,6 +1286,78 @@ void parse_array_init(var_t *var,
+ if (!is_array_literal_placeholder(array_var))
+ return array_var;
+
+ type_t *elem_type = hint_type ? hint_type : array_var->type;
+ if (!elem_type)
+ elem_type = TY_int;
</file context>
Summary
Fix a segfault when evaluating array compound literals like
(int[]){1,2,3,4,5}in expression context. The literal now yields the temporary array’s address (via decay) instead of collapsing to a single element, so indexing and reads work correctly.Motivation
Previously, code like the snippet below crashed because the array literal was reduced to a scalar and later treated as a pointer.
Reproduction (manual)
Before: segfault
After: prints
a[0..4]and Sum = 6 as expectedApproach (high level)
(type[]){…}produces a real temporary array object and the expression value decays to its address.Scope
Tests
Compatibility
Issue
Summary by cubic
Fix parsing of array compound literals so they allocate a temporary array and decay to its address, preserving pointer semantics and preventing segfaults.
Bug Fixes
Refactors
Written for commit b13f5b6. Summary will update automatically on new commits.