Ambiguity in `ref` grammar

### Summary of issue:


We currently use the `ref` token as an introducer for three language constructs with very different semantics:
1. `ref` binding patterns, as in `(ref x: i32, ref y: i32)`.
2. Declaring reference components of a function return form, such as `fn F() -> (ref X, ref Y);`
3. Marking arguments to `ref` parameters, such as `F(ref x, ref y)`.

These aren't ambiguous in an absolute sense, but disambiguating them seems to require 3 tokens of lookahead (vs. the current maximum of 2), and may require unbounded lookahead under arguably-plausible future evolution.

### Details:


We currently use the `ref` token as an introducer for three semantically different language constructs:

1. A `ref` binding pattern, e.g. `ref x: i32`. This can only occur in a pattern context.
2. A `ref` return form, e.g. `-> (i32, ref i32)`. Currently, this can only appear in the right-hand side of `->` in a function declaration, but #5389 proposes to also allow it in a form literal (e.g. `form(ref i32)`). As of this writing, the proposal doesn't allow it to appear below the top level of a form literal (e.g. `form((i32, ref i32))` isn't allowed), but I'm not sure how durable that restriction will be.
3. A `ref` argument tag, e.g. `F(x, ref y)`. At present, it can only appear as an element of a function call's explicit argument list (or an element of a tuple or struct literal in such a position, recursively, as with (2)), and for semantic reasons it currently can't appear in a valid return type expression, but at the [10-30 open discussion](https://docs.google.com/document/d/1Yt-i5AmF76LSvD4TrWRIAE_92kii6j5yFiW-S7ahzlg/edit?tab=t.0#heading=h.dnsogn9b2goi) @zygoloid argued that we shouldn't bake either of those assumptions into our implementation strategy, and should be prepared to support code like `fn I() -> (ref a, ref b) * 3;`, where both `ref`s are argument tags for the `*` operator.

(1) can be distinguished from (2) using syntactic context, because pattern contexts and return type expressions are always marked by an introducer (if that's not the case, we may have an even bigger problem with `var`). I believe (1) can be distinguished from (3) using at most 3 tokens of lookahead, where the worst case is distinguishing e.g. the binding pattern `ref each x: i32` from the expression pattern `ref each x`. I don't think that's a problem for the toolchain, but it could be a problem for tools implemented using a parser generator with one token of lookahead (such as yacc and its descendants).

As best I can tell, (2) and (3) are not formally ambiguous, but if we want code like `fn I() -> (ref a, ref b) * 3;` to parse, I don't think they can be distinguished with any fixed amount of lookahead. For example, disambiguating code like `fn F() -> (((ref x,),),) * 2` requires lookahead proportional to the number of parentheses.

I see a couple of options that only resolve the (1)/(3) ambiguity:
- Accept 3 tokens of lookahead in the toolchain, and hope that yacc-based tools can muddle through by tweaking the structure of the grammar.
- Lex an each-name "`each` _identifier_" as a single token, so that we only need 2 tokens of lookahead.
- Require expression patterns to have an introducer (perhaps `==` as a unary prefix?). This would probably have substantial knock-on benefits for the implementation of the toolchain (and probably other tools).

I also see a couple of options that only resolve the (2)/(3) ambiguity:
- Disallow (3) in return type expressions, and/or in the operands of an operator. These are both the status quo for the language design, so we would just be affirming that the toolchain implementation can proceed on the assumption that one or both of them will not change.
- Accept the ambiguity. For the toolchain, that would mean representing (2) and (3) with the same kind of parse node, and disambiguating during checking. This would probably require some kind of two-phase checking for function return type expressions (in effect, unbounded lookahead, but during checking rather than parsing). It's possible we could rationalize this as part of the two-pass checking that we already do for patterns, since the return type expression behaves like a parameter pattern in some respects. I'm far from confident in that, though; the division of labor between the two passes is very different in the two cases.

Finally, I see one option that resolves both ambiguities simultaneously: change (3) to use a new keyword. My tentative suggestion would be `out`, e.g. `F(x, out y)`. `mut` is more appealing in some ways, but it would be misleading in cases where the corresponding parameter has a const type, so I think it would only be viable if we don't require reference-to-const arguments to be tagged. `obj` or even `addr` might also be possibilities.

### Any other information that you want to share?


_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ambiguity in `ref` grammar #6342

Summary of issue:

Details:

Any other information that you want to share?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ambiguity in ref grammar #6342

Description

Summary of issue:

Details:

Any other information that you want to share?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Ambiguity in `ref` grammar #6342