-
Notifications
You must be signed in to change notification settings - Fork 693
Support ${placeholder} syntax in tokenizer
#2239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1920,6 +1920,21 @@ impl<'a> Tokenizer<'a> { | |
|
|
||
| chars.next(); | ||
|
|
||
| // Handle ${placeholder} syntax | ||
| if matches!(chars.peek(), Some('{')) { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wondering how this works with the
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I could possibly add this as another flag to dialect if that makes sense? I just thought it would be yet another flag, and if we could include it in this flag, but you're right it means that it will parse some dialects as valid even if it's not for that dialect
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if the syntax potentially conflicts we might need to ensure that it doesn't, because in this case its rather that
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So there is some existing behaviour, whereby The dollar-quoted string tags only allow alphanumeric characters and underscores (e.g., |
||
| chars.next(); // consume '{' | ||
| let placeholder = peeking_take_while(chars, |ch| ch != '}'); | ||
| if matches!(chars.peek(), Some('}')) { | ||
| chars.next(); // consume '}' | ||
| return Ok(Token::Placeholder(format!("${{{placeholder}}}"))); | ||
| } else { | ||
| return self.tokenizer_error( | ||
| chars.location(), | ||
| "Unterminated dollar-brace placeholder, expected '}'", | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| // If the dialect does not support dollar-quoted strings, then `$$` is rather a placeholder. | ||
| if matches!(chars.peek(), Some('$')) && !self.dialect.supports_dollar_placeholder() { | ||
| chars.next(); | ||
|
|
@@ -3218,6 +3233,36 @@ mod tests { | |
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn tokenize_dollar_brace_placeholder() { | ||
| let sql = String::from("SELECT ${name}, ${1}"); | ||
| let dialect = GenericDialect {}; | ||
| let tokens = Tokenizer::new(&dialect, &sql).tokenize().unwrap(); | ||
| assert_eq!( | ||
| tokens, | ||
| vec![ | ||
| Token::make_keyword("SELECT"), | ||
| Token::Whitespace(Whitespace::Space), | ||
| Token::Placeholder("${name}".into()), | ||
| Token::Comma, | ||
| Token::Whitespace(Whitespace::Space), | ||
| Token::Placeholder("${1}".into()), | ||
| ] | ||
| ); | ||
| } | ||
|
|
||
| #[test] | ||
| fn tokenize_dollar_brace_placeholder_unterminated() { | ||
| let sql = String::from("SELECT ${name"); | ||
| let dialect = GenericDialect {}; | ||
| let result = Tokenizer::new(&dialect, &sql).tokenize(); | ||
| assert!(result.is_err()); | ||
| let err = result.unwrap_err(); | ||
| assert!(err | ||
| .to_string() | ||
| .contains("Unterminated dollar-brace placeholder")); | ||
| } | ||
|
|
||
| #[test] | ||
| fn tokenize_nested_dollar_quoted_strings() { | ||
| let sql = String::from("SELECT $tag$dollar $nested$ string$tag$"); | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we include a reference to the dialect that defines this syntax maybe here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tough one, as it's not defined by a specific dialect but used in some of our other integrations around parametrised queries. Namely we're using this with perses