diff --git a/src/functions-reference/void_functions.Rmd b/src/functions-reference/void_functions.Rmd index 78969b56e..b55d5e97f 100644 --- a/src/functions-reference/void_functions.Rmd +++ b/src/functions-reference/void_functions.Rmd @@ -39,8 +39,8 @@ Print the values denoted by the arguments x1 through xN on the output message stream. There are no spaces between items in the print, but a line feed (LF; Unicode U+000A; C++ literal `'\n'`) is inserted at the end of the printed line. The types `T1` through `TN` can be any of -Stan's built-in numerical types or double quoted strings of ASCII -characters. +Stan's built-in numerical types or double quoted strings of characters +(bytes). ## Reject statement @@ -60,5 +60,4 @@ arguments x1 through xN on the output message stream. There are no spaces between items in the print, but a line feed (LF; Unicode U+000A; C++ literal `'\n'`) is inserted at the end of the printed line. The types `T1` through `TN` can be any of Stan's built-in -numerical types or double quoted strings of ASCII characters. - +numerical types or double quoted strings of characters (bytes). diff --git a/src/reference-manual/encoding.Rmd b/src/reference-manual/encoding.Rmd index bcf2ed855..5ae2a6bb9 100644 --- a/src/reference-manual/encoding.Rmd +++ b/src/reference-manual/encoding.Rmd @@ -25,3 +25,19 @@ is convenient. Any content after a block comment open sequence in ASCII (`/*`) up to the closing block comment (`*/`) is ignored, and thus may also be written in whatever character set is convenient. + +## String literals + +The raw byte sequence within a string literal is escaped according +to the C++ standard. In particular, this means that UTF-8 encoded +strings are supported, however they are not tested for invalid byte +sequences. A `print` or `reject` statement should properly display +Unicode characters if your terminal supports the encoding used in the +input. In other words, Stan simply preserves any string of bytes between +two double quotes (`"`) when passing to C++. On compliant terminals, +this allows the use of glyphs and other characters from encodings such as +UTF-8 that fall outside the ASCII-compatible range. + +ASCII is the recommended encoding for maximum portability, because it encodes +the ASCII characters (Unicode code points 0--127) using the same sequence of +bytes as the UTF-8 encoding of Unicode and common ISO-8859 extensions of Latin. \ No newline at end of file diff --git a/src/reference-manual/statements.Rmd b/src/reference-manual/statements.Rmd index 56e965fa0..f31824a9e 100644 --- a/src/reference-manual/statements.Rmd +++ b/src/reference-manual/statements.Rmd @@ -211,14 +211,14 @@ statements of the forms listed in the table above. The compound form is legal whenever the corresponding long form would be legal and it has the same effect.* - operation | compound | unfolded -:-----------|:------------|:------------- -addition | `x += y` | `x = x + y` -subtraction | `x -= y` | `x = x - y` -multiplication | `x *= y` | `x = x * y` -division | `x /= y` | `x = x / y` -elementwise multiplication | `x .*= y` | `x = x .* y` -elementwise division | `x ./= y` | `x = x ./ y` + | operation | compound | unfolded | + | :------------------------- | :-------- | :----------- | + | addition | `x += y` | `x = x + y` | + | subtraction | `x -= y` | `x = x - y` | + | multiplication | `x *= y` | `x = x * y` | + | division | `x /= y` | `x = x / y` | + | elementwise multiplication | `x .*= y` | `x = x .* y` | + | elementwise division | `x ./= y` | `x = x ./ y` | ## Increment log density {#increment-log-prob.section} @@ -1321,16 +1321,14 @@ step, and the `generated quantities` block once per iteration. String literals begin and end with a double quote character (`"`). The characters between the double quote characters may be -the space character or any visible ASCII character, with the exception -of the backslash character (`\`) and double quote character -(`"`). The full list of visible ASCII characters is as follows, - -``` -a b c d e f g h i j k l m n o p q r s t u v w x y z -A B C D E F G H I J K L M N O P Q R S T U V W X Y Z -0 1 2 3 4 5 6 7 8 9 0 { } [ ] ( ) < > -~ @ # $ ` ^ & * _ ' - + = | / ! ? . , ; : -``` +any byte sequence, with the exception of the double quote character. + +The Stan interfaces preserve the byte sequences which they receive. +The encoding of these byte sequences as characters and their rendering +as glyphs will be handled by whatever display mechanism is being used to +monitor Stan's output (e.g., a terminal, a Jupyter notebook, RStudio, etc.). +Stan does not enforce a character encoding for strings, and no attempt is +made to validate the bytes as legal ASCII, UTF-8, etc. ### Debug by `print` {-}