From c123f4dee14312061b65cde7cf71133c50820137 Mon Sep 17 00:00:00 2001
From: Brian Ward <bward@flatironinstitute.org>
Date: Fri, 27 Aug 2021 12:27:14 -0400
Subject: [PATCH 1/3] Update encoding notices for string literals

---
 src/functions-reference/void_functions.Rmd |  5 ++---
 src/reference-manual/encoding.Rmd          | 12 ++++++++++++
 src/reference-manual/statements.Rmd        | 14 ++++----------
 3 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/functions-reference/void_functions.Rmd b/src/functions-reference/void_functions.Rmd
index 78969b56e..f2a7eadb3 100644
--- a/src/functions-reference/void_functions.Rmd
+++ b/src/functions-reference/void_functions.Rmd
@@ -39,8 +39,7 @@ Print the values denoted by the arguments x1 through xN on the output
 message stream. There are no spaces between items in the print, but a
 line feed (LF; Unicode U+000A; C++ literal `'\n'`) is inserted at
 the end of the printed line. The types `T1` through `TN` can be any of
-Stan's built-in numerical types or double quoted strings of ASCII
-characters.
+Stan's built-in numerical types or double quoted strings of characters.
 
 ## Reject statement
 
@@ -60,5 +59,5 @@ arguments x1 through xN on the output message stream. There are no
 spaces between items in the print, but a line feed (LF; Unicode
 U+000A; C++ literal `'\n'`) is inserted at the end of the printed
 line. The types `T1` through `TN` can be any of Stan's built-in
-numerical types or double quoted strings of ASCII characters.
+numerical types or double quoted strings of characters.
 
diff --git a/src/reference-manual/encoding.Rmd b/src/reference-manual/encoding.Rmd
index bcf2ed855..4828f68dd 100644
--- a/src/reference-manual/encoding.Rmd
+++ b/src/reference-manual/encoding.Rmd
@@ -25,3 +25,15 @@ is convenient.
 Any content after a block comment open sequence in ASCII (`/*`)
 up to the closing block comment (`*/`) is ignored, and thus may
 also be written in whatever character set is convenient.
+
+## String literals
+
+String literals are escaped according to the C++ standard, 
+meaning that non-ASCII characters in a `print` or `reject` 
+statement should properly be displayed if your terminal supports
+the encoding used in the input. This has been tested with UTF-8 
+encoded characters on a compliant terminal, and may not work under 
+other conditions. 
+
+The recommended character encoding for portable code that should
+display properly on all systems is still ASCII.
\ No newline at end of file
diff --git a/src/reference-manual/statements.Rmd b/src/reference-manual/statements.Rmd
index 56e965fa0..5f2f39418 100644
--- a/src/reference-manual/statements.Rmd
+++ b/src/reference-manual/statements.Rmd
@@ -1321,17 +1321,11 @@ step, and the `generated quantities` block once per iteration.
 
 String literals begin and end with a double quote character
 (`"`).  The characters between the double quote characters may be
-the space character or any visible ASCII character, with the exception
-of the backslash character (`\`) and double quote character
-(`"`).  The full list of visible ASCII characters is as follows,
-
-```
-a b c d e f g h i j k l m n o p q r s t u v w x y z
-A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
-0 1 2 3 4 5 6 7 8 9 0 { } [ ] ( ) < >
-~ @ # $ ` ^ & * _ ' - + = | / ! ? . , ; :
-```
+any character, with the exception of the double quote character.
 
+Characters outside the ASCII character set will be escaped and 
+passed to C++ as encoded. The behavior of these strings may depend
+on your interface's encoding settings.
 
 ### Debug by `print` {-}
 

From 2f02f0a7effb8e71919fccce3bc95dddf6ceb5d2 Mon Sep 17 00:00:00 2001
From: Brian Ward <bward@flatironinstitute.org>
Date: Mon, 30 Aug 2021 10:15:41 -0400
Subject: [PATCH 2/3] Be more clear on character/byte distinction

---
 src/functions-reference/void_functions.Rmd |  6 +++---
 src/reference-manual/encoding.Rmd          | 14 +++++++------
 src/reference-manual/statements.Rmd        | 23 +++++++++++-----------
 3 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/src/functions-reference/void_functions.Rmd b/src/functions-reference/void_functions.Rmd
index f2a7eadb3..b55d5e97f 100644
--- a/src/functions-reference/void_functions.Rmd
+++ b/src/functions-reference/void_functions.Rmd
@@ -39,7 +39,8 @@ Print the values denoted by the arguments x1 through xN on the output
 message stream. There are no spaces between items in the print, but a
 line feed (LF; Unicode U+000A; C++ literal `'\n'`) is inserted at
 the end of the printed line. The types `T1` through `TN` can be any of
-Stan's built-in numerical types or double quoted strings of characters.
+Stan's built-in numerical types or double quoted strings of characters
+(bytes).
 
 ## Reject statement
 
@@ -59,5 +60,4 @@ arguments x1 through xN on the output message stream. There are no
 spaces between items in the print, but a line feed (LF; Unicode
 U+000A; C++ literal `'\n'`) is inserted at the end of the printed
 line. The types `T1` through `TN` can be any of Stan's built-in
-numerical types or double quoted strings of characters.
-
+numerical types or double quoted strings of characters (bytes).
diff --git a/src/reference-manual/encoding.Rmd b/src/reference-manual/encoding.Rmd
index 4828f68dd..6dd9d59ae 100644
--- a/src/reference-manual/encoding.Rmd
+++ b/src/reference-manual/encoding.Rmd
@@ -28,12 +28,14 @@ also be written in whatever character set is convenient.
 
 ## String literals
 
-String literals are escaped according to the C++ standard, 
-meaning that non-ASCII characters in a `print` or `reject` 
-statement should properly be displayed if your terminal supports
-the encoding used in the input. This has been tested with UTF-8 
-encoded characters on a compliant terminal, and may not work under 
-other conditions. 
+String literals are escaped according to the C++ standard.
+In particular, this means that bytes outside of the ASCII character
+range in a `print` or `reject` statement should properly be displayed
+if your terminal supports the encoding used in the input. In other
+words, Stan simply preserves any string of bytes between two double
+quotes (`"`) when passing to C++. On compliant terminals, this allows
+the use of glyphs and other characters from encodings such as UTF-8 that
+fall outside the ASCII-compatible range.
 
 The recommended character encoding for portable code that should
 display properly on all systems is still ASCII.
\ No newline at end of file
diff --git a/src/reference-manual/statements.Rmd b/src/reference-manual/statements.Rmd
index 5f2f39418..81b44c2b9 100644
--- a/src/reference-manual/statements.Rmd
+++ b/src/reference-manual/statements.Rmd
@@ -211,14 +211,14 @@ statements of the forms listed in the table above.  The compound
 form is legal whenever the corresponding long form would be legal
 and it has the same effect.*
 
- operation  |   compound  |   unfolded
-:-----------|:------------|:-------------
-addition | `x += y` | `x = x + y`
-subtraction | `x -= y` | `x = x - y`
-multiplication | `x *= y` | `x = x * y`
-division | `x /= y` | `x = x / y`
-elementwise multiplication | `x .*= y` | `x = x .* y`
-elementwise division | `x ./= y` | `x = x ./ y`
+ | operation                  | compound  | unfolded     |
+ | :------------------------- | :-------- | :----------- |
+ | addition                   | `x += y`  | `x = x + y`  |
+ | subtraction                | `x -= y`  | `x = x - y`  |
+ | multiplication             | `x *= y`  | `x = x * y`  |
+ | division                   | `x /= y`  | `x = x / y`  |
+ | elementwise multiplication | `x .*= y` | `x = x .* y` |
+ | elementwise division       | `x ./= y` | `x = x ./ y` |
 
 
 ## Increment log density {#increment-log-prob.section}
@@ -1323,9 +1323,10 @@ String literals begin and end with a double quote character
 (`"`).  The characters between the double quote characters may be
 any character, with the exception of the double quote character.
 
-Characters outside the ASCII character set will be escaped and 
-passed to C++ as encoded. The behavior of these strings may depend
-on your interface's encoding settings.
+Bytes with values greater than 127 (outside the ASCII character set) 
+appearing in string literals will be escaped and passed to C++. 
+The behavior of these strings may depend on your interface's encoding 
+settings.
 
 ### Debug by `print` {-}
 

From b00f29c2607b182f84db8823bf829f7fc57ebd51 Mon Sep 17 00:00:00 2001
From: Brian Ward <bward@flatironinstitute.org>
Date: Tue, 31 Aug 2021 11:20:45 -0400
Subject: [PATCH 3/3] Changes per review

---
 src/reference-manual/encoding.Rmd   | 24 +++++++++++++-----------
 src/reference-manual/statements.Rmd | 13 ++++++++-----
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/src/reference-manual/encoding.Rmd b/src/reference-manual/encoding.Rmd
index 6dd9d59ae..5ae2a6bb9 100644
--- a/src/reference-manual/encoding.Rmd
+++ b/src/reference-manual/encoding.Rmd
@@ -28,14 +28,16 @@ also be written in whatever character set is convenient.
 
 ## String literals
 
-String literals are escaped according to the C++ standard.
-In particular, this means that bytes outside of the ASCII character
-range in a `print` or `reject` statement should properly be displayed
-if your terminal supports the encoding used in the input. In other
-words, Stan simply preserves any string of bytes between two double
-quotes (`"`) when passing to C++. On compliant terminals, this allows
-the use of glyphs and other characters from encodings such as UTF-8 that
-fall outside the ASCII-compatible range.
-
-The recommended character encoding for portable code that should
-display properly on all systems is still ASCII.
\ No newline at end of file
+The raw byte sequence within a string literal is escaped according 
+to the C++ standard. In particular, this means that UTF-8 encoded 
+strings are supported, however they are not tested for invalid byte 
+sequences. A `print` or `reject` statement should properly display 
+Unicode characters if your terminal supports the encoding used in the
+input. In other words, Stan simply preserves any string of bytes between 
+two double quotes (`"`) when passing to C++. On compliant terminals,
+this allows the use of glyphs and other characters from encodings such as
+UTF-8 that fall outside the ASCII-compatible range.
+
+ASCII is the recommended encoding for maximum portability, because it encodes
+the ASCII characters (Unicode code points 0--127) using the same sequence of
+bytes as the UTF-8 encoding of Unicode and common ISO-8859 extensions of Latin.
\ No newline at end of file
diff --git a/src/reference-manual/statements.Rmd b/src/reference-manual/statements.Rmd
index 81b44c2b9..f31824a9e 100644
--- a/src/reference-manual/statements.Rmd
+++ b/src/reference-manual/statements.Rmd
@@ -1321,12 +1321,15 @@ step, and the `generated quantities` block once per iteration.
 
 String literals begin and end with a double quote character
 (`"`).  The characters between the double quote characters may be
-any character, with the exception of the double quote character.
+any byte sequence, with the exception of the double quote character.
+
+The Stan interfaces preserve the byte sequences which they receive. 
+The encoding of these byte sequences as characters and their rendering
+as glyphs will be handled by whatever display mechanism is being used to
+monitor Stan's output (e.g., a terminal, a Jupyter notebook, RStudio, etc.).
+Stan does not enforce a character encoding for strings, and no attempt is
+made to validate the bytes as legal ASCII, UTF-8, etc.
 
-Bytes with values greater than 127 (outside the ASCII character set) 
-appearing in string literals will be escaped and passed to C++. 
-The behavior of these strings may depend on your interface's encoding 
-settings.
 
 ### Debug by `print` {-}