Skip to content

Add ShowSpecialCharacters and ShowStringCharacters options and round-trip FullForm output#1763

Merged
mmatera merged 12 commits intomasterfrom
full_form_encodes_invertible_ascii
Mar 29, 2026
Merged

Add ShowSpecialCharacters and ShowStringCharacters options and round-trip FullForm output#1763
mmatera merged 12 commits intomasterfrom
full_form_encodes_invertible_ascii

Conversation

@mmatera
Copy link
Copy Markdown
Contributor

@mmatera mmatera commented Mar 29, 2026

This PR adds support for the options ShowSpecialCharacters and ShowStringCharacters used in StyleBox, Style, and Cell builtin functions. These options control how strings are rendered.

In WMA, when this ShowSpecialCharacters option is set to False , and ShowStringCharacters is set to True, strings are rendered using an ASCII representation in which any non-ASCII characters are represented by their character names. This provides an "invertible" representation of the internal original String. In WMA, this representation is used in FullForm.

This would also provide better grounds for #1735

result = result[:-1]

for res in result:
print("show", String(res).value)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh.

I often look at the PR inside GitHub's interface to spot stuff like this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, usually I do a git diff origin/master | grep 'print(' but this escaped to my sight...

)

ascii_operator_to_symbol = NAMED_CHARACTERS_COLLECTION["ascii-operator-to-symbol"]
CHARACTER_TO_NAME = {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this to Mathics3-scanner. Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will, but first I wanted to be sure that this is the right approach. Thoughts?

Copy link
Copy Markdown
Member

@rocky rocky Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this approach of putting in a PR, then discussing whatever it is that happens to be in it (you figure it out), really hard to follow.

Many people start with a problem and then go to a solution, instead of writing some code based on something that feels wrong (is this what is meant by "vibe" coding?) and then looking at what's been created and discussing that.

If that's the way you have to work, well. okay. But maybe after all the vibe coding, we can have a discussion (independent of the code) about what's wrong. Then discuss ways to address that.

I had thought we were going to start to do that for #1735, which I had imagined was taking that code and breaking it up into pieces. You know, like option information from a built-in (CharacterEncoding, of ToString ) is not filtering down to rendering routines. How do we do that? Do we add **kwargs parameters to the methods or split out the relevant ones (like encoding)?

I admit that there are bigger issues we want to solve, but I offer this as a specific example of something where we can break off a small, isolated problem (independent of the larger issue) and create a PR for that.

Or decide to hold off on that until the bigger picture is decided.

Instead, we are now on to a related topic with code that is outside of #1735.

So be it.

Okay. Now that you've come across this other thing and written some code so you might be able to understand something about it, can we just forget about the code (for now), and describe what the problem is in human language, and then what the approaches for handling this are?

Copy link
Copy Markdown
Member

@rocky rocky Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Now that you've come across this other thing and written some code so you might be able to understand something about it, can we just forget about the code (for now), and describe what the problem is in human language, and then what the approaches for handling this are?

I get from the PR comment (probably written after the code) that we should add the option ShowSpecialCharacters, which is used in Style and StyleBox.

Instead of the code, though, describe in human language what the issues or approaches are and what implications those might have.

(I write "human language" because I understand English may be awkward for you (as it is for me)).

If you want to think and describe in Spanish, that's okay, I'll use Google Translate. The main thing is to express the idea independent of specific code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this approach of putting in a PR, then discussing whatever it is that happens to be in it (you figure it out), really hard to follow.

OK: the wordy version of this PR would be: we need conversion tables for

  1. Take any str in the Mathics3 inner encoding, and convert it into an ASCII representation in an invertible way. This is required for FullForm.
  2. Take any str in the Mathics3 inner encoding, and convert it into an ASCII representation that be visually close to the character that the internal character represents.

I believe I mentioned this earlier; if not, my apologies.

Many people start with a problem and then go to a solution, instead of writing some code based on something that feels wrong (is this what is meant by "vibe" coding?) and then looking at what's been created and discussing that.

If that's the way you have to work, well. okay. But maybe after all the vibe coding, we can have a discussion (independent of the code) about what's wrong. Then discuss ways to address that.

Again, I though we have already that discussion. Now I am just proposing an implementation for it. And for it, I feel easier to show the code of the implementation instead of trying to figure out how to translate from Physicist-Spanish to Computer-Science English.

I had thought we were going to start to do that for #1735, which I had imagined was taking that code and breaking it up into pieces. You know, like option information from a built-in (CharacterEncoding, of ToString ) is not filtering down to rendering routines. How do we do that? Do we add **kwargs parameters to the methods or split out the relevant ones (like encoding)?

I am doing the work of spliting in pieces. Now I put another of these pieces, related to MathMLForm. There are coming more.

I admit that there are bigger issues we want to solve, but I offer this as a specific example of something where we can break off a small, isolated problem (independent of the larger issue) and create a PR for that.

What this PR tries to solve is to have an output from FullForm that can be copies from any front-end, copy to another front end, and produce exactly the same code. Then, we can compare in tests results and expected results disregarding of the encoding.

Or decide to hold off on that until the bigger picture is decided.

That is the bigger picture

Instead, we are now on to a related topic with code that is outside of #1735.

So be it.

Okay. Now that you've come across this other thing and written some code so you might be able to understand something about it, can we just forget about the code (for now), and describe what the problem is in human language, and then what the approaches for handling this are?

I am going to update the PR description to focus more on its central aspect.

return value
# value = expr.value
# return value
kwargs["System`ShowStringCharacters"] = SymbolTrue
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is setting this to True unconditionally correct? Can't it be overwritten from outside via kwargs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we do not pass through boxes here. The alternative would be to add the quotes, and then leave the render function to remove them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I have added a long comment to explain how is this path. Also, I handle the case where kwargs["SystemShowStringCharacters"]was already set toTrue`.

)
# These characters are used in encoding
# in WMA, and differs from what we have
# in Mathics3-scanner tables:
Copy link
Copy Markdown
Member

@rocky rocky Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not totally accurate. As mentioned before,  is listed as the Wolfram-language encoding.

A number of these we choose to not to use by default for input and output because they would need special code pages set up by users, which is generally not done. So instead, we often pick a Unicode symbol that is equivalent and commonly available to users.

However, we always note the corresponding WL unicode, and that too is available in JSON tables.

If the scanner is not accepting  as an acceptable character for DifferentialD, that can easily be fixed. In fact, we've done something like that recently.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that this character appears in some places. This is the kind of things that I would like to fix before moving this to Mathics3-scanner, to avoid doing coordinated changes before have this in clear.

Copy link
Copy Markdown
Member

@rocky rocky Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - that's fine and probably a good idea. But then, let's describe the problem and ideas without reference to specific code. (It's fine for you to have written this for yourself to gain some idea. Just do not hold too tightly, though, on the exact code until we've gone over the problem and ideas at a high level first.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this part is not strictly needed for this PR. I will propose this for another round.

https://reference.wolfram.com/language/ref/ShowSpecialCharacters.html</url>
<dl>
<dt>'ShowSpecialCharacters'
<dd>is an option for 'Style' and 'Cell' that directs whether non-ANSI characters must be shown as special characters or by escaped sequences.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANSI -> ASCII.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually mix them in my head: ASCII==7bits and ANSI==8bits, right?

Copy link
Copy Markdown
Member

@rocky rocky Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://stackoverflow.com/questions/701882/what-is-ansi-format and decide what it is that want to convey. Then pick the word that is appropriate. If you decide it is ANSI, then I think you would need to elaborate more on the code page aspect.

Also taking from the link:

The name "ANSI" is a misnomer, since it doesn't correspond to any actual ANSI standard, but the name has stuck. ANSI is not the same as UTF-8.

Do you mean UTF-8?

Copy link
Copy Markdown
Member

@rocky rocky Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually mix them in my head: ASCII==7bits and ANSI==8bits, right?

While there is 8-bit ASCII, I am now getting the impression you mean either UTF-8 or Unicode.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to mean, the "default" (American centrier) interpretation of an 8-bit character. But OK, the right name is ASCII.

</ul>
"""

summary_text = "cell option directing whether show special characters in a reversible ANSI format."
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ANSI -> ASCII

@rocky
Copy link
Copy Markdown
Member

rocky commented Mar 29, 2026

Do I have this right that this PR:

  • Adds ShowSpecialCharacters and ShowStringCharacters options used in Cell, Style, and StyleBox built-in functions
  • Changes FullForm (and only that or other Forms as well?) so that the strings they produce are round-trip or invertible?

Is the round-trip or invertibility aspect you find desirable for testing? Or is there a user impact as well? (Cut and paste output from x to feed into y for some x and y).

You indicate that it might also be what WMA does, but here, I'd be grateful to get some simple examples or documentation somewhere that show this.

@rocky rocky changed the title Improving FullForm and ShowSpecialCharacters option for Style and StyleBox Add ShowSpecialCharacters and ShowStringCharacters options and round-trip FullForm output Mar 29, 2026
@rocky rocky changed the title Add ShowSpecialCharacters and ShowStringCharacters options and round-trip FullForm output Add ShowSpecialCharacters and ShowStringCharacters options and round-trip FullForm output Mar 29, 2026
@rocky
Copy link
Copy Markdown
Member

rocky commented Mar 29, 2026

@mmatera I have edited both the PR title and the description. I make mistakes and might not have gotten this correct. Please check.

Getting these to be accurate is very helpful to me, especially now, as I will be going over PRs and change logs in order to get release notes done. (I have been putting this off, but I need to start doing now)

@rocky
Copy link
Copy Markdown
Member

rocky commented Mar 29, 2026

LGTM.

I am not 100% certain everything here is as it should be, but I guess, let's go with it for now, with the understanding that in the future we may need to make other adjustments or change things slightly.

@rocky
Copy link
Copy Markdown
Member

rocky commented Mar 29, 2026

Note: SYMBOLS_MANIFEST.txt will need updating.

@mmatera
Copy link
Copy Markdown
Contributor Author

mmatera commented Mar 29, 2026

LGTM.

I am not 100% certain everything here is as it should be, but I guess, let's go with it for now, with the understanding that in the future we may need to make other adjustments or change things slightly.

@rocky, thanks for the review and the patience!

@mmatera mmatera merged commit c686c12 into master Mar 29, 2026
18 checks passed
@mmatera mmatera deleted the full_form_encodes_invertible_ascii branch March 29, 2026 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants