Skip to content

Commit d892842

Browse files
authored
Merge pull request #80 from maxonfjvipon/bug/#79/update-xmir-tutorial
bug(#79): update XMIR tutorial
2 parents 9d4c64d + aadadad commit d892842

File tree

1 file changed

+119
-98
lines changed

1 file changed

+119
-98
lines changed

_posts/2022/11/2022-11-25-xmir-guide.md

Lines changed: 119 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ title: "XMIR, a Quick Tour"
55
author: yegor256
66
---
77

8+
_Last updated at: 17.04.2025_
9+
810
XMIR is a dialect of [XML](https://en.wikipedia.org/wiki/XML),
911
which we use to represent a parsed
10-
[EO](https://www.eolang.org) program. It is a pretty simple format,
12+
[EO](https://www.eolang.org) object. It is a pretty simple format,
1113
which has a few
1214
important tricks, which I share below in this blog post. You may
1315
also want to check our [schema](https://en.wikipedia.org/wiki/XML_schema):
@@ -17,9 +19,10 @@ which may be more readable for some of you).
1719

1820
<!--more-->
1921

20-
Consider this simple EO program that prints `"Hello, world!"`:
22+
Consider this simple EO object that prints `"Hello, world!"`:
2123

2224
```
25+
# App.
2326
[] > app
2427
[x] > foo
2528
QQ.io.stdout > @
@@ -34,113 +37,119 @@ If we parse it using `EoSyntax` class from [eo-parser],
3437
we will get this XMIR (or very similar):
3538

3639
```xml
37-
<program xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
38-
dob="2024-12-27T11:00:08" ms="98" name="app" revision="27abe8b"
39-
source="app.eo" time="2025-01-13T09:32:04.455112Z" version="0.50.0"
40-
xsi:noNamespaceSchemaLocation="https://www.eolang.org/xsd/XMIR-0.50.0.xsd">
41-
<listing># Simple app.
40+
<object
41+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
42+
dob="2024-12-27T11:00:08"
43+
ms="98"
44+
revision="27abe8b"
45+
time="2025-04-17T09:32:04.455112Z"
46+
version="0.56.0"
47+
xsi:noNamespaceSchemaLocation="https://www.eolang.org/xsd/XMIR-0.56.0.xsd">
48+
<listing># App.
4249
[] &gt; app
4350
[x] &gt; foo
4451
QQ.io.stdout &gt; @
45-
QQ.txt.sprintf
52+
QQ.txt.sprintf *1
4653
"Hello, %s\n"
47-
* x
54+
x
4855
foo &gt; @
4956
"world!"
5057
</listing>
51-
<objects>
52-
<o line="2" name="app" pos="0">
53-
<o line="3" name="foo" pos="2">
54-
<o base="" line="3" name="x" pos="3"/>
55-
<o base=".stdout" line="4" name="@" pos="9">
56-
<o base=".io" line="4" pos="6">
57-
<o base="QQ" line="4" pos="4"/>
58+
<o line="2" name="app" pos="0">
59+
<o line="3" name="foo" pos="2">
60+
<o base="" line="3" name="x" pos="3"/>
61+
<o base=".stdout" line="4" name="@" pos="9">
62+
<o base=".io" line="4" pos="6">
63+
<o base="QQ" line="4" pos="4"/>
64+
</o>
65+
<o base=".sprintf" line="5" pos="12">
66+
<o base=".txt" line="5" pos="8">
67+
<o base="QQ" line="5" pos="6"/>
5868
</o>
59-
<o base=".sprintf" line="5" pos="12">
60-
<o base=".txt" line="5" pos="8">
61-
<o base="QQ" line="5" pos="6"/>
62-
</o>
63-
<o base="string" line="6" pos="8">48-65-6C-6C-6F-2C-20-25-73-0A</o>
64-
<o base="tuple" line="7" pos="8">
65-
<o base=".empty">
66-
<o base="tuple"/>
67-
</o>
68-
<o base="x" line="7" pos="10"/>
69+
<o base="string" line="6" pos="8">48-65-6C-6C-6F-2C-20-25-73-0A</o>
70+
<o base="tuple" line="7" pos="8">
71+
<o base=".empty">
72+
<o base="tuple"/>
6973
</o>
74+
<o base="x" line="7" pos="10"/>
7075
</o>
7176
</o>
7277
</o>
73-
<o base="foo" line="8" name="@" pos="2">
74-
<o base="string" line="9" pos="4">77-6F-72-6C-64-21</o>
75-
</o>
7678
</o>
77-
</objects>
78-
</program>
79+
<o base="foo" line="8" name="@" pos="2">
80+
<o base="string" line="9" pos="4">77-6F-72-6C-64-21</o>
81+
</o>
82+
</o>
83+
</object>
7984
```
8085

81-
The `<program>` is the root element, it will always be there, with
86+
The `<object>` is the root element, it will always be there, with
8287
a few mandatory attributes:
8388

84-
* `ms` is how much time in milliseconds it took to parse the program
89+
* `ms` is how much time in milliseconds it took to parse the object
8590
and generate this XMIR file,
86-
* `name` is the name of the program, as it was given to the parser,
8791
* `time` is the time in [ISO 8601] format when the file was generated,
8892
* `version` is the version of the parser.
8993

90-
The `<listing>` element contains the source code of the EO program,
91-
which was parsed, without any modifiations, "as is."
94+
The `<listing>` element contains the source code of the EO object,
95+
which was parsed, without any modifications, "as is."
9296

9397
## Errors and Warnings
9498

9599
The `<errors>` element may have a list of problems discovered by the
96-
parser or any other optimizers, as `<error>` elements.
100+
parser or any other optimizers, as `<error>` elements. If there are no
101+
errors, the `<errors>` element should not exist in `<object>`.
97102
For example, it may look like this:
98103

99104
```xml
100-
<program>
101-
[..]
105+
<object>
106+
[...]
102107
<errors>
103108
<error severity="warning" line="3">There is an extra bracket</error>
104109
<error severity="error" line="12">The object 'x' is not found</error>
110+
[...]
105111
</errors>
106-
</program>
112+
</object>
107113
```
108114

109115
The errors with the `warning` severity may more or less safely be ignored. The
110-
errors with the `error` severity will lead to failures in further compilation
111-
and processing. There could also be elements with the `critical` severity,
112-
which must stop the processing of the document immediately.
116+
errors with the `error` severity will lead to failures in further compilation
117+
and processing. There could also be elements with the `critical` severity,
118+
which must stop the processing of the document immediately.
113119

114120
## Sheets
115121

116-
The `<sheets>` element will rarely be empty. It contains a list of all
117-
post-processors that were applied to the document after is parsing.
118-
We process our XMIR documents using dozens of XSL stylesheets. That's why
119-
the name of the XML element. You may find something like this over there:
122+
The `<sheets>` element contains a list of all
123+
post-processors that were applied to the document after is parsing.
124+
We process our XMIR documents using dozens of XSL stylesheets. That's why
125+
the name of the XML element. You may find something like this over there:
120126

121127
```xml
122-
<program>
123-
[..]
128+
<object>
129+
[...]
124130
<sheets>
125-
<sheet>not-empty-atoms</sheet>
126-
<sheet>middle-varargs</sheet>
127-
<sheet>duplicate-names</sheet>
128-
<sheet>many-free-attributes</sheet>
131+
<sheet>move-voids-up</sheet>
132+
<sheet>const-to-dataized</sheet>
133+
<sheet>stars-to-tuples</sheet>
134+
<sheet>wrap-method-calls</sheet>
129135
[...]
130136
</sheets>
131-
</program>
137+
</object>
132138
```
133139

134140
The names you see in the `<sheet>` elements are the names of the files.
135-
For example, `not-empty-atoms` represents the
136-
[`not-empty-atoms.xsl`] file
137-
in the [objectionary/eo](https://github.com/objectionary/eo) GitHub repository.
141+
For example, `wrap-method-calls` represents the
142+
[`wrap-method-calls.xsl`] file
143+
in the [objectionary/eo](https://github.com/objectionary/eo) GitHub repository.
144+
145+
If no XSL stylesheets are applied to XMIR, the `<sheets>` element should not exist
146+
in `<object>`.
138147

139148
## Metas
140149

141150
There may be an optional element `<metas>` with a list of `<meta>` elements.
142-
For example, if my source code would have this meta at the 3rd
143-
line of the source file:
151+
For example, if my source code would have this meta at the 3rd
152+
line of the source file:
144153

145154
```
146155
+alias foo com.example.foo
@@ -149,77 +158,89 @@ There may be an optional element `<metas>` with a list of `<meta>` elements.
149158
We would see the following in the XMIR:
150159

151160
```xml
152-
<program>
153-
[..]
154-
<metas>
161+
<object>
162+
[...]
163+
<metas>
155164
<meta line="3">
156165
<head>alias</head>
157-
<tail>foo com.example.foo</tail>
166+
<tail>foo Q.com.example.foo</tail>
158167
<part>foo</part>
159-
<part>com.example.foo</part>
168+
<part>Q.com.example.foo</part>
160169
</meta>
161-
[..]
170+
[...]
162171
</metas>
163-
</program>
172+
</object>
164173
```
165174

166175
Each `<meta>` element contains parts of the meta. The `<head>`
167-
contains everything that goes after the `+` until the first space.
168-
The `<tail>` contains everything after the first space. There could
169-
be a number of `<part>` elements, each of which containing the parts
170-
of the `<tail>` separated by spaces.
176+
contains everything that goes after the `+` until the first space.
177+
The `<tail>` contains everything after the first space. There could
178+
be a number of `<part>` elements, each of which containing the parts
179+
of the `<tail>` separated by spaces.
171180

172181
## Objects
173182

174-
The `<objects/>` element contains object, as they were found in the source
175-
code, where each object is represented by the `<o/>` element.
176-
Each `<o/>` element may have a few optional attributes:
183+
The `<object>` element must contain only one `<o/>` element which represents an
184+
object being parsed. The `<o/>` element may have a few optional attributes:
177185

178186
* `line` and `pos` are the number of the line where the object
179187
was found by the parser and the position in the line;
180188
* `name` is the name of the object, if the object has it;
181189
* `base` may refer to object formation that is being copied;
182-
* `loc` may contain a "locator" of the object.
190+
* `as` is the name of the attribute which current object is bound to during the
191+
application
183192

184193
There could be no other attributes.
185194

186-
## Data Objects
187-
188-
Data literals found in the source code are presented with `<o/>` XML elements
189-
that contain text, for example:
195+
## Special cases
190196

197+
1. The `<o/>` elements that have nested `<o>` element with `name` which
198+
value is `λ` are **atoms**. Atoms must not have `base` attribute:
191199
```xml
192-
<o base="string" line="6" pos="8">48-65-6C-6C-6F-2C-20-25-73-0A</o>
200+
<o name="try">
201+
<o name="λ"/>
202+
</o>
193203
```
194204

195-
The value of the `base` attribute is the "type" of the data found in the
196-
sources. It may be one of the following three:
197-
`string`, `number`, and `bytes`.
198-
199-
## Locators
200-
201-
If you apply [`set-locators.xsl`] optimization XSL stylesheet to the following
202-
XMIR document:
203-
205+
2. The `<o/>` elements with `base` attribute which value is `` are **void** attributes.
206+
Void attributes also must have `name` attribute:
204207
```xml
205-
<o base=".times" name="x">
206-
<o base="a"/>
207-
<o base="b"/>
208+
<o name="foo">
209+
<o name="bar" base=""/>
208210
</o>
209211
```
210212

211-
You will get additional attribute `loc` added to each `<o>` element:
213+
3. **Data literals** found in the source code are presented with nested `<o/>` XML elements
214+
that contain text. Only elements with `base` attribute equal to `Q.org.eolang.bytes` may contain
215+
nested `<o>` element with text.
212216

213217
```xml
214-
```xml
215-
<o base=".times" name="x" loc="Φ.x">
216-
<o base="a" loc="Φ.x.ρ"/>
217-
<o base="b" loc="Φ.x.α0"/>
218+
<o base="Q.org.eolang.bytes" line="6" pos="8">
219+
<o>48-65-6C-6C-6F-2C-20-25-73-0A</o>
218220
</o>
219221
```
220222

221-
Locators are absolute and unique coordinates of any object
222-
in the entire object "Universe."
223+
4. The `name` attribute of `<o/>` element may be **auto generated** by EO parser.
224+
In such case it's look like:
225+
```xml
226+
<o name="a🌵104"/>
227+
```
228+
229+
Such `name` consists of several parts:
230+
- char `a` (ascii 97) that stands for "auto-generated"
231+
- char `🌵` that is just a pretty character prohibited by EO grammar
232+
- number `104` which is joined line and position of the place where
233+
the object is found.
234+
235+
Such names are unique through entire XMIR.
236+
237+
5. If object is bound to a specific attribute not by name but by position, the
238+
`as` attribute may look like:
239+
```xml
240+
<o base="Q.org.eolang.number" as="α2"/>
241+
```
242+
Here the first character is `α` (alpha), the number `2` is the position of the
243+
attribute.
223244

224245
<hr/>
225246

0 commit comments

Comments
 (0)