Skip to content

Commit c01af8f

Browse files
author
Hgh
committed
Reformatted the file and changed some sentenses
1 parent 8fee708 commit c01af8f

File tree

1 file changed

+123
-60
lines changed

1 file changed

+123
-60
lines changed

encoding/data-structure-encoding/README.md

Lines changed: 123 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,78 @@
11
# Data Structure Encoding
22

3-
In computing, serialization is the process of translating a data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, across a computer network) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked.
4-
5-
This process of serializing an object is also called marshalling an object in some situations.The opposite operation, extracting a data structure from a series of bytes, is de-serialization, (also called un-serialization or un-marshalling).
3+
## Table of contents
4+
5+
- ### [Definition](#definition)
6+
- ### [Text-based encoding formats](#text-based-encoding-formats)
7+
- #### [PEM](#pem)
8+
- ##### [Advantage and disadvantage](#advantage-and-disadvantage)
9+
- ##### [File content](#file-content)
10+
- ##### [Public key](#public-key)
11+
- ### [Base64](#base64)
12+
- ### [Examples](#examples)
13+
- #### [Manual encoding](#manual-encoding)
14+
- #### [Create a binary file](#create-a-binary-file)
15+
- #### [Encode to standard Base64](#encode-to-standard-base64)
16+
- #### [Decode from standard Base64](#decode-from-standard-base64)
17+
- ### [Text-to-binary decoding](#text-to-binary-decoding)
18+
19+
## Definition
20+
21+
In computing, serialization is the process of translating a data structure or object state into a format that can be
22+
stored (for example, in a file or memory data buffer) or transmitted (for example, across a computer network) and
23+
reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread
24+
according to the serialization format, it can be used to create a semantically identical clone of the original object.
25+
For many complex objects, such as those that make extensive use of references, this process is not straightforward.
26+
Serialization of object-oriented objects does not include any of their associated methods with which they were
27+
previously linked.
28+
29+
This process of serializing an object is also called marshalling an object in some situations.The opposite operation,
30+
extracting a data structure from a series of bytes, is de-serialization, (also called un-serialization or
31+
un-marshalling).
632

733
[Comparison of data-serialization formats](https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats)
834
[Serialization](https://en.wikipedia.org/wiki/Serialization)
935

1036
## Text-based encoding formats
1137

12-
### PEM [RFC 7468](https://tools.ietf.org/html/rfc7468)
13-
14-
Several security-related standards used on the Internet define ASN.1
15-
data formats that are normally encoded using the Basic Encoding Rules
16-
(BER) or Distinguished Encoding Rules (DER) [X.690](https://en.wikipedia.org/wiki/X.690), which are
17-
binary, octet-oriented encodings.
18-
A disadvantage of a binary data format is that it cannot be
19-
interchanged in textual transports, such as email or text documents.
20-
One advantage with text-based encodings is that they are easy to
21-
modify using common text editors; for example, a user may concatenate
22-
several certificates to form a certificate chain with copy-and-paste
23-
operations.
24-
The content of a PEM file begins with a header such as `-----BEGIN CERTIFICATE-----` in a stand-alone line and ends with a
25-
footer like `-----END CERTIFICATE-----` in the same way. The contents between header and footer tags are base64 encoded string of the related object in DER-encoded format. Except the header, the last line of content and footer lines, each line has the length of 64 characters. So, to parse a PEM file, you need to know the exact definition of the encoded object in ASN.1 syntax. you can use this online tool to check the content of a PEM file [PEM Parser](https://8gwifi.org/PemParserFunctions.jsp) or [Decode PEM data](https://report-uri.com/home/pem_decoder).
26-
27-
38+
### PEM
39+
40+
[RFC 7468](https://tools.ietf.org/html/rfc7468)
41+
42+
Several security-related standards used on the Internet define [ASN.1](https://en.wikipedia.org/wiki/ASN.1) data formats
43+
that are normally encoded using the Basic Encoding Rules (BER) or Distinguished Encoding Rules (
44+
DER) [X.690](https://en.wikipedia.org/wiki/X.690), which are binary, octet-oriented encodings.
45+
46+
#### Advantage and disadvantage
47+
48+
A disadvantage of a binary data format is that it cannot be interchanged in textual transports, such as email or text
49+
documents. One advantage with text-based encodings is that they are easy to modify using common text editors; for
50+
example, a user may concatenate several certificates to form a certificate chain with copy-and-paste operations.
51+
52+
#### File content
53+
54+
The content of a PEM file begins with a header such as `-----BEGIN CERTIFICATE-----` in a stand-alone line and ends with
55+
a footer like `-----END CERTIFICATE-----` in the same way. The contents between header and footer tags are base64
56+
encoded string of the related object in DER-encoded format. Except the header, the last line of content and footer
57+
lines, each line has the length of 64 characters. So, to parse a PEM file, you need to know the exact definition of the
58+
encoded object in ASN.1 syntax. you can use this online tool to check the content of a PEM
59+
file [PEM Parser](https://8gwifi.org/PemParserFunctions.jsp)
60+
or [Decode PEM data](https://report-uri.com/home/pem_decoder).
61+
2862
#### Public Key
2963

30-
a PEM file which contains a public key begins with the line `-----BEGIN PUBLIC KEY-----` and ends with the line `-----END PUBLIC KEY-----`. Between these two tags is the base64 encoded string of `SubjectPublicKeyInfo` object in DER-encoded format:
64+
A PEM file which contains a public key begins with the line `-----BEGIN PUBLIC KEY-----` and ends with the
65+
line `-----END PUBLIC KEY-----`. Between these two tags is the base64 encoded string of `SubjectPublicKeyInfo` object in
66+
DER-encoded format:
3167

3268
```
3369
-----BEGIN PUBLIC KEY-----
3470
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEk1qnJZfju7Cs3mcFHkaNv30Y14EX
3571
wLpQUpi1k2W+KWVSb1dnBTkavBRZ8bp0Ip1NR59PwuN/9Nf1pKu77a3PaQ==
3672
-----END PUBLIC KEY-----
3773
```
38-
To parse the `SubjectPublicKeyInfo` object, you need to follow these steps :
74+
75+
To parse the `SubjectPublicKeyInfo` object, you need to follow these steps:
3976

4077
- decode the base64 string (e.g. use this online tool [Cryptii](https://cryptii.com/pipes/base64-to-hex)):
4178

@@ -49,7 +86,8 @@ or you can use the following `OpenSSL` command :
4986
$ openssl ec -pubin -inform DER -in certificate.cer -outform PEM -out certificate.pem
5087
```
5188

52-
- parse the resulting byte array(`DER` formatted) according to the ASN.1 syntax of `SubjectPublicKeyInfo` [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280#section-4.1.1.2):
89+
- parse the resulting byte array(`DER` formatted) according to the ASN.1 syntax
90+
of `SubjectPublicKeyInfo` [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280#section-4.1.1.2):
5391

5492
```
5593
SubjectPublicKeyInfo ::= SEQUENCE {
@@ -63,7 +101,7 @@ AlgorithmIdentifier ::= SEQUENCE {
63101
parameters ANY DEFINED BY algorithm OPTIONAL }
64102
```
65103

66-
you can use the [ASN.1 Javascript decoder](https://lapo.it/asn1js/) online tool to check the parser result :
104+
you can use the [ASN.1 Javascript decoder](https://lapo.it/asn1js/) online tool to check the parser result:
67105

68106
```
69107
SEQUENCE (2 elem)
@@ -73,25 +111,33 @@ SEQUENCE (2 elem)
73111
BIT STRING (520 bit) 0000010010010011010110101010011100100101100101111110001110111011101100…
74112
```
75113

76-
We know that the `SEQUENCE` tag is `0x30` so the byte array is started with this value. Here, `0x59` equals to the length of the `SEQUENCE` object in bytes. The next `0x30` means that there is another `SEQUENCE` as we expect from the `AlgorithmIdentifier` definition syntax. The `SubjectPublicKey` contains the public key bytes and included as a `BIT STRING` in the object. `BIT STRING` tag is `0x03` which you can find it in the byte array easily followed by `0x42`(it's length in bytes).
114+
We know that the `SEQUENCE` tag is `0x30` so the byte array is started with this value. Here, `0x59` equals to the
115+
length of the `SEQUENCE` object in bytes. The next `0x30` means that there is another `SEQUENCE` as we expect from
116+
the `AlgorithmIdentifier` definition syntax. The `SubjectPublicKey` contains the public key bytes and included as
117+
a `BIT STRING` in the object. `BIT STRING` tag is `0x03` which you can find it in the byte array easily followed
118+
by `0x42`(it's length in bytes).
77119

78-
Sometimes, the cryptographic objects such as `Certificate`s, `Public Key`s, ... may be stored or transmitted in `DER` format (`.der`).
79-
For example the following command will export the former public key (an EC Public Key) from `PEM` format to its equivalent `DER` one:
120+
Sometimes, the cryptographic objects such as `Certificate`s, `Public Key`s, ... may be stored or transmitted in `DER`
121+
format (`.der`). For example the following command will export the former public key (an EC Public Key) from `PEM`
122+
format to its equivalent `DER` one:
80123

81124
```
82125
$ openssl ec -pubin -inform PEM -in public-key.pem -outform DER -out public-key.der
83126
```
84127

85-
The resulting `.der` file contains the base64 decoded of `.pem` file. Please note that the `OpenSSL` command for a `RSA Public Key` is as the following one:
128+
The resulting `.der` file contains the base64 decoded of `.pem` file. Please note that the `OpenSSL` command for
129+
a `RSA Public Key` is as the following one:
86130

87131
For `RSA Public Key`
132+
88133
```
89134
$ openssl rsa -pubin -inform PEM -in public-key.pem -outform DER -out public-key.der
90135
```
91136

92137
#### Certificate
93138

94139
For `Certificate`
140+
95141
```
96142
$ openssl x509 -inform PEM -in certificate.pem -outform DER -out certificate.der
97143
```
@@ -102,19 +148,28 @@ Note: `Certificate` in `DER` format may be stored in `.der`, `.cer` or `.crt` fi
102148

103149
### [ASN.1](https://en.wikipedia.org/wiki/ASN.1)
104150

105-
Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography.
151+
Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can
152+
be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer
153+
networking, and especially in cryptography.
106154

107-
Protocol developers define data structures in ASN.1 modules, which are generally a section of a broader standards document written in the ASN.1 language. The advantage is that the ASN.1 description of the data encoding is independent of a particular computer or programming language. Because ASN.1 is both human-readable and machine-readable, an ASN.1 compiler can compile modules into libraries of code, codecs, that decode or encode the data structures. Some ASN.1 compilers can produce code to encode or decode several encodings.
155+
Protocol developers define data structures in ASN.1 modules, which are generally a section of a broader standards
156+
document written in the ASN.1 language. The advantage is that the ASN.1 description of the data encoding is independent
157+
of a particular computer or programming language. Because ASN.1 is both human-readable and machine-readable, an ASN.1
158+
compiler can compile modules into libraries of code, codecs, that decode or encode the data structures. Some ASN.1
159+
compilers can produce code to encode or decode several encodings.
108160

109161
[X.690](https://en.wikipedia.org/wiki/X.690) is an ITU-T standard specifying several ASN.1 encoding formats:
110162

111163
- Basic Encoding Rules (BER)
112164
- Canonical Encoding Rules (CER)
113165
- Distinguished Encoding Rules (DER)
114166

115-
Any ASN.1 encoding begins with two common bytes (or octets, groupings of eight bits) that are universally applied regardless of the type. The first byte is the type indicator, which also includes some modification bits we shall briefly touch upon. The second byte is the length header.
167+
Any ASN.1 encoding begins with two common bytes (or octets, groupings of eight bits) that are universally applied
168+
regardless of the type. The first byte is the type indicator, which also includes some modification bits we shall
169+
briefly touch upon. The second byte is the length header.
116170

117-
We will use the [asn1parse](https://www.openssl.org/docs/manmaster/man1/openssl-asn1parse.html) command of `OpenSSL` with [ASN1_generate_nconf](https://www.openssl.org/docs/manmaster/man3/ASN1_generate_nconf.html) formatted file.
171+
We will use the [asn1parse](https://www.openssl.org/docs/manmaster/man1/openssl-asn1parse.html) command of `OpenSSL`
172+
with [ASN1_generate_nconf](https://www.openssl.org/docs/manmaster/man3/ASN1_generate_nconf.html) formatted file.
118173

119174
Some of the more applicable data types are:
120175

@@ -142,22 +197,24 @@ Some of the more applicable data types are:
142197

143198
- SET, SET OF : Constructed, tag = 0x11
144199

145-
The header byte is always placed at the start of any ASN.1 encoding and is divides into three parts: the classification, the constructed bit, and the primitive type. The header byte is broken as shown here :
200+
The header byte is always placed at the start of any ASN.1 encoding and is divides into three parts: the classification,
201+
the constructed bit, and the primitive type. The header byte is broken as shown here :
146202

147203
- bits 8,7 : Classification
148-
- bit 6 : Constructed
204+
- bit 6 : Constructed
149205
- bits 5..1 : Primitive Type
150206

151207
The classification bits refer to :
152208

153-
| Class | Bit 8 | Bit 7 |
209+
| Class | Bit 8 | Bit 7 |
154210
| :---------------| :-----| :-----|
155-
|universal | 0 | 0 |
156-
|application | 0 | 1 |
157-
|context-specific | 1 | 0 |
158-
|private | 1 | 1 |
211+
|universal | 0 | 0 |
212+
|application | 0 | 1 |
213+
|context-specific | 1 | 0 |
214+
|private | 1 | 1 |
159215

160-
`Primitive` method applies to simple types and types derived from simple types by implicit tagging. It requires that the length of the value be known in advance.
216+
`Primitive` method applies to simple types and types derived from simple types by implicit tagging. It requires that the
217+
length of the value be known in advance.
161218

162219
Simple Integer : put this lines as the content of `int.cnf` file:
163220

@@ -172,8 +229,8 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der
172229
000000 02 01 04
173230
```
174231

175-
As we expected, `0x02` refers to the `INTEGER` tag, `0x01` is the length of it and `0x04` is its value.
176-
Now, change the value in `int.cnf` file to `65889` and run it again:
232+
As we expected, `0x02` refers to the `INTEGER` tag, `0x01` is the length of it and `0x04` is its value. Now, change the
233+
value in `int.cnf` file to `65889` and run it again:
177234

178235
```
179236
openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der
@@ -192,19 +249,21 @@ asn1=NULL
192249
openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der
193250
000000 05 00
194251
```
252+
195253
`0x05` is the corresponding tag to `NULL`.
196254

197-
Tagging is useful to distinguish types within an application; it is also commonly used to distinguish component types within a structured type. For instance, optional components of a SET or SEQUENCE type are typically given distinct context-specific tags to avoid ambiguity.
198-
There are two ways to tag a type: implicitly and explicitly.
255+
Tagging is useful to distinguish types within an application; it is also commonly used to distinguish component types
256+
within a structured type. For instance, optional components of a SET or SEQUENCE type are typically given distinct
257+
context-specific tags to avoid ambiguity. There are two ways to tag a type: implicitly and explicitly.
199258

200-
Implicitly tagged types are derived from other types by changing the tag of the underlying type.
259+
Implicitly tagged types are derived from other types by changing the tag of the underlying type.
201260

202261
[[class] number] IMPLICIT Type
203262

204263
class = UNIVERSAL | APPLICATION | PRIVATE
205264

206-
where Type is a type, class is an optional class name, and number is the tag number within the class, a nonnegative integer.
207-
If the class name is absent, then the tag is context-specific.
265+
where Type is a type, class is an optional class name, and number is the tag number within the class, a nonnegative
266+
integer. If the class name is absent, then the tag is context-specific.
208267

209268
Keep going and put an `IMPLICIT` tag on it :
210269

@@ -217,7 +276,8 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der
217276
000000 81 01 04
218277
```
219278

220-
`8` octet shows that it has context-specific class and is a `primitive` not a `constructed`. and `1` is the tag number of it.
279+
`8` octet shows that it has context-specific class and is a `primitive` not a `constructed`. and `1` is the tag number
280+
of it.
221281

222282
Now, let try this one :
223283

@@ -232,27 +292,30 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der
232292

233293
`4` octet shows that it has application class.
234294

235-
A real example : KCS #8's `PrivateKeyInfo` type has an optional attributes component with an implicit, context-specific tag:
295+
A real example : KCS #8's `PrivateKeyInfo` type has an optional attributes component with an implicit, context-specific
296+
tag:
236297

237-
PrivateKeyInfo ::= SEQUENCE {
238-
version Version,
239-
privateKeyAlgorithm PrivateKeyAlgorithmIdentifier,
240-
privateKey PrivateKey,
241-
attributes [0] IMPLICIT Attributes OPTIONAL }
298+
PrivateKeyInfo ::= SEQUENCE { version Version, privateKeyAlgorithm PrivateKeyAlgorithmIdentifier, privateKey PrivateKey,
299+
attributes [0] IMPLICIT Attributes OPTIONAL }
242300

243-
Here the underlying type is Attributes, the class is absent (i.e., context-specific), and the tag number within the class is 0.
301+
Here the underlying type is Attributes, the class is absent (i.e., context-specific), and the tag number within the
302+
class is 0.
244303

245-
`Constructed, definite-length` method applies to simple string types, structured types, types derived simple string types and structured types by implicit tagging, and types derived from anything by explicit tagging. It requires that the length of the value be known in advance.
304+
`Constructed, definite-length` method applies to simple string types, structured types, types derived simple string
305+
types and structured types by implicit tagging, and types derived from anything by explicit tagging. It requires that
306+
the length of the value be known in advance.
246307

247-
For example a `SEQUENCE` will be shown by `0x30` tag, because it's a constructed type so the `6`th bit will be `1` and makes the `0x10` tag to `0x30`. The same approach cause that a `SET` will be started by `0x31`.
308+
For example a `SEQUENCE` will be shown by `0x30` tag, because it's a constructed type so the `6`th bit will be `1` and
309+
makes the `0x10` tag to `0x30`. The same approach cause that a `SET` will be started by `0x31`.
248310

249311
Explicit tagging denotes a type derived from another type by adding an outer tag to the underlying type.
250312

251313
[[`class`] `number`] EXPLICIT `Type`
252314

253315
`class` = UNIVERSAL | APPLICATION | PRIVATE
254316

255-
where `Type` is a type, `class` is an optional class name, and `number` is the tag number within the class, a nonnegative integer.
317+
where `Type` is a type, `class` is an optional class name, and `number` is the tag number within the class, a
318+
nonnegative integer.
256319

257320
If the `class` name is absent, then the tag is `context-specific`.
258321

@@ -269,10 +332,10 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der
269332
000000 a1 03 02 01 04
270333
```
271334

272-
We do not specified the class in `int.cnf` file, so its class is `context-specific` as the default : `1 0` in bits 8,7 and `constructed` `1` in bit 6.
273-
The Tag number is also appeared in second octet of byte `1`.
335+
We do not specified the class in `int.cnf` file, so its class is `context-specific` as the default : `1 0` in bits 8,7
336+
and `constructed` `1` in bit 6. The Tag number is also appeared in second octet of byte `1`.
274337

275-
No try to determine the class of object an set it to `Application` :
338+
No try to determine the class of object an set it to `Application` :
276339

277340
```
278341
asn1=EXPLICIT:1A, INTEGER:4

0 commit comments

Comments
 (0)