You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: encoding/binary-to-text/README.md
+84-39Lines changed: 84 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,46 +40,71 @@ In [src/java/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/cryptog
40
40
41
41
# Binary-to-text Encoding
42
42
43
-
The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over the response of an API calling, or viewing special characters on a debug console or unit test function. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed.
44
-
Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it.
To understand the purpose of Encoding, please check [here](../../README.md#purpose)
45
60
46
61
## Hexadecimal (Base16)
47
62
48
-
Base16 can also refer to a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
63
+
The Hexadecimal is a numeral system made up of 16 symbols to write and share numerical values. Base16 can also refer to
64
+
a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
49
65
50
-
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16 symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with standard written notation for hexadecimal numbers.
66
+
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16
67
+
symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the
68
+
ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with
69
+
standard written notation for hexadecimal numbers.
70
+
71
+
### Advantages
51
72
52
73
There are several advantages of Base16 encoding:
53
74
54
-
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal
55
-
- Being exactly half a byte, 4-bits is easier to process than the 5 or 6 bits of Base32 and Base64 respectively
56
-
The symbols 0-9 and A-F are universal in hexadecimal notation, so it is easily understood at a glance without needing to rely on a symbol lookup table
57
-
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"), making it more efficient in hardware than Base32 and Base64
75
+
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal.
76
+
- Being exactly half a byte (4-bits) is easier to process than the 5 or 6 bits of Base32 and Base64 respectively. The
77
+
symbols 0-9 and A-F are universal in hexadecimal notation, so it would be easily understood at a glance without
78
+
needing to rely on a symbol lookup table.
79
+
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"),
80
+
making Base16 more efficient in hardware than Base32 and Base64.
81
+
82
+
### Disadvantages
58
83
59
84
The main disadvantages of Base16 encoding are:
60
85
61
-
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
86
+
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In
87
+
contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
62
88
- Possible added complexity of having to accept both uppercase and lowercase letters.
63
89
64
90
## Base64
65
91
66
-
Here, we are talking about the `Base64` encoding from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
92
+
Here, we are talking about the `Base64` encoding
93
+
from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
67
94
68
95
There are two different versions defined in RFC 4648:
69
96
70
97
* Standard
71
98
* With URL and Filename Safe Alphabet
72
99
73
-
The encoding process represents 24-bit groups of input bits as output
74
-
strings of 4 encoded characters. Proceeding from left to right, a
75
-
24-bit input group is formed by concatenating 3 8-bit input groups.
76
-
These 24 bits are then treated as 4 concatenated 6-bit groups, each
77
-
of which is translated into a single character in the base 64
78
-
alphabet.
100
+
The encoding process takes 24-bit groups as input and represents 4 encoded characters string as output.
79
101
80
-
Each 6-bit group is used as an index into an array of 64 printable
81
-
characters. The character referenced by the index is placed in the
82
-
output string.
102
+
The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from
103
+
left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4
104
+
concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.
105
+
106
+
Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is
107
+
placed in the output string.
83
108
84
109
The Base 64 Alphabet Table
85
110
@@ -102,23 +127,22 @@ The Base 64 Alphabet Table
102
127
15 P 32 g 49 x
103
128
16 Q 33 h 50 y
104
129
105
-
Special processing is performed if fewer than 24 bits are available
106
-
at the end of the data being encoded. A full encoding quantum is
107
-
always completed at the end of a quantity. When fewer than 24 input
108
-
bits are available in an input group, bits with value zero are added
109
-
(on the right) to form an integral number of 6-bit groups.
110
-
Since it encodes by group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for padding.
130
+
Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full
131
+
encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input
132
+
group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Since it encodes by
133
+
group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for
134
+
padding.
111
135
112
-
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` ,
113
-
and the `_` is used for `63` instead of `/` . This encoding may be referred to as "base64url".
114
-
This encoding should not be regarded as the same as the "base64" encoding and
115
-
should not be referred to as only "base64".
136
+
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` , and the `_` is used for `63` instead of `/`.
137
+
This encoding may be referred to as "base64url".
138
+
This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64"
139
+
.
116
140
117
-
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
141
+
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
118
142
119
-
### Example
143
+
### Examples
120
144
121
-
#### manual encoding
145
+
#### Manual encoding
122
146
123
147
Suppose that the input byte array is [0xff, 0xe2].
124
148
@@ -140,31 +164,30 @@ The output length is not the multiplier of 4, so add `=` as the padding characte
140
164
141
165
`/``+``I``=`
142
166
143
-
If we try to do same one for `base64url`:
167
+
If we try to do same one for `base64url`:
144
168
145
169
`_``-``I``=`
146
170
147
-
##### create a binary file
171
+
####Create a binary file
148
172
149
-
You can use `echo` in command line interface :
173
+
You can use `echo` in command line interface:
150
174
151
175
```
152
176
$ echo -n -e \\xff\\xe2 > data_binary.bin
153
177
```
154
178
155
-
To check the content of the binary file:
179
+
To check the content of the binary file:
156
180
157
181
```
158
182
$ hexdump data_binary.bin
159
183
```
160
184
161
-
##### encode to standard Base64
185
+
####Encode to standard Base64
162
186
163
187
```
164
188
$ openssl enc -base64 -e -in data_binary.bin
165
189
```
166
-
167
-
##### decode from standard Base64
190
+
#### Decode from standard Base64
168
191
169
192
```
170
193
$ openssl enc -base64 -d <<< /+I= | od -vt x1
@@ -177,3 +200,25 @@ In [src/python/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/crypt
177
200
178
201
##### Java
179
202
In [src/java/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/cryptography-in-use/tree/main/src/java/cryptography-in-use/src/main/java/cryptolib) folder, you can find the `BinaryEncoder.java` source code contains the `hex` and `base64` encoder/decoders implementations. Their unit-tests also are available in [src/java/cryptography-in-use/test](https://github.com/KeyvanArj/cryptography-in-use/tree/main/src/java/cryptography-in-use/test/java/cryptolib) folder as the `BinaryEncoderTest.java` source code.
203
+
=======
204
+
In many situations, we have some text values which should be decoded to an equivalent byte arrays to use as the input of
205
+
a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt
206
+
it before transmission. The encryption process accepts a byte array as the input, so we need to convert the message to a
207
+
byte array :
208
+
209
+
```
210
+
$ echo -n 'Hello, World' | od -t x1
211
+
0000000 48 65 6c 6c 6f 20 57 6f 72 6c 64
212
+
```
213
+
214
+
or in other representation way:
215
+
216
+
```
217
+
$ echo -n 'Hello, World' | xxd -ps
218
+
48656c6c6f2c20576f726c64
219
+
```
220
+
221
+
But what does it mean really? It's very important for you to understand what happens exactly in this conversion. Take a
222
+
look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e`
223
+
character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value
224
+
from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option? Yes,
0 commit comments