Skip to content

Commit 312d1f5

Browse files
author
Hgh
committed
Updated text of the encoding -> binary to text read me file regarding some text modification
1 parent 24a9b6f commit 312d1f5

File tree

1 file changed

+84
-39
lines changed

1 file changed

+84
-39
lines changed

encoding/binary-to-text/README.md

Lines changed: 84 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -40,46 +40,71 @@ In [src/java/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/cryptog
4040

4141
# Binary-to-text Encoding
4242

43-
The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over the response of an API calling, or viewing special characters on a debug console or unit test function. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed.
44-
Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it.
43+
## Table of contents
44+
45+
- ### [Purpose](#purpose)
46+
- ### [Hexadecimal (Base16)](#Hexadecimal-(Base16))
47+
- #### [Advantages](#advantages)
48+
- #### [Disadvantages](#disadvantages)
49+
- ### [Base64](#base64)
50+
- ### [Examples](#examples)
51+
- #### [Manual encoding](#manual-encoding)
52+
- #### [Create a binary file](#create-a-binary-file)
53+
- #### [Encode to standard Base64](#encode-to-standard-base64)
54+
- #### [Decode from standard Base64](#decode-from-standard-base64)
55+
- ### [Text-to-binary decoding](#text-to-binary-decoding)
56+
57+
## Purpose
58+
59+
To understand the purpose of Encoding, please check [here](../../README.md#purpose)
4560

4661
## Hexadecimal (Base16)
4762

48-
Base16 can also refer to a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
63+
The Hexadecimal is a numeral system made up of 16 symbols to write and share numerical values. Base16 can also refer to
64+
a binary to text encoding belonging to the same family as Base32, Base58, and Base64.
4965

50-
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16 symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with standard written notation for hexadecimal numbers.
66+
In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16
67+
symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the
68+
ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with
69+
standard written notation for hexadecimal numbers.
70+
71+
### Advantages
5172

5273
There are several advantages of Base16 encoding:
5374

54-
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal
55-
- Being exactly half a byte, 4-bits is easier to process than the 5 or 6 bits of Base32 and Base64 respectively
56-
The symbols 0-9 and A-F are universal in hexadecimal notation, so it is easily understood at a glance without needing to rely on a symbol lookup table
57-
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"), making it more efficient in hardware than Base32 and Base64
75+
- Most programming languages already have facilities to parse ASCII-encoded hexadecimal.
76+
- Being exactly half a byte (4-bits) is easier to process than the 5 or 6 bits of Base32 and Base64 respectively. The
77+
symbols 0-9 and A-F are universal in hexadecimal notation, so it would be easily understood at a glance without
78+
needing to rely on a symbol lookup table.
79+
- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"),
80+
making Base16 more efficient in hardware than Base32 and Base64.
81+
82+
### Disadvantages
5883

5984
The main disadvantages of Base16 encoding are:
6085

61-
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
86+
- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In
87+
contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively.
6288
- Possible added complexity of having to accept both uppercase and lowercase letters.
6389

6490
## Base64
6591

66-
Here, we are talking about the `Base64` encoding from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
92+
Here, we are talking about the `Base64` encoding
93+
from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648).
6794

6895
There are two different versions defined in RFC 4648:
6996

7097
* Standard
7198
* With URL and Filename Safe Alphabet
7299

73-
The encoding process represents 24-bit groups of input bits as output
74-
strings of 4 encoded characters. Proceeding from left to right, a
75-
24-bit input group is formed by concatenating 3 8-bit input groups.
76-
These 24 bits are then treated as 4 concatenated 6-bit groups, each
77-
of which is translated into a single character in the base 64
78-
alphabet.
100+
The encoding process takes 24-bit groups as input and represents 4 encoded characters string as output.
79101

80-
Each 6-bit group is used as an index into an array of 64 printable
81-
characters. The character referenced by the index is placed in the
82-
output string.
102+
The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from
103+
left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4
104+
concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet.
105+
106+
Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is
107+
placed in the output string.
83108

84109
The Base 64 Alphabet Table
85110

@@ -102,23 +127,22 @@ The Base 64 Alphabet Table
102127
15 P 32 g 49 x
103128
16 Q 33 h 50 y
104129

105-
Special processing is performed if fewer than 24 bits are available
106-
at the end of the data being encoded. A full encoding quantum is
107-
always completed at the end of a quantity. When fewer than 24 input
108-
bits are available in an input group, bits with value zero are added
109-
(on the right) to form an integral number of 6-bit groups.
110-
Since it encodes by group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for padding.
130+
Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full
131+
encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input
132+
group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Since it encodes by
133+
group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for
134+
padding.
111135

112-
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` ,
113-
and the `_` is used for `63` instead of `/` . This encoding may be referred to as "base64url".
114-
This encoding should not be regarded as the same as the "base64" encoding and
115-
should not be referred to as only "base64".
136+
In `URL/Filename safe` version, the `-` is used for `62` instead of `+` , and the `_` is used for `63` instead of `/`.
137+
This encoding may be referred to as "base64url".
138+
This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64"
139+
.
116140

117-
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
141+
In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021.
118142

119-
### Example
143+
### Examples
120144

121-
#### manual encoding
145+
#### Manual encoding
122146

123147
Suppose that the input byte array is [0xff, 0xe2].
124148

@@ -140,31 +164,30 @@ The output length is not the multiplier of 4, so add `=` as the padding characte
140164

141165
`/` `+` `I` `=`
142166

143-
If we try to do same one for `base64url` :
167+
If we try to do same one for `base64url`:
144168

145169
`_` `-` `I` `=`
146170

147-
##### create a binary file
171+
#### Create a binary file
148172

149-
You can use `echo` in command line interface :
173+
You can use `echo` in command line interface:
150174

151175
```
152176
$ echo -n -e \\xff\\xe2 > data_binary.bin
153177
```
154178

155-
To check the content of the binary file :
179+
To check the content of the binary file:
156180

157181
```
158182
$ hexdump data_binary.bin
159183
```
160184

161-
##### encode to standard Base64
185+
#### Encode to standard Base64
162186

163187
```
164188
$ openssl enc -base64 -e -in data_binary.bin
165189
```
166-
167-
##### decode from standard Base64
190+
#### Decode from standard Base64
168191

169192
```
170193
$ openssl enc -base64 -d <<< /+I= | od -vt x1
@@ -177,3 +200,25 @@ In [src/python/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/crypt
177200

178201
##### Java
179202
In [src/java/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/cryptography-in-use/tree/main/src/java/cryptography-in-use/src/main/java/cryptolib) folder, you can find the `BinaryEncoder.java` source code contains the `hex` and `base64` encoder/decoders implementations. Their unit-tests also are available in [src/java/cryptography-in-use/test](https://github.com/KeyvanArj/cryptography-in-use/tree/main/src/java/cryptography-in-use/test/java/cryptolib) folder as the `BinaryEncoderTest.java` source code.
203+
=======
204+
In many situations, we have some text values which should be decoded to an equivalent byte arrays to use as the input of
205+
a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt
206+
it before transmission. The encryption process accepts a byte array as the input, so we need to convert the message to a
207+
byte array :
208+
209+
```
210+
$ echo -n 'Hello, World' | od -t x1
211+
0000000 48 65 6c 6c 6f 20 57 6f 72 6c 64
212+
```
213+
214+
or in other representation way:
215+
216+
```
217+
$ echo -n 'Hello, World' | xxd -ps
218+
48656c6c6f2c20576f726c64
219+
```
220+
221+
But what does it mean really? It's very important for you to understand what happens exactly in this conversion. Take a
222+
look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e`
223+
character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value
224+
from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option? Yes,

0 commit comments

Comments
 (0)