Skip to content

Commit 6348860

Browse files
authored
Merge pull request #345 from nspcc-dev/feature/container-policy-ec
2 parents d2d285d + b59c35d commit 6348860

File tree

6 files changed

+165
-6
lines changed

6 files changed

+165
-6
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
## [Unreleased]
44

55
### Added
6+
- EC rules to container storage policy (#345)
7+
- GET/RANGE query for EC part object (#345)
68

79
### Changed
810

netmap/types.proto

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,53 @@ message PlacementPolicy {
130130
json_name = "subnetId",
131131
deprecated = true
132132
];
133+
134+
// Erasure coding rule for container objects.
135+
//
136+
// For each original object, the payload is split into `data_part_num` data
137+
// and `parity_part_num` parity parts. Each part is the same size. Data parts
138+
// contain the original payload. If its length is not divisible by
139+
// `data_part_num`, the last part is aligned with zero bytes. Both
140+
// `data_part_num` and `parity_part_num` MUST NOT be zero or exceed 64,
141+
// including in total.
142+
//
143+
// For each payload part, a part object is created. Original object's ID,
144+
// signature and header is written in `header.split.parent`,
145+
// `header.split.parent_signature` and `header.split.parent_header` fields
146+
// correspondingly. Part index is written in the `__NEOFS__EC_PART_IDX`
147+
// attribute as base-10 integer. Rule index in `PlacementPolicy.ec_rules`
148+
// list is written in the `__NEOFS__EC_RULE_IDX` attribute as base-10
149+
// integer.
150+
//
151+
// Each part object is stored in the container in one copy. Storage nodes are
152+
// selected from the network map similar to `PlacementPolicy.replicas` rules.
153+
// Optional `selector` acts the same way. The object for the `i`-th part is
154+
// placed in the `i`-th node. If it is unavailable, the backup nodes with
155+
// indexes `m * n + i` (`n = data_part_num + parity_part_num`,
156+
// `m = 1, ..., CBF-1`). If all nodes for the `i`-th part
157+
// are unavailable, nodes for the `i+1`-th (0 for the last) part are tried,
158+
// and so on.
159+
//
160+
// Once part objects are stored in the container, the original object remains
161+
// available if at least `data_part_num` of any part objects are available.
162+
// In other words, unavailability (including complete loss) of any of
163+
// `parity_part_num` part objects does not violate availability of the
164+
// original one.
165+
//
166+
// Objects of TOMBSTONE and LOCK types are not encoded and stored as they are
167+
// because they have no payload.
168+
message ECRule {
169+
// Number of data parts
170+
uint32 data_part_num = 1;
171+
172+
// Number of parity parts
173+
uint32 parity_part_num = 2;
174+
175+
// Name of the linked selector
176+
string selector = 3 [json_name = "selector"];
177+
}
178+
// Erasure coding rules. Limited to 4 items.
179+
repeated ECRule ec_rules = 6 [json_name = "ecRules"];
133180
}
134181

135182
// NeoFS node description

object/service.proto

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,17 @@ service ObjectService {
262262
rpc Replicate(ReplicateRequest) returns (ReplicateResponse);
263263
}
264264

265-
// GET object request
265+
// GET object request.
266+
//
267+
// The query for a parent object's EC part locally stored on the server is
268+
// specified as follows:
269+
// - `body.address` is an address of the parent;
270+
// - `meta_header.x_headers` includes `__NEOFS__EC_RULE_IDX` and
271+
// `__NEOFS__EC_PART_IDX` by object attribute format. Rule index MUST NOT
272+
// exceed container's `PlacementPolicy.ec_rules` list. Part index MUST NOT
273+
// exceed total part number in the indexed rule.
274+
// In this case, if `body.address` refers to TOMBSTONE or LOCK object (which
275+
// cannot have EC parts), the query applies to it.
266276
message GetRequest {
267277
// GET Object request body
268278
message Body {
@@ -653,7 +663,17 @@ message Range {
653663
uint64 length = 2;
654664
}
655665

656-
// Request part of object's payload
666+
// Request part of object's payload.
667+
//
668+
// The query for a parent object's EC part locally stored on the server is
669+
// specified as follows:
670+
// - `body.address` is an address of the parent;
671+
// - `meta_header.x_headers` includes `__NEOFS__EC_RULE_IDX` and
672+
// `__NEOFS__EC_PART_IDX` by object attribute format. Rule index MUST NOT
673+
// exceed container's `PlacementPolicy.ec_rules` list. Part index MUST NOT
674+
// exceed total part number in the indexed rule.
675+
// In this case, if `body.address` refers to TOMBSTONE or LOCK object (which
676+
// cannot have EC parts), the query applies to it.
657677
message GetRangeRequest {
658678
// Byte range of object's payload request body
659679
message Body {

object/types.proto

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,12 @@ message Header {
228228
// * __NEOFS__TICK_TOPIC \
229229
// UTF-8 string topic ID that is used for object notification.
230230
// DEPRECATED: attribute ignored by servers.
231+
// * __NEOFS__EC_RULE_IDX \
232+
// Index of EC rule in container's `PlacementPolicy.ec_rules` according to
233+
// which the part was created. Base-10 integer.
234+
// * __NEOFS__EC_PART_IDX \
235+
// Index in the EC parts into which the parent object is divided according
236+
// to `__NEOFS__EC_RULE_IDX` EC rule. Base-10 integer.
231237
//
232238
// And some well-known attributes used by applications only:
233239
//
@@ -264,7 +270,8 @@ message Header {
264270
// the original one is in the `Split` headers. Parent and children objects
265271
// must be within the same container.
266272
message Split {
267-
// Identifier of the origin object. Known only to the minor child.
273+
// Identifier of the origin object. If the origin object is split to comply
274+
// with the object size limit, `parent` is known only to the minor child.
268275
neo.fs.v2.refs.ObjectID parent = 1 [json_name = "parent"];
269276

270277
// Identifier of the left split neighbor

proto-docs/netmap.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
- [NodeInfo](#neo.fs.v2.netmap.NodeInfo)
3434
- [NodeInfo.Attribute](#neo.fs.v2.netmap.NodeInfo.Attribute)
3535
- [PlacementPolicy](#neo.fs.v2.netmap.PlacementPolicy)
36+
- [PlacementPolicy.ECRule](#neo.fs.v2.netmap.PlacementPolicy.ECRule)
3637
- [Replica](#neo.fs.v2.netmap.Replica)
3738
- [Selector](#neo.fs.v2.netmap.Selector)
3839

@@ -486,6 +487,53 @@ storage policy definition languages.
486487
| selectors | [Selector](#neo.fs.v2.netmap.Selector) | repeated | Set of Selectors to form the container's nodes subset |
487488
| filters | [Filter](#neo.fs.v2.netmap.Filter) | repeated | List of named filters to reference in selectors |
488489
| subnet_id | [neo.fs.v2.refs.SubnetID](#neo.fs.v2.refs.SubnetID) | | DEPRECATED. Was used for subnetwork ID to select nodes from, currently ignored. |
490+
| ec_rules | [PlacementPolicy.ECRule](#neo.fs.v2.netmap.PlacementPolicy.ECRule) | repeated | Erasure coding rules. Limited to 4 items. |
491+
492+
493+
<a name="neo.fs.v2.netmap.PlacementPolicy.ECRule"></a>
494+
495+
### Message PlacementPolicy.ECRule
496+
Erasure coding rule for container objects.
497+
498+
For each original object, the payload is split into `data_part_num` data
499+
and `parity_part_num` parity parts. Each part is the same size. Data parts
500+
contain the original payload. If its length is not divisible by
501+
`data_part_num`, the last part is aligned with zero bytes. Both
502+
`data_part_num` and `parity_part_num` MUST NOT be zero or exceed 64,
503+
including in total.
504+
505+
For each payload part, a part object is created. Original object's ID,
506+
signature and header is written in `header.split.parent`,
507+
`header.split.parent_signature` and `header.split.parent_header` fields
508+
correspondingly. Part index is written in the `__NEOFS__EC_PART_IDX`
509+
attribute as base-10 integer. Rule index in `PlacementPolicy.ec_rules`
510+
list is written in the `__NEOFS__EC_RULE_IDX` attribute as base-10
511+
integer.
512+
513+
Each part object is stored in the container in one copy. Storage nodes are
514+
selected from the network map similar to `PlacementPolicy.replicas` rules.
515+
Optional `selector` acts the same way. The object for the `i`-th part is
516+
placed in the `i`-th node. If it is unavailable, the backup nodes with
517+
indexes `m * n + i` (`n = data_part_num + parity_part_num`,
518+
`m = 1, ..., CBF-1`). If all nodes for the `i`-th part
519+
are unavailable, nodes for the `i+1`-th (0 for the last) part are tried,
520+
and so on.
521+
522+
Once part objects are stored in the container, the original object remains
523+
available if at least `data_part_num` of any part objects are available.
524+
In other words, unavailability (including complete loss) of any of
525+
`parity_part_num` part objects does not violate availability of the
526+
original one.
527+
528+
Objects of TOMBSTONE and LOCK types are not encoded and stored as they are
529+
because they have no payload.
530+
531+
532+
| Field | Type | Label | Description |
533+
| ----- | ---- | ----- | ----------- |
534+
| data_part_num | [uint32](#uint32) | | Number of data parts |
535+
| parity_part_num | [uint32](#uint32) | | Number of parity parts |
536+
| selector | [string](#string) | | Name of the linked selector |
489537

490538

491539
<a name="neo.fs.v2.netmap.Replica"></a>

proto-docs/object.md

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,9 @@ Please refer to detailed `XHeader` description.
149149
Statuses:
150150
- **OK** (0, SECTION_SUCCESS): \
151151
object has been successfully saved in the container;
152+
- **INCOMPLETE** (1, SECTION_SUCCESS): \
153+
object was put to some nodes, but the number of replicas is not sufficient
154+
to satisfy placement policy;
152155
- Common failures (SECTION_FAILURE_COMMON);
153156
- **ACCESS_DENIED** (2048, SECTION_OBJECT): \
154157
write access to the container is denied;
@@ -188,6 +191,9 @@ Please refer to detailed `XHeader` description.
188191
Statuses:
189192
- **OK** (0, SECTION_SUCCESS): \
190193
object has been successfully marked to be removed from the container;
194+
- **INCOMPLETE** (1, SECTION_SUCCESS): \
195+
some nodes have accepted the deletion mark, but some may still store
196+
the object;
191197
- Common failures (SECTION_FAILURE_COMMON);
192198
- **ACCESS_DENIED** (2048, SECTION_OBJECT): \
193199
delete access to the object is denied;
@@ -250,6 +256,9 @@ Please refer to detailed `XHeader` description.
250256
Statuses:
251257
- **OK** (0, SECTION_SUCCESS): \
252258
objects have been successfully selected;
259+
- **INCOMPLETE** (1, SECTION_SUCCESS): \
260+
some nodes were unable to process the request, so the result may
261+
not contain all data;
253262
- Common failures (SECTION_FAILURE_COMMON);
254263
- **ACCESS_DENIED** (2048, SECTION_OBJECT): \
255264
access to operation SEARCH of the object is denied;
@@ -474,7 +483,17 @@ Get hash of object's payload part response body.
474483
<a name="neo.fs.v2.object.GetRangeRequest"></a>
475484

476485
### Message GetRangeRequest
477-
Request part of object's payload
486+
Request part of object's payload.
487+
488+
The query for a parent object's EC part locally stored on the server is
489+
specified as follows:
490+
- `body.address` is an address of the parent;
491+
- `meta_header.x_headers` includes `__NEOFS__EC_RULE_IDX` and
492+
`__NEOFS__EC_PART_IDX` by object attribute format. Rule index MUST NOT
493+
exceed container's `PlacementPolicy.ec_rules` list. Part index MUST NOT
494+
exceed total part number in the indexed rule.
495+
In this case, if `body.address` refers to TOMBSTONE or LOCK object (which
496+
cannot have EC parts), the query applies to it.
478497

479498

480499
| Field | Type | Label | Description |
@@ -528,7 +547,17 @@ chunks.
528547
<a name="neo.fs.v2.object.GetRequest"></a>
529548

530549
### Message GetRequest
531-
GET object request
550+
GET object request.
551+
552+
The query for a parent object's EC part locally stored on the server is
553+
specified as follows:
554+
- `body.address` is an address of the parent;
555+
- `meta_header.x_headers` includes `__NEOFS__EC_RULE_IDX` and
556+
`__NEOFS__EC_PART_IDX` by object attribute format. Rule index MUST NOT
557+
exceed container's `PlacementPolicy.ec_rules` list. Part index MUST NOT
558+
exceed total part number in the indexed rule.
559+
In this case, if `body.address` refers to TOMBSTONE or LOCK object (which
560+
cannot have EC parts), the query applies to it.
532561

533562

534563
| Field | Type | Label | Description |
@@ -950,6 +979,12 @@ that affect system behaviour:
950979
* __NEOFS__TICK_TOPIC \
951980
UTF-8 string topic ID that is used for object notification.
952981
DEPRECATED: attribute ignored by servers.
982+
* __NEOFS__EC_RULE_IDX \
983+
Index of EC rule in container's `PlacementPolicy.ec_rules` according to
984+
which the part was created. Base-10 integer.
985+
* __NEOFS__EC_PART_IDX \
986+
Index in the EC parts into which the parent object is divided according
987+
to `__NEOFS__EC_RULE_IDX` EC rule. Base-10 integer.
953988

954989
And some well-known attributes used by applications only:
955990

@@ -990,7 +1025,7 @@ must be within the same container.
9901025

9911026
| Field | Type | Label | Description |
9921027
| ----- | ---- | ----- | ----------- |
993-
| parent | [neo.fs.v2.refs.ObjectID](#neo.fs.v2.refs.ObjectID) | | Identifier of the origin object. Known only to the minor child. |
1028+
| parent | [neo.fs.v2.refs.ObjectID](#neo.fs.v2.refs.ObjectID) | | Identifier of the origin object. If the origin object is split to comply with the object size limit, `parent` is known only to the minor child. |
9941029
| previous | [neo.fs.v2.refs.ObjectID](#neo.fs.v2.refs.ObjectID) | | Identifier of the left split neighbor |
9951030
| parent_signature | [neo.fs.v2.refs.Signature](#neo.fs.v2.refs.Signature) | | `signature` field of the parent object. Used to reconstruct parent. |
9961031
| parent_header | [Header](#neo.fs.v2.object.Header) | | `header` field of the parent object. Used to reconstruct parent. |

0 commit comments

Comments
 (0)