You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`samples_split_option`: Used to set the strategy of samples data split. You can either instantiate a Splitter before passing it to Scenario, as in the below example, or you can pass it by its string identifier. In the latter case, the default parameters for the Splitter selected will be used.
241
+
-`samples_split_option`: Used to set the strategy of samples data split. You can either instantiate a `Splitter` before passing it to `Scenario`, or you can pass it by its string identifier. In the latter case, the default parameters for the `Splitter` selected will be used.
242
242
How the original dataset data samples are split among partners:
243
-
-`RandomSplitter`: the dataset is shuffled and partners receive data samples selected randomly
244
-
String identifier: `'random'`
245
-
-`StratifiedSplitter`: the dataset is stratified per class and each partner receives certain classes only (note: depending on the `amounts_per_partner` specified, there might be small overlaps of classes)
246
-
String identifier: `'stratified'``[[nb of clusters (int), 'shared' or 'specific']]`
247
-
-`'AdvancedSplitter'`: in certain cases it might be interesting to split the dataset among partners in a more elaborate way. For that we consider the data samples from the initial dataset as split in clusters per data labels. The advanced split is configured by indicating, for each partner in sequence, the following 2 elements: `[[nb of clusters (int), 'shared' or 'specific']]`. Practically, you can either instantiate your `AdvancedSplitter` object, and pass this list `[[nb of clusters (int), 'shared' or 'specific']]` to the keyword argument `description`, or use the string identifier and pass the list `[[nb of clusters (int), 'shared' or 'specific']]` to the scenario via the keyword argument `samples_split_configuration`.
248
-
String identifier:`'advanced'`.
249
-
Configuration:
243
+
244
+
-`RandomSplitter`: the dataset is shuffled and partners receive data samples selected randomly. String identifier: `'random'`
245
+
246
+
-`StratifiedSplitter`: the dataset is stratified per class and each partner receives certain classes only (note: depending on the `amounts_per_partner` specified, there might be some overlap of classes). String identifier: `'stratified'`
247
+
248
+
-`AdvancedSplitter`: in certain cases it might be interesting to split the dataset among partners in a more elaborate way. For that we consider the data samples from the initial dataset as split in clusters per data labels. The advanced split is configured by indicating, for each partner in sequence, the following 2 elements: `[[nb of clusters (int), 'shared' or 'specific']]`. Practically, you can either instantiate your `AdvancedSplitter` object, and pass this list `[[nb of clusters (int), 'shared' or 'specific']]` to the keyword argument `description`, or use the string identifier and pass the list `[[nb of clusters (int), 'shared' or 'specific']]` to the `Scenario` via the keyword argument `samples_split_configuration`. String identifier:`'advanced'`. Configuration:
250
249
-`nb of clusters (int)`: the given partner will receive data samples from that many different clusters (clusters of data samples per labels/classes)
251
250
-`'shared'` or `'specific'`:
252
-
-`'shared'`: all partners with option `'shared'` receive data samples picked
251
+
-`'shared'`: all partners with option `'shared'` receive data samples picked
253
252
from clusters they all share data samples from
254
-
-`'specific'`: each partner with option `'specific'` receives data samples picked
255
-
from cluster(s) it is the only one to receive from
-`FlexibleSplitter`: in other cases one might want to specify in detail the split among partners (partner per partner and class per class). For that the `FlexibleSplitter` can be used. It is configured by indicating, for each partner in sequence, a list of the percentage of samples for each class: `[[% for class 1, ..., % for class n]]`. As above, it can be instantiated separately and then passed to the `Scenario` instance. Or the string identifier `'flexible'` can be used for the parameter `samples_split_option`, coupled with the split configuration passed to the keyword argument `samples_split_configuration`. String identified: `'flexible'`.
258
+
Example: `samples_split_option='flexible', samples_split_configuration=[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.5, 0.5], [0.5, 0.5, 0.5, 1.0, 1.0, 1.0, 0.5, 0.5, 0.5, 0.0]]` (this corresponds to 50% of the last 3 classes for partner 1, and 50% or 100% of each of the first 9 classes for partner 2).
259
+
Note: in the list of % for each class, one shouldn't interpret the order of its inputs as any human-readable order of the samples (e.g. alphabetical, numerical...). The implementation uses the order in which the samples appear in the dataset. As such, note that one can artificially enforce a certain order if desired, by sorting the dataset beforehand.
258
260
259
261

0 commit comments