Skip to content

Joining a node after committing a plan - transfers freeze & cluster state is stuck #996

@martinsumner

Description

@martinsumner

To replicate:

  • Start six nodes
  • Join three nodes to the first, but not nodes 5 and 6
  • Plan/Commit cluster chnages
  • attempt to join nodes 5 and 6 (i.e. without planning - just riak admin cluster join )

The transfers stop at the point the additional nodes join - and the cluster ends up stuck in that state:

dev/dev4/riak/bin/riak admin cluster plan
=============================== Staged Changes ================================
Action         Details(s)
-------------------------------------------------------------------------------
join           'dev2@127.0.0.1'
join           'dev3@127.0.0.1'
join           'dev4@127.0.0.1'
-------------------------------------------------------------------------------


NOTE: Applying these changes will result in 1 cluster transition

###############################################################################
                         After cluster transition 1/1
###############################################################################

================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
valid     100.0%     25.0%    dev1@127.0.0.1
valid       0.0%     25.0%    dev2@127.0.0.1
valid       0.0%     25.0%    dev3@127.0.0.1
valid       0.0%     25.0%    dev4@127.0.0.1
-------------------------------------------------------------------------------
Valid:4 / Leaving:0 / Exiting:0 / Joining:0 / Down:0

Transfers resulting from cluster changes: 48
  16 transfers from 'dev1@127.0.0.1' to 'dev4@127.0.0.1'
  16 transfers from 'dev1@127.0.0.1' to 'dev3@127.0.0.1'
  16 transfers from 'dev1@127.0.0.1' to 'dev2@127.0.0.1'

$ dev/dev4/riak/bin/riak admin cluster commit
Cluster changes committed
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+------+-------+-----+-------+
|        node        |status| avail |ring |pending|
+--------------------+------+-------+-----+-------+
| (C) dev1@127.0.0.1 |valid |  up   |100.0|  25.0 |
|     dev2@127.0.0.1 |valid |  up   |  0.0|  25.0 |
|     dev3@127.0.0.1 |valid |  up   |  0.0|  25.0 |
|     dev4@127.0.0.1 |valid |  up   |  0.0|  25.0 |
+--------------------+------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev5/riak/bin/riak admin cluster join dev1@127.0.0.1
Success: staged join request for 'dev5@127.0.0.1' to 'dev1@127.0.0.1'
$ dev/dev6/riak/bin/riak admin cluster join dev1@127.0.0.1
Success: staged join request for 'dev6@127.0.0.1' to 'dev1@127.0.0.1'
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: false

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected


$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev4/riak/bin/riak admin cluster status
---- Cluster Status ----
Ring ready: true

+--------------------+-------+-------+-----+-------+
|        node        |status | avail |ring |pending|
+--------------------+-------+-------+-----+-------+
|     dev5@127.0.0.1 |joining|  up   |  0.0|   0.0 |
|     dev6@127.0.0.1 |joining|  up   |  0.0|   0.0 |
| (C) dev1@127.0.0.1 | valid |  up   | 71.9|  25.0 |
|     dev2@127.0.0.1 | valid |  up   |  9.4|  25.0 |
|     dev3@127.0.0.1 | valid |  up   | 10.9|  25.0 |
|     dev4@127.0.0.1 | valid |  up   |  7.8|  25.0 |
+--------------------+-------+-------+-----+-------+

Key: (C) = Claimant; availability marked with '!' is unexpected
$ dev/dev4/riak/bin/riak admin transfers
'dev6@127.0.0.1' waiting to handoff 30 partitions
'dev5@127.0.0.1' waiting to handoff 30 partitions
'dev4@127.0.0.1' waiting to handoff 27 partitions
'dev3@127.0.0.1' waiting to handoff 30 partitions
'dev2@127.0.0.1' waiting to handoff 11 partitions

Active Transfers:


$ dev/dev4/riak/bin/riak admin transfers
'dev6@127.0.0.1' waiting to handoff 30 partitions
'dev5@127.0.0.1' waiting to handoff 30 partitions
'dev4@127.0.0.1' waiting to handoff 27 partitions
'dev3@127.0.0.1' waiting to handoff 30 partitions
'dev2@127.0.0.1' waiting to handoff 11 partitions

Active Transfers:


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions