Skip to content
This repository was archived by the owner on Apr 22, 2020. It is now read-only.

Commit 203890f

Browse files
tomasonjojexp
authored andcommitted
Add LPA docs (#436)
1 parent 0f4b15b commit 203890f

File tree

5 files changed

+92
-17
lines changed

5 files changed

+92
-17
lines changed

doc/betweenness-centrality.adoc

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,9 +99,7 @@ YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMil
9999
| stats | boolean | true | yes | if stats about centrality should be returned
100100
| writeProperty | string | 'centrality' | yes | property name written back to
101101
| graph | string | 'heavy' | yes | use 'heavy' when describing the subset of the graph with label and relationship-type parameter, 'cypher' for describing the subset with cypher node-statement and relationship-statement
102-
| concurrency | int | 1 | yes | if concurrency is set and > 1 parallel BC is used. It spawns N(given by the concurrency param) concurrent threads for calculation where each one calculates the BC for one node at a time
103-
104-
102+
| concurrency | int | available CPUs | yes | number of concurrent threads
105103
|===
106104

107105
.Results
@@ -134,7 +132,7 @@ YIELD nodeId, centrality - yields centrality for each node
134132
| name | type | default | optional | description
135133
| label | string | null | yes | label to load from the graph, if null load all nodes
136134
| relationship | string | null | yes | relationship-type to load from the graph, if null load all relationships
137-
| concurrency | int | 1 | yes | if concurrency is set and > 1 parallel BC is used. It spawns N(given by the concurrency param) concurrent threads for calculation where each one calculates the BC for one node at a time
135+
| concurrency | int | available CPUs | yes | number of concurrent threads
138136
| direction | string | outgoing | yes | relationship direction to load from the graph, if 'both' treats the relationships as undirected
139137
|===
140138

doc/label-propagation.adoc

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ The intuition of label flooding is that a single label can quickly become domina
66
The labels are expected to be trapped inside a densely connected group of nodes.
77
Based on the idea of local trapping of labels, the author proposed a local community detection method where a node is initialized with a label and propagates step by step through the neighbours until the end of the community is reached.
88
The end of the community indicates the number of edges proceeding outward from the community and drops below a threshold value.
9-
LPA is a method which is effective, simple and near-linear time algorithm.
9+
LPA is a method which is effective, simple and near-linear time algorithm.[1]
1010
Community detection is an important methodology for understanding the organization of various real-world networks and has applications in problems as diverse as consensus formation in social communities or the identification of functional modules in biochemical networks.
1111

1212
== History, Explanation
1313

1414

1515
Many community detection algorithms have been proposed and utilized with varying degrees of effectiveness in the community detection research.
16-
_Label Propagation_ which was first proposed by Raghavan et al. (2007) uses unique identifiers of nodes as labels and propagates the labels based on an agreement with the majority of the neighbour nodes and each node selects a label from its neighbourhood to adopt it as its label.
16+
_Label Propagation_ which was first proposed by Raghavan et al. (2007)https://arxiv.org/pdf/0709.2938.pdf[[2\]] uses unique identifiers of nodes as labels and propagates the labels based on an agreement with the majority of the neighbour nodes and each node selects a label from its neighbourhood to adopt it as its label.
1717
LPA works as follows: Node x has neighbours and each neighbour carries a label denoting the community to which they belong to.
1818
Each node in the network chooses to join the community to which the maximum number of its neighbours belongs to, with ties broken uniformly and randomly.
1919
At the beginning, every node is initialized with unique label (called as identifier) and the labels propagate through the network.
@@ -23,7 +23,7 @@ At end of the propagation, most labels disappear and only few labels exist in th
2323
Label propagation algorithm reaches convergence when each node has the majority label of its neighbours.
2424
At the end of the convergence, nodes connected with the same label form a community.
2525
According to LPA, communities are defined as nodes having identical labels at convergence.
26-
The agreement between the nodes to spread the labels forms the basis for _Label Propagation_ algorithms.
26+
The agreement between the nodes to spread the labels forms the basis for _Label Propagation_ algorithms.[1]
2727

2828
== When to use it / use-cases
2929

@@ -40,6 +40,11 @@ _Label Propagation_ algorithms (LPA) are such type of algorithms that detects co
4040

4141
== Constraints / when not to use it
4242

43+
In contrast with other algorithms label propagation can result in various community structures from the same initial condition.
44+
The range of solutions can be narrowed if some nodes are given preliminary labels while others are held unlabelled.
45+
Consequently, unlabelled nodes will be more likely to adapt to the labelled ones.
46+
For a more accurate finding of communities, Jaccard’s index is used to aggregate multiple community structures, containing all important information.[2]
47+
4348
== Algorithm explanation on simple sample graph
4449

4550
image::{img}/label_propagation.png[]
@@ -56,6 +61,39 @@ include::scripts/label-propagation.cypher[tag=create-sample-graph]
5661
include::scripts/label-propagation.cypher[tag=write-sample-graph]
5762
----
5863

64+
.Results
65+
[opts="header",cols="1,1"]
66+
|===
67+
| name | partition
68+
| Alice | 5
69+
| Charles | 4
70+
| Bridget | 5
71+
| Michael | 5
72+
| Doug | 4
73+
| Mark | 4
74+
|===
75+
76+
Our algorithm found two communities with 3 members each.
77+
78+
=== Using pre-defined labels
79+
80+
_At the beginning of the algorithm, every node is initialized with unique label (called as identifier) and the labels propagate through the network._
81+
82+
It is possible to define preliminary labels (identifiers) of nodes using `partition` parameter.
83+
We need to save preliminary set of labels we would like to run the LPA algorithm with as a property of nodes (must be a number).
84+
In our example graph we saved them as property _predefined_label_.
85+
Algorithm first checks if there is a _predefined label_ assigned to the node and loads it, if there is one.
86+
If there isn't one it assigns the node new unique label(node id are used).
87+
Using this preliminary set of labels (identifiers) it then sequentially updates each node's label to a new one, which is the most frequent label among its neighbors at every label propagation step(iteration).
88+
89+
Can be used as semi-supervised way of finding communities, where we hand-pick some initial communities.
90+
91+
.Running algorithm with pre-defined labels
92+
[source,cypher]
93+
----
94+
include::scripts/label-propagation.cypher[tag=write-existing-label-sample-graph]
95+
----
96+
5997
== Example Usage
6098

6199
== Syntax
@@ -79,7 +117,7 @@ partitionProperty - simple label propagation kernel
79117
| concurrency | int | available CPUs | yes | number of concurrent threads
80118
| iterations | int | 1 | yes | the maximum number of iterations to run
81119
| weightProperty | string | 'weight' | yes | property name that contains weight. Must be numeric.
82-
| partitionProperty | string | 'partition' | yes | property name written back the partition of the graph in which the node reside
120+
| partitionProperty | string | 'partition' | yes | property name written back the partition of the graph in which the node reside, can be used to define initial set of labels (must be a number)
83121
| write | boolean | true | yes | if result should be written back as node property
84122

85123
|===
@@ -116,9 +154,11 @@ We support the following versions of the label propagation algorithms:
116154

117155
== References
118156

119-
* http://cpb.iphy.ac.cn/fileup/PDF/2014-9-098902.pdf
157+
* [1] http://shodhganga.inflibnet.ac.in/bitstream/10603/36003/4/chapter3.pdf
120158

121-
* http://shodhganga.inflibnet.ac.in/bitstream/10603/36003/4/chapter3.pdf
159+
* [2] https://arxiv.org/pdf/0709.2938.pdf
160+
161+
* http://cpb.iphy.ac.cn/fileup/PDF/2014-9-098902.pdf
122162

123163
ifdef::implementation[]
124164
// tag::implementation[]

doc/scripts/label-propagation.cypher

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
// tag::create-sample-graph[]
22

3-
CREATE (nAlice:User {id:'Alice'})
4-
,(nBridget:User {id:'Bridget'})
5-
,(nCharles:User {id:'Charles'})
6-
,(nDoug:User {id:'Doug'})
7-
,(nMark:User {id:'Mark'})
8-
,(nMichael:User {id:'Michael'})
3+
CREATE (nAlice:User {id:'Alice',predefined_label:52})
4+
,(nBridget:User {id:'Bridget',predefined_label:21})
5+
,(nCharles:User {id:'Charles',predefined_label:43})
6+
,(nDoug:User {id:'Doug',predefined_label:21})
7+
,(nMark:User {id:'Mark',predefined_label: 19})
8+
,(nMichael:User {id:'Michael', predefined_label: 52})
99
CREATE (nAlice)-[:FOLLOW]->(nBridget)
1010
,(nAlice)-[:FOLLOW]->(nCharles)
1111
,(nMark)-[:FOLLOW]->(nDoug)
@@ -24,4 +24,17 @@ CREATE (nAlice)-[:FOLLOW]->(nBridget)
2424
CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING', {iterations:10,partitionProperty:'partition', write:true})
2525
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;
2626

27-
// tag::write-sample-graph[]
27+
// end::write-sample-graph[]
28+
29+
30+
// tag::write-existing-label-sample-graph[]
31+
32+
CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING', {iterations:10,partitionProperty:'predefined_label', write:true})
33+
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty;
34+
35+
// end::write-existing-label-sample-graph[]
36+
37+
38+
39+
40+

doc/scripts/single-shortest-path.cypher

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -64,3 +64,14 @@ YIELD sourceNodeId, targetNodeId, distance
6464
RETURN sourceNodeId, targetNodeId, distance LIMIT 20
6565

6666
// end::all-pairs-sample-graph[]
67+
68+
69+
// tag::all-pairs-bidirected-graph[]
70+
71+
CALL algo.allShortestPaths.stream('cost', {
72+
nodeQuery:'MATCH (n:Loc) RETURN id(n) as id',
73+
relationshipQuery:'MATCH (n:Loc)-[r]-(p:Loc) RETURN id(n) as source, id(p) as target, r.cost as weight',
74+
graph:'cypher', defaultValue:1.0})
75+
76+
77+
// end::all-pairs-bidirected-graph[]

doc/single-shortest-path.adoc

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,19 @@ include::scripts/single-shortest-path.cypher[tag=delta-write-sample-graph]
8484
include::scripts/single-shortest-path.cypher[tag=all-pairs-sample-graph]
8585
----
8686

87+
For now only single-source shortest path support loading the relationship as undirected, but we can use cypher loading to help us solve this.
88+
Undirected graph can be represented as https://en.wikipedia.org/wiki/Bidirected_graph[Bidirected graph], that is a directed graph in which the reverse of every relationship is also a relationship.
89+
90+
We do not have to save this reversed relationship, we can project it using *cypher loading*.
91+
Note that relationship query does not specify direction of the relationship.
92+
This is applicable to all other algorithms, that use *cypher loading*.
93+
94+
.Running all pairs shortest path treating the graph as undirected
95+
[source,cypher]
96+
----
97+
include::scripts/single-shortest-path.cypher[tag=all-pairs-bidirected-graph]
98+
----
99+
87100
== Example Usage
88101

89102
== Syntax

0 commit comments

Comments
 (0)