Skip to content
This repository was archived by the owner on Apr 22, 2020. It is now read-only.

Commit 74d499a

Browse files
tomasonjojexp
authored andcommitted
Update docs (#422)
1 parent 1f10609 commit 74d499a

22 files changed

+597
-257
lines changed

doc/betweenness-centrality.adoc

Lines changed: 14 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ betweenness centrality as an analysis measure, it indicates a potential gate kee
3131

3232
It differs from the other centrality measures. A node can have quite low degree, be connected to others that have low degree, even be a long way from others on average, and still have high betweenness. Consider a node A that lies on a bridge between two groups of vertices within a network. Since any path between nodes in different groups must go through this bridge, node A acquires high betweenness even though it is not well connected (it lies at the periphery of both groups).
3333

34+
Node betweenness centrality is relevant to problems such as identifying important nodes that control flows of information between separate parts of the network and identifying causal nodes to influence other entities behavior, such as genes in genomics or customers in marketing studies.
35+
Betweenness centrality has been used to analyze social networks and protein networks, to identify significant nodes in wireless ad hoc networks, to study the importance and activity of nodes in mobile phone call networks and interaction patterns of players on massively multiplayer online games, to study online expertise sharing communities such as physicians, to identify and analyze linking behavior of key bloggers in dynamic networks of blog posts and to measure network traffic in communication networks.[1]
36+
37+
3438
== Constraints / when not to use it
3539

3640

@@ -42,33 +46,19 @@ People with high betweenness tend to be the innovators and brokers in any networ
4246
.Create sample graph
4347
[source,cypher]
4448
----
45-
CREATE (nAlice:User {id:'Alice'})
46-
,(nBridget:User {id:'Bridget'})
47-
,(nCharles:User {id:'Charles'})
48-
,(nDoug:User {id:'Doug'})
49-
,(nMark:User {id:'Mark'})
50-
,(nMichael:User {id:'Michael'})
51-
CREATE (nAlice)-[:MANAGE]->(nBridget)
52-
,(nAlice)-[:MANAGE]->(nCharles)
53-
,(nAlice)-[:MANAGE]->(nDoug)
54-
,(nMark)-[:MANAGE]->(nAlice)
55-
,(nCharles)-[:MANAGE]->(nMichael)
56-
49+
include::scripts/betweenness-centrality.cypher[tag=create-sample-graph]
5750
----
5851

5952
.Running algorithm and streaming results
6053
[source,cypher]
6154
----
62-
CALL algo.betweenness.stream('User','MANAGE',{direction:'out'})
63-
YIELD nodeId, centrality
64-
RETURN nodeId,centrality order by centrality desc limit 20
55+
include::scripts/betweenness-centrality.cypher[tag=stream-sample-graph]
6556
----
6657

6758
.Running algorithm and writing back results
6859
[source,cypher]
6960
----
70-
CALL algo.betweenness('User','MANAGE', {direction:'out',write:true, writeProperty:'centrality'})
71-
YIELD nodes, minCentrality, maxCentrality, sumCentrality, loadMillis, computeMillis, writeMillis
61+
include::scripts/betweenness-centrality.cypher[tag=write-sample-graph]
7262
----
7363

7464
.Results
@@ -164,10 +154,7 @@ Set `graph:'cypher'` in the config.
164154

165155
[source,cypher]
166156
----
167-
CALL algo.betweenness(
168-
'MATCH (p:User) RETURN id(p) as id',
169-
'MATCH (p1:User)-[:MANAGE]->(p2:User) RETURN id(p1) as source, id(p2) as target',
170-
{graph:'cypher', write: true});
157+
include::scripts/betweenness-centrality.cypher[tag=cypher-loading]
171158
----
172159

173160
== Versions
@@ -176,14 +163,14 @@ We support the following versions of the betweenness centrality algorithm:
176163

177164
* [x] directed, unweighted
178165

179-
- loading incoming relationships: 'INCOMING','IN','I' or '<'
180-
- loading outgoing relationships: 'OUTGOING','OUT','O' or '>'
166+
** loading incoming relationships: 'INCOMING','IN','I' or '<'
167+
** loading outgoing relationships: 'OUTGOING','OUT','O' or '>'
181168

182169
* [ ] directed, weighted
183170

184171
* [x] undirected, unweighted
185172

186-
- direction:'both' or '<>'
173+
** direction:'both' or '<>'
187174

188175
* [ ] undirected, weighted
189176

@@ -221,6 +208,9 @@ We support the following versions of the betweenness centrality algorithm:
221208

222209
* http://iima.org/wp/wp-content/uploads/2017/04/Curriculum-Structure-and-Assessment-Placement_Lightfoot.pdf
223210

211+
* [1] https://arxiv.org/pdf/1702.06087.pdf
212+
213+
224214
ifdef::implementation[]
225215
// tag::implementation[]
226216

doc/closeness-centrality.adoc

Lines changed: 9 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,10 @@ Our algorithm returns *normalized closeness centrality* score.
2424

2525
== When to use it / use-cases
2626

27-
The most central nodes according to closeness centrality can quickly interact to all others because they are close to all others.
27+
As an example, one can consider identifying the location within a city where to place a new public service, so that it is easily accessible for everyone.
28+
Similarly, identifying central people that have ideal social network location for the purpose of information dissemination or network influence.
29+
In such kind of applications, the nodes who can access the entire network faster need to be selected.
30+
2831
This measure is preferable to degree centrality, because it does not take into account only direct connections among nodes, but also indirect connections.
2932
For closeness centrality in directed networks, we split the concept into in-closeness
3033
and out-closeness.
@@ -45,33 +48,20 @@ image::{img}/closeness_centrality.png[]
4548
.Create sample graph
4649
[source,cypher]
4750
----
48-
CREATE (a:Node{id:"A"}),
49-
(b:Node{id:"B"}),
50-
(c:Node{id:"C"}),
51-
(d:Node{id:"D"}),
52-
(e:Node{id:"E"})
53-
CREATE (a)-[:LINK]->(b),
54-
(b)-[:LINK]->(a),
55-
(b)-[:LINK]->(c),
56-
(c)-[:LINK]->(b),
57-
(c)-[:LINK]->(d),
58-
(d)-[:LINK]->(c),
59-
(d)-[:LINK]->(e),
60-
(e)-[:LINK]->(d)
51+
include::scripts/closeness-centrality.cypher[tag=create-sample-graph]
6152
----
6253

6354
.Running algorithm and streaming results
6455
[source,cypher]
6556
----
66-
CALL algo.closeness.stream('Node', 'LINKS') YIELD nodeId, centrality
67-
RETURN nodeId,centrality order by centrality desc limit 20 - yields centrality for each node
57+
include::scripts/closeness-centrality.cypher[tag=stream-sample-graph]
58+
- yields centrality for each node
6859
----
6960

7061
.Running algorithm and writing back results
7162
[source,cypher]
7263
----
73-
CALL algo.closeness('Node', 'LINK', {write:true, writeProperty:'centrality'})
74-
YIELD nodes,loadMillis, computeMillis, writeMillis
64+
include::scripts/closeness-centrality.cypher[tag=write-sample-graph]
7565
- calculates closeness centrality and potentially writes back
7666
----
7767

@@ -170,10 +160,7 @@ Set `graph:'cypher'` in the config.
170160

171161
[source,cypher]
172162
----
173-
CALL algo.closeness(
174-
'MATCH (p:Node) RETURN id(p) as id',
175-
'MATCH (p1:Node)-[:LINK]->(p2:Node) RETURN id(p1) as source, id(p2) as target',
176-
{graph:'cypher', write: true});
163+
include::scripts/closeness-centrality.cypher[tag=cypher-loading]
177164
----
178165

179166
== Versions

doc/connected-components.adoc

Lines changed: 34 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= Connected Components
1+
= Community detection: Connected Components
22

33
_Connected Components_ or _UnionFind_ basically finds sets of connected nodes where each node is reachable from any other node in the same set.
44
In graph theory, a connected component of an undirected graph is a subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the graph.
@@ -37,32 +37,20 @@ image::{img}/connected_components.png[]
3737
.Create sample graph
3838
[source,cypher]
3939
----
40-
CREATE (nAlice:User {id:'Alice'})
41-
,(nBridget:User {id:'Bridget'})
42-
,(nCharles:User {id:'Charles'})
43-
,(nDoug:User {id:'Doug'})
44-
,(nMark:User {id:'Mark'})
45-
,(nMichael:User {id:'Michael'})
46-
CREATE (nAlice)-[:FRIEND{weight:0.5}]->(nBridget)
47-
,(nAlice)-[:FRIEND{weight:4}]->(nCharles)
48-
,(nMark)-[:FRIEND{weight:1}]->(nDoug)
49-
,(nMark)-[:FRIEND{weight:2}]->(nMichael)
50-
40+
include::scripts/connected-components.cypher[tag=create-sample-graph]
5141
----
5242

5343
=== Unweighted version:
5444

5545
.Running algorithm and streaming results
5646
[source,cypher]
5747
----
58-
CALL algo.unionFind.stream('User', 'FRIEND', {})
59-
YIELD nodeId,setId
48+
include::scripts/connected-components.cypher[tag=unweighted-stream-sample-graph]
6049
----
6150
.Running algorithm and writing back results
6251
[source,cypher]
6352
----
64-
CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:"partition"})
65-
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis
53+
include::scripts/connected-components.cypher[tag=unweighted-write-sample-graph]
6654
----
6755
.Results
6856
[opts="header",cols="1,1"]
@@ -83,9 +71,7 @@ Results show us, that we have two distinct group of users that have no link betw
8371
.We can also easily check the number and size of partitions using cypher.
8472
[source,cypher]
8573
----
86-
MATCH (u:User)
87-
RETURN u.partition as partition,count(*) as size_of_partition
88-
ORDER by size_of_partition DESC LIMIT 20
74+
include::scripts/connected-components.cypher[tag=check-results-sample-graph]
8975
----
9076
=== Weighted version:
9177

@@ -94,14 +80,12 @@ If you define the property that holds the weight(weightProperty) and the thresho
9480
.Running algorithm and streaming results
9581
[source,cypher]
9682
----
97-
CALL algo.unionFind.stream('User', 'FRIEND', {weightProperty:'weight', defaultValue:0.0, threshold:1.0})
98-
YIELD nodeId,setId
83+
include::scripts/connected-components.cypher[tag=weighted-stream-sample-graph]
9984
----
10085
.Running algorithm and writing back results
10186
[source,cypher]
10287
----
103-
CALL algo.unionFind('User', 'FRIEND', {write:true, partitionProperty:"partition",weightProperty:'weight', defaultValue:0.0, threshold:1.0})
104-
YIELD nodes, setCount, loadMillis, computeMillis, writeMillis
88+
include::scripts/connected-components.cypher[tag=weighted-write-sample-graph]
10589
----
10690

10791
.Results
@@ -120,6 +104,32 @@ We can observe, that because the weight of the relationship betwen Bridget and A
120104

121105
== Example Usage
122106

107+
As said Connected Components are an essential step in preprocessing your data.
108+
One reason is that most centralities suffer from disconnected components or you just want to find disconnected groups of nodes.
109+
Yelp's social network will be used to demonstrate how to proceed when dealing with real world data.
110+
A typical social network consist of one big component and a number of small disconnected components.
111+
112+
.Get the count of connected components
113+
[source,cypher]
114+
----
115+
include::scripts/connected-components.cypher[tag=count-component-yelp]
116+
----
117+
118+
We get back count of disconnected components being 18512 if we do not count users without friends.
119+
Let's now check the size of top 20 components to get a better picture
120+
121+
.Get the size of top 20 components
122+
[source,cypher]
123+
----
124+
include::scripts/connected-components.cypher[tag=top-20-component-yelp]
125+
----
126+
127+
The biggest component has 8938630 out of total 8981389 (99,5%).
128+
It is quite high, but not shocking as we have a friendship social network, where we can expect small world effect and 6 degree of separation rule, where you can get to any person in a social network, just depends how long is the path.
129+
130+
We can now move on to next step of analysis and run centralities only on the biggest components, so that our results will be more accurate.
131+
We just simply write back the results to the node, and use centralities with cypher loading or set a new label for the biggest component.
132+
123133
== Syntax
124134

125135
.Running algorithm and writing back results
@@ -193,9 +203,7 @@ Set `graph:'cypher'` in the config.
193203

194204
[source,cypher]
195205
----
196-
CALL algo.unionFind('MATCH (p:User) RETURN id(p) as id',
197-
'MATCH (p1:User)-[:FRIEND]->(p2:User) RETURN id(p1) as source, id(p2) as target',
198-
{graph:'cypher',write:true});
206+
include::scripts/connected-components.cypher[tag=cypher-loading]
199207
----
200208
== Implementations
201209

doc/label-propagation.adoc

Lines changed: 4 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ Moreover, it is computationally expensive to determine the community size.
3838
Hence, algorithms that are able to detect communities from complex networks without having any knowledge about the number of communities or the size of communities are essential.
3939
_Label Propagation_ algorithms (LPA) are such type of algorithms that detects community structure that uses network structure alone as its guide and requires neither a pre-defined objective function nor prior information about the communities.
4040

41-
4241
== Constraints / when not to use it
4342

4443
== Algorithm explanation on simple sample graph
@@ -48,29 +47,13 @@ image::{img}/label_propagation.png[]
4847
.Create sample graph
4948
[source,cypher]
5049
----
51-
CREATE (nAlice:User {id:'Alice'})
52-
,(nBridget:User {id:'Bridget'})
53-
,(nCharles:User {id:'Charles'})
54-
,(nDoug:User {id:'Doug'})
55-
,(nMark:User {id:'Mark'})
56-
,(nMichael:User {id:'Michael'})
57-
CREATE (nAlice)-[:FOLLOW]->(nBridget)
58-
,(nAlice)-[:FOLLOW]->(nCharles)
59-
,(nMark)-[:FOLLOW]->(nDoug)
60-
,(nBridget)-[:FOLLOW]->(nMichael)
61-
,(nDoug)-[:FOLLOW]->(nMark)
62-
,(nMichael)-[:FOLLOW]->(nAlice)
63-
,(nAlice)-[:FOLLOW]->(nMichael)
64-
,(nBridget)-[:FOLLOW]->(nAlice)
65-
,(nMichael)-[:FOLLOW]->(nBridget)
66-
,(nCharles)-[:FOLLOW]->(nDoug)
50+
include::scripts/label-propagation.cypher[tag=create-sample-graph]
6751
----
6852

6953
.Running algorithm and writing back results
7054
[source,cypher]
7155
----
72-
CALL algo.labelPropagation('User', 'FOLLOW','OUTGOING', {iterations:10,partitionProperty:'partition', write:true})
73-
YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, write, partitionProperty
56+
include::scripts/label-propagation.cypher[tag=write-sample-graph]
7457
----
7558

7659
== Example Usage
@@ -120,11 +103,11 @@ We support the following versions of the label propagation algorithms:
120103

121104
* [x] directed, unweighted:
122105

123-
- weightProperty: null
106+
** weightProperty: null
124107

125108
* [x] directed, weighted
126109

127-
- weightProperty : 'weight'
110+
** weightProperty : 'weight'
128111

129112
* [ ] undirected, unweighted
130113

doc/louvain.adoc

Lines changed: 8 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -88,36 +88,19 @@ image::{img}/louvain.png[]
8888
.Create sample graph
8989
[source,cypher]
9090
----
91-
CREATE (nAlice:User {id:'Alice'})
92-
,(nBridget:User {id:'Bridget'})
93-
,(nCharles:User {id:'Charles'})
94-
,(nDoug:User {id:'Doug'})
95-
,(nMark:User {id:'Mark'})
96-
,(nMichael:User {id:'Michael'})
97-
CREATE (nAlice)-[:FRIEND]->(nBridget)
98-
,(nAlice)-[:FRIEND]->(nCharles)
99-
,(nMark)-[:FRIEND]->(nDoug)
100-
,(nBridget)-[:FRIEND]->(nMichael)
101-
,(nCharles)-[:FRIEND]->(nMark)
102-
,(nAlice)-[:FRIEND]->(nMichael)
103-
,(nCharles)-[:FRIEND]->(nDoug)
91+
include::scripts/louvain.cypher[tag=create-sample-graph]
10492
----
10593

10694
.Running algorithm and streaming results
10795
[source,cypher]
10896
----
109-
CALL algo.clustering.louvain.stream('User', 'FRIEND',
110-
{})
111-
YIELD nodeId, setId
112-
RETURN nodeId, setId LIMIT 20
97+
include::scripts/louvain.cypher[tag=stream-sample-graph]
11398
----
11499

115100
.Running algorithm and writing back results
116101
[source,cypher]
117102
----
118-
CALL algo.clustering.louvain('User', 'FRIEND',
119-
{write:true, writeProperty:'community'})
120-
YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis
103+
include::scripts/louvain.cypher[tag=write-sample-graph]
121104
----
122105

123106
.Results
@@ -134,6 +117,9 @@ YIELD nodes, communityCount, iterations, loadMillis, computeMillis, writeMillis
134117

135118
== Example Usage
136119

120+
121+
122+
137123
== Syntax
138124

139125
.Running algorithm and writing back results
@@ -201,11 +187,11 @@ YIELD nodeId, setId - yields a setId to each node id
201187

202188
* [x] undirected, unweighted
203189

204-
- weightProperty: null
190+
** weightProperty: null
205191

206192
* [x] undirected, weighted
207193

208-
- weightProperty : 'weight'
194+
** weightProperty : 'weight'
209195

210196
== References
211197

0 commit comments

Comments
 (0)