Skip to content
This repository was archived by the owner on Apr 22, 2020. It is now read-only.

Commit 023a400

Browse files
authored
Similarity Function Examples (#804)
* explaining how to use pearson function * update result * Add cosine function example * typo * jaccard examples * update jaccard examples * euclidean + better explanatory text on the others * address david's feedback
1 parent 278455a commit 023a400

File tree

8 files changed

+230
-0
lines changed

8 files changed

+230
-0
lines changed

doc/asciidoc/scripts/similarity-cosine.cypher

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,23 @@ MERGE (karin)-[:LIKES {score: 10}]->(portuguese)
4343

4444
// end::create-sample-graph[]
4545

46+
// tag::function-cypher[]
47+
MATCH (p1:Person {name: 'Michael'})-[likes1:LIKES]->(cuisine)
48+
MATCH (p2:Person {name: "Arya"})-[likes2:LIKES]->(cuisine)
49+
RETURN p1.name AS from,
50+
p2.name AS to,
51+
algo.similarity.cosine(collect(likes1.score), collect(likes2.score)) AS similarity
52+
// end::function-cypher[]
53+
54+
// tag::function-cypher-all[]
55+
MATCH (p1:Person {name: 'Michael'})-[likes1:LIKES]->(cuisine)
56+
MATCH (p2:Person)-[likes2:LIKES]->(cuisine) WHERE p2 <> p1
57+
RETURN p1.name AS from,
58+
p2.name AS to,
59+
algo.similarity.cosine(collect(likes1.score), collect(likes2.score)) AS similarity
60+
ORDER BY similarity DESC
61+
// end::function-cypher-all[]
62+
4663
// tag::stream[]
4764
MATCH (p:Person), (c:Cuisine)
4865
OPTIONAL MATCH (p)-[likes:LIKES]->(c)

doc/asciidoc/scripts/similarity-euclidean.cypher

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ MERGE (italian:Cuisine {name:'Italian'})
99
MERGE (indian:Cuisine {name:'Indian'})
1010
MERGE (lebanese:Cuisine {name:'Lebanese'})
1111
MERGE (portuguese:Cuisine {name:'Portuguese'})
12+
MERGE (british:Cuisine {name:'British'})
13+
MERGE (mauritian:Cuisine {name:'Mauritian'})
1214

1315
MERGE (zhen:Person {name: "Zhen"})
1416
MERGE (praveena:Person {name: "Praveena"})
@@ -18,23 +20,46 @@ MERGE (karin:Person {name: "Karin"})
1820

1921
MERGE (praveena)-[:LIKES {score: 9}]->(indian)
2022
MERGE (praveena)-[:LIKES {score: 7}]->(portuguese)
23+
MERGE (praveena)-[:LIKES {score: 8}]->(british)
24+
MERGE (praveena)-[:LIKES {score: 1}]->(mauritian)
2125

2226
MERGE (zhen)-[:LIKES {score: 10}]->(french)
2327
MERGE (zhen)-[:LIKES {score: 6}]->(indian)
28+
MERGE (zhen)-[:LIKES {score: 2}]->(british)
2429

2530
MERGE (michael)-[:LIKES {score: 8}]->(french)
2631
MERGE (michael)-[:LIKES {score: 7}]->(italian)
2732
MERGE (michael)-[:LIKES {score: 9}]->(indian)
33+
MERGE (michael)-[:LIKES {score: 3}]->(portuguese)
2834

2935
MERGE (arya)-[:LIKES {score: 10}]->(lebanese)
3036
MERGE (arya)-[:LIKES {score: 10}]->(italian)
3137
MERGE (arya)-[:LIKES {score: 7}]->(portuguese)
38+
MERGE (arya)-[:LIKES {score: 9}]->(mauritian)
3239

3340
MERGE (karin)-[:LIKES {score: 9}]->(lebanese)
3441
MERGE (karin)-[:LIKES {score: 7}]->(italian)
42+
MERGE (karin)-[:LIKES {score: 10}]->(portuguese)
3543

3644
// end::create-sample-graph[]
3745

46+
// tag::function-cypher[]
47+
MATCH (p1:Person {name: 'Zhen'})-[likes1:LIKES]->(cuisine)
48+
MATCH (p2:Person {name: "Praveena"})-[likes2:LIKES]->(cuisine)
49+
RETURN p1.name AS from,
50+
p2.name AS to,
51+
algo.similarity.euclideanDistance(collect(likes1.score), collect(likes2.score)) AS similarity
52+
// end::function-cypher[]
53+
54+
// tag::function-cypher-all[]
55+
MATCH (p1:Person {name: 'Zhen'})-[likes1:LIKES]->(cuisine)
56+
MATCH (p2:Person)-[likes2:LIKES]->(cuisine) WHERE p2 <> p1
57+
RETURN p1.name AS from,
58+
p2.name AS to,
59+
algo.similarity.euclideanDistance(collect(likes1.score), collect(likes2.score)) AS similarity
60+
ORDER BY similarity DESC
61+
// end::function-cypher-all[]
62+
3863
// tag::stream[]
3964
MATCH (p:Person), (c:Cuisine)
4065
OPTIONAL MATCH (p)-[likes:LIKES]->(c)

doc/asciidoc/scripts/similarity-jaccard.cypher

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,28 @@ MERGE (karin)-[:LIKES]->(italian)
3535

3636
// end::create-sample-graph[]
3737

38+
39+
// tag::function-cypher[]
40+
MATCH (p1:Person {name: 'Karin'})-[:LIKES]->(cuisine1)
41+
WITH p1, collect(id(cuisine1)) AS p1Cuisine
42+
MATCH (p2:Person {name: "Arya"})-[:LIKES]->(cuisine2)
43+
WITH p1, p1Cuisine, p2, collect(id(cuisine2)) AS p2Cuisine
44+
RETURN p1.name AS from,
45+
p2.name AS to,
46+
algo.similarity.jaccard(p1Cuisine, p2Cuisine) AS similarity
47+
// end::function-cypher[]
48+
49+
// tag::function-cypher-all[]
50+
MATCH (p1:Person {name: 'Karin'})-[:LIKES]->(cuisine1)
51+
WITH p1, collect(id(cuisine1)) AS p1Cuisine
52+
MATCH (p2:Person)-[:LIKES]->(cuisine2) WHERE p1 <> p2
53+
WITH p1, p1Cuisine, p2, collect(id(cuisine2)) AS p2Cuisine
54+
RETURN p1.name AS from,
55+
p2.name AS to,
56+
algo.similarity.jaccard(p1Cuisine, p2Cuisine) AS similarity
57+
ORDER BY similarity DESC
58+
// end::function-cypher-all[]
59+
3860
// tag::stream[]
3961
MATCH (p:Person)-[:LIKES]->(cuisine)
4062
WITH {item:id(p), categories: collect(id(cuisine))} as userData

doc/asciidoc/scripts/similarity-pearson.cypher

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,28 @@ MERGE (karin)-[:RATED {score: 9}]->(gruffalo)
4545

4646
// end::create-sample-graph[]
4747

48+
// tag::function-cypher[]
49+
MATCH (p1:Person {name: 'Arya'})-[rated:RATED]->(movie)
50+
WITH p1, algo.similarity.asVector(movie, rated.score) AS p1Vector
51+
MATCH (p2:Person {name: 'Karin'})-[rated:RATED]->(movie)
52+
WITH p1, p2, p1Vector, algo.similarity.asVector(movie, rated.score) AS p2Vector
53+
RETURN p1.name AS from,
54+
p2.name AS to,
55+
algo.similarity.pearson(p1Vector, p2Vector, {vectorType: "maps"}) AS similarity
56+
// end::function-cypher[]
57+
58+
// tag::function-cypher-all[]
59+
MATCH (p1:Person {name: 'Arya'})-[rated:RATED]->(movie)
60+
WITH p1, algo.similarity.asVector(movie, rated.score) AS p1Vector
61+
MATCH (p2:Person)-[rated:RATED]->(movie) WHERE p2 <> p1
62+
WITH p1, p2, p1Vector, algo.similarity.asVector(movie, rated.score) AS p2Vector
63+
RETURN p1.name AS from,
64+
p2.name AS to,
65+
algo.similarity.pearson(p1Vector, p2Vector, {vectorType: "maps"}) AS similarity
66+
ORDER BY similarity DESC
67+
// end::function-cypher-all[]
68+
69+
4870
// tag::stream[]
4971
MATCH (p:Person), (m:Movie)
5072
OPTIONAL MATCH (p)-[rated:RATED]->(m)

doc/asciidoc/similarity-cosine.adoc

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,41 @@ image::cosine-similarity2.png[role="middle"]
7979
include::scripts/similarity-cosine.cypher[tag=create-sample-graph]
8080
----
8181

82+
.The following will return the Cosine similarity of Michael and Arya:
83+
[source, cypher]
84+
----
85+
include::scripts/similarity-cosine.cypher[tag=function-cypher]
86+
----
87+
88+
// tag::function-cypher[]
89+
.Results
90+
[opts="header"]
91+
|===
92+
| `from` | `to` | `similarity`
93+
| "Michael" | "Arya" | 0.9788908326303921
94+
95+
|===
96+
// end::function-cypher[]
97+
98+
.The following will return the Cosine similarity of Michael and the other people that have a cuisine in common:
99+
[source, cypher]
100+
----
101+
include::scripts/similarity-cosine.cypher[tag=function-cypher-all]
102+
----
103+
104+
// tag::function-cypher--all[]
105+
.Results
106+
[opts="header"]
107+
|===
108+
| `from` | `to` | `similarity`
109+
| "Michael" | "Arya" | 0.9788908326303921
110+
| "Michael" | "Zhen" | 0.9542262139256075
111+
| "Michael" | "Praveena" | 0.9429903335828894
112+
| "Michael" | "Karin" | 0.8498063272285821
113+
114+
|===
115+
// end::function-cypher-all[]
116+
82117
.The following will return a stream of node pairs along with their Cosine similarities:
83118
[source, cypher]
84119
----

doc/asciidoc/similarity-euclidean.adoc

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,39 @@ These two lists of numbers have a euclidean distance of 8.42.
7373
include::scripts/similarity-euclidean.cypher[tag=create-sample-graph]
7474
----
7575

76+
.The following will return the Euclidean distance of Zhen and Praveena:
77+
[source, cypher]
78+
----
79+
include::scripts/similarity-euclidean.cypher[tag=function-cypher]
80+
----
81+
82+
// tag::function-cypher[]
83+
.Results
84+
[opts="header"]
85+
|===
86+
| `from` | `to` | `similarity`
87+
| "Zhen" | "Praveena" | 6.708203932499369
88+
89+
|===
90+
// end::function-cypher[]
91+
92+
.The following will return the Euclidean distance of Zhen and the other people that have a cuisine in common:
93+
[source, cypher]
94+
----
95+
include::scripts/similarity-euclidean.cypher[tag=function-cypher-all]
96+
----
97+
98+
// tag::function-cypher--all[]
99+
.Results
100+
[opts="header"]
101+
|===
102+
| `from` | `to` | `similarity`
103+
| "Zhen" | "Praveena" | 6.708203932499369
104+
| "Zhen" | "Michael" | 3.605551275463989
105+
106+
|===
107+
// end::function-cypher-all[]
108+
76109
.The following will return a stream of node pairs, along with their intersection and euclidean similarities:
77110
[source, cypher]
78111
----

doc/asciidoc/similarity-jaccard.adoc

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,41 @@ J(A,B) = 2 / 3 + 4 - 2
7979
include::scripts/similarity-jaccard.cypher[tag=create-sample-graph]
8080
----
8181

82+
.The following will return the Jaccard similarity of Karin and Arya:
83+
[source, cypher]
84+
----
85+
include::scripts/similarity-jaccard.cypher[tag=function-cypher]
86+
----
87+
88+
// tag::function-cypher[]
89+
.Results
90+
[opts="header"]
91+
|===
92+
| `from` | `to` | `similarity`
93+
| "Karin" | "Arya" | 0.66
94+
95+
|===
96+
// end::function-cypher[]
97+
98+
.The following will return the Jaccard similarity of Karin and the other people that have a cuisine in common:
99+
[source, cypher]
100+
----
101+
include::scripts/similarity-jaccard.cypher[tag=function-cypher-all]
102+
----
103+
104+
// tag::function-cypher--all[]
105+
.Results
106+
[opts="header"]
107+
|===
108+
| `from` | `to` | `similarity`
109+
| "Karin" | "Arya" | 0.66
110+
| "Karin" | "Michael" | 0.25
111+
| "Karin" | "Praveena" | 0.0
112+
| "Karin" | "Zhen" | 0.0
113+
114+
|===
115+
// end::function-cypher-all[]
116+
82117
.The following will return a stream of node pairs along with their intersection and Jaccard similarities:
83118
[source, cypher]
84119
----

doc/asciidoc/similarity-pearson.adoc

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,47 @@ include::scripts/similarity-pearson.cypher[tag=function]
6666
include::scripts/similarity-pearson.cypher[tag=create-sample-graph]
6767
----
6868

69+
.The following will return the Pearson similarity of Arya and Karin:
70+
[source, cypher]
71+
----
72+
include::scripts/similarity-pearson.cypher[tag=function-cypher]
73+
----
74+
75+
// tag::function-cypher[]
76+
.Results
77+
[opts="header"]
78+
|===
79+
| `from` | `to` | `similarity`
80+
|"Arya" | "Karin" | 0.8194651785206903
81+
82+
|===
83+
// end::function-cypher[]
84+
85+
86+
In this example, we pass in `vectorType: "maps"` as an extra parameter, as well as using the `algo.similarity.asVector` function to construct a vector of maps containing each movie and the corresponding rating.
87+
We do this because the Pearson Similarity algorithm needs to compute the average of *all* the movies that a user has reviewed, not just the ones that they have in common with the user we're comparing them to.
88+
We can't therefore just pass in collections of the ratings of movies that have been reviewed by both people.
89+
90+
.The following will return the Pearson similarity of Arya and other people that have rated at least one movie:
91+
[source, cypher]
92+
----
93+
include::scripts/similarity-pearson.cypher[tag=function-cypher-all]
94+
----
95+
96+
// tag::function-cypher--all[]
97+
.Results
98+
[opts="header"]
99+
|===
100+
| `from` | `to` | `similarity`
101+
| "Arya" | "Karin" | 0.8194651785206903
102+
| "Arya" | "Zhen" | 0.4839533792540704
103+
| "Arya" | "Praveena" | 0.09262336892949784
104+
| "Arya" | "Michael" | -0.9551953674747637
105+
106+
|===
107+
// end::function-cypher-all[]
108+
109+
69110
.The following will return a stream of node pairs along with their Pearson similarities:
70111
[source, cypher]
71112
----

0 commit comments

Comments
 (0)