Skip to content
This repository was archived by the owner on Apr 22, 2020. It is now read-only.

Commit 975c8ef

Browse files
authored
Source target ids docs (#852)
* jaccard - source and target ids * remove unnecessary fields * same explanation of topK as in the earlier example * Cosine - added source and target ids, splitting functions and procedures * overlap and euclidean * overlap source and target ids * add sourceIds and targetIds to the Syntax section * add sourceIds and targetIds for pearson * put cypher projection last * edit similarity doc * fix include
1 parent 3b45068 commit 975c8ef

11 files changed

+591
-102
lines changed

doc/asciidoc/scripts/similarity-cosine.cypher

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,3 +144,20 @@ RETURN algo.asNode(item1).name AS from, algo.asNode(item2).name AS to, similarit
144144
ORDER BY similarity DESC
145145

146146
// end::embedding-graph-stream[]
147+
148+
// tag::source-target-ids[]
149+
MATCH (p:Person), (c:Cuisine)
150+
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
151+
WITH {item:id(p), name: p.name, weights: collect(coalesce(likes.score, algo.NaN()))} as userData
152+
WITH collect(userData) as personCuisines
153+
154+
// create sourceIds list containing ids for Praveena and Arya
155+
WITH personCuisines,
156+
[value in personCuisines WHERE value.name IN ["Praveena", "Arya"] | value.item ] AS sourceIds
157+
158+
CALL algo.similarity.cosine.stream(personCuisines, {sourceIds: sourceIds, topK: 1})
159+
YIELD item1, item2, similarity
160+
WITH algo.getNodeById(item1) AS from, algo.getNodeById(item2) AS to, similarity
161+
RETURN from.name AS from, to.name AS to, similarity
162+
ORDER BY similarity DESC
163+
// end::source-target-ids[]

doc/asciidoc/scripts/similarity-euclidean.cypher

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -145,3 +145,19 @@ ORDER BY similarity DESC
145145

146146
// end::embedding-graph-stream[]
147147

148+
// tag::source-target-ids[]
149+
MATCH (p:Person), (c:Cuisine)
150+
OPTIONAL MATCH (p)-[likes:LIKES]->(c)
151+
WITH {item:id(p), name: p.name, weights: collect(coalesce(likes.score, algo.NaN()))} as userData
152+
WITH collect(userData) as personCuisines
153+
154+
// create sourceIds list containing ids for Praveena and Arya
155+
WITH personCuisines,
156+
[value in personCuisines WHERE value.name IN ["Praveena", "Arya"] | value.item ] AS sourceIds
157+
158+
CALL algo.similarity.euclidean.stream(personCuisines, {sourceIds: sourceIds, topK: 1})
159+
YIELD item1, item2, similarity
160+
WITH algo.getNodeById(item1) AS from, algo.getNodeById(item2) AS to, similarity
161+
RETURN from.name AS from, to.name AS to, similarity
162+
ORDER BY similarity DESC
163+
// end::source-target-ids[]

doc/asciidoc/scripts/similarity-jaccard.cypher

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,53 @@ MERGE (karin)-[:LIKES]->(italian)
3636
// end::create-sample-graph[]
3737

3838

39+
// tag::create-sample-graph-procedure[]
40+
41+
MERGE (french:Cuisine {name:'French'})
42+
MERGE (italian:Cuisine {name:'Italian'})
43+
MERGE (indian:Cuisine {name:'Indian'})
44+
MERGE (lebanese:Cuisine {name:'Lebanese'})
45+
MERGE (portuguese:Cuisine {name:'Portuguese'})
46+
47+
MERGE (zhen:Person {name: "Zhen"})
48+
MERGE (praveena:Person {name: "Praveena"})
49+
MERGE (michael:Person {name: "Michael"})
50+
MERGE (arya:Person {name: "Arya"})
51+
MERGE (karin:Person {name: "Karin"})
52+
53+
MERGE (shrimp:Recipe {title: "Shrimp Bolognese"})
54+
MERGE (saltimbocca:Recipe {title: "Saltimbocca alla roman"})
55+
MERGE (periperi:Recipe {title: "Peri Peri Naan"})
56+
57+
MERGE (praveena)-[:LIKES]->(indian)
58+
MERGE (praveena)-[:LIKES]->(portuguese)
59+
60+
MERGE (zhen)-[:LIKES]->(french)
61+
MERGE (zhen)-[:LIKES]->(indian)
62+
63+
MERGE (michael)-[:LIKES]->(french)
64+
MERGE (michael)-[:LIKES]->(italian)
65+
MERGE (michael)-[:LIKES]->(indian)
66+
67+
MERGE (arya)-[:LIKES]->(lebanese)
68+
MERGE (arya)-[:LIKES]->(italian)
69+
MERGE (arya)-[:LIKES]->(portuguese)
70+
71+
MERGE (karin)-[:LIKES]->(lebanese)
72+
MERGE (karin)-[:LIKES]->(italian)
73+
74+
MERGE (shrimp)-[:TYPE]->(italian)
75+
MERGE (shrimp)-[:TYPE]->(indian)
76+
77+
MERGE (saltimbocca)-[:TYPE]->(italian)
78+
MERGE (saltimbocca)-[:TYPE]->(french)
79+
80+
MERGE (periperi)-[:TYPE]->(portuguese)
81+
MERGE (periperi)-[:TYPE]->(indian)
82+
83+
// end::create-sample-graph-procedure[]
84+
85+
3986
// tag::function-cypher[]
4087
MATCH (p1:Person {name: 'Karin'})-[:LIKES]->(cuisine1)
4188
WITH p1, collect(id(cuisine1)) AS p1Cuisine
@@ -102,3 +149,43 @@ MATCH (p:Person {name: "Praveena"})-[:SIMILAR]->(other),
102149
WHERE not((p)-[:LIKES]->(cuisine))
103150
RETURN cuisine.name AS cuisine
104151
// end::query[]
152+
153+
// tag::source-target-ids[]
154+
// compute categories for recipes
155+
MATCH (recipe:Recipe)-[:TYPE]->(cuisine)
156+
WITH {item:id(recipe), categories: collect(id(cuisine))} as data
157+
WITH collect(data) AS recipeCuisines
158+
159+
// compute categories for people
160+
MATCH (person:Person)-[:LIKES]->(cuisine)
161+
WITH recipeCuisines, {item:id(person), categories: collect(id(cuisine))} as data
162+
WITH recipeCuisines, collect(data) AS personCuisines
163+
164+
// create sourceIds and targetIds lists
165+
WITH recipeCuisines, personCuisines,
166+
[value in recipeCuisines | value.item] AS sourceIds,
167+
[value in personCuisines | value.item] AS targetIds
168+
169+
CALL algo.similarity.jaccard.stream(recipeCuisines + personCuisines, {sourceIds: sourceIds, targetIds: targetIds})
170+
YIELD item1, item2, similarity
171+
WITH algo.getNodeById(item1) AS from, algo.getNodeById(item2) AS to, similarity
172+
RETURN from.title AS from, to.name AS to, similarity
173+
ORDER BY similarity DESC
174+
LIMIT 10
175+
// end::source-target-ids[]
176+
177+
// tag::source-target-ids-2[]
178+
MATCH (person:Person)-[:LIKES]->(cuisine)
179+
WITH {item:id(person), name: person.name, categories: collect(id(cuisine))} as data
180+
WITH collect(data) AS personCuisines
181+
182+
// create sourceIds list containing ids for Praveena and Arya
183+
WITH personCuisines,
184+
[value in personCuisines WHERE value.name IN ["Praveena", "Arya"] | value.item ] AS sourceIds
185+
186+
CALL algo.similarity.jaccard.stream(personCuisines, {sourceIds: sourceIds, topK: 1})
187+
YIELD item1, item2, similarity
188+
WITH algo.getNodeById(item1) AS from, algo.getNodeById(item2) AS to, similarity
189+
RETURN from.name AS from, to.name AS to, similarity
190+
ORDER BY similarity DESC
191+
// end::source-target-ids-2[]

doc/asciidoc/scripts/similarity-overlap.cypher

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,3 +83,19 @@ MATCH path = (fantasy:Genre {name: "Fantasy"})-[:NARROWER_THAN*]->(genre)
8383
RETURN [node in nodes(path) | node.name] AS hierarchy
8484
ORDER BY length(path)
8585
// end::query[]
86+
87+
88+
// tag::source-target-ids[]
89+
MATCH (book:Book)-[:HAS_GENRE]->(genre)
90+
WITH {item:id(genre), name: genre.name, categories: collect(id(book))} as userData
91+
WITH collect(userData) as data
92+
93+
// create sourceIds list containing ids for Fantasy and Classics
94+
WITH data,
95+
[value in data WHERE value.name IN ["Fantasy", "Classics"] | value.item ] AS sourceIds
96+
97+
CALL algo.similarity.overlap.stream(data, {sourceIds: sourceIds})
98+
YIELD item1, item2, count1, count2, intersection, similarity
99+
RETURN algo.getNodeById(item1).name AS from, algo.getNodeById(item2).name AS to, similarity
100+
ORDER BY similarity DESC
101+
// end::source-target-ids[]

doc/asciidoc/scripts/similarity-pearson.cypher

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -148,4 +148,21 @@ YIELD item1, item2, count1, count2, similarity
148148
RETURN algo.asNode(item1).name AS from, algo.asNode(item2).name AS to, similarity
149149
ORDER BY similarity DESC
150150

151-
// end::embedding-graph-stream[]
151+
// end::embedding-graph-stream[]
152+
153+
// tag::source-target-ids[]
154+
MATCH (p:Person), (m:Movie)
155+
OPTIONAL MATCH (p)-[rated:RATED]->(m)
156+
WITH {item:id(p), name: p.name, weights: collect(coalesce(rated.score, algo.NaN()))} as userData
157+
WITH collect(userData) as personCuisines
158+
159+
// create sourceIds list containing ids for Praveena and Arya
160+
WITH personCuisines,
161+
[value in personCuisines WHERE value.name IN ["Praveena", "Arya"] | value.item ] AS sourceIds
162+
163+
CALL algo.similarity.pearson.stream(personCuisines, {sourceIds: sourceIds, topK: 1})
164+
YIELD item1, item2, similarity
165+
WITH algo.getNodeById(item1) AS from, algo.getNodeById(item2) AS to, similarity
166+
RETURN from.name AS from, to.name AS to, similarity
167+
ORDER BY similarity DESC
168+
// end::source-target-ids[]

doc/asciidoc/similarity-cosine.adoc

Lines changed: 64 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
:procedure-name: Cosine Similarity
12
[[algorithms-similarity-cosine]]
23
= The Cosine Similarity algorithm
34

@@ -15,9 +16,11 @@ This section includes:
1516

1617
* <<algorithms-similarity-cosine-context, History and explanation>>
1718
* <<algorithms-similarity-cosine-usecase, Use-cases - when to use the Cosine Similarity algorithm>>
18-
* <<algorithms-similarity-cosine-sample, Cosine Similarity algorithm sample>>
19-
* <<algorithms-similarity-cosine-cypher-projection, Cypher projection>>
19+
* <<algorithms-similarity-cosine-function-sample, Cosine Similarity function algorithm sample>>
20+
* <<algorithms-similarity-cosine-procedure-sample, Cosine Similarity procedures algorithm sample>>
21+
* <<algorithms-similarity-cosine-source-target-ids, Specifying source and target ids>>
2022
* <<algorithms-similarity-cosine-skipping-values, Skipping values>>
23+
* <<algorithms-similarity-cosine-cypher-projection, Cypher projection>>
2124
* <<algorithms-similarity-cosine-syntax, Syntax>>
2225
2326
@@ -36,10 +39,6 @@ The library contains both procedures and functions to calculate similarity betwe
3639
The function is best used when calculating the similarity between small numbers of sets.
3740
The procedures parallelize the computation and are therefore more appropriate for computing similarities on bigger datasets.
3841

39-
Cosine similarity is only calculated over non-NULL dimensions.
40-
When calling the function, we should provide lists that contain the overlapping items.
41-
The procedures expect to receive the same length lists for all items, so we need to pad those lists with `algo.NaN()` where necessary.
42-
4342
// end::explanation[]
4443

4544
[[algorithms-similarity-cosine-usecase]]
@@ -52,8 +51,14 @@ For example, to get movie recommendations based on the preferences of users who
5251
// end::use-case[]
5352

5453

55-
[[algorithms-similarity-cosine-sample]]
56-
== Cosine algorithm sample
54+
[[algorithms-similarity-cosine-function-sample]]
55+
== Cosine Similarity algorithm function sample
56+
57+
The Cosine Similarity function computes the similarity of two lists of numbers.
58+
59+
include::similarity.adoc[tag=weighted-function-note]
60+
61+
We can use it to compute the similarity of two hardcoded lists.
5762

5863
.The following will return the cosine similarity of two lists of numbers:
5964
[source, cypher]
@@ -82,6 +87,8 @@ image::cosine-similarity2.png[role="middle"]
8287

8388
// end::function-explanation[]
8489

90+
We can also use it to compute the similarity of nodes based on lists computed by a Cypher query.
91+
8592
.The following will create a sample graph:
8693
[source, cypher]
8794
----
@@ -123,6 +130,21 @@ include::scripts/similarity-cosine.cypher[tag=function-cypher-all]
123130
|===
124131
// end::function-cypher-all[]
125132

133+
134+
[[algorithms-similarity-cosine-procedure-sample]]
135+
== Cosine Similarity algorithm procedures sample
136+
137+
138+
include::similarity.adoc[tag=computation]
139+
include::similarity.adoc[tag=weighted-note]
140+
141+
.The following will create a sample graph:
142+
[source, cypher]
143+
----
144+
include::scripts/similarity-cosine.cypher[tag=create-sample-graph]
145+
----
146+
147+
126148
.The following will return a stream of node pairs along with their Cosine similarities:
127149
[source, cypher]
128150
----
@@ -236,18 +258,29 @@ include::scripts/similarity-cosine.cypher[tag=query]
236258
// end::query[]
237259

238260

239-
[[algorithms-similarity-cosine-cypher-projection]]
240-
== Cypher projection
261+
[[algorithms-similarity-cosine-source-target-ids]]
262+
== Specifying source and target ids
241263

242-
include::projected-graph-model/cypher-projection.adoc[tag=similarity-explanation]
264+
include::similarity.adoc[tag=source-target-ids]
243265

244-
.Set `graph:'cypher'` in the config:
266+
We could use this technique to compute the similarity of a subset of items to all other items.
245267

246-
[source,cypher]
268+
.The following will find the most similar person (i.e. `k=1`) to Arya and Praveena:
269+
[source, cypher]
247270
----
248-
include::scripts/similarity-cosine.cypher[tag=cypher-projection]
271+
include::scripts/similarity-cosine.cypher[tag=source-target-ids]
249272
----
250273

274+
// tag::source-target-ids[]
275+
.Results
276+
[opts="header",cols="1,1,1"]
277+
|===
278+
| `from` | `to` | `similarity`
279+
| Praveena | Karin | 1.0
280+
| Arya | Michael | 0.9788908326303921
281+
|===
282+
// end::source-target-ids[]
283+
251284
[[algorithms-similarity-cosine-skipping-values]]
252285
== Skipping values
253286

@@ -267,14 +300,25 @@ include::scripts/similarity-cosine.cypher[tag=create-sample-embedding-graph]
267300
include::scripts/similarity-cosine.cypher[tag=embedding-graph-stream]
268301
----
269302

303+
[[algorithms-similarity-cosine-cypher-projection]]
304+
== Cypher projection
305+
306+
include::projected-graph-model/cypher-projection.adoc[tag=similarity-explanation]
307+
308+
.Set `graph:'cypher'` in the config:
309+
310+
[source,cypher]
311+
----
312+
include::scripts/similarity-cosine.cypher[tag=cypher-projection]
313+
----
270314

271315
[[algorithms-similarity-cosine-syntax]]
272316
== Syntax
273317

274318
.The following will run the algorithm and write back results:
275319
[source, cypher]
276320
----
277-
CALL algo.similarity.cosine(userData:List<Map>> or String, {
321+
CALL algo.similarity.cosine(userData:List<Map> or String, {
278322
topK: 1, similarityCutoff: 0.1, write:true, writeProperty: "cosineSimilarity"
279323
})
280324
YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min, max, mean, stdDev, p25, p50, p75, p90, p95, p99, p999, p100
@@ -297,6 +341,8 @@ YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min,
297341
| `writeBatchSize` | int | 10000 | yes | The batch size to use when storing results.
298342
| `writeRelationshipType` | string | SIMILAR | yes | The relationship type to use when storing results.
299343
| `writeProperty` | string | score | yes | The property to use when storing results.
344+
| `sourceIds` | long[] | null | yes | The ids of items from which we need to compute similarities. Defaults to all the items provided in the `data` parameter.
345+
| `targetIds` | long[] | null | yes | The ids of items to which we need to compute similarities. Defaults to all the items provided in the `data` parameter.
300346
|===
301347

302348
.Results
@@ -325,7 +371,7 @@ YIELD nodes, similarityPairs, write, writeRelationshipType, writeProperty, min,
325371
.The following will run the algorithm and stream results:
326372
[source,cypher]
327373
----
328-
CALL algo.similarity.cosine.stream(userData:List<Map>> or String, {
374+
CALL algo.similarity.cosine.stream(userData:List<Map> or String, {
329375
degreeCutoff: 10, similarityCutoff: 0.1, concurrency:4
330376
})
331377
YIELD item1, item2, count1, count2, intersection, similarity
@@ -344,6 +390,8 @@ YIELD item1, item2, count1, count2, intersection, similarity
344390
| `skipValue` | double | null | yes | Value to skip when executing similarity computation. A value of `null` means that skipping is disabled.
345391
| `concurrency` | int | available CPUs | yes | The number of concurrent threads.
346392
| `graph` | string | dense | yes | The graph name ('dense' or 'cypher').
393+
| `sourceIds` | long[] | null | yes | The ids of items from which we need to compute similarities. Defaults to all the items provided in the `data` parameter.
394+
| `targetIds` | long[] | null | yes | The ids of items to which we need to compute similarities. Defaults to all the items provided in the `data` parameter.
347395
|===
348396

349397
.Results

0 commit comments

Comments
 (0)