Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,6 @@ include::questions/1-embeddings.adoc[leveloffset=+1]
[.summary]
== Lesson Summary

In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevent information.
In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevant information.

In the next lesson, you will use a vector index in Neo4j to find relevant data.
Original file line number Diff line number Diff line change
Expand Up @@ -67,22 +67,44 @@ The _names_ would be the node and relationship identifiers.

If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would:

. **Gather** the text from the page. +
+
image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"]

. **Gather** the text from the page.
+
Neo4j is a graph database management system (GDBMS) developed by
Neo4j Inc.

The data elements Neo4j stores are nodes, edges connecting them
and attributes of nodes and edges. Described by its developers
as an ACID-compliant transactional database with native graph
storage and processing...

. Split the text into **chunks**.
+
Neo4j is a graph database management system (GDBMS) developed
Neo4j is a graph database management system (GDBMS) developed
by Neo4j Inc.
+
{sp}
+
The data elements Neo4j stores are nodes, edges connecting them,
and attributes of nodes and edges...
The data elements Neo4j stores are nodes, edges connecting them
and attributes of nodes and edges.
+
{sp}
+
Described by its developers as an ACID-compliant transactional
database with native graph storage and processing...

. Generate **embeddings** and **vectors** for each chunk.
+
[0.21972137987, 0.12345678901, 0.98765432109, ...]
+
{sp}
+
[0.34567890123, 0.23456789012, 0.87654321098, ...]
+
{sp}
+
[0.45678901234, 0.34567890123, 0.76543210987, ...]

. **Extract** the entities and relationships using an **LLM**.
+
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
:order: 3
:branch: main

The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents. By default, the chunk size is quite large, which may result in fewer, larger chunks. The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data.
The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents.

By default, the chunk size is quite large, which may result in fewer, larger chunks.

The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data.

In this lesson, you will modify the `SimpleKGPipeline` to use a different chunk size.

Expand Down Expand Up @@ -51,15 +55,19 @@ include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_sp

Run the modified pipeline to recreate the knowledge graph with the new chunk size.

== Explore

You can view the documents and the associated chunk using the following Cypher query:

[source, cypher]
.View the documents and chunks
----
MATCH (d:Document)<-[:FROM_DOCUMENT]-(c:Chunk)
RETURN d.path, c.index, c.text
RETURN d.path, c.index, c.text, size(c.text)
ORDER BY d.path, c.index
----

You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph.
View the entities extracted from each chunk using the following Cypher query:

[source, cypher]
.View the entities extracted from each chunk
Expand All @@ -68,6 +76,11 @@ MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
RETURN p
----

[TIP]
====
You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph.
====

[.quiz]
== Check your understanding

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,18 @@
:order: 4
:branch: main

The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. This can lead to graphs that are non-specific and may be difficult to analyze and query.
The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text.

This can lead to graphs that are non-specific and may be difficult to analyze and query.

In this lesson, you will modify the `SimpleKGPipeline` to use a custom schema for the knowledge graph.


== Schema

When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. This allows you to create a more structured and meaningful knowledge graph.
When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships.

This allows you to create a more structured and meaningful knowledge graph.

You define a schema by expressing the desired nodes, relationships, or patterns you want to extract from the text.

Expand Down Expand Up @@ -62,38 +66,39 @@ You can also provide a description for each node label and associated properties
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=node_types]
----

Run the program to create the knowledge graph with the defined nodes.
Recreate the knowledge graph with the defined nodes:

[TIP]
.Remember to delete the existing graph before re-running the pipeline
====
. Delete any existing nodes and relationships.
+
[source, cypher]
.Delete the existing graph
----
MATCH (n) DETACH DELETE n
----
====
. Run the program
+
The graph will be constrained to only include the defined node labels.

The graph created will be constrained to only include the defined node labels.
View the entities and chunks in the graph using the following Cypher query:

[source, cypher]
.View the entities extracted from each chunk
.Entities and Chunks
----
MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
RETURN p
----

== Relationships

You express required relationship types by providing a list of relationship types to the `SimpleKGPipeline`.
You can define required relationship types by providing a list to the `SimpleKGPipeline`.

[source, python]
.RELATIONSHIP_TYPES
----
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=relationship_types]
----

You can also provide patterns that define how nodes types are connected by relationships.
You can also describe patterns that define how nodes are connected by relationships.

[source, python]
.PATTERNS
Expand All @@ -109,36 +114,49 @@ Nodes, relationships and patterns are all passed to the `SimpleKGPipeline` as th
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=kg_builder]
----

Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of nodes, relationships and patterns relevant to the data.

== Process all the documents

When you are happy with the schema, you can modify the program to process all the PDF documents from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals[Neo4j and Generative AI Fundamentals course^]:

[%collapsible]
.Reveal the complete code
====
[source, python]
.All PDFs
----
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=all_documents]
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tags=**;!simple_nodes;!all_documents]
----
====

Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of `NODES`, `RELATIONSHIPS` and `PATTERNS` relevant to the data.

Recreate the knowledge graph:

. Delete any existing nodes and relationships.
. Run the program.

You can run the program to create a knowledge graph based on all the documents using the defined schema.

[%collapsible]
.Reveal the complete code
.Process all the documents?
====
In the next lesson, you will add structured data to the knowledge graph, and process all of the documents.

Optionally, you could modify the program now to process the documents from the `data` directory without the structured data:

[source, python]
.All PDFs
----
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=**,!simple_nodes]
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=all_documents]
----
====

[TIP]
.OpenAI Rate Limiting?
====
When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this.
====
== Explore

Review the knowledge graph and observe how the defined schema has influenced the structure of the graph:

[source, cypher]
.Entities and Chunks
----
MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
RETURN p
----

Review the knowledge graph and observe how the defined schema has influenced the structure of the graph.
View the counts of documents, chunks and entities in the graph:

[source, cypher]
.Documents, Chunks, and Entity counts
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Combining the structured and unstructured data can enhance the knowledge graph's
.Lexical and Domain Graphs
The unstructured part of your graph is known as the link:https://graphrag.com/reference/knowledge-graph/lexical-graph/[Lexical Graph], while the structured part is known as the link:https://graphrag.com/reference/knowledge-graph/domain-graph/[Domain Graph].

== Load from CSV file
== Structured data source

The repository contains a sample CSV file `genai-graphrag-python/data/docs.csv` which contains metadata about the lessons the documents were created from.

Expand All @@ -24,6 +24,8 @@ genai-fundamentals_1-generative-ai_2-considerations.pdf,genai-fundamentals,1-gen
...
----

=== Load from CSV file

You can use the CSV file as input and a structured data source when creating the knowledge graph.

Open `genai-graphrag-python/kg_structured_builder.py` and review the code.
Expand Down Expand Up @@ -73,18 +75,22 @@ image::images/kg-builder-structured-model.svg["A data model showing Lesson nodes

Run the program to create the knowledge graph with the structured data.

[TIP]
.Clear the graph before importing
[NOTE]
.Remember to delete the existing graph before re-running the pipeline
====
Remember to clear the database before running the program to avoid inconsistent data.

[source, cypher]
.Delete all
.Delete the existing graph
----
MATCH (n) DETACH DELETE n
----
====

[TIP]
.OpenAI Rate Limiting?
====
When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this.
====

== Explore the structured data

The structured data allows you to query the knowledge graph in new ways.
Expand All @@ -106,7 +112,9 @@ The knowledge graph allows you to summarize the content of each lesson by specif
.Summarize lesson content
----
MATCH (lesson:Lesson)<-[:PDF_OF]-(:Document)<-[:FROM_DOCUMENT]-(c:Chunk)
RETURN lesson.name,
RETURN
lesson.name,
lesson.url,
[ (c)<-[:FROM_CHUNK]-(tech:Technology) | tech.name ] AS technologies,
[ (c)<-[:FROM_CHUNK]-(concept:Concept) | concept.name ] AS concepts
----
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

The chunks in the knowledge graph include vector embeddings that allow for similarity search based on vector distance.

You can create a vector retriever that uses these embeddings to find the most relevant chunks for a given query.
In this lesson you will create a vector retriever that uses these embeddings to find the most relevant chunks for a given query.

The retriever can then use the structured and unstructured data in the knowledge graph to provide additional context.

Expand Down Expand Up @@ -114,6 +114,50 @@ The retrieval query includes additional context relating to technologies and con

Experiment asking different questions relating to the knowledge graph such as _"What technologies and concepts support knowledge graphs?"_.

=== Generalize entity retrieval

The retriever currently uses the knowledge graph to add additional context related to technologies and concepts.
The specific entities allow for targeted retrieval, however you may also want to generalize the retrieval to include all related entities.

You can use the node labels and relationship types to create a response that includes details about the entities.

This cypher query retrieves all related entities between the chunks:

[source, cypher]
.Related entities
----
MATCH (c:Chunk)<-[:FROM_CHUNK]-(entity)-[r]->(other)-[:FROM_CHUNK]->()
RETURN DISTINCT
labels(entity)[2], entity.name, entity.type, entity.description,
type(r),
labels(other)[2], other.name, other.type, other.description
----

The output uses the node labels, properties, and relationship types to output rows which form statements such as:

* `Concept` "Semantic Search" `RELATED_TO` `Technology` "Vector Indexes"
* `Technology` "Retrieval Augmented Generation" `HAS_CHALLENGE` "Understanding what the user is asking for and finding the correct information to pass to the LLM"`

These statements can be used to create additional context for the LLM to generate responses.

Modify the `retrieval_query` to include all entities associated with the chunk:

[source, python]
.Enhanced retrieval query with all related entities
----
include::{repository-raw}/{branch}/genai-graphrag-python/solutions/vector_cypher_rag.py[tag=advanced_retrieval_query]
----

[TIP]
.Format the context
====
The Cypher functions `reduce` and `coalesce` are used to format the associated entities into readable statements. The `reduce` function adds space characters between the values, and `coalesce` replaces null values with empty strings.
====

== Experiment

Experiment running the code with different queries to see how the additional context changes the responses.

[.quiz]
== Check your understanding

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ The `Text2CypherRetriever` retriever allows you to create `GraphRAG` pipelines t

Using text to cypher retrieval can help you get precise information from the knowledge graph based on user questions. For example, how many lessons are in a course, what concepts are covered in a module, or how technologies relate to each other.

In this lesson, you will create a text to cypher retriever and use it to answer questions about the data in knowledge graph.

== Create a Text2CypherRetriever GraphRAG pipeline

Open `genai-graphrag-python/text2cypher_rag.py` and review the code.
Expand Down
Loading