neo4j-graphacademy · martinohanlon · Dec 1, 2025 · Dec 1, 2025 · Dec 1, 2025
diff --git a/...oc/courses/genai-fundamentals/modules/2-rag/lessons/2-vector-search/lesson.adoc b/...oc/courses/genai-fundamentals/modules/2-rag/lessons/2-vector-search/lesson.adoc
@@ -154,6 +154,6 @@ include::questions/1-embeddings.adoc[leveloffset=+1]
 [.summary]
 == Lesson Summary
 
-In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevent information.
+In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevant information.
 
 In the next lesson, you will use a vector index in Neo4j to find relevant data.
diff --git a/...ython/modules/1-introduction/lessons/1-knowledge-graph-construction/lesson.adoc b/...ython/modules/1-introduction/lessons/1-knowledge-graph-construction/lesson.adoc
@@ -67,22 +67,44 @@ The _names_ would be the node and relationship identifiers.
 
 If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would:
 
-. **Gather** the text from the page. +
-+
 image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"]
+
+. **Gather** the text from the page.
++
+    Neo4j is a graph database management system (GDBMS) developed by 
+    Neo4j Inc.
+
+    The data elements Neo4j stores are nodes, edges connecting them 
+    and attributes of nodes and edges. Described by its developers 
+    as an ACID-compliant transactional database with native graph 
+    storage and processing...
+
 . Split the text into **chunks**.
 +
-    Neo4j is a graph database management system (GDBMS) developed
+    Neo4j is a graph database management system (GDBMS) developed 
     by Neo4j Inc.
 +
 {sp}
 +
-    The data elements Neo4j stores are nodes, edges connecting them,
-    and attributes of nodes and edges...
+    The data elements Neo4j stores are nodes, edges connecting them 
+    and attributes of nodes and edges.
++
+{sp}
++
+    Described by its developers as an ACID-compliant transactional 
+    database with native graph storage and processing...
 
 . Generate **embeddings** and **vectors** for each chunk.
 +
     [0.21972137987, 0.12345678901, 0.98765432109, ...]
++
+{sp}
++
+    [0.34567890123, 0.23456789012, 0.87654321098, ...]
++
+{sp}
++
+    [0.45678901234, 0.34567890123, 0.76543210987, ...]
 
 . **Extract** the entities and relationships using an **LLM**.
 +

diff --git a/...s/2-knowledge-graph-pipeline/lessons/2-create-a-graph/images/entities-graph.png b/...s/2-knowledge-graph-pipeline/lessons/2-create-a-graph/images/entities-graph.png
diff --git a/...hrag-python/modules/2-knowledge-graph-pipeline/lessons/3-chunk-size/lesson.adoc b/...hrag-python/modules/2-knowledge-graph-pipeline/lessons/3-chunk-size/lesson.adoc
@@ -3,7 +3,11 @@
 :order: 3
 :branch: main
 
-The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents. By default, the chunk size is quite large, which may result in fewer, larger chunks. The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data.
+The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents. 
+
+By default, the chunk size is quite large, which may result in fewer, larger chunks. 
+
+The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data.
 
 In this lesson, you will modify the `SimpleKGPipeline` to use a different chunk size.
 
@@ -51,15 +55,19 @@ include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_sp
 
 Run the modified pipeline to recreate the knowledge graph with the new chunk size.
 
+== Explore
+
+You can view the documents and the associated chunk using the following Cypher query:
+
 [source, cypher]
 .View the documents and chunks
 ----
 MATCH (d:Document)<-[:FROM_DOCUMENT]-(c:Chunk)
-RETURN d.path, c.index, c.text
+RETURN d.path, c.index, c.text, size(c.text)
 ORDER BY d.path, c.index
 ----
 
-You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph.
+View the entities extracted from each chunk using the following Cypher query:
 
 [source, cypher]
 .View the entities extracted from each chunk
@@ -68,6 +76,11 @@ MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
 RETURN p
 ----
 
+[TIP]
+====
+You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph.
+====
+
 [.quiz]
 == Check your understanding
 

diff --git a/...python/modules/2-knowledge-graph-pipeline/lessons/4-define-a-schema/lesson.adoc b/...python/modules/2-knowledge-graph-pipeline/lessons/4-define-a-schema/lesson.adoc
@@ -3,14 +3,18 @@
 :order: 4
 :branch: main
 
-The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. This can lead to graphs that are non-specific and may be difficult to analyze and query.
+The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. 
+
+This can lead to graphs that are non-specific and may be difficult to analyze and query.
 
 In this lesson, you will modify the `SimpleKGPipeline` to use a custom schema for the knowledge graph.
 
 
 == Schema
 
-When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. This allows you to create a more structured and meaningful knowledge graph.
+When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. 
+
+This allows you to create a more structured and meaningful knowledge graph.
 
 You define a schema by expressing the desired nodes, relationships, or patterns you want to extract from the text.
 
@@ -62,38 +66,39 @@ You can also provide a description for each node label and associated properties
 include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=node_types]
 ----
 
-Run the program to create the knowledge graph with the defined nodes.
+Recreate the knowledge graph with the defined nodes:
 
-[TIP]
-.Remember to delete the existing graph before re-running the pipeline
-====
+. Delete any existing nodes and relationships.
++
 [source, cypher]
 .Delete the existing graph
 ----
 MATCH (n) DETACH DELETE n
 ----
-====
+. Run the program
++ 
+The graph will be constrained to only include the defined node labels.
 
-The graph created will be constrained to only include the defined node labels.
+View the entities and chunks in the graph using the following Cypher query:
 
 [source, cypher]
-.View the entities extracted from each chunk
+.Entities and Chunks
 ----
 MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
 RETURN p
 ----
 
 == Relationships
 
-You express required relationship types by providing a list of relationship types to the `SimpleKGPipeline`.
+You can define required relationship types by providing a list to the `SimpleKGPipeline`.
 
 [source, python]
 .RELATIONSHIP_TYPES
 ----
 include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=relationship_types]
 ----
 
-You can also provide patterns that define how nodes types are connected by relationships.
+You can also describe patterns that define how nodes are connected by relationships.
 
 [source, python]
 .PATTERNS
@@ -109,36 +114,49 @@ Nodes, relationships and patterns are all passed to the `SimpleKGPipeline` as th
 include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=kg_builder]
 ----
 
-Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of nodes, relationships and patterns relevant to the data.
-
-== Process all the documents
-
-When you are happy with the schema, you can modify the program to process all the PDF documents from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals[Neo4j and Generative AI Fundamentals course^]:
-
+[%collapsible]
+.Reveal the complete code
+====
 [source, python]
-.All PDFs
 ----
-include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=all_documents]
+include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tags=**;!simple_nodes;!all_documents]
 ----
+====
+
+Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of `NODES`, `RELATIONSHIPS` and `PATTERNS` relevant to the data.
+
+Recreate the knowledge graph:
+
+. Delete any existing nodes and relationships.
+. Run the program.
 
-You can run the program to create a knowledge graph based on all the documents using the defined schema.
 
 [%collapsible]
-.Reveal the complete code
+.Process all the documents?
 ====
+In the next lesson, you will add structured data to the knowledge graph, and process all of the documents.
+
+Optionally, you could modify the program now to process the documents from the `data` directory without the structured data:
+
 [source, python]
+.All PDFs
 ----
-include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=**,!simple_nodes]
+include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=all_documents]
 ----
 ====
 
-[TIP]
-.OpenAI Rate Limiting?
-====
-When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this.
-====
+== Explore
+
+Review the knowledge graph and observe how the defined schema has influenced the structure of the graph:
+
+[source, cypher]
+.Entities and Chunks
+----
+MATCH p = (c:Chunk)-[*..3]-(e:__Entity__)
+RETURN p
+----
 
-Review the knowledge graph and observe how the defined schema has influenced the structure of the graph.
+View the counts of documents, chunks and entities in the graph:
 
 [source, cypher]
 .Documents, Chunks, and Entity counts

diff --git a/...python/modules/2-knowledge-graph-pipeline/lessons/5-structured-data/lesson.adoc b/...python/modules/2-knowledge-graph-pipeline/lessons/5-structured-data/lesson.adoc
@@ -11,7 +11,7 @@ Combining the structured and unstructured data can enhance the knowledge graph's
 .Lexical and Domain Graphs
 The unstructured part of your graph is known as the link:https://graphrag.com/reference/knowledge-graph/lexical-graph/[Lexical Graph], while the structured part is known as the link:https://graphrag.com/reference/knowledge-graph/domain-graph/[Domain Graph].
 
-== Load from CSV file
+== Structured data source
 
 The repository contains a sample CSV file `genai-graphrag-python/data/docs.csv` which contains metadata about the lessons the documents were created from.
 
@@ -24,6 +24,8 @@ genai-fundamentals_1-generative-ai_2-considerations.pdf,genai-fundamentals,1-gen
 ...
 ----
 
+=== Load from CSV file
+
 You can use the CSV file as input and a structured data source when creating the knowledge graph.
 
 Open `genai-graphrag-python/kg_structured_builder.py` and review the code.
@@ -73,18 +75,22 @@ image::images/kg-builder-structured-model.svg["A data model showing Lesson nodes
 
 Run the program to create the knowledge graph with the structured data.
 
-[TIP]
-.Clear the graph before importing
+[NOTE]
+.Remember to delete the existing graph before re-running the pipeline
 ====
-Remember to clear the database before running the program to avoid inconsistent data.
-
 [source, cypher]
-.Delete all
+.Delete the existing graph
 ----
 MATCH (n) DETACH DELETE n
 ----
 ====
 
+[TIP]
+.OpenAI Rate Limiting?
+====
+When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this.
+====
+
 == Explore the structured data
 
 The structured data allows you to query the knowledge graph in new ways.
@@ -106,7 +112,9 @@ The knowledge graph allows you to summarize the content of each lesson by specif
 .Summarize lesson content
 ----
 MATCH (lesson:Lesson)<-[:PDF_OF]-(:Document)<-[:FROM_DOCUMENT]-(c:Chunk)
-RETURN lesson.name,
+RETURN 
+  lesson.name,
+  lesson.url,
   [ (c)<-[:FROM_CHUNK]-(tech:Technology) | tech.name ] AS technologies,
   [ (c)<-[:FROM_CHUNK]-(concept:Concept) | concept.name ] AS concepts
 ----

diff --git a/...aphrag-python/modules/3-retrieval/lessons/1-vector-cypher-retriever/lesson.adoc b/...aphrag-python/modules/3-retrieval/lessons/1-vector-cypher-retriever/lesson.adoc
@@ -5,7 +5,7 @@
 
 The chunks in the knowledge graph include vector embeddings that allow for similarity search based on vector distance.
 
-You can create a vector retriever that uses these embeddings to find the most relevant chunks for a given query.
+In this lesson you will create a vector retriever that uses these embeddings to find the most relevant chunks for a given query.
 
 The retriever can then use the structured and unstructured data in the knowledge graph to provide additional context.
 
@@ -114,6 +114,50 @@ The retrieval query includes additional context relating to technologies and con
 
 Experiment asking different questions relating to the knowledge graph such as _"What technologies and concepts support knowledge graphs?"_.
 
+=== Generalize entity retrieval
+
+The retriever currently uses the knowledge graph to add additional context related to technologies and concepts.
+The specific entities allow for targeted retrieval, however you may also want to generalize the retrieval to include all related entities.
+
+You can use the node labels and relationship types to create a response that includes details about the entities.
+
+This cypher query retrieves all related entities between the chunks:
+
+[source, cypher]
+.Related entities
+----
+MATCH (c:Chunk)<-[:FROM_CHUNK]-(entity)-[r]->(other)-[:FROM_CHUNK]->()
+RETURN DISTINCT 
+  labels(entity)[2], entity.name, entity.type, entity.description, 
+  type(r), 
+  labels(other)[2], other.name, other.type, other.description
+----
+
+The output uses the node labels, properties, and relationship types to output rows which form statements such as:
+
+* `Concept` "Semantic Search" `RELATED_TO` `Technology` "Vector Indexes"
+* `Technology` "Retrieval Augmented Generation" `HAS_CHALLENGE` "Understanding what the user is asking for and finding the correct information to pass to the LLM"`
+
+These statements can be used to create additional context for the LLM to generate responses.
+
+Modify the `retrieval_query` to include all entities associated with the chunk:
+
+[source, python]
+.Enhanced retrieval query with all related entities
+----
+include::{repository-raw}/{branch}/genai-graphrag-python/solutions/vector_cypher_rag.py[tag=advanced_retrieval_query]
+----
+
+[TIP]
+.Format the context
+====
+The Cypher functions `reduce` and `coalesce` are used to format the associated entities into readable statements. The `reduce` function adds space characters between the values, and `coalesce` replaces null values with empty strings. 
+====
+
+== Experiment
+
+Experiment running the code with different queries to see how the additional context changes the responses.
+
 [.quiz]
 == Check your understanding
 

diff --git a/...phrag-python/modules/3-retrieval/lessons/2-text-to-cypher-retriever/lesson.adoc b/...phrag-python/modules/3-retrieval/lessons/2-text-to-cypher-retriever/lesson.adoc
@@ -8,6 +8,8 @@ The `Text2CypherRetriever` retriever allows you to create `GraphRAG` pipelines t
 
 Using text to cypher retrieval can help you get precise information from the knowledge graph based on user questions. For example, how many lessons are in a course, what concepts are covered in a module, or how technologies relate to each other.
 
+In this lesson, you will create a text to cypher retriever and use it to answer questions about the data in knowledge graph.
+
 == Create a Text2CypherRetriever GraphRAG pipeline
 
 Open `genai-graphrag-python/text2cypher_rag.py` and review the code.