diff --git a/asciidoc/courses/genai-fundamentals/modules/2-rag/lessons/2-vector-search/lesson.adoc b/asciidoc/courses/genai-fundamentals/modules/2-rag/lessons/2-vector-search/lesson.adoc index 96cceedc4..1d7480c0b 100644 --- a/asciidoc/courses/genai-fundamentals/modules/2-rag/lessons/2-vector-search/lesson.adoc +++ b/asciidoc/courses/genai-fundamentals/modules/2-rag/lessons/2-vector-search/lesson.adoc @@ -154,6 +154,6 @@ include::questions/1-embeddings.adoc[leveloffset=+1] [.summary] == Lesson Summary -In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevent information. +In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevant information. In the next lesson, you will use a vector index in Neo4j to find relevant data. diff --git a/asciidoc/courses/genai-graphrag-python/modules/1-introduction/lessons/1-knowledge-graph-construction/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/1-introduction/lessons/1-knowledge-graph-construction/lesson.adoc index a20252881..bf7ba9240 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/1-introduction/lessons/1-knowledge-graph-construction/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/1-introduction/lessons/1-knowledge-graph-construction/lesson.adoc @@ -67,22 +67,44 @@ The _names_ would be the node and relationship identifiers. If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would: -. **Gather** the text from the page. + -+ image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"] + +. **Gather** the text from the page. ++ + Neo4j is a graph database management system (GDBMS) developed by + Neo4j Inc. + + The data elements Neo4j stores are nodes, edges connecting them + and attributes of nodes and edges. Described by its developers + as an ACID-compliant transactional database with native graph + storage and processing... + . Split the text into **chunks**. + - Neo4j is a graph database management system (GDBMS) developed + Neo4j is a graph database management system (GDBMS) developed by Neo4j Inc. + {sp} + - The data elements Neo4j stores are nodes, edges connecting them, - and attributes of nodes and edges... + The data elements Neo4j stores are nodes, edges connecting them + and attributes of nodes and edges. ++ +{sp} ++ + Described by its developers as an ACID-compliant transactional + database with native graph storage and processing... . Generate **embeddings** and **vectors** for each chunk. + [0.21972137987, 0.12345678901, 0.98765432109, ...] ++ +{sp} ++ + [0.34567890123, 0.23456789012, 0.87654321098, ...] ++ +{sp} ++ + [0.45678901234, 0.34567890123, 0.76543210987, ...] . **Extract** the entities and relationships using an **LLM**. + diff --git a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/2-create-a-graph/images/entities-graph.png b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/2-create-a-graph/images/entities-graph.png index 58c527cc5..c4ec1a270 100644 Binary files a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/2-create-a-graph/images/entities-graph.png and b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/2-create-a-graph/images/entities-graph.png differ diff --git a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/3-chunk-size/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/3-chunk-size/lesson.adoc index e231dd2ad..5da5d6498 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/3-chunk-size/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/3-chunk-size/lesson.adoc @@ -3,7 +3,11 @@ :order: 3 :branch: main -The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents. By default, the chunk size is quite large, which may result in fewer, larger chunks. The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data. +The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents. + +By default, the chunk size is quite large, which may result in fewer, larger chunks. + +The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data. In this lesson, you will modify the `SimpleKGPipeline` to use a different chunk size. @@ -51,15 +55,19 @@ include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_sp Run the modified pipeline to recreate the knowledge graph with the new chunk size. +== Explore + +You can view the documents and the associated chunk using the following Cypher query: + [source, cypher] .View the documents and chunks ---- MATCH (d:Document)<-[:FROM_DOCUMENT]-(c:Chunk) -RETURN d.path, c.index, c.text +RETURN d.path, c.index, c.text, size(c.text) ORDER BY d.path, c.index ---- -You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph. +View the entities extracted from each chunk using the following Cypher query: [source, cypher] .View the entities extracted from each chunk @@ -68,6 +76,11 @@ MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) RETURN p ---- +[TIP] +==== +You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph. +==== + [.quiz] == Check your understanding diff --git a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/4-define-a-schema/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/4-define-a-schema/lesson.adoc index 3f1fe4ef1..908ee3423 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/4-define-a-schema/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/4-define-a-schema/lesson.adoc @@ -3,14 +3,18 @@ :order: 4 :branch: main -The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. This can lead to graphs that are non-specific and may be difficult to analyze and query. +The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. + +This can lead to graphs that are non-specific and may be difficult to analyze and query. In this lesson, you will modify the `SimpleKGPipeline` to use a custom schema for the knowledge graph. == Schema -When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. This allows you to create a more structured and meaningful knowledge graph. +When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. + +This allows you to create a more structured and meaningful knowledge graph. You define a schema by expressing the desired nodes, relationships, or patterns you want to extract from the text. @@ -62,22 +66,23 @@ You can also provide a description for each node label and associated properties include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=node_types] ---- -Run the program to create the knowledge graph with the defined nodes. +Recreate the knowledge graph with the defined nodes: -[TIP] -.Remember to delete the existing graph before re-running the pipeline -==== +. Delete any existing nodes and relationships. ++ [source, cypher] .Delete the existing graph ---- MATCH (n) DETACH DELETE n ---- -==== +. Run the program ++ +The graph will be constrained to only include the defined node labels. -The graph created will be constrained to only include the defined node labels. +View the entities and chunks in the graph using the following Cypher query: [source, cypher] -.View the entities extracted from each chunk +.Entities and Chunks ---- MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) RETURN p @@ -85,7 +90,7 @@ RETURN p == Relationships -You express required relationship types by providing a list of relationship types to the `SimpleKGPipeline`. +You can define required relationship types by providing a list to the `SimpleKGPipeline`. [source, python] .RELATIONSHIP_TYPES @@ -93,7 +98,7 @@ You express required relationship types by providing a list of relationship type include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=relationship_types] ---- -You can also provide patterns that define how nodes types are connected by relationships. +You can also describe patterns that define how nodes are connected by relationships. [source, python] .PATTERNS @@ -109,36 +114,49 @@ Nodes, relationships and patterns are all passed to the `SimpleKGPipeline` as th include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=kg_builder] ---- -Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of nodes, relationships and patterns relevant to the data. - -== Process all the documents - -When you are happy with the schema, you can modify the program to process all the PDF documents from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals[Neo4j and Generative AI Fundamentals course^]: - +[%collapsible] +.Reveal the complete code +==== [source, python] -.All PDFs ---- -include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=all_documents] +include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tags=**;!simple_nodes;!all_documents] ---- +==== + +Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of `NODES`, `RELATIONSHIPS` and `PATTERNS` relevant to the data. + +Recreate the knowledge graph: + +. Delete any existing nodes and relationships. +. Run the program. -You can run the program to create a knowledge graph based on all the documents using the defined schema. [%collapsible] -.Reveal the complete code +.Process all the documents? ==== +In the next lesson, you will add structured data to the knowledge graph, and process all of the documents. + +Optionally, you could modify the program now to process the documents from the `data` directory without the structured data: + [source, python] +.All PDFs ---- -include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=**,!simple_nodes] +include::{repository-raw}/{branch}/genai-graphrag-python/solutions/kg_builder_schema.py[tag=all_documents] ---- ==== -[TIP] -.OpenAI Rate Limiting? -==== -When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this. -==== +== Explore + +Review the knowledge graph and observe how the defined schema has influenced the structure of the graph: + +[source, cypher] +.Entities and Chunks +---- +MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) +RETURN p +---- -Review the knowledge graph and observe how the defined schema has influenced the structure of the graph. +View the counts of documents, chunks and entities in the graph: [source, cypher] .Documents, Chunks, and Entity counts diff --git a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/5-structured-data/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/5-structured-data/lesson.adoc index a5c0baa1e..c27fe7ff8 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/5-structured-data/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/2-knowledge-graph-pipeline/lessons/5-structured-data/lesson.adoc @@ -11,7 +11,7 @@ Combining the structured and unstructured data can enhance the knowledge graph's .Lexical and Domain Graphs The unstructured part of your graph is known as the link:https://graphrag.com/reference/knowledge-graph/lexical-graph/[Lexical Graph], while the structured part is known as the link:https://graphrag.com/reference/knowledge-graph/domain-graph/[Domain Graph]. -== Load from CSV file +== Structured data source The repository contains a sample CSV file `genai-graphrag-python/data/docs.csv` which contains metadata about the lessons the documents were created from. @@ -24,6 +24,8 @@ genai-fundamentals_1-generative-ai_2-considerations.pdf,genai-fundamentals,1-gen ... ---- +=== Load from CSV file + You can use the CSV file as input and a structured data source when creating the knowledge graph. Open `genai-graphrag-python/kg_structured_builder.py` and review the code. @@ -73,18 +75,22 @@ image::images/kg-builder-structured-model.svg["A data model showing Lesson nodes Run the program to create the knowledge graph with the structured data. -[TIP] -.Clear the graph before importing +[NOTE] +.Remember to delete the existing graph before re-running the pipeline ==== -Remember to clear the database before running the program to avoid inconsistent data. - [source, cypher] -.Delete all +.Delete the existing graph ---- MATCH (n) DETACH DELETE n ---- ==== +[TIP] +.OpenAI Rate Limiting? +==== +When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this. +==== + == Explore the structured data The structured data allows you to query the knowledge graph in new ways. @@ -106,7 +112,9 @@ The knowledge graph allows you to summarize the content of each lesson by specif .Summarize lesson content ---- MATCH (lesson:Lesson)<-[:PDF_OF]-(:Document)<-[:FROM_DOCUMENT]-(c:Chunk) -RETURN lesson.name, +RETURN + lesson.name, + lesson.url, [ (c)<-[:FROM_CHUNK]-(tech:Technology) | tech.name ] AS technologies, [ (c)<-[:FROM_CHUNK]-(concept:Concept) | concept.name ] AS concepts ---- diff --git a/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/1-vector-cypher-retriever/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/1-vector-cypher-retriever/lesson.adoc index 7083ef8c5..3bfaa03e4 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/1-vector-cypher-retriever/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/1-vector-cypher-retriever/lesson.adoc @@ -5,7 +5,7 @@ The chunks in the knowledge graph include vector embeddings that allow for similarity search based on vector distance. -You can create a vector retriever that uses these embeddings to find the most relevant chunks for a given query. +In this lesson you will create a vector retriever that uses these embeddings to find the most relevant chunks for a given query. The retriever can then use the structured and unstructured data in the knowledge graph to provide additional context. @@ -114,6 +114,50 @@ The retrieval query includes additional context relating to technologies and con Experiment asking different questions relating to the knowledge graph such as _"What technologies and concepts support knowledge graphs?"_. +=== Generalize entity retrieval + +The retriever currently uses the knowledge graph to add additional context related to technologies and concepts. +The specific entities allow for targeted retrieval, however you may also want to generalize the retrieval to include all related entities. + +You can use the node labels and relationship types to create a response that includes details about the entities. + +This cypher query retrieves all related entities between the chunks: + +[source, cypher] +.Related entities +---- +MATCH (c:Chunk)<-[:FROM_CHUNK]-(entity)-[r]->(other)-[:FROM_CHUNK]->() +RETURN DISTINCT + labels(entity)[2], entity.name, entity.type, entity.description, + type(r), + labels(other)[2], other.name, other.type, other.description +---- + +The output uses the node labels, properties, and relationship types to output rows which form statements such as: + +* `Concept` "Semantic Search" `RELATED_TO` `Technology` "Vector Indexes" +* `Technology` "Retrieval Augmented Generation" `HAS_CHALLENGE` "Understanding what the user is asking for and finding the correct information to pass to the LLM"` + +These statements can be used to create additional context for the LLM to generate responses. + +Modify the `retrieval_query` to include all entities associated with the chunk: + +[source, python] +.Enhanced retrieval query with all related entities +---- +include::{repository-raw}/{branch}/genai-graphrag-python/solutions/vector_cypher_rag.py[tag=advanced_retrieval_query] +---- + +[TIP] +.Format the context +==== +The Cypher functions `reduce` and `coalesce` are used to format the associated entities into readable statements. The `reduce` function adds space characters between the values, and `coalesce` replaces null values with empty strings. +==== + +== Experiment + +Experiment running the code with different queries to see how the additional context changes the responses. + [.quiz] == Check your understanding diff --git a/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/2-text-to-cypher-retriever/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/2-text-to-cypher-retriever/lesson.adoc index 224c24708..16a577d90 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/2-text-to-cypher-retriever/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/3-retrieval/lessons/2-text-to-cypher-retriever/lesson.adoc @@ -8,6 +8,8 @@ The `Text2CypherRetriever` retriever allows you to create `GraphRAG` pipelines t Using text to cypher retrieval can help you get precise information from the knowledge graph based on user questions. For example, how many lessons are in a course, what concepts are covered in a module, or how technologies relate to each other. +In this lesson, you will create a text to cypher retriever and use it to answer questions about the data in knowledge graph. + == Create a Text2CypherRetriever GraphRAG pipeline Open `genai-graphrag-python/text2cypher_rag.py` and review the code. diff --git a/asciidoc/courses/genai-graphrag-python/modules/4-customisation/lessons/1-loading-data/lesson.adoc b/asciidoc/courses/genai-graphrag-python/modules/4-customisation/lessons/1-loading-data/lesson.adoc index 4edc5f0b7..0ab2110c8 100644 --- a/asciidoc/courses/genai-graphrag-python/modules/4-customisation/lessons/1-loading-data/lesson.adoc +++ b/asciidoc/courses/genai-graphrag-python/modules/4-customisation/lessons/1-loading-data/lesson.adoc @@ -46,6 +46,7 @@ You can run the custom loader directly to verify that it is working: [source,python] ---- +include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_custom_pdf.py[tag=pdf_file] include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_custom_pdf.py[tag=run_loader] ---- @@ -63,7 +64,7 @@ This example code shows how to create and use the `CustomPDFLoader` in a `Simple [source, python] ---- -include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_custom_pdf.py[tag=**,!run_loader] +include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_custom_pdf.py[tag=**;!run_loader] ---- ==== @@ -92,6 +93,7 @@ You can run the text loader directly to verify that it is working: [source,python] ---- +include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_text_file.py[tag=pdf_file] include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_text_file.py[tag=run_loader] ---- @@ -102,7 +104,7 @@ This example code shows how to create and use the `TextLoader` in a `SimpleKGPip [source, python] ---- -include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_text_file.py[tag=**,!run_loader] +include::{repository-raw}/{branch}/genai-graphrag-python/examples/data_loader_text_file.py[tag=**;!run_loader] ---- ==== diff --git a/asciidoc/courses/llm-knowledge-graph-construction/modules/2-llm-graph-builder/lessons/1-construction-process/lesson.adoc b/asciidoc/courses/llm-knowledge-graph-construction/modules/2-llm-graph-builder/lessons/1-construction-process/lesson.adoc index 9f05319f1..525a97095 100644 --- a/asciidoc/courses/llm-knowledge-graph-construction/modules/2-llm-graph-builder/lessons/1-construction-process/lesson.adoc +++ b/asciidoc/courses/llm-knowledge-graph-construction/modules/2-llm-graph-builder/lessons/1-construction-process/lesson.adoc @@ -67,9 +67,18 @@ The _names_ would be the node and relationship identifiers. If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would: -. **Gather** the text from the page. + +image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page"] + +. **Gather** the text from the page. + -image::images/neo4j-wiki.png[A screenshot of the Neo4j wiki page] + Neo4j is a graph database management system (GDBMS) developed by + Neo4j Inc. + + The data elements Neo4j stores are nodes, edges connecting them + and attributes of nodes and edges. Described by its developers + as an ACID-compliant transactional database with native graph + storage and processing... + . Split the text into **chunks**. + Neo4j is a graph database management system (GDBMS) developed @@ -77,25 +86,38 @@ image::images/neo4j-wiki.png[A screenshot of the Neo4j wiki page] + {sp} + - The data elements Neo4j stores are nodes, edges connecting them, - and attributes of nodes and edges... + The data elements Neo4j stores are nodes, edges connecting them + and attributes of nodes and edges. ++ +{sp} ++ + Described by its developers as an ACID-compliant transactional + database with native graph storage and processing... . Generate **embeddings** and **vectors** for each chunk. + [0.21972137987, 0.12345678901, 0.98765432109, ...] ++ +{sp} ++ + [0.34567890123, 0.23456789012, 0.87654321098, ...] ++ +{sp} ++ + [0.45678901234, 0.34567890123, 0.76543210987, ...] . **Extract** the entities and relationships using an **LLM**. + Send the text to the LLM with an appropriate prompt, for example: + - Your task is to identify the entities and relations requested - with the user prompt from a given text. You must generate the + Your task is to identify the entities and relations requested + with the user prompt from a given text. You must generate the output in a JSON format containing a list with JSON objects. Text: {text} + -Parse the entities and relationships output by the LLM.: +Parse the entities and relationships output by the LLM. + [source, json] ---- diff --git a/asciidoc/courses/workshop-genai2/course.adoc b/asciidoc/courses/workshop-genai2/course.adoc new file mode 100644 index 000000000..aaf6c3a88 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/course.adoc @@ -0,0 +1,30 @@ += Neo4j and Generative AI Workshop +:status: active +:duration: 3 hours +:caption: Learn how Neo4j and GraphRAG can support your Generative AI projects +:usecase: blank-sandbox +:key-points: Generative AI, GraphRAG, Knowledge Graph Construction, Vectors and Text to Cypher Retrievers, Agents +:categories: workshops +:repository: neo4j-graphacademy/workshop-genai2 + + +== Course Description + + +Welcome to this Neo4j course. + +=== Prerequisites + +To take this course we recommend that you have taken these beginner courses in GraphAcademy: + +* link:/courses/neo4j-fundamentals/[Neo4j Fundamentals^] + + +=== Duration + +{duration} + + +=== What you will learn + +* Generative AI diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/confused-llm.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/confused-llm.svg new file mode 100644 index 000000000..7dd0f1976 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/confused-llm.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/genai-model-process.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/genai-model-process.svg new file mode 100644 index 000000000..013292b00 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/genai-model-process.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-blackbox.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-blackbox.svg new file mode 100644 index 000000000..b40be224a --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-blackbox.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-missing-data.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-missing-data.svg new file mode 100644 index 000000000..7804baee7 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-missing-data.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-document-results.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-document-results.svg new file mode 100644 index 000000000..6ca24a39a --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-document-results.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-document.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-document.svg new file mode 100644 index 000000000..744beda38 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-document.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-interaction.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-interaction.svg new file mode 100644 index 000000000..4b457cbfb --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/images/llm-prompt-interaction.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/lesson.adoc new file mode 100644 index 000000000..03bfdd019 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/1-what-is-genai/lesson.adoc @@ -0,0 +1,167 @@ += What is Generative AI +:order: 1 +:type: lesson +:slides: true + +[.slide] +== GenAI + +Generative AI (or GenAI) refers to artificial intelligence systems designed to create new content that resembles human-made data. The data could be text, images, audio, or code. + +[.transcript-only] +==== +These models, like GPT (for text) or DALL-E (for images), are trained on large datasets and use patterns learned from this data to generate new output. +==== + +image::images/genai-model-process.svg[A diagram showing the process of Generative AI, where a model is trained on a large dataset, learns patterns, and generates new content based on those patterns.] + +[.transcript-only] +==== +Generative AI is widely used in applications such as chatbots, content creation, image synthesis, and code generation. +==== + +[.slide.discrete] +== GenAI + +Generative AI models are not "intelligent" in the way humans are: + +. They do not understand or comprehend the content they generate +. They rely on statistical patterns and correlations learned from their training data. + +While Generative AI models can produce coherent and contextually relevant outputs, they lack understanding. + +[.slide] +== Large Language Models (LLMs) + +This course will focus on text-generating models, specifically Large Language Models (LLMs) + +LLMs are a type of generative AI model designed to understand and generate human-like text. + +These models are trained on vast amounts of text data and can perform various tasks, including answering questions, summarizing data, and analyzing text. + +[.slide.discrete] +== LLM Responses + +The response generated by an LLM is a probabilistic continuation of the instructions it receives. + +The LLM provides the most likely response based on the patterns it has learned from its training data. + +If presented with the instruction: + + "Continue this sequence - A B C" + +An LLM could respond: + + "D E F" + +[.slide.col-2] +== Prompts + +[.col] +==== +To get an LLM to perform a task, you provide a **prompt**. + +A prompt should specify your requirements and provide clear instructions on how to respond. +==== + +[.col] +image::images/llm-prompt-interaction.svg["A user asks an LLM the question 'What is an LLM? Give the response using simple language avoiding jargon.', the LLM responds with a simple definition of an LLM."] + +[.slide.discrete.col-2] +== Caution + +[.col] +==== +While GenAI and LLMs provide a lot of potential, you should also be cautious. + +At their core, LLMs are highly complex predictive text machines. LLM’s don’t know or understand the information they output; they simply predict the next word in a sequence. + +The words are based on the patterns and relationships from other text in the training data. +==== + +[.col] +image::images/llm-blackbox.svg["An LLM as a black box, responding to the question 'How did you determine that answer?' with 'I don't know.'"] + + +[.slide] +== Access to Data + +The sources for this training data are often the internet, books, and other publicly available text. +The data could be of questionable quality or even incorrect. + +Training happens at a point in time, the data is static, and may not reflect the current state of the world or include any private information. + +[.slide.discrete] +== Access to Data + +When prompted to provide a response, relating to new or data not in the training set, the LLM may provide a response that is not accurate. + +image::images/llm-missing-data.svg["A diagram of an LLM returning out of data information."] + +[.slide.col-2] +== Accuracy + +[.col] +==== +LLMs are designed to create human-like text and are often fine-tuned to be as helpful as possible, even if that means occasionally generating misleading or baseless content, a phenomenon known as **hallucination**. + +For example, when asked to _"Describe the moon."_ an LLM may respond with _"The moon is made of cheese."_. +While this is a common saying, it is not true. +==== + +[.col] +image::images/confused-llm.svg["A diagram of a confused LLM with a question mark thinking about the moon and cheese."] + +[.transcript-only] +==== +While LLMs can represent the essence of words and phrases, they don't possess a genuine understanding or ethical judgment of the content. +==== + +[.slide.discrete] +== Improving LLM responses + +You can improve the accuracy of responses from LLMs by providing _context_ in your prompts. + +The context could include relevant information, data, or details that help the model generate more accurate and relevant responses. + +[.slide.col-2] +== Avoiding hallucination + +[.col] +==== +Providing context can help minimize hallucinations by anchoring the model’s response to the facts and details you supply. + +If you ask a model to summarize a company's performance, the model is more likely to produce an accurate summary if you include a relevant stock market report in your prompt. +==== + +[.col] +image::images/llm-prompt-document.svg["A diagram of an LLM being passed a stock market report and being asked to summarise a company's performance."] + +[.slide.col-2] +== Access to data + +[.col] +==== +LLMs have a fixed knowledge cutoff and cannot access real-time or proprietary data unless it is provided in the prompt. + +If you need the model to answer questions about recent events or organization-specific information, you must supply that data as part of your prompt. This ensures that the model’s responses are up-to-date and relevant to your particular use case. + +You could also provide statistics or data points in the prompt to help the model include useful facts in its response. +==== + +[.col] +image::images/llm-prompt-document-results.svg["A diagram of an LLM being passed a stock market report and the annual results, being asked to summarize a company's performance. The response includes a specific profit figure from the annual results."] + +[.slide] +== Supplying context + +Supplying context in your prompts helps LLMs generate more *accurate*, *relevant*, and *trustworthy* responses by *reducing hallucinations* and *compensating for the lack of access to data*. + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned about Generative AI models, their capabilities, constraints, and how providing context in your prompts can help improve the accuracy of LLM responses. + +In the next lesson, you will learn about Retrieval-Augmented Generation (RAG), GraphRAG, and how they can be used to provide context to LLMs. diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-news-agency-knowledge-graph.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-news-agency-knowledge-graph.svg new file mode 100644 index 000000000..35a15f919 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-news-agency-knowledge-graph.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-news-agency.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-news-agency.svg new file mode 100644 index 000000000..77f805538 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-news-agency.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-rag-process.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-rag-process.svg new file mode 100644 index 000000000..22235d46d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-rag-process.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-rag-vector+graph-process.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-rag-vector+graph-process.svg new file mode 100644 index 000000000..d988c79e4 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-rag-vector+graph-process.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-text-to-cypher-process.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-text-to-cypher-process.svg new file mode 100644 index 000000000..0b7254062 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/images/llm-text-to-cypher-process.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/lesson.adoc new file mode 100644 index 000000000..69c9a3b1f --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/2-graphrag/lesson.adoc @@ -0,0 +1,215 @@ += GraphRAG? +:order: 1 +:type: lesson + +[.slide] +== Retrieval-Augmented Generation + +Retrieval-Augmented Generation (RAG) is an approach that enhances the responses of LLMs by providing them with relevant, up-to-date information retrieved from external sources. + +RAG helps generate more accurate and tailored answers, especially when the required information is not present in the model’s training data. + +[.slide.col-2] +== The Retrieval-Augmented Generation (RAG) Process + +[.col] +==== +The RAG process typically involves three main steps: + +. **Understanding the User Query** ++ +The system first interprets the user’s input or question to determine what information is needed. +. **Information Retrieval** ++ +A _retriever_ searches external data sources (such as documents, databases, or knowledge graphs) to find relevant information based on the user’s query. +. **Response Generation** ++ +The retrieved information is inserted into the prompt, and the language model uses this context to generate a more accurate and relevant response. +==== + +[.col] +image::images/llm-rag-process.svg["A diagram showing the RAG process. A question from a user is sent to a retriever, which searches for relevant information. The retrieved information is then combined with the original question and sent to a language model, which generates a response."] + +[.slide.col-2.discrete] +== The RAG Process + +[.col] +==== +RAG systems can provide responses that are both contextually aware and grounded in real, up-to-date information. + +If building a chatbot for a news agency, you could use RAG to pull real-time headlines or results from a news API. + +When a user asks, "What’s the latest news on the Olympics?", the chatbot, can provide a current headline or summary from the most recent articles, ensuring the response is timely and accurate. + +[NOTE] +.Grounding +===== +The process of providing context to an LLM to improve the accuracy of its responses and reduce the likelihood of hallucinations is known as _Grounding_. +===== +==== + +[.col] +image::images/llm-news-agency.svg["A news agency chatbot, showing the user asking a question, the chatbot grounding the question with a news API, and the chatbot responding with the latest news."] + + +[.slide] +== Retrievers + +The retriever is a key component of the RAG process. A retriever is responsible for searching and retrieving relevant information from external data sources based on the user’s query. + +A retriever typically takes an *unstructured input* (like a question or prompt) and searches for structured data that can provide context or answers. + +Neo4j support various methods for building retrievers, including: + +* Full-text search +* Vector search +* Text to Cypher + +You will explore these methods in the rest of the course. + +[.slide] +== Data sources + +The data sources used in the RAG process can vary widely, depending on the application and the type of information needed. Common data sources include: + +* **Documents** ++ +Textual data sources, such as articles, reports, or manuals, that can be searched for relevant information. +* **APIs** ++ +External services that can provide real-time data or specific information based on user queries. +* **Knowledge Graphs** ++ +Graph-based representations of information that can provide context and relationships between entities. + +[.slide.col-2.discrete] +== Data sources + +[.col] +==== +The news agency chatbot could use the following data sources: + +* A news API to retrieve the latest articles or headlines. +* A knowledge graph to understand the relationships between different news topics, such as how they relate to each other or their historical context. This would help the chatbot provide more in-depth and contextual responses. +* A document database to store and retrieve articles, reports, or other textual data that can be used to answer user queries. +==== + +[.col] +image::images/llm-news-agency-knowledge-graph.svg["A news agency chatbot, showing the user asking a question, the chatbot grounding with the addition of data from a knowledge graph"] + +[.transcript-only] +==== +[TIP] +.Learn more about knowledge graphs +===== +You will learn more about knowledge graphs and their construction in the next module. +===== +==== + +[.slide.discrete] +== GraphRAG + +GraphRAG (Graph Retrieval Augmented Generation) is an approach that uses the strengths of graph databases to provide relevant and useful context to LLMs. + +GraphRAG can be used in conjunction with vector RAG. + +While vector RAG uses embeddings to find contextually relevant information, GraphRAG enhances this process by leveraging the relationships and structure within a graph. + +[.slide.discrete] +== GraphRAG + +Benefits of GraphRAG: + +* *Richer Context* ++ +Graphs capture relationships between entities, enabling retrieval of more relevant and connected information. +* *Improved Accuracy* ++ +By combining vector similarity with graph traversal, results are more precise and context-aware. +* *Explainability* ++ +Graphs provide clear paths and connections, making it easier to understand why certain results were retrieved. +* *Flexible Queries*: ++ +GraphRAG supports complex queries, such as combining full-text, vector, and text-to-cypher searches. +* *Enhanced Reasoning* ++ +Graphs enable reasoning over data, supporting advanced use cases like recommendations and knowledge discovery. + +[.slide.col-60-40] +== Graph-Enhanced Vector Search + +[.col] +==== +A common approach to GraphRAG is to use a combination of vector search and graph traversal. + +This allows for the retrieval of relevant documents based on semantic similarity, followed by a graph traversal to find related entities or concepts. + +The high-level process is as follows: + +. A user submits a query. +. The system uses a vector search to find nodes similar to the users query. +. The graph is then traversed to find related nodes or entities. +. The entities and relationships are added to the context for the LLM. +. The related data could also be scored based on its relevance to the user query. +==== + +[.col] +image::images/llm-rag-vector+graph-process.svg[A diagram showing a user question being passed to a vector search to find semantically similar data. The results are then used to find related nodes or entities in the graph. The most relevant results are used as context for the LLM.] + +[.slide] +== Full Text Search + +Full text search is another powerful technique that can be combined with graph-enhanced search to further improve information retrieval. + +While vector search excels at finding semantically similar content, full text search allows users to match specific keywords or phrases within documents or nodes. + +If the user is looking for a movie or actor by name, full text search can quickly locate those entities based on exact text matches. + +Full text search can be used as a replacement for or in conjunction with vector search. + +When used in conjunction with vector search, full text search can refine results by filtering out irrelevant content based on specific keywords or phrases. + +[TIP] +.Learn more about full text search +Full text search is available in Neo4j using link:https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/full-text-indexes/[full-text indexes^]. + +[.slide.col-2] +== Text to Cypher + +[.col] +==== +Text to Cypher is an alternative approach in GraphRAG that allows users to express their information needs in natural language, which is then automatically translated into Cypher queries. + +You leverage the power of LLMs to interpret user intent and generate precise graph queries, enabling direct access to structured data and relationships within the graph. + +You can use text to Cypher to turn users' queries into complex searches, aggregations, or traversals, making advanced graph querying more accessible and flexible. + +Text to Cypher works by passing the user's query and the graph schema to an LLM, which generates a Cypher query that can be executed against the graph database. +==== + +[.col] +image::images/llm-text-to-cypher-process.svg[A diagram showing a user question being passed to an LLM, which generates a Cypher query based on the graph schema. The generated Cypher query is then executed against the graph database to retrieve relevant data.] + +[.transcript-only] +==== +[IMPORTANT] +.Exercise caution with LLM-generated queries +===== +Caution should always be taken when executing LLM-generated Cypher queries, as they may not always be safe or efficient. + +You are trusting the generation of Cypher to the LLM. +It may generate invalid Cypher queries that could corrupt data in the graph or provide access to sensitive information. + +In a production environment, you should ensure that access to data is limited, and sufficient security is in place to prevent malicious queries. +===== +==== + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned about RAG and GraphRAG techniques, and how they can be used to enhance information retrieval. + +In the next lesson, you will learn about knowledge graphs and how they represent real-world entities and their relationships. \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/generic-knowledge-graph.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/generic-knowledge-graph.svg new file mode 100644 index 000000000..1d672273d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/generic-knowledge-graph.svg @@ -0,0 +1 @@ +sourcesourcesourcesourcesourcesourceaboutaboutaboutrelates toSourceAchunk1chunk2chunk3SourceBchunk4chunk5chunk6topic itopic j \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/neo4j-google-knowledge-graph.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/neo4j-google-knowledge-graph.svg new file mode 100644 index 000000000..1e16b5dae --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/neo4j-google-knowledge-graph.svg @@ -0,0 +1,1417 @@ + + + + + + + + + + + + + + + + + FOUNDED + + + + + + + + + + + + + + LOCATED + + + + + + + + + + + + + + LOCATED + + + + + + + + + + + + + + CAPITAL_OF + + + + + + + + + + + + + + FOUNDER + + + + + + + + + + + + + + WORKS_AT + + + + + + + + + + + + + + ABOUT + + + + + + + + + + + + + + WRITTEN_BY + + + + + + + + + + + + + + OWNER_OF + + + + + + + + + + + + + + PUBLISHED ON + + + + + + + + + + + + + REFERS_TO + + + + + + + + + + + + + + + + + + + PUBLISHED_ON + + + + + + + + + + + + + Neo4j + + + + + + + Company + + + + + + + + + + + + + + + Malmo + + + + + + + Location + + + + + + + + + + + + + + + London + + + + + + + Location + + + + + + + + + + + + + + + Blog + Post + + + + + + + Article + + + + + + + + + + + + + + + United + Kingdom + + + + + + + Location + + + + + + + + + + + + + + + Emil + Eifrém + + + + + + + Person + + + + + + + + + + + + + + + neo4j.com + + + + + + + Website + + + + + + + + + + + + + + + Database + + + + + + + Technology + + + + + + + + + + + + + + + Documentation + + + + + + + Article + + + + + + + + + + + REFERS_TO + + + + + diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/org-principles-with-data.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/org-principles-with-data.svg new file mode 100644 index 000000000..2f18e8b7a --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/org-principles-with-data.svg @@ -0,0 +1 @@ +HAS_MODULEHAS_LESSONTYPE_OFTYPE_OFTYPE_OFTYPE_OFHAS_LESSONTYPE_OFTYPE_OFTYPE_OFTYPE_OFTYPE_OFCoursename:Neo4j & LLMModulename:Generative AIorder:1Lessonname:Neo4j & Generative AIorder:1Typename:quizTypename:challengeTypename:videoTypename:practicalContentLessonname:Grounding LLMSorder:2 \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/org-principles.svg b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/org-principles.svg new file mode 100644 index 000000000..4bf0f05c9 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/images/org-principles.svg @@ -0,0 +1 @@ +TYPE_OFTYPE_OFTYPE_OFTYPE_OFTypename:quizTypename:challengeTypename:videoTypename:practicalContent \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/lesson.adoc new file mode 100644 index 000000000..56e9bff8a --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/3-knowledge-graph/lesson.adoc @@ -0,0 +1,98 @@ += What is a Knowledge Graph +:order: 1 +:type: lesson + + +[.slide.discrete] +== Knowledge Graphs +[quote] +A knowledge graph is an organized representation of real-world entities and their relationships. + +Knowledge graphs provide a structured way to represent entities, their attributes, and their relationships, allowing for a comprehensive and interconnected understanding of the information. + +Knowledge graphs are useful for Generative AI applications because they provide structured, interconnected data that enhances context, reasoning, and accuracy in generated responses. + +[.slide.discrete.col-2] +== Search Engines + +[.col] +==== +Search engines typically use knowledge graphs to provide information about people, places, and things. + +This knowledge graph could represent Neo4j: +==== + +[.col] +image::images/neo4j-google-knowledge-graph.svg["An example of a knowledge graph of Neo4j showing the relationships between people, places, and things", width=90%] + +[.slide.discrete.col-2] +== Data sources + +[.col] +==== +Knowledge graphs can break down sources of information and integrate them, allowing you to see the relationships between the data. + +This integration from diverse sources gives knowledge graphs a more holistic view and facilitates complex queries, analytics, and insights. + +Knowledge graphs can readily adapt and evolve as they grow, taking on new information and structure changes. +==== + +[.col] +image::images/generic-knowledge-graph.svg[a diagram of an abstract knowledge graph showing how sources contain chunks of data about topics which can be related to other topics] + +[.transcript-only] +==== +Neo4j is well-suited for representing and querying complex, interconnected data in Knowledge Graphs. +Unlike traditional relational databases, which use tables and rows, Neo4j uses a graph-based model with nodes and relationships. +==== + +[.slide.col-2] +== Organizing principles + +[.col] +==== +A knowledge graph stores data and relationships alongside frameworks known as organizing principles. + +The organizing principles are the rules or categories around the data that provide structure to the data. +Organizing principles can range from simple data descriptions, for example, describing a GraphAcademy course as `course -> modules -> lessons`, to a complex vocabulary of the complete solution. + +Knowledge graphs are inherently flexible, and you can change the organizing principles as the data grows and changes. + +The organizing principles describing the content in GraphAcademy could look like this: +==== + +[.col] +image::images/org-principles.svg[A Graph showing 4 types of content] + +[.slide.discrete.col-40-60] +== Organizing principles + +[.col] +==== +The organizing principles are stored as nodes in the graph and can be stored alongside the actual data. + +This integration of organizing principles and data allows for complex queries and analytics to be performed. + +Mapping the organizing principles to the lesson content in GraphAcademy could look like this: +==== + +[.col] +image::images/org-principles-with-data.svg[A Graph showing the organizing principles and the lesson content] + +[.slide] +== Generative AI applications + +In Generative AI applications, knowledge graphs play a crucial role by capturing and organizing important domain-specific or proprietary company information. They are not limited to strictly structured data—knowledge graphs can also integrate and represent less organized or unstructured information. + +GraphRAG can use knowledge graphs for context, forming the foundation for applications that leverage proprietary or domain-specific data. By grounding responses in a knowledge graph, these applications can provide more accurate answers and greater _explainability_, thanks to the rich context and relationships present in the data. + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned about knowledge graphs and how they are an organized representation of real-world entities and their relationships. + +You can learn more in the Neo4j blog post link:https://neo4j.com/blog/what-is-knowledge-graph[What Is a Knowledge Graph?^]. + +In the next lesson, you will setup your development environment to use Neo4j's GraphRAG capabilities. diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/4-neo4j-graphrag/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/4-neo4j-graphrag/lesson.adoc new file mode 100644 index 000000000..f7bdc1ebd --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/lessons/4-neo4j-graphrag/lesson.adoc @@ -0,0 +1,92 @@ += GraphRAG for Python +:order: 1 +:type: lesson +:branch: new-workshop +:repository-dir-name: workshop-genai2 + +The link:https://neo4j.com/docs/neo4j-graphrag-python/current/[GraphRAG for Python^] package (`neo4j-graphrag`) allows you to access Neo4j Generative AI functions including: + +- Retrievers +- GraphRAG pipelines +- Knowledge graph construction + +The purpose is to provide a first party package to developers, where Neo4j can guarantee long term commitment and maintenance as well as being fast to ship new features and high performing patterns and methods. + +You will use the `neo4j-graphrag` package to create a knowledge graph, retrievers and implement simple applications that use GraphRAG to provide context to LLM queries. + +You must set up a development environment to run the code examples and exercises. + +include::../../../../../../shared/courses/codespace/get-started.adoc[] + +[%collapsible] +.Develop on your local machine +==== +You will need link:https://python.org[Python] installed and the ability to install packages using `pip`. + +You may want to set up a virtual environment using link:https://docs.python.org/3/library/venv.html[`venv`^] or link:https://virtualenv.pypa.io/en/latest/[`virtualenv`^] to keep your dependencies separate from other projects. + +Clone the link:{repository-link}[github.com/neo4j-graphacademy/{repository-dir-name}] repository: + +[source,bash] +[subs="verbatim,attributes"] +---- +git clone https://github.com/neo4j-graphacademy/{repository-dir-name} +---- + +Install the required packages using `pip`: + +[source,bash] +[subs="verbatim,attributes"] +---- +cd neo4j-graphacademy/{repository-dir-name} +pip install -r requirements.txt +---- + +You do not need to create a Neo4j database as you will use the provided sandbox instance. + +The sandbox uses Neo4j's GenAI functions, you can find out more about how to configure them in the link:https://neo4j.com/docs/cypher-manual/current/genai-integrations/[Neo4j GenAI integration documentation^]. +==== + +== Setup the environment + +Create a copy of the `.env.example` file and name it `.env`. +Fill in the required values. + +[source] +[subs="verbatim,attributes"] +.Create a .env file +---- +OPENAI_API_KEY="sk-..." +NEO4J_URI="{instance-scheme}://{instance-ip}:{instance-boltPort}" +NEO4J_USERNAME="{instance-username}" +NEO4J_PASSWORD="{instance-password}" +NEO4J_DATABASE="{instance-database}" +---- + +Add your Open AI API key (`OPENAI_API_KEY`), which you can get from link:https://platform.openai.com[platform.openai.com^]. + +// Update the Neo4j sandbox connection details: + +// NEO4J_URI:: [copy]#{instance-scheme}://{instance-ip}:{instance-boltPort}# +// NEO4J_USERNAME:: [copy]#{instance-username}# +// NEO4J_PASSWORD:: [copy]#{instance-password}# +// NEO4J_DATABASE:: [copy]#{instance-database}# + +== Test your setup + +You can test your setup by running `test_environment.py` - this will attempt to connect to the Neo4j sandbox and the OpenAI API. + +You will see an `OK` message if you have set up your environment correctly. If any tests fail, check the contents of the `.env` file. + +== Continue + +When you are ready, you can move on to the next task. + +read::Success - let's get started![] + +[.summary] +== Summary + +You have setup your environment and are ready to start coding. + +In the next module, you will use the `neo4j-graphrag` package to create a knowledge graph from structured and unstructured data using an LLM. diff --git a/asciidoc/courses/workshop-genai2/modules/1-generative-ai/module.adoc b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/module.adoc new file mode 100644 index 000000000..03a16f7b5 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/1-generative-ai/module.adoc @@ -0,0 +1,19 @@ += Generative AI +:order: 1 + +== Module Overview + +In this module, you will learn: + +* What Generative AI is and how it works, including: +** What Large Language Models (LLMs) are and how they differ from other AI models. +** The limitations of Generative AI models, including _hallucination_ and _access to data_. +** How providing context can improve the responses from Generative AI models. +* How you can use Retrieval Augmented Generation (RAG) to improve GenerativeAI model responses. +* The benefits of GraphRAG techniques for enhancing information retrieval. +* How knowledge graphs structure data to represent real-world entities and their relationships. + + +If you are ready, let's get going! + +link:./1-what-is-genai/[Ready? Let's go →, role=btn] diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/1-knowledge-graph-construction/images/neo4j-wiki.png b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/1-knowledge-graph-construction/images/neo4j-wiki.png new file mode 100644 index 000000000..ad16f46d1 Binary files /dev/null and b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/1-knowledge-graph-construction/images/neo4j-wiki.png differ diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/1-knowledge-graph-construction/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/1-knowledge-graph-construction/lesson.adoc new file mode 100644 index 000000000..148803336 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/1-knowledge-graph-construction/lesson.adoc @@ -0,0 +1,218 @@ += Constructing knowledge graphs +:type: lesson +:order: 1 + +[.slide.discrete] +== Lesson Overview + +In this lesson, you will review the process of constructing knowledge graphs from unstructured text using an LLM. + +[.slide] +== The construction process + +When constructing a knowledge graph from unstructured text, you typically follow these steps: + +. Gather the data +. Chunk the data +. _Vectorize_ the data +. Pass the data to an LLM to extract nodes and relationships +. Use the output to generate the graph + +[.transcript-only] +=== Gather your data sources + +The first step is to gather your unstructured data. +The data can be in the form of text documents, PDFs, publicly available data, or any other source of information. + +Depending on the format, you may need to reformat the data into a format (typically text) that the LLM can process. + +The data sources should contain the information you want to include in your knowledge graph. + +[.transcript-only] +=== Chunk the data + +The next step is to break down the data into _right-sized_ parts. +This process is known as _chunking_. + +The size of the chunks depends on the LLM you are using, the complexity of the data, and what you want to extract from the data. + +You may not need to chunk the data if the LLM can process the entire document at once and it fits your requirements. + +[.transcript-only] +=== Vectorize the data + +Depending on your requirements for querying and searching the data, you may need to create *vector embeddings*. +You can use any embedding model to create embeddings for each data chunk, but the same model must be used for all embeddings. + +Placing these vectors into a link:https://neo4j.com/docs/cypher-manual/current/indexes/semantic-indexes/vector-indexes/[Vector index^] allows you to perform semantic searches, similarity searches, and clustering on the data. + +[TIP] +.Chunking, Vectors, and Similarity Search +You can learn more about how to chunk documents, vectors, similarity search, and embeddings in the GraphAcademy course link:https://graphacademy.neo4j.com/courses/llm-vectors-unstructured/1-introduction/2-semantic-search/[Introduction to Vector Indexes and Unstructured Data^]. + +[.transcript-only] +=== Extract nodes and relationships + +The next step is to pass the unstructured text data to the LLM to extract the nodes and relationships. + +You should provide a suitable prompt that will instruct the LLM to: + +- Identify the entities in the text. +- Extract the relationships between the entities. +- Format the output so you can use it to generate the graph, for example, as JSON or another structured format. + +Optionally, you may also provide additional context or constraints for the extraction, such as the type of entities or relationships you are interested in extracting. + +[.transcript-only] +=== Generate the graph + +Finally, you can use the output from the LLM to generate the graph by creating the nodes and relationships within Neo4j. + +The entity and relationship types would become labels and relationship types in the graph. +The _names_ would be the node and relationship identifiers. + +[.slide.col-40-60] +== Example + +[.col] +==== +If you wanted to construct a knowledge graph based on the link:https://en.wikipedia.org/wiki/Neo4j[Neo4j Wikipedia page^], you would: +==== + +[.col] +image::images/neo4j-wiki.png["A screenshot of the Neo4j wiki page", width=80%] + +[.slide.discrete] +== Gather + +. **Gather** the text from the page. + + Neo4j is a graph database management system (GDBMS) developed by + Neo4j Inc. + + The data elements Neo4j stores are nodes, edges connecting them + and attributes of nodes and edges. Described by its developers + as an ACID-compliant transactional database with native graph + storage and processing... + +[.slide.discrete] +== Chunk +[start=2] +. Split the text into **chunks**. ++ + Neo4j is a graph database management system (GDBMS) developed + by Neo4j Inc. ++ +{sp} ++ + The data elements Neo4j stores are nodes, edges connecting them + and attributes of nodes and edges. ++ +{sp} ++ + Described by its developers as an ACID-compliant transactional + database with native graph storage and processing... + +[.slide.discrete] +== Embeddings +[start=3] +. Generate **embeddings** and **vectors** for each chunk. ++ + [0.21972137987, 0.12345678901, 0.98765432109, ...] ++ +{sp} ++ + [0.34567890123, 0.23456789012, 0.87654321098, ...] ++ +{sp} ++ + [0.45678901234, 0.34567890123, 0.76543210987, ...] + +[.slide.discrete] +== Extract +[start=4] +. **Extract** the entities and relationships using an **LLM**. ++ +Send the text to the LLM with an appropriate prompt, for example: ++ + Your task is to identify the entities and relations requested + with the user prompt from a given text. You must generate the + output in a JSON format containing a list with JSON objects. + + Text: + {text} ++ +Parse the entities and relationships output by the LLM. ++ +[.transcript-only] +==== +[source, json] +---- +{ + "node_types": [ + { + "label": "GraphDatabase", + "properties": [ + { + "name": "Neo4j", "type": "STRING" + } + ] + }, + { + "label": "Company", + "properties": [ + { + "name": "Neo4j Inc", "type": "STRING" + } + ] + }, + { + "label": "Programming Language", + "properties": [ + { + "name": "Java", "type": "STRING" + } + ] + } + ], + "relationship_types": [ + { + "label": "DEVELOPED_BY" + }, + { + "label": "IMPLEMENTED_IN" + } + ], + "patterns": [ + ["Neo4j", "DEVELOPED_BY", "Neo4j Inc"], + ["Neo4j", "IMPLEMENTED_IN", "Java"], + ] +} +---- +==== + +[.slide.discrete] +== Generate +[start=5] +. **Generate** the graph. ++ +Use the data to construct the graph in Neo4j by creating nodes and relationships based on the entities and relationships extracted by the LLM. ++ +[source, cypher, role=noplay nocopy] +.Generate the graph +---- +MERGE (neo4jInc:Company {id: 'Neo4j Inc'}) +MERGE (neo4j:GraphDatabase {id: 'Neo4j'}) +MERGE (java:ProgrammingLanguage {id: 'Java'}) +MERGE (neo4j)-[:DEVELOPED_BY]->(neo4jInc) +MERGE (neo4j)-[:IMPLEMENTED_IN]->(java) +---- + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned about how to construct a knowledge graph. + +In the next lesson, you will extract a graph schema from a piece of text and review the results. diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/images/neo4j_graphdatabase.json b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/images/neo4j_graphdatabase.json new file mode 100644 index 000000000..cdd281bf0 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/images/neo4j_graphdatabase.json @@ -0,0 +1,97 @@ +{ + "style": { + "font-family": "sans-serif", + "background-color": "#ffffff", + "background-image": "", + "background-size": "100%", + "node-color": "#ffffff", + "border-width": 4, + "border-color": "#000000", + "radius": 50, + "node-padding": 5, + "node-margin": 2, + "outside-position": "auto", + "node-icon-image": "", + "node-background-image": "", + "icon-position": "inside", + "icon-size": 64, + "caption-position": "inside", + "caption-max-width": 200, + "caption-color": "#000000", + "caption-font-size": 50, + "caption-font-weight": "normal", + "label-position": "inside", + "label-display": "pill", + "label-color": "#000000", + "label-background-color": "#ffffff", + "label-border-color": "#000000", + "label-border-width": 4, + "label-font-size": 40, + "label-padding": 5, + "label-margin": 4, + "directionality": "directed", + "detail-position": "inline", + "detail-orientation": "parallel", + "arrow-width": 5, + "arrow-color": "#000000", + "margin-start": 5, + "margin-end": 5, + "margin-peer": 20, + "attachment-start": "normal", + "attachment-end": "normal", + "relationship-icon-image": "", + "type-color": "#000000", + "type-background-color": "#ffffff", + "type-border-color": "#000000", + "type-border-width": 0, + "type-font-size": 16, + "type-padding": 5, + "property-position": "outside", + "property-alignment": "colon", + "property-color": "#000000", + "property-font-size": 16, + "property-font-weight": "normal" + }, + "nodes": [ + { + "id": "n0", + "position": { + "x": -223, + "y": 0 + }, + "caption": "Neo4j", + "style": { + "radius": 100 + }, + "labels": [ + "GraphDatabase" + ], + "properties": {} + }, + { + "id": "n1", + "position": { + "x": 286.5, + "y": 0 + }, + "caption": "Neo4j", + "style": { + "radius": 100 + }, + "labels": [ + "Company" + ], + "properties": {} + } + ], + "relationships": [ + { + "id": "n0", + "type": "DEVELOPED_BY", + "style": {}, + "properties": {}, + "fromId": "n0", + "toId": "n1" + } + ] +} \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/images/neo4j_graphdatabase.svg b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/images/neo4j_graphdatabase.svg new file mode 100644 index 000000000..73b4ebbf5 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/images/neo4j_graphdatabase.svg @@ -0,0 +1 @@ +DEVELOPED_BYNeo4jGraphDatabaseNeo4jCompany \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/lesson.adoc new file mode 100644 index 000000000..bfe15c839 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/2-extract-schema/lesson.adoc @@ -0,0 +1,102 @@ += Extracting a schema from text +:type: lesson +:order: 2 +:branch: new-workshop + +[.slide.discrete] +== Overview + +The link:https://neo4j.com/docs/neo4j-graphrag-python/current/[GraphRAG for Python^] package (`neo4j-graphrag`) allows you to access Neo4j Generative AI functions. + +During this course you will use the `neo4j_graphrag` package to build a knowledge graph and retrievers to extract information from the graph using LLMs. + +In this lesson you will review how a graph schema can be extracted from text using an LLM. + +[.slide.discrete] +== Using the SchemaFromTextExtractor + +Open `workshop-genai/extract_schema.py` + +[.transcript-only] +==== +[source, python] +.extract_entities.py +---- +include::{repository-raw}/{branch}/workshop-genai/extract_schema.py[] +---- +==== + +The code uses the `SchemaFromTextExtractor` class to extract a schema from a given text input. + +The extractor: + +. Creates a prompt instructing the LLM to: +.. Identify entities and relationships in any given text +.. Format the output as JSON +. Passes the prompt and text to the LLM for processing +. Parses the JSON response to create a schema object + +[.slide.discrete.col-40-60] +== Output + +[.col] +==== +Given the text, _"Neo4j is a graph database management system (GDBMS) developed by Neo4j Inc."_, a simplified version of the extracted schema would be: +==== + +[.col] +==== +[source,text] +.Extracted Schema +---- +node_types=( + NodeType(label='GraphDatabase), + NodeType(label='Company') +) +relationship_types=( + RelationshipType(label='DEVELOPED_BY'), +) +patterns=( + ('GraphDatabaseManagementSystem', 'DEVELOPED_BY', 'Company') +) +---- +==== + +[.slide.discrete.col-40-60] +== Execute + +[.col] +==== +Run the program and observe the output. You will see a more detailed schema based on the text provided. + +This schema can be used to store the data held within the text. +==== + +[.col] +image::images/neo4j_graphdatabase.svg["a graph schema with a Neo4j GraphDatabase node connected to a Neo4j Inc Company node via a DEVELOPED_BY relationship"] + +[.slide.discrete] +== Experiment + +Experiment with different text inputs to see how the schema extraction varies based on the content provided, for example: + +* "Python is a programming language created by Guido van Rossum." +* "The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France." +* "Large Language Models (LLMs) are a type of artificial intelligence model designed to understand and generate human-like text." + +[.transcript-only] +==== +When you have experimented with the schema extraction, you can continue. +==== + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you: + +* Learned how to extract a graph schema from unstructured text using an LLM. +* Explored how different text inputs can lead to different schema extractions. + +In the next lesson, you will create a knowledge graph construction pipeline using the `SimpleKGPipeline` class. diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/entities-graph.png b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/entities-graph.png new file mode 100644 index 000000000..c4ec1a270 Binary files /dev/null and b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/entities-graph.png differ diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg-builder-default-model.json b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg-builder-default-model.json new file mode 100644 index 000000000..7ce5bca1d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg-builder-default-model.json @@ -0,0 +1,130 @@ +{ + "style": { + "font-family": "sans-serif", + "background-color": "#ffffff", + "background-image": "", + "background-size": "100%", + "node-color": "#ffffff", + "border-width": 4, + "border-color": "#000000", + "radius": 50, + "node-padding": 5, + "node-margin": 2, + "outside-position": "auto", + "node-icon-image": "", + "node-background-image": "", + "icon-position": "inside", + "icon-size": 64, + "caption-position": "inside", + "caption-max-width": 200, + "caption-color": "#000000", + "caption-font-size": 50, + "caption-font-weight": "normal", + "label-position": "inside", + "label-display": "pill", + "label-color": "#000000", + "label-background-color": "#ffffff", + "label-border-color": "#000000", + "label-border-width": 4, + "label-font-size": 40, + "label-padding": 5, + "label-margin": 4, + "directionality": "directed", + "detail-position": "inline", + "detail-orientation": "parallel", + "arrow-width": 5, + "arrow-color": "#000000", + "margin-start": 5, + "margin-end": 5, + "margin-peer": 20, + "attachment-start": "normal", + "attachment-end": "normal", + "relationship-icon-image": "", + "type-color": "#000000", + "type-background-color": "#ffffff", + "type-border-color": "#000000", + "type-border-width": 0, + "type-font-size": 16, + "type-padding": 5, + "property-position": "outside", + "property-alignment": "colon", + "property-color": "#000000", + "property-font-size": 16, + "property-font-weight": "normal" + }, + "nodes": [ + { + "id": "n0", + "position": { + "x": 3.8117408928910077e-32, + "y": 47.03686581561153 + }, + "caption": "Document", + "style": {}, + "labels": [], + "properties": { + "path": "", + "createdAt": "" + } + }, + { + "id": "n1", + "position": { + "x": -1.1723460390943958e-32, + "y": 378.9796534081454 + }, + "caption": "Chunk", + "style": {}, + "labels": [], + "properties": { + "index": "", + "text": "", + "embedding": "" + } + }, + { + "id": "n2", + "position": { + "x": 331.942787592534, + "y": 378.9796534081454 + }, + "caption": "Entity", + "style": { + "property-position": "outside" + }, + "labels": [ + "__Entity__" + ], + "properties": {} + } + ], + "relationships": [ + { + "id": "n0", + "type": "FROM_DOCUMENT", + "style": {}, + "properties": {}, + "fromId": "n1", + "toId": "n0" + }, + { + "id": "n1", + "type": "FROM_CHUNK", + "style": {}, + "properties": {}, + "fromId": "n2", + "toId": "n1" + }, + { + "id": "n2", + "type": "", + "style": { + "attachment-start": "right", + "attachment-end": "top" + }, + "properties": {}, + "fromId": "n2", + "toId": "n2" + } + ] +} \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg-builder-default-model.svg b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg-builder-default-model.svg new file mode 100644 index 000000000..8c3c63f56 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg-builder-default-model.svg @@ -0,0 +1,444 @@ + + + + + + + + + + + + + + + + + FROM_DOCUMENT + + + + + + + + + + + + + + FROM_CHUNK + + + + + + + + + + + + + + + Document + + + + + + + + + path: + + createdAt: + + + + + + + + + + + + + + + + Chunk + + + + + + + + + index: + + text: + + embedding: + + + + + + + + + + + + + + + + Entity + + + + + + + __Entity__ + + + + + + + + diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg_builder_pipeline.png b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg_builder_pipeline.png new file mode 100644 index 000000000..5237da994 Binary files /dev/null and b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/images/kg_builder_pipeline.png differ diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/lesson.adoc new file mode 100644 index 000000000..c854a4206 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/3-create-a-graph/lesson.adoc @@ -0,0 +1,148 @@ += Create a graph +:type: lesson +:order: 3 +:branch: new-workshop + +[.slide.discrete] +== Overview + +In this lesson, you will learn how to create a knowledge graph from unstructured data using the `SimpleKGPipeline` class. + +The `SimpleKGPipeline` class provides a pipeline which implements a series of steps to create a knowledge graph from unstructured data: + +1. Load the text +2. Split the text into chunks +3. Create embeddings for each chunk +4. Extract entities from the chunks +5. Write the data to a Neo4j database + +[.slide.discrete] +== Overview + +image::images/kg_builder_pipeline.png["Pipeline showing these steps"] + +[.slide-only] +==== +**Continue with the lesson to create the knowledge graph.** +==== + +[.transcript-only] +=== Create the knowledge graph + +Open `workshop-genai/kg_builder.py` and review the code. + +[source, python] +.kg_builder.py +---- +include::{repository-raw}/{branch}/workshop-genai/kg_builder.py[] +---- + +The code loads a single pdf file, `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf`, and run the pipeline to create a knowledge graph in Neo4j. + +The PDF document contains the content from the link:https://graphacademy.neo4j.com/courses/genai-fundamentals/[Neo4j & Generative AI Fundamentals^] course, specifically the link:https://graphacademy.neo4j.com/courses/genai-fundamentals/1-generative-ai/1-what-is-genai/[What is Generative AI?^] lesson. + +Breaking down the code, you can see the following steps: + +. Create a connection to Neo4j: ++ +[source, python] +.Neo4j connection +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder.py[tag=neo4j_driver] +---- +. Instantiate an LLM model: ++ +[source, python] +.LLM +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder.py[tag=llm] +---- ++ +Model parameters, `model_params`, are set to lower the temperature of the model to be more deterministic, and set to response format to be `json`. +. Create an embedding model: ++ +[source, python] +.Embedding model +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder.py[tag=embedder] +---- +. Setup the `SimpleKGPipeline`: ++ +[source, python] +.kg_builder +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder.py[tag=kg_builder] +---- +. Run the pipeline to create the graph from a single PDF file: ++ +[source, python] +.kg_builder +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder.py[tag=run_one_doc] +---- + +When you run the program, the pipeline will process the PDF document and create the graph in Neo4j. + +A summary of the results will be returned, for example: + + {'resolver': {'number_of_nodes_to_resolve': 12, 'number_of_created_nodes': 10}} + +[.slide.col-2] +== Explore the Knowledge Graph + +[.col] +==== +The `SimpleKGPipeline` creates the following default graph model. + +The `Entity` nodes represent the entities extracted from the text chunks. Relevant properties are extracted from the chunk and associated with the entity nodes. +==== + +[.col] +image::images/kg-builder-default-model.svg["a graph model showing (Document)<[:FROM_DOCUMENT]-(Chunk)<-[:FROM_CHUNK]-(Entity)", width=90%] + +[slide.discrete] +== View documents and chunks + +You can view the documents and chunks created in the graph using the following Cypher query: + +[source, cypher] +.View the documents and chunks +---- +MATCH (d:Document)<-[:FROM_DOCUMENT]-(c:Chunk) +RETURN d.path, c.text +---- + +[NOTE] +.Chunk size +The default chunk size is greater than the length of the document, so only a single chunk is created. + +[slide.discrete.col-2] +== Entities and relationships + +[.col] +==== +The extracted entities and the relationships between them can be found using a variable length path query: + +[source, cypher] +.View the entities extracted from each chunk +---- +MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) +RETURN p +---- +==== + +[.col] +image::images/entities-graph.png["A graph showing entities extracted from a chunk", width=90%] + +read::Continue[] + + +[.summary] +== Lesson Summary + +In this lesson, you: + +* Learned how to use the `SimpleKGPipeline` class. +* Explored the graph model created by the pipeline. + +In the next lesson, you will modify the chunk size used when splitting the text and define a custom schema for the knowledge graph. diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/4-chunk-size/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/4-chunk-size/lesson.adoc new file mode 100644 index 000000000..32dc6ec5c --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/4-chunk-size/lesson.adoc @@ -0,0 +1,108 @@ += Chunk size +:type: lesson +:order: 4 +:branch: new-workshop + +[.slide.discrete] +== Overview + +The graph created by the `SimpleKGPipeline` is based on chunks of text extracted from the documents. + +By default, the chunk size is quite large, which may result in fewer, larger chunks. + +The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data. + +In this lesson, you will modify the `SimpleKGPipeline` to use a different chunk size. + +[.slide] +== Delete the existing graph + +You will be re-importing the data and modifying the existing graph. To ensure a clean state, you can delete the graph at any time using: + +[source, cypher] +.Delete the existing graph +---- +MATCH (n) DETACH DELETE n +---- + +[.slide-only] +==== +**Continue with the lesson to modify the chunk size.** +==== + +[.transcript-only] +=== Text Splitter Chunk Size + +To modify the chunk size you will need to create a `FixedSizeSplitter` object and pass it to the `SimpleKGPipeline` when creating the pipeline instance: + +. Modify the `workshop-genai/kg_builder.py` file to import the `FixedSizeSplitter` class and create an instance with a chunk size of 500 characters: ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_split.py[tag=import_text_splitter] + +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_split.py[tag=text_splitter] +---- ++ +[NOTE] +.Chunk size and overlap +The `chunk_size` parameter defines the maximum number of characters in each text chunk. The `chunk_overlap` parameter ensures that there is some overlap between consecutive chunks, which can help maintain context. +. Update the `SimpleKGPipeline` instantiation to use the custom text splitter: ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_split.py[tag=kg_builder] +---- + +[%collapsible] +.Reveal the complete code +==== +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_split.py[tag=**] +---- +==== + +Run the modified pipeline to recreate the knowledge graph with the new chunk size. + +[slide.discrete] +== Explore + +You can view the documents and the associated chunk using the following Cypher query: + +[source, cypher] +.View the documents and chunks +---- +MATCH (d:Document)<-[:FROM_DOCUMENT]-(c:Chunk) +RETURN d.path, c.index, c.text, size(c.text) +ORDER BY d.path, c.index +---- + +View the entities extracted from each chunk using the following Cypher query: + +[source, cypher] +.View the entities extracted from each chunk +---- +MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) +RETURN p +---- + +[.transcript-only] +==== +[TIP] +===== +You can experiment with different chunk sizes to see how it affects the entities extracted and the structure of the knowledge graph. +===== +==== + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you: + +* Learned about the impact of chunk size on entity extraction +* Modified the `SimpleKGPipeline` to use a custom chunk size with the `FixedSizeSplitter` + +In the next lesson, you will define a custom schema for the knowledge graph. diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/4-chunk-size/questions/1-chunk-size.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/4-chunk-size/questions/1-chunk-size.adoc new file mode 100644 index 000000000..0326aa7d7 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/4-chunk-size/questions/1-chunk-size.adoc @@ -0,0 +1,19 @@ +[.question] += What is the primary trade-off when increasing the chunk size in the SimpleKGPipeline? + +* [ ] Larger chunks process faster but use more memory +* [x] Larger chunks provide more context for entity extraction but result in less granular data +* [ ] Larger chunks create more entities but fewer relationships +* [ ] Larger chunks improve accuracy but require more computational power + +[TIP,role=hint] +.Hint +==== +Consider what happens to the level of detail and context when you make text chunks bigger or smaller. +==== + +[TIP,role=solution] +.Solution +==== +**Larger chunks provide more context for entity extraction but result in less granular data**. The larger the chunk size, the more context the LLM has when extracting entities and relationships, but it may also lead to less granular data. This is the key trade-off - more context versus granularity of the extracted information. +==== diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/5-define-a-schema/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/5-define-a-schema/lesson.adoc new file mode 100644 index 000000000..6b8df3a92 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/5-define-a-schema/lesson.adoc @@ -0,0 +1,191 @@ += Define a schema +:type: lesson +:order: 5 +:branch: new-workshop + +[.slide.discrete] +== Overview + +The knowledge graph you created is unconstrained, meaning that any entity or relationship can be created based on the data extracted from the text. + +This can lead to graphs that are non-specific and may be difficult to analyze and query. + +In this lesson, you will modify the `SimpleKGPipeline` to use a custom schema for the knowledge graph. + +[.slide] +== Schema + +When you provide a schema to the `SimpleKGPipeline`, it will pass this information to the LLM instructing it to only identify those nodes and relationships. + +This allows you to create a more structured and meaningful knowledge graph. + +You define a schema by expressing the desired nodes, relationships, or patterns you want to extract from the text. + +For example, you might want to extract the following information: + +* nodes - `Person`, `Organization`, `Location` +* relationships - `WORKS_AT`, `LOCATED_IN` +* patterns - `(Person)-[WORKS_AT]->(Organization)`, `(Organization)-[LOCATED_IN]->(Location)` + +[.transcript-only] +==== +[TIP] +.Iterate your schema +===== +You don't have to define nodes, relationships, and patterns all at once. You can start with just nodes or just relationships and expand your schema as needed. + +For example, if you only define nodes, the LLM will find any relationships between those nodes based on the text. + +This approach can help you iteratively build and refine your knowledge graph schema. +===== +==== + +[.slide-only] +==== +**Continue with the lesson to define the schema.** +==== + +[.transcript-only] +=== Nodes + +Open `workshop-genai/kg_builder_schema.py` and review the code: + +[source, python] +.kg_builder_schema.py +---- +include::{repository-raw}/{branch}/workshop-genai/kg_builder_schema.py[] +---- + +You define the `NODES` as a list of node labels and pass the list to the `SimpleKGPipeline` when creating the pipeline instance. + +[source, python] +.NODES +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tag=simple_nodes] +---- + +[TIP] +.Define relevant nodes +==== +You should define the node labels that are relevant to your domain and the information you want to extract from the text. +==== + +You can also provide a description for each node label and associated properties to help guide the LLM when extracting entities. + +[source, python] +.Node descriptions and properties +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tag=node_types] +---- + +Recreate the knowledge graph with the defined nodes: + +. Delete any existing nodes and relationships. ++ +[source, cypher] +.Delete the existing graph +---- +MATCH (n) DETACH DELETE n +---- +. Run the program ++ +The graph will be constrained to only include the defined node labels. + +View the entities and chunks in the graph using the following Cypher query: + +[source, cypher] +.Entities and Chunks +---- +MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) +RETURN p +---- + +[.transcript-only] +=== Relationships + +You can define required relationship types by providing a list to the `SimpleKGPipeline`. + +[source, python] +.RELATIONSHIP_TYPES +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tag=relationship_types] +---- + +You can also describe patterns that define how nodes are connected by relationships. + +[source, python] +.PATTERNS +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tag=patterns] +---- + +Nodes, relationships and patterns are all passed to the `SimpleKGPipeline` as the `schema` when creating the pipeline: + +[source, python] +.schema +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tag=kg_builder] +---- + +[%collapsible] +.Reveal the complete code +==== +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tags=**;!simple_nodes;!all_documents] +---- +==== + +Review the `data/genai-fundamentals_1-generative-ai_1-what-is-genai.pdf` PDF document and experiment by creating a set of `NODES`, `RELATIONSHIPS` and `PATTERNS` relevant to the data. + +Recreate the knowledge graph: + +. Delete any existing nodes and relationships. +. Run the program. + + +[%collapsible] +.Process all the documents? +==== +In the next lesson, you will add structured data to the knowledge graph, and process all of the documents. + +Optionally, you could modify the program now to process the documents from the `data` directory without the structured data: + +[source, python] +.All PDFs +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_builder_schema.py[tag=all_documents] +---- +==== + +[.slide] +== Explore + +Review the knowledge graph and observe how the defined schema has influenced the structure of the graph: + +[source, cypher] +.Entities and Chunks +---- +MATCH p = (c:Chunk)-[*..3]-(e:__Entity__) +RETURN p +---- + +View the counts of documents, chunks and entities in the graph: + +[source, cypher] +.Documents, Chunks, and Entity counts +---- +RETURN + count{ (:Document) } as documents, + count{ (:Chunk) } as chunks, + count{ (:__Entity__) } as entities +---- + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned how to define a custom schema for the knowledge graph. + +In the next lesson, you will learn how to add structured data to the knowledge graph. diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/images/kg-builder-structured-model.json b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/images/kg-builder-structured-model.json new file mode 100644 index 000000000..7aa3cb453 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/images/kg-builder-structured-model.json @@ -0,0 +1,154 @@ +{ + "style": { + "font-family": "sans-serif", + "background-color": "#ffffff", + "background-image": "", + "background-size": "100%", + "node-color": "#ffffff", + "border-width": 4, + "border-color": "#000000", + "radius": 50, + "node-padding": 5, + "node-margin": 2, + "outside-position": "auto", + "node-icon-image": "", + "node-background-image": "", + "icon-position": "inside", + "icon-size": 64, + "caption-position": "inside", + "caption-max-width": 200, + "caption-color": "#000000", + "caption-font-size": 50, + "caption-font-weight": "normal", + "label-position": "inside", + "label-display": "pill", + "label-color": "#000000", + "label-background-color": "#ffffff", + "label-border-color": "#000000", + "label-border-width": 4, + "label-font-size": 40, + "label-padding": 5, + "label-margin": 4, + "directionality": "directed", + "detail-position": "inline", + "detail-orientation": "parallel", + "arrow-width": 5, + "arrow-color": "#000000", + "margin-start": 5, + "margin-end": 5, + "margin-peer": 20, + "attachment-start": "normal", + "attachment-end": "normal", + "relationship-icon-image": "", + "type-color": "#000000", + "type-background-color": "#ffffff", + "type-border-color": "#000000", + "type-border-width": 0, + "type-font-size": 16, + "type-padding": 5, + "property-position": "outside", + "property-alignment": "colon", + "property-color": "#000000", + "property-font-size": 16, + "property-font-weight": "normal" + }, + "nodes": [ + { + "id": "n0", + "position": { + "x": 3.8117408928910077e-32, + "y": 47.03686581561153 + }, + "caption": "Document", + "style": {}, + "labels": [], + "properties": { + "path": "", + "createdAt": "" + } + }, + { + "id": "n1", + "position": { + "x": -1.1723460390943958e-32, + "y": 378.9796534081454 + }, + "caption": "Chunk", + "style": {}, + "labels": [], + "properties": { + "index": "", + "text": "", + "embedding": "" + } + }, + { + "id": "n2", + "position": { + "x": 331.942787592534, + "y": 378.9796534081454 + }, + "caption": "Entity", + "style": { + "property-position": "outside" + }, + "labels": [ + "__Entity__" + ], + "properties": {} + }, + { + "id": "n4", + "position": { + "x": 331.942787592534, + "y": 47.03686581561152 + }, + "caption": "Lesson", + "style": {}, + "labels": [], + "properties": { + "name": "", + "module": "", + "course": "", + "url": "" + } + } + ], + "relationships": [ + { + "id": "n0", + "type": "FROM_DOCUMENT", + "style": {}, + "properties": {}, + "fromId": "n1", + "toId": "n0" + }, + { + "id": "n1", + "type": "FROM_CHUNK", + "style": {}, + "properties": {}, + "fromId": "n2", + "toId": "n1" + }, + { + "id": "n2", + "type": "", + "style": { + "attachment-start": "right", + "attachment-end": "top" + }, + "properties": {}, + "fromId": "n2", + "toId": "n2" + }, + { + "id": "n3", + "type": "PDF_OF", + "style": {}, + "properties": {}, + "fromId": "n0", + "toId": "n4" + } + ] +} \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/images/kg-builder-structured-model.svg b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/images/kg-builder-structured-model.svg new file mode 100644 index 000000000..7b39dedec --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/images/kg-builder-structured-model.svg @@ -0,0 +1,622 @@ + + + + + + + + + + + + + + + + + FROM_DOCUMENT + + + + + + + + + + + + + + FROM_CHUNK + + + + + + + + + + + + + + + + PDF_OF + + + + + + + + + + + + + Document + + + + + + + + + path: + + createdAt: + + + + + + + + + + + + + + + + Chunk + + + + + + + + + index: + + text: + + embedding: + + + + + + + + + + + + + + + + Entity + + + + + + + __Entity__ + + + + + + + + + + + + + + + Lesson + + + + + + + + + name: + + module: + + course: + + url: + + + + + + + + + diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/lesson.adoc new file mode 100644 index 000000000..ff89411e1 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/6-structured-data/lesson.adoc @@ -0,0 +1,151 @@ += Add structured data to the knowledge graph +:type: lesson +:order: 6 +:branch: new-workshop + +[.slide.discrete] +== Overview + +The knowledge graph you created is solely based on unstructured data extracted from documents. + +You may have access to structured data sources such as databases, CSV files, or APIs that contain valuable information relevant to your domain. + +Combining the structured and unstructured data can enhance the knowledge graph's richness and usefulness. + +[NOTE] +.Lexical and Domain Graphs +The unstructured part of your graph is known as the link:https://graphrag.com/reference/knowledge-graph/lexical-graph/[Lexical Graph], while the structured part is known as the link:https://graphrag.com/reference/knowledge-graph/domain-graph/[Domain Graph]. + +[.slide] +== Structured data source + +The repository contains a sample CSV file `workshop-genai/data/docs.csv` which contains metadata about the lessons the documents were created from. + +[source, csv] +.Sample docs.csv +---- +filename,course,module,lesson,url +genai-fundamentals_1-generative-ai_1-what-is-genai.pdf,genai-fundamentals,1-generative-ai,1-what-is-genai,https://graphacademy.neo4j.com/courses/genai-fundamentals/1-generative-ai/1-what-is-genai +genai-fundamentals_1-generative-ai_2-considerations.pdf,genai-fundamentals,1-generative-ai,2-considerations,https://graphacademy.neo4j.com/courses/genai-fundamentals/1-generative-ai/2-considerations +... +---- + +You can use the CSV file as input and a structured data source when creating the knowledge graph. + +[.slide-only] +==== +**Continue with the lesson to load the structured data.** +==== + +[.transcript-only] +=== Load from CSV file + +Open `workshop-genai/kg_structured_builder.py` and review the code. + +[source, python] +.kg_structured_builder.py +---- +include::{repository-raw}/{branch}/workshop-genai/kg_structured_builder.py[] +---- + +The key differences are: + +. The `docs.csv` file is loaded using `csv.DictReader` to read each row as a dictionary: ++ +[source, python] +.Load docs.csv +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_structured_builder.py[tag=load_csv] +---- +. The path of the PDF document is constructed using the `filename` field from the CSV: ++ +[source, python] +.PDF path +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_structured_builder.py[tag=pdf_path] +---- +. A `cypher` statement is defined to create `Lesson` nodes with properties from the CSV data: ++ +[source, python] +.Cypher statement +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_structured_builder.py[tag=cypher] +---- ++ +The `pdf_path` is used as the key to match the `Document` nodes created from the PDF files. +. A `Lesson` node is created for each document using the `cypher` statement and the CSV data: ++ +[source, python] +.Lesson nodes +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/kg_structured_builder.py[tag=create_structured_graph] +---- + +The resulting knowledge graph will now contain `Lesson` nodes connected to the `Document` nodes created from the PDF files: + +image::images/kg-builder-structured-model.svg["A data model showing Lesson nodes connected to Document nodes using a PDF_OF relationship."] + +Run the program to create the knowledge graph with the structured data. + +[NOTE] +.Remember to delete the existing graph before re-running the pipeline +==== +[source, cypher] +.Delete the existing graph +---- +MATCH (n) DETACH DELETE n +---- +==== + +[TIP] +.OpenAI Rate Limiting? +==== +When using a free OpenAI API key, you may encounter rate limiting issues when processing multiple documents. You can add a `sleep` between document processing to mitigate this. +==== + +[.slide] +== Explore the structured data + +The structured data allows you to query the knowledge graph in new ways. + +You can find all lessons that cover a specific technology or concept: + +[source, cypher] +.Find lessons about Knowledge Graphs +---- +MATCH (kg:Technology) +MATCH (kg)-[:FROM_CHUNK]->(c)-[:FROM_DOCUMENT]-(d)-[:PDF_OF]-(l) +WHERE toLower(kg.name) CONTAINS "knowledge graph" +RETURN DISTINCT toLower(kg.name), l.name, l.url +---- + +[.slide.discrete] +== Explore the knowledge graph + +The knowledge graph allows you to summarize the content of each lesson by specific categories such as technologies and concepts: + +[source, cypher] +.Summarize lesson content +---- +MATCH (lesson:Lesson)<-[:PDF_OF]-(:Document)<-[:FROM_DOCUMENT]-(c:Chunk) +RETURN + lesson.name, + lesson.url, + [ (c)<-[:FROM_CHUNK]-(tech:Technology) | tech.name ] AS technologies, + [ (c)<-[:FROM_CHUNK]-(concept:Concept) | concept.name ] AS concepts +---- + +Spend some time exploring the knowledge graph and experiment with adding additional data. + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned: + +* About benefits of adding structured data to a knowledge graph. +* How to load structured data from a CSV file. +* How to create nodes from structured data and connect them to unstructured data nodes. + +In the next module, you will create retrievers to query the knowledge graph. diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/7-create-a-kg/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/7-create-a-kg/lesson.adoc new file mode 100644 index 000000000..35dccd514 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/lessons/7-create-a-kg/lesson.adoc @@ -0,0 +1,28 @@ += Create your own knowledge graph +:type: challenge +:order: 7 + +[.slide.discrete] +== Overview + +In this optional challenge, you will apply what you have learned to create your own knowledge graph. + +You should: + +. Find a data set of documents that you want to create a knowledge graph from. This could be a collection of PDFs, text files, or any other text-based documents. ++ +Try searching kaggle for link:https://www.kaggle.com/datasets?search=pdf[open PDF datasets^], for example the link:https://www.kaggle.com/datasets/snehaanbhawal/resume-dataset[Resume Dataset^]. +. Use the `SimpleKGPipeline` to load the documents, split the text into chunks, extract entities, and create the knowledge graph in Neo4j. +. Load any associated structured data if available to enrich the knowledge graph. +. Query the knowledge graph to explore the data and extract insights. + +You can come back to this challenge at any time. + +read::Complete[] + +[.summary] +== Lesson Summary + +In this lesson, you applied what you had learned to create your own knowledge graph. + +In the next module, you will create retrievers to query the knowledge graph. \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/module.adoc b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/module.adoc new file mode 100644 index 000000000..ec0522ff6 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/2-knowledge-graph-construction/module.adoc @@ -0,0 +1,17 @@ += Knowledge Graph Construction +:order: 2 + + + +== Module Overview + +In this module, you will learn: + +* The process of creating knowledge graphs from unstructured text. +* How to use the `SimpleKGPipeline` class to create a knowledge graph from unstructured data. +* The stages of the pipeline and how they work together. +* How to modify the configuration to define the schema and data model. + +If you are ready, let's get going! + +link:./1-knowledge-graph-construction/[Ready? Let's go →, role=btn] diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/apple-embedding.adoc b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/apple-embedding.adoc new file mode 100644 index 000000000..2d135976e --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/apple-embedding.adoc @@ -0,0 +1 @@ +`0.0077788467, -0.02306925, -0.007360777, -0.027743412, -0.0045747845, 0.01289164, -0.021863015, -0.008587573, 0.01892967, -0.029854324, -0.0027962727, 0.020108491, -0.004530236, 0.009129008, -0.021451797, 0.002030382, 0.030813828, 9.744976e-05, 0.0019172973, -0.02568733, -0.020985752, -0.008066699, 0.02134214, -0.01222684, 0.0009980568, 0.005105939, 0.009999417, -0.000107408916, 0.015845545, -0.012980737, 0.020574536, -0.016160812, -0.018518453, 0.005263572, -0.019286057, -0.009293495, -0.012096621, -0.008854863, -0.005753605, -0.006157968, 0.010540851, 0.007724018, -0.0065554776, 0.00052944134, -0.023453051, 0.011089141, -0.021671113, -0.00061425474, -0.012754567, 0.015489157, -0.0054520466, -0.0020355221, -0.015050527, -0.0052944133, -0.0028082666, 0.0027431573, -0.019450543, 0.0063807103, -0.010725899, 0.0049243183, 0.005266999, 0.01513277, -0.027921606, 0.0055754115, -0.009183837, 0.00380718, -0.013624975, -0.0084710615, 0.012905347, 0.015667351, 0.033363372, 0.013268588, 0.014036193, 0.0063464423, 0.004454846, 0.0014820931, -0.03396649, -0.0062779062, -0.00314238, 0.01818948, 0.0075389706, -0.02637269, 0.009574492, 0.024974553, 0.024823774, 0.009882905, -0.021657405, 0.010109074, -0.007970748, 0.0028887964, 0.011849891, 0.0054726074, 0.0078336755, 0.016448664, -0.026975807, 0.016599443, -0.012713445, 0.026345275, 0.004667308, -0.03736588, 0.0009834929, 0.006089432, -0.028730331, -0.011198798, -0.020396343, 0.0019738395, 0.012459862, -0.003738644, 0.015448036, -0.019902883, 0.0064389664, 0.00926608, 0.021945259, -0.051648803, -0.016448664, -0.01744929, -0.009499103, 0.0021743076, -0.022795105, -0.035556525, 0.034021318, 0.025892938, 0.038407627, -0.008752059, 0.013446782, -0.0032640316, -0.01779197, -0.009567639, -0.0011205651, -0.013947096, 0.04707059, 0.008100967, 0.019491665, 0.016448664, -0.017846799, 0.019573908, -0.02223311, 0.015489157, -0.0057433248, -0.033445615, 0.010554559, 0.014694139, -0.01239818, 0.0070660715, -0.011226213, 0.023686076, 0.02360383, 0.022753984, -0.005215597, 0.0070866323, 0.010753313, -0.024110999, -0.003909984, 0.005462327, 0.0017459571, 0.0057981536, -0.016983245, -0.0021777344, -0.0039373985, 0.003772912, -0.006634294, 0.008614987, -0.006579465, -0.008841156, 0.0017699447, 0.024412557, 0.011856745, 0.013522171, -0.016051153, -0.00951281, -0.016133398, 0.004177275, -0.010691631, 0.01296703, 0.00886857, 0.016078569, 0.004434285, 0.012734006, -0.0067850733, 0.0006545197, 0.0011317023, -0.0046090526, 0.023096664, 0.01946425, -0.016640564, 0.014899747, 0.004701576, -0.010568266, 0.005530863, -0.019231228, 0.032047477, 0.02041005, -0.00397852, -0.014419994, -0.684703, -0.020643072, 0.00603803, -0.00033582686, 0.033993904, 0.03188299, 0.022287939, -0.0012739147, -0.018381381, -0.010396926, 0.0018042127, 0.0032863058, 0.00886857, 0.009519664, 5.9969083e-05, -0.022287939, 0.016284177, -0.023658661, -0.010431194, 0.02489231, -0.012261108, -0.014351458, -0.008841156, -0.029717252, 0.0036564006, 0.019628737, 0.019957712, -0.014022485, -0.019560201, 0.021767065, -0.008238039, -0.00048146606, 0.027291073, 0.0060140425, 0.037393294, 0.0072031436, -0.04416466, 0.013940242, 0.009663589, 0.03415839, -0.02065678, -0.020423757, 0.013563293, -0.0065246364, -0.015872959, -0.0009278074, 0.013254881, 0.005637094, -0.00071491714, -0.025344647, 0.03484375, 4.8269758e-05, 0.010787581, 0.008409379, 0.021780772, 0.008738352, 0.023124078, -0.008745206, -0.001522358, 0.016448664, -0.022370182, -0.0034011037, -0.034734093, -0.02523499, -0.020547122, 0.010636802, -0.009190691, 0.0076417746, 0.005434912, -0.01951908, 0.021492919, 0.022438718, -0.02306925, -0.007059218, -0.0031115387, 0.01705178, 0.023576416, -0.00148809, -0.027071757, 0.0047461246, -0.0023867695, -0.009389445, 0.0049414523, -0.027537804, 0.03158143, 0.0054246318, -0.024042463, -0.011301602, 0.013926535, -0.02371349, 0.034130976, 0.023932805, 0.0028682356, -0.019148985, -0.014570774, -0.0053423885, -0.032376453, -0.019244935, -0.0021434664, -0.019930298, 0.016530907, -0.0056302403, 0.00943742, 0.0067679393, 0.024028756, 0.013474196, -0.019477958, 0.014570774, 0.03673535, -0.020437464, -0.0076623354, -0.012631202, 0.008587573, -0.00869723, 0.025824402, -0.03125246, 0.010629948, -0.00761436, 0.021067996, -0.032952156, 0.025399476, -0.00438631, 0.011863599, 0.003027582, -0.01059568, 0.018463625, -0.0045405165, -0.030978315, -0.0034884873, -0.0059420797, 0.008018723, 0.0052190237, 0.007299094, -0.006250492, 0.02390539, 0.0004050055, 0.009965149, -0.020670487, 0.011993817, -0.02508421, -0.016969537, 0.007991308, 0.000463047, -0.00052258774, 0.0012704879, -0.01232279, -0.028511016, -0.016887294, -0.010862971, 0.0052361577, -0.008861717, 0.005530863, -0.0017579509, 0.021506626, 0.022589497, -0.015900373, 0.0028596686, -0.0233571, 0.0009406579, 0.016229348, 0.010205025, 0.028182043, -0.009026204, 0.0042218235, 0.0150368195, -0.035803255, 0.0068193413, 0.0018727488, -0.017846799, -0.029251205, 0.01340566, -0.016887294, -0.008190064, 0.008286014, -0.014748968, 0.0039888006, -0.0149682835, 0.007477288, 0.01015705, 0.002385056, 0.0054314854, 0.008861717, 0.0021023448, 0.0016602869, 0.030896071, 0.020053662, 0.0016157385, 0.04767371, -0.020218149, 0.0008228615, -0.013467343, 0.019820638, 0.0053252545, 0.0016525766, -0.013816877, -0.008477915, -0.0059592137, 0.013398807, -0.0009586486, 0.01150721, 0.023973927, -0.0029007902, 0.011246773, -0.0022873923, 0.013775756, -0.03292474, 0.003995654, -0.005369803, 0.011294749, 0.03459702, -0.0022771119, -0.028593259, -0.0066068796, -0.020451171, 0.012357058, 0.034185804, 0.002359355, 0.012185718, 0.0009329476, -0.007984455, 0.0016688539, -0.0047666854, 0.00047204236, -0.0036769616, -0.0074567273, 0.0034833471, 0.010115928, 0.03328113, -0.003368549, -0.026071131, -0.0035535966, -0.004986001, -0.00934147, -0.0125215445, 0.004143007, 0.014872333, 0.004146434, -0.010979483, 0.02223311, -0.0009552218, -0.0140499, 0.014502238, 0.026687955, -0.0020286685, 0.007621214, -0.0132617345, 0.045946598, 0.008169503, -0.004143007, -0.0022634047, -0.003240044, -0.025769573, -0.030759, 0.010479169, -0.00090467645, -0.024618166, 0.02350788, 0.022397596, 0.022877349, 0.0408201, 0.0032965862, -0.0034679265, -0.012946469, 0.0059763477, -0.020286685, -0.00019372156, -0.001281625, -0.013672951, 0.0028082666, 0.004146434, 0.013316563, -0.0002972753, 0.024933431, -0.010218732, 0.0067473785, 0.00096807233, -0.017600069, 0.0047495514, 0.0053458153, -0.012453008, -0.021698527, -0.02745556, 0.009060472, 0.003961386, -0.006867317, 0.008950814, -0.028949646, -0.0059455065, -0.005777593, 0.014748968, -0.0032948728, 0.021629991, 0.008320282, 0.020094784, 0.020423757, -0.01380317, 0.031362116, -0.0109863365, 0.005198463, -0.0062025166, 0.00017980016, 0.004968867, -0.019477958, -0.003947679, 0.03942196, -0.0048317946, -0.00595236, -0.024357729, 0.012679177, -0.002345648, -0.025413183, 0.0046227598, -0.015996324, -0.01809353, -0.0029864605, 0.016558321, -0.0055034487, -0.017161438, 0.04071044, -0.0025855242, -0.012644909, -0.01788792, -0.014255508, 0.007943333, 0.06513671, 0.02542689, -0.0109520685, 0.023727197, -0.0055925455, 0.027674876, -0.011945842, -0.006791927, 0.029059304, -0.00075818057, -0.0014101302, -0.008806888, 0.014776383, -0.018449917, 0.023891684, 0.011294749, -0.002393623, -0.020135906, -0.0056816423, -0.008203771, 0.00051230734, -0.014598188, 0.010650509, 0.0055205827, 0.01720256, 0.0057638856, 0.018751476, 0.029196376, -0.005195036, -0.024535922, -0.0060825786, -0.006243638, 0.015297256, -0.006226504, -0.001954992, 0.022301646, 0.017161438, 0.015955202, 0.0059489333, 0.0052601453, 0.012178864, 0.010616241, -0.0037249369, -0.02637269, 0.007792554, -0.011459235, -0.014611895, 0.032568354, -0.0012088054, -0.013810024, 0.024672994, 0.01627047, 0.0050511104, -0.0055891187, -0.00022102891, 0.026729077, -0.0074704345, 0.0031526603, 0.010307829, -0.025659915, -0.0055377167, -0.019998834, 0.0032880192, 0.014502238, -0.0012936188, -0.005650801, -0.011376992, -0.018669233, -0.0068536093, -0.011616868, -0.000986063, -0.026358983, -0.011390699, 0.0077308714, 0.033144057, 0.008217478, 0.020889802, 0.0057261907, 0.0069838283, 0.03489858, -0.008306575, -0.014803797, 0.004742698, -0.014474823, -0.022973299, 0.019094156, -0.001972126, -0.013145223, 0.011671697, 0.008649255, 0.013755195, -0.0060448837, 0.02958018, 0.0045028217, -0.0120897675, -0.00046004856, 0.017833091, 0.011986963, -0.019327179, -0.011829331, 0.00795704, -0.010410633, -0.0026334994, -0.008005016, 0.014666725, 0.014653017, 0.019738395, 0.012535252, -0.025276111, 0.0037146565, 0.02760634, -0.004441139, 0.014831211, -0.0109863365, 0.01222684, 0.0138305845, -0.008786327, -0.0074156057, -0.0052190237, -0.015900373, -0.02099946, -0.04997652, 0.014255508, 0.02094463, -0.014104729, 0.020464879, -0.004986001, -0.007970748, -0.020889802, 0.012219986, -0.008710938, -0.0025820974, -0.0013553012, -0.013857999, -0.033555273, -0.027016928, -0.01646237, 0.020862387, 0.0009629321, -0.017435582, -0.020272978, 0.018271724, 0.008155796, -0.024878602, -0.02834653, -0.049181502, 0.011431821, 0.003176648, 0.0035056213, 0.02952535, -0.015283549, 0.017572654, -0.006905012, 0.014214386, -0.026208203, -0.022164574, -0.028428772, 0.00012647052, 0.03829797, 0.018258017, 0.020423757, 0.014077314, 0.016640564, -0.00020646499, 0.0044616996, -0.008587573, 0.0029898873, 0.012219986, -0.018518453, 0.013679804, 0.014557066, 0.015859252, 0.0027071757, 0.012919054, -0.0039750934, 0.012788836, 0.0042560915, -0.0023353675, -0.027990142, -0.005404071, -0.004451419, -0.009444274, -0.019848052, 0.01008166, 0.0092455195, -0.024316607, 0.019162692, 0.009087887, 0.0017819385, -0.02922379, 0.025043089, -0.009972002, 0.021328432, 0.01141126, 0.0053903637, -0.026701663, -0.006685696, 0.008827449, -0.007477288, 0.015146477, -0.0068775974, 0.007792554, -0.014515945, -0.0074361665, 0.0058358484, 0.041149072, -0.025591379, -0.022356475, 0.0068570366, -0.04188926, -0.0053766565, -0.006411552, -0.009663589, -0.016092276, 0.001164257, 0.013556439, 9.952459e-06, 0.0003868006, -0.0058358484, -0.017367046, 0.0061682486, 0.020135906, 0.029991396, 0.0025769572, 0.035227552, 0.021602577, -0.0034576461, -0.019573908, 0.0022548377, -0.009533371, -0.011610014, 0.026454933, 0.01488604, 0.012315936, -0.007209997, -0.0028511016, 0.0045370897, -0.010239293, -0.0096430285, 0.035008237, 0.01769602, 0.016188227, -0.027976435, -0.031115387, -0.01946425, 0.026729077, -0.0048352215, -0.002503281, -0.015091648, -0.03829797, -0.01116453, 0.026331568, -0.01232279, 0.019505372, 0.004180702, -0.013912828, 0.01513277, -0.011849891, -0.02489231, 0.00088068884, -0.0026095118, 0.02740073, -0.02405617, 0.018203188, -0.0012859085, 0.005318401, -0.006349869, -0.007758286, 0.004674162, 0.03169109, -0.02785307, -0.0008571296, 0.0026369262, 0.015077941, 0.010623095, -0.012103475, -0.022260524, -0.009204398, -0.0028733758, -0.027976435, 0.010013124, 0.0077788467, -0.021013167, -0.011150823, 0.008244893, -0.006247065, -0.0062402114, 0.0027979861, 0.01372778, -0.0007671759, -0.013426221, 0.016928416, -0.0016191653, 0.0033668356, 0.026975807, -0.0121240355, -0.010705338, 0.023768319, -0.020793851, 0.00081129605, 0.0079022115, 0.0023096665, -0.024028756, 0.009937734, -0.0037592049, -0.0038483017, 0.020204442, -0.019546494, -0.012267961, -0.004338335, 0.0074361665, 0.016201934, 0.0024775798, 0.0061339806, 0.013248027, -0.008532744, -0.0019669859, -0.012713445, -0.030183297, 7.549679e-05, -0.012473569, -0.002210289, 0.02075273, -0.003116679, -0.0025872376, -0.003793473, 0.007299094, 0.0136592435, -0.024522215, -0.03391166, -0.021410676, 0.020506, -0.01463931, 0.00017551666, -0.020643072, -0.002201722, -0.022109745, 0.003632413, -0.0009286641, 0.00044891142, 0.0027191697, 0.014666725, 0.013391953, 0.02386427, -0.009039911, 0.0021348994, -0.013837438, -0.021410676, -0.021602577, -0.0059146653, 0.0048729163, 0.017983872, 0.01961503, -0.021917844, -0.028839989, -0.00808726, -0.03983318, -0.03254094, -0.005739898, 0.013248027, -0.00070206664, 0.006140834, 0.010013124, 0.0055411435, 0.0063841376, 0.016791344, -0.047564052, -0.0010725899, 0.004989428, -0.020917216, 0.022370182, -0.022959592, -0.020451171, -0.023233736, 0.001032325, 0.008094113, 0.0010777301, 0.01116453, 0.00038637224, -0.0033188604, -0.00886857, 0.022150867, 0.006394418, -0.00013310995, 0.009300348, -0.01883372, -0.009553932, 0.0032109162, -0.0007637491, -0.023727197, 0.0063258815, 0.009122155, 0.008327136, 0.008066699, 0.0013090394, -0.0051539144, 0.00975954, -0.020026248, -0.005873543, -0.011308456, -0.018765183, 0.014310337, -0.024412557, -0.017942749, -0.012535252, 0.010342097, -0.0243029, -0.010198171, 0.026838735, -0.0081078205, -0.0144337015, -0.010568266, 0.022301646, -0.03489858, -0.008066699, -0.0028802294, -0.023110371, -0.024193242, 0.03829797, 0.0029898873, -0.008361404, -0.0076280674, 0.014611895, 0.009560785, -0.0039716666, -0.004297213, 0.013446782, -0.022507254, -0.013337124, 0.008423086, -0.018600697, -0.023850562, 0.003947679, 0.0113838455, -0.0022788253, -0.0041909823, 0.20747247, -0.007059218, 0.016599443, 0.03988801, -0.0005011702, -0.0007568955, 0.015543986, 0.013145223, -0.0038825697, 0.0050339764, -0.014817504, 0.011767647, -0.015242428, 0.007299094, 0.010890386, -0.007580092, -0.03489858, -0.0089713745, -0.016393835, -0.0060825786, 0.023658661, -0.011459235, -0.011610014, -0.011514064, 0.02897706, 0.003108112, -0.02927862, 0.009889758, 0.018641818, 0.010150196, -0.00020453741, -0.004146434, -0.0039339717, -0.002090351, -0.008361404, -0.0001941499, -0.0075389706, 0.024165828, 0.02745556, 0.026920978, -0.0015789003, -0.00090638985, -0.007888504, -0.0035570234, -0.028127214, 0.0142966295, -0.008457354, -0.007360777, 0.023041835, 0.021753358, -0.047838196, -0.003755778, 0.025221283, 0.025111625, 0.0014692425, 0.0071346075, 0.0026900417, 0.012727153, -0.00223599, -0.0020423757, -0.00744302, 0.018998206, 0.0012841951, 0.019094156, -0.024330314, -0.0043074936, -0.034240633, 0.005839275, -0.009300348, -0.008738352, 0.0038654357, -0.020739023, -0.007545824, 0.00035017662, -0.030128468, -0.0408201, 0.024083585, 0.026098546, 0.014598188, 0.022493547, -0.006867317, 0.009252373, -0.006140834, -0.0022942459, -0.006147688, -0.016667979, 0.03223938, -0.00544862, -0.0058872504, -0.003844875, -0.005582265, -0.015448036, 0.004454846, -0.02603001, 0.0056987763, 0.017421875, -0.015790716, 0.01946425, -0.01042434, -0.00070120994, -0.0040641907, -0.017956456, 0.01769602, -0.010095367, -0.008080406, 0.024069877, 0.0029898873, 0.009403152, 0.0057913, 0.006870744, -0.012809397, -0.011424967, 0.01256952, -0.011178237, 0.033829417, 0.009725272, -0.002683188, -0.029086718, 0.017956456, -0.0010940074, 0.0075526778, -0.01868294, 0.0020612231, 0.017517826, -0.01439258, -0.021150239, -0.020780144, 0.00021256898, 0.0167091, -0.028483601, -0.003478207, -0.0048043802, 0.004454846, 0.0034936275, 0.008752059, 0.0024930006, 0.004828368, -0.017654898, -0.0015009405, -0.009320909, 0.0013458775, 0.013816877, 0.020560829, 0.007319655, 0.0035433162, -0.0028168336, 0.002784279, -0.00032833073, -0.023343394, -0.021314725, -0.018792598, 4.789495e-05, -0.018792598, -0.006689123, 0.04213599, -0.01769602, -0.034076147, -0.027592633, -0.01084241, 0.013734634, -0.022753984, -0.01479009, 0.023110371, -0.011795062, -0.04150546, -0.007340216, -0.18016769, 0.027565219, -0.0068775974, 0.0007757429, 0.018299138, 0.0038003265, 0.01676393, 0.009807515, -0.0063601495, 0.0019224375, 0.021259896, 0.0033102934, -0.028922232, -0.011054873, 0.024015049, -0.011596307, -0.004824941, 0.015996324, 0.025166454, 0.011123409, 0.01642125, -0.010047392, 0.01414585, -0.019957712, 0.009999417, 0.023453051, -0.025673622, 0.0014469683, -0.012007524, -0.016284177, -0.014159557, -0.015297256, 0.011260481, 0.0115826, 0.0128299575, -0.007621214, -0.014022485, -0.012363912, 0.0014512518, 0.023644954, 0.02158887, 0.01971098, 0.0078336755, 0.004705003, 0.0062607722, 0.020190734, 0.02006737, -0.019107863, 0.011952695, -0.019327179, 0.019628737, -0.013556439, -0.0066137332, 0.027825655, 0.00047289906, 0.009649882, -0.015406914, -0.0034216645, -0.020684194, -0.0065554776, -0.01266547, -0.010753313, 0.02016332, -0.018806305, -0.0072579724, -0.016818758, -0.013762048, -0.0081078205, -0.032952156, 0.01661315, -0.012219986, -0.011514064, 0.03169109, -0.024261778, 0.0005153058, -0.0007594656, -0.01818948, 0.026098546, 0.007648628, -0.0021006314, -0.005918092, 0.02143809, -0.017380754, -0.00031376682, -0.0059455065, 0.012219986, -0.0068604634, 0.004283506, -0.027291073, -0.030238125, 0.017750848, -0.019327179, -0.003810607, -0.021602577, 0.021465505, 0.036707934, 0.011801915, 0.004382883, -0.0028151202, 0.0036461202, -0.0018761756, -0.0021880148, -0.030046225, 0.015763301, 0.03563877, -0.0028408212, -0.006127127, 0.01971098, 0.018902255, -0.0025152748, -0.002325087, 0.020889802, 0.031142801, 0.028894817, -0.007429313, 0.0017313932, 0.011438674, -0.025509134, 0.005842702, -0.011856745, 0.025056796, 0.0007873084, 0.019546494, 0.014611895, -0.005088805, -0.011116555, -0.09907578, -0.04421949, 0.009972002, 0.0136935115, 0.015297256, 0.025015675, -0.005164195, 0.022959592, -0.012487276, 0.038709186, 0.0028562418, -0.021396969, -0.00061596814, 0.0077308714, 0.0115826, -0.00037137998, -0.027674876, -0.011555186, -0.022630619, 0.013638683, -0.013851145, -0.016873587, -0.010444901, -0.019217521, -8.918393e-07, 0.00072348415, -0.035254966, 0.028894817, 0.03662569, 0.007038657, 0.030238125, -0.02153404, 0.021301018, -0.038078655, 0.0019464251, 0.007991308, -0.018724062, 0.00628476, 0.019930298, -0.028593259, -0.001396423, 0.0003814462, 0.015516572, -0.03001881, 0.010773874, -0.02213716, 0.00027500108, 0.0010991476, 0.012007524, -0.013241174, -0.013097248, 0.018710354, -0.0021211922, -0.014735261, 0.0070146695, -0.020862387, -0.014063607, 0.0059832013, -0.018737769, 0.004228677, 0.006229931, -0.019628737, -0.00041314415, 0.013556439, 0.022260524, 0.0019738395, -0.0149682835, -0.001852188, 0.004776966, -0.018614404, -0.0011445528, -0.012219986, -0.02681132, 0.0461385, -0.021136532, -0.0007084919, -0.019724688, -0.020204442, 0.01365239, -0.032869913, -0.0044308584, -0.030594513, 0.0014675291, -0.008190064, 0.012377619, -0.0052258773, -0.003896277, 0.0078062615, 0.0057124835, -0.034624435, 0.03328113, 0.0022394168, 0.025892938, -0.011925281, -0.025097918, -0.002141753, -0.011445528, -0.0019190107, 0.032020062, -0.01739446, -0.0038174605, -0.0042526647, -0.08059845, 0.021109117, -0.002631786, -0.0049071843, 0.0144337015, 0.0035673038, 0.015982617, -0.036762763, -0.0062402114, -0.0041361535, -0.022041209, 0.010760167, -0.0057810196, -0.010019978, -0.00223599, -0.024878602, 0.019532787, 0.005465754, 0.030621927, 0.016010031, 0.012761421, 0.011308456, 0.019286057, -0.001992687, -0.013028712, 0.00768975, -0.016654272, 0.0029367716, 0.0019464251, -0.020423757, 0.00803243, -0.006428686, -0.014419994, 0.04268428, -0.0003623846, -0.008190064, -0.0047975266, 0.0011676837, -0.00454737, 0.006805634, -0.0066582817, -0.01710661, 0.01788792, -0.018011287, -0.011013751, -0.012014378, -0.011246773, 0.011692258, 0.016476078, -0.013056126, 0.015955202, 0.025796987, -0.016325299, -0.017682312, -0.017983872, -0.054691803, 0.023987634, -0.0020166747, -0.0060311765, -0.016476078, -0.0011616868, 0.033198886, 0.015763301, -0.0074498737, 0.008251746, -0.008477915, -0.016489785, -0.015173892, 0.03234904, -0.019985126, 0.000744045, -0.021410676, 0.016791344, -0.015242428, -0.002912784, -0.0014058467, -0.004824941, -0.0035673038, -0.008320282, 0.025344647, 0.013076687, -0.004735844, -0.034130976, 0.017312218, 0.016832465, 0.017380754, -0.02508421, -0.00808726, 0.013522171, 0.012439301, 0.014707847, 0.017147731, 0.006517783, -0.0010854404, 0.013782609, 0.008512183, -0.009451128, -0.014378873, 0.010636802, 0.023891684, 0.01809353, -0.012946469, -0.014337751, -0.011644282, -0.0018453344, 0.012069207, 0.0038585821, -0.020478586, -0.011843038, 0.02208233, 0.022109745, 0.005753605, -0.005650801, 0.022904763, -0.02119136, 0.017462997, -0.0059283725, -0.008662962, -0.015585108, 0.035227552, 0.05249865, 0.007634921, 0.015489157, -0.012781982, 0.021026874, 0.013741488, 0.0053423885, -0.024330314, 0.018724062, -0.008450501, 0.008025576, -0.01824431, -0.014762675, -0.014173265, -0.020793851, -0.0004604769, 0.014214386, 0.020670487, -0.019656152, 0.072593436, -0.0074224593, -0.0040539103, 0.00272431, 0.006336162, 0.021013167, 0.006805634, 0.016681686, -0.019203814, -0.009848637, 0.012857372, 0.015077941, 0.011959549, -0.017929042, -0.009320909, -0.0033120068, -0.023192614, 0.008985083, -0.022603204, 0.0060003353, 0.025207575, 0.02445368, 0.008827449, -0.006007189, -0.027647462, -0.010602534, 0.011150823, -0.0067131105, -0.0045884917, -0.041286144, 0.019395715, -0.006212797, -0.053293668, -0.01912157, 0.018326553, -0.016530907, -0.011198798, 0.0027448707, 0.027784534, -0.0013390239, -0.024508508, 0.023754612, -0.021259896, -0.017257389, 0.022027502, -0.012103475, -0.013535879, -0.015667351, 0.0061511146 diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/3d-vector.svg b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/3d-vector.svg new file mode 100644 index 000000000..5d01727c8 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/3d-vector.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/llm-rag-create-vector.svg b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/llm-rag-create-vector.svg new file mode 100644 index 000000000..ea6fba21d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/llm-rag-create-vector.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/llm-rag-vector-process.svg b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/llm-rag-vector-process.svg new file mode 100644 index 000000000..9a8ffa067 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/llm-rag-vector-process.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/vector-distance.svg b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/vector-distance.svg new file mode 100644 index 000000000..a4f44b66d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/images/vector-distance.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/lesson.adoc new file mode 100644 index 000000000..086ad7893 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/1-vectors/lesson.adoc @@ -0,0 +1,157 @@ += Vector RAG +:order: 2 +:type: lesson + +[.slide.discrete] +== Vector RAG +You previously learned about **Retrieval Augmented Generation** (RAG) and the role of retrievers in finding relevant information. + +One of the challenges of RAG is understanding what the user is asking for and finding the correct information to pass to the LLM. + +In this lesson, you will learn about semantic search and how vector indexes can help you find relevant information from a user's question. + +[.slide] +== Semantic Search + +Semantic search aims to understand search phrases' intent and contextual meaning, rather than focusing on individual keywords. + +Traditional keyword search often depends on exact-match keywords or proximity-based algorithms that find similar words. + +For example, if you input "apple" in a traditional search, you might predominantly get results about the fruit. + +However, in a semantic search, the engine tries to gauge the context: Are you searching about the fruit, the tech company, or something else? + +The results are tailored based on the term and the perceived intent. + +[.slide.col-2] +== Vectors + +[.col] +==== +You can represent data as **vectors** to perform semantic search. + +Vectors are simply a list of numbers. +For example, the vector `[1, 2, 3]` is a list of three numbers and could represent a point in three-dimensional space. + +You can use vectors to represent many different types of data, including text, images, and audio. + +The number of dimensions in a vector is called the **dimensionality** of the vector. +A vector with three numbers has a dimensionality of 3. +Higher dimensionality captures more fine-grained meaning but is more expensive computationally and similarly, lower dimensionality is faster and cheaper to compute, but offers less nuance. +==== + +[.col] +image::images/3d-vector.svg["A diagram showing a 3d representation of the x,y,z coordinates 1,1,1 and 1,2,3", width=95%] + +[.slide] +== Embeddings + +When referring to vectors in the context of machine learning and NLP, the term "embedding" is typically used. Embeddings are numerical translations of data objects, for example images, text, or audio, represented as vectors. +This way, LLM algorithms will be able to compare two different text paragraphs by comparing their numerical representations. + +Each dimension in a vector can represent a particular semantic aspect of the word or phrase. +When multiple dimensions are combined, they can convey the overall meaning of the word or phrase. + +For example, the word "apple" might be represented by an embedding with the following dimensions: + +* fruit +* technology +* color +* taste +* shape + +When applied in a search context, the vector for "apple" can be compared to the vectors for other words or phrases to determine the most relevant results. + +You can create embeddings in various ways, but one of the most common methods is to use an *embedding model*. + +[.transcript-only] +==== +For example, the embedding for the word "apple" is `0.0077788467, -0.02306925, -0.007360777, -0.027743412, -0.0045747845, 0.01289164, -0.021863015, -0.008587573, 0.01892967, -0.029854324, -0.0027962727, 0.020108491, -0.004530236, 0.009129008,` ... and so on. + +[%collapsible] +.Reveal the completed embeddings for the word "apple"! +===== +[source] +---- +include::apple-embedding.adoc[] +---- +===== + +[NOTE] +.Embedding models +===== +OpenAI's `text-embedding-ada-002` embedding model created this embedding - a vector of 1,536 dimensions. + +LLM providers typically expose API endpoints that convert a _chunk_ of text into a vector embedding. +Depending on the provider, the shape and size of the vector may differ. +===== +==== + +[.slide.discrete] +== Embeddings + +While it is possible to create embeddings for individual words, embedding entire sentences or paragraphs is more common, as the meaning of a word can change based on its context. + +For example, the word _bank_ will have a different vector in _river bank_ than in _savings bank_. + +Semantic search systems can use these contextual embeddings to understand user intent. + +Embeddings can represent more than just text. +They can also represent entire documents, images, audio, or other data types. + +[.slide.col-2] +== How are vectors used in semantic search? + +[.col] +==== +You can use the _distance_ or _angle_ between vectors to gauge the semantic similarity between words or phrases. + +Words with similar meanings or contexts will have vectors that are close together, while unrelated words will be farther apart. +==== + +[.col] +image::images/vector-distance.svg[A 3 dimensional chart illustrating the distance between vectors. The vectors are for the words "apple" and "fruit", width=90%] + +[.slide] +== RAG + +Semantic search is employed in vector-based RAG to find contextually relevant results for a user's question. + +An embedding model is used to create a vector representation of the source data. + +image::images/llm-rag-create-vector.svg["A diagram showing data being processed by an embedding model to create a vector representation of the data. The data is then stored in a vector index."] + +[.slide.discrete.col-2] +== RAG + +[.col] +==== +When a user submits a question, the system: + +. Creates an embedding of the question. +. Compares the question vector to the vectors of the indexed data. +. The results are scored based on their similarity. +. The most relevant results are used as context for the LLM. +==== + +[.col] +==== +image::images/llm-rag-vector-process.svg[A diagram showing a user question being processed by an embedding model to create a vector representation of the question. The question vector is then compared to the vectors of the indexed data. The most relevant results are used as context for the LLM.] +==== + +[.transcript-only] +==== +[TIP] +.Learn more about vectors and embeddings +===== +You can learn more about vectors, embeddings, and semantic search in the GraphAcademy course link:https://graphacademy.neo4j.com/courses/llm-vectors-unstructured/[Introduction to Vector Indexes and Unstructured Data^] +===== +==== + + +[.summary] +== Lesson Summary + +In this lesson, you learned about vectors and embeddings, and how they can be used in RAG to find relevant information. + +In the next lesson, you will use a vector index in Neo4j to find relevant data. diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/2-vector-cypher-retriever/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/2-vector-cypher-retriever/lesson.adoc new file mode 100644 index 000000000..50481536d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/2-vector-cypher-retriever/lesson.adoc @@ -0,0 +1,194 @@ += Vector + Cypher retriever +:type: lesson +:order: 2 +:branch: new-workshop + +[.slide] +== Overview + +The chunks in the knowledge graph include vector embeddings that allow for similarity search based on vector distance. + +In this lesson, you will create a vector retriever that uses these embeddings to find the most relevant chunks for a given query. + +The retriever can then use the structured and unstructured data in the knowledge graph to provide additional context. + +[.slide] +== Create the Vector Index + +You will need to create a vector index on the `Chunk` nodes `embedding` properties: + +[source, cypher] +---- +CREATE VECTOR INDEX chunkEmbedding IF NOT EXISTS +FOR (n:Chunk) +ON n.embedding +OPTIONS {indexConfig: { + `vector.dimensions`: 1536, + `vector.similarity_function`: 'cosine' +}}; +---- + +A vector index named `chunkEmbedding` will be created for nodes with a `Chunk` label, indexing the `embedding` property. +The index is configured to 1536 dimensions (as provided by the `text-embedding-ada-002` embedding model) and use cosine similarity for distance calculations. + +[.slide] +== Search the Vector Index + +You can search the vector index by creating an embedding for a search term: + +[source, cypher] +---- +WITH genai.vector.encode( + "Retrieval Augmented Generation", + "OpenAI", + { token: "sk-..." }) AS userEmbedding +CALL db.index.vector.queryNodes('chunkEmbedding', 5, userEmbedding) +YIELD node, score +RETURN node.text, score +---- + +[.transcript-only] +==== +[IMPORTANT] +.OpenAI token +===== +You will need to update the `$token` with your OpenAI API key. +===== +==== + +[.slide] +== Create a Vector + Cypher GraphRAG pipeline + +The `neo4j_graphrag` package includes a `VectorCypherRetriever` class that combines vector similarity search with Cypher retrieval. + +You can use this retriever to create a `GraphRAG` pipeline to: + +. Perform a vector similarity search to find the most relevant chunks for a given query. +. Use a Cypher query to add additional information to the context. +. Pass the context to an LLM to generate a response to the original query. + +[.slide-only] +==== +**Continue with the lesson to create the Vector + Cypher retriever.** +==== + +[.transcript-only] +=== Retriever + +Open `workshop-genai/vector_cypher_rag.py` and review the code: + +[source, python] +.vector_cypher_rag.py +---- +include::{repository-raw}/{branch}/workshop-genai/vector_cypher_rag.py[] +---- + +The retriever is configured to use the `chunkEmbedding` vector index you just created. + +[source, python] +.Retriever initialization +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/vector_cypher_rag.py[tag=retriever] +---- + +When you run the code: + +. The `VectorCypherRetriever` uses the vector index to find chunks similar to the query: ++ +_"Where can I learn more about knowledge graphs?"_ +. The `GraphRAG` pipeline passes the text from those chunks as context to the LLM. +. The response from the LLM is printed: ++ +_You can learn more about knowledge graphs in the Neo4j blog post linked here: link:https://neo4j.com/blog/what-is-knowledge-graph[What Is a Knowledge Graph?^]_ + +You can print the context passed to the LLM by adding the following to the end of the code: + +[source, python] +.Print context +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/vector_cypher_rag.py[tag=print_context] +---- + +[.transcript-only] +=== Retrieval Cypher Query + +The `VectorCypherRetriever` also allows you to define a Cypher query to retrieve additional context from the knowledge graph. + +Adding additional context can help the LLM generate more accurate responses. + +Update the `retrieval_query` to add additional information about the lessons, technologies, and concepts related to the chunks: + +[source, python] +.Enhanced retrieval query +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/vector_cypher_rag.py[tag=retrieval_query] +---- + +The retriever will execute the Cypher query adding more context. + +Running the code again for the same query, _"Where can I learn more about knowledge graphs?"_, will produce a more detailed response: + +_You can learn more about knowledge graphs in the Neo4j blog post linked here: link:https://neo4j.com/blog/what-is-knowledge-graph[What Is a Knowledge Graph?^]. Additionally, you can explore further lessons on knowledge graphs on the GraphAcademy website, specifically in the course "GenAI Fundamentals," including the sections "What is a Knowledge Graph" and "Creating Knowledge Graphs."_ + +The retrieval query includes additional context relating to technologies and concepts mentioned in the chunks. + +Experiment asking different questions relating to the knowledge graph such as _"What technologies and concepts support knowledge graphs?"_. + +[.transcript-only] +=== Generalize entity retrieval + +The retriever currently uses the knowledge graph to add additional context related to technologies and concepts. +The specific entities allow for targeted retrieval, however you may also want to generalize the retrieval to include all related entities. + +You can use the node labels and relationship types to create a response that includes details about the entities. + +This cypher query retrieves all related entities between the chunks: + +[source, cypher] +.Related entities +---- +MATCH (c:Chunk)<-[:FROM_CHUNK]-(entity)-[r]->(other)-[:FROM_CHUNK]->() +RETURN DISTINCT + labels(entity)[2], entity.name, entity.type, entity.description, + type(r), + labels(other)[2], other.name, other.type, other.description +---- + +The output uses the node labels, properties, and relationship types to output rows which form statements such as: + +* `Concept` "Semantic Search" `RELATED_TO` `Technology` "Vector Indexes" +* `Technology` "Retrieval Augmented Generation" `HAS_CHALLENGE` "Understanding what the user is asking for and finding the correct information to pass to the LLM"` + +These statements can be used to create additional context for the LLM to generate responses. + +Modify the `retrieval_query` to include all entities associated with the chunk: + +[source, python] +.Enhanced retrieval query with all related entities +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/vector_cypher_rag.py[tag=advanced_retrieval_query] +---- + +[TIP] +.Format the context +==== +The Cypher functions `reduce` and `coalesce` are used to format the associated entities into readable statements. The `reduce` function adds space characters between the values, and `coalesce` replaces null values with empty strings. +==== + +[.slide] +== Experiment + +Experiment running the code with different queries to see how the additional context changes the responses. + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you: + +* Created a vector index on the `Chunk` nodes in the knowledge graph. +* Used the `VectorCypherRetriever` to perform vector similarity search and Cypher retrieval. +* Created a `GraphRAG` pipeline to generate responses with context from the knowledge graph. + +In the next lesson, you will use the `Text2CypherRetriever` retriever to get information from the knowledge graph based on natural language questions. diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/3-text-to-cypher-retriever/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/3-text-to-cypher-retriever/lesson.adoc new file mode 100644 index 000000000..28bbf70ff --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/lessons/3-text-to-cypher-retriever/lesson.adoc @@ -0,0 +1,87 @@ += Text to Cypher retriever +:type: lesson +:order: 2 +:branch: new-workshop + +[.slide.discrete] +== Overview + +The `Text2CypherRetriever` retriever allows you to create `GraphRAG` pipelines that can answer natural language questions by generating and executing Cypher queries against the knowledge graph. + +Using text to cypher retrieval can help you get precise information from the knowledge graph based on user questions. For example, how many lessons are in a course, what concepts are covered in a module, or how technologies relate to each other. + +In this lesson, you will create a text to Cypher retriever and use it to answer questions about the data in knowledge graph. + +[.slide-only] +==== +**Continue with the lesson to create the text to Cypher retriever.** +==== + +[.transcript-only] +=== Create a Text2CypherRetriever GraphRAG pipeline + +Open `workshop-genai/text2cypher_rag.py` and review the code. + +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/text2cypher_rag.py[] +---- + +The retriever is configured to use your database connection and given an example of how to query nodes by name: + +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/text2cypher_rag.py[tag=examples] + +include::{repository-raw}/{branch}/workshop-genai/solutions/text2cypher_rag.py[tag=retriever] +---- + +The response includes the Cypher statement that the LLM generated and the results from executing the query: + +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/text2cypher_rag.py[tag=print_response] +---- + +Running the code for the query, _"How many technologies are mentioned in the knowledge graph?"_, will produce a response similar to: + +_114 technologies are mentioned in the knowledge graph._ + +The context shows that the LLM generated and executed the following Cypher query: + +[source, cypher] +---- +MATCH (t:Technology) RETURN count(t) AS technologyCount +---- + +The following data was returned from the knowledge graph and passed as context to the LLM: + +[source, python] +---- +[RetrieverResultItem(content='', metadata=None)] +---- + +[.slide] +== Experiment with Different Questions + +The `Text2CypherRetriever` passed the graph schema to the LLM to help it generate accurate Cypher queries. + +Try asking different questions about the knowledge graph such as: + +* _How does Neo4j relate to other technologies?_ +* _What entities exist in the knowledge graph?_ +* _Which lessons cover Generative AI concepts?_ + +Review the responses, the generated Cypher queries, and the results passed to the LLM. + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you: + +* Created a `GraphRAG` pipeline using the `Text2CypherRetriever`. +* Used natural language questions to generate and execute Cypher queries against the knowledge graph. + +In the next module you will explore how to customize the `SimpleKGPipeline` to create knowledge graphs for different types of data, scenarios, and use cases. diff --git a/asciidoc/courses/workshop-genai2/modules/3-retrieval/module.adoc b/asciidoc/courses/workshop-genai2/modules/3-retrieval/module.adoc new file mode 100644 index 000000000..87cd37b7d --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/3-retrieval/module.adoc @@ -0,0 +1,17 @@ += Retrieval +:order: 3 + + +== Module Overview + +In this module, you will learn: + +* How to use retrieval techniques to find relevant information from a knowledge graph. +* How vector, vector + cypher, and text to cypher retrievers can be used to get relevant data from a knowledge graph. + +[TIP] +The link:https://graphacademy.neo4j.com/courses/genai-fundamentals[GraphAcademy Neo4j and Generative AI course^] includes more information on retrievers and how to use them with graphs. + +If you are ready, let's get going! + +link:./1-vectors/[Ready? Let's go →, role=btn] diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/1-what-is-an-agent/images/agent-process.svg b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/1-what-is-an-agent/images/agent-process.svg new file mode 100644 index 000000000..24961a411 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/1-what-is-an-agent/images/agent-process.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/1-what-is-an-agent/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/1-what-is-an-agent/lesson.adoc new file mode 100644 index 000000000..4616c18ae --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/1-what-is-an-agent/lesson.adoc @@ -0,0 +1,58 @@ += What is an Agent? +:order: 1 +:type: lesson + +[.slide.discrete] +== Definition + +An **AI agent** is a system that combines a Large Language Model (LLM) with the ability to take actions in the real world. + +Unlike a simple chatbot that can only respond with text, an agent can interact with external systems, retrieve information, and execute tasks autonomously. + +[.slide] +== Key Components + +**LLM**: The agent uses an LLM for reasoning, planning, and decision-making. + +**Tools**: Agents have access to a set of _tools_ that extend their capabilities beyond text generation. Tools are typically retrievers that can: + +* Access databases +* Make API calls to external services +* Undertake File system operations +* Search the web +* Execute code + +**Decision Making**: The agent uses the LLM to analyze tasks, determine which tools are needed, and coordinate their use to achieve goals. + +[.slide.col-2] +== How Agents Work + +[.col] +==== +1. **Receive a task** or query from a user +2. **Plan and reason** using the LLM to break down complex tasks +3. **Select and execute tools** based on what's needed +4. **Observe and process results** and determine next steps +5. **Iterate** until the task is complete or provide a final response + +This autonomous capability makes agents particularly powerful for complex, multi-step tasks that require both reasoning and real-world interaction. +==== + +[.col] +image::./images/agent-process.svg["A flowchart showing the agent process from user input to LLM reasoning, tool selection, tool execution, and final output.", width=90%] + +[.slide] +== Creating an Agent + +You will use Neo4j, Python, and link:https://www.langchain.com/[LangChain^] to build an agent that will use the retrievers you've already learned about as tools. + +You will progressively add more tools to the agent, starting with a simple schema introspection tool, then adding document retrieval, and finally text-to-Cypher capabilities. + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you learned how agents use reasoning and tools to perform complex tasks autonomously. + +In the next lesson, you review a simple Python agent that uses LangChain to interact with Neo4j. diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/2-langchain-agent/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/2-langchain-agent/lesson.adoc new file mode 100644 index 000000000..75ab9d0ab --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/2-langchain-agent/lesson.adoc @@ -0,0 +1,111 @@ += LangChain Agent +:order: 2 +:type: challenge +:branch: new-workshop + +[.slide] +== Overview + +You will be updating a LangChain agent, adding a set of tools, to interact with Neo4j. + +In this lesson you will: + +* Review the agent code +* Investigate how the agent works +* Experiment with different queries + +[.slide-only] +==== +**Continue with the lesson to create the text to Cypher retriever.** +==== + +[.transcript-only] +=== Agent + +Open `workshop-genai/agent.py`. + +[source, python] +.agent.py +---- +include::{repository-raw}/{branch}/workshop-genai/agent.py[] +---- + +Review the code and try to answer the following questions: + +. What is the agent's function? +. What do you think the response to the `query` will be? +. How could you extend the agent? + +Run the agent to see what it does. + +[.transcript-only] +=== Review + +This program is a LangChain agent that uses a Neo4j database. +The agent has access to a single tool which retrieves the database schema. +The agent uses this tool to answer questions about the database structure. + +The code: + +. Creates an `OpenAI` LLM `model`. ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent.py[tag=model] +---- +. Connects to your Neo4j database. ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent.py[tag=driver] +---- +. Defines a `Get-graph-database-schema` tool. ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent.py[tag=tools] +---- ++ +The tool uses the Neo4j driver to get the database schema and return it as a string. ++ +[NOTE] +.Determine what tool to use +==== +The agent will use the tool's name (`Get-graph-database-schema`), and docstring (`Get the schema of the graph database.`) to determine whether it should execute the tool to resolve a user's query. +==== +. Creates a react (Reasoning and Acting) agent using the `model` and `tools`. ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent.py[tag=agent] +---- +. Runs the agent, passing the `query` and streams the response. ++ +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent.py[tag=run] +---- ++ +When you run the agent, you will see: + +** The messages between `Human`, `AI`, and `Tool` +** The context of the database schema +** The agent's final response + +[.slide] +== Experiment + +Experiment with agent, modify the `query` to ask different questions, for example: + +* `"What questions can I answer using this graph database?"` +* `"How are concepts related to other entities?"` +* `"How does the graph model relate technologies to benefits?"` + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you reviewed an agent which can answer questions about a Neo4j database schema. + +In the next lesson, you will add a vector + cypher tool to the agent to enable semantic search capabilities. diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/3-agent-search-lesson/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/3-agent-search-lesson/lesson.adoc new file mode 100644 index 000000000..d58d157d4 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/3-agent-search-lesson/lesson.adoc @@ -0,0 +1,126 @@ += Search lesson content +:order: 3 +:type: challenge +:branch: new-workshop + +[.slide.discrete] +== Overview + +In this lesson, you will enhance your agent by adding a search lesson tool using a vector + cypher retriever. + +The agent decide will decide which tool is best for each question: + +**Schema Tool** - to understand the database structure: + +- "What entities exist in the graph?" +- "How are technologies related to concepts?" + +**Search Lesson Tool** - for finding content within the lessons: + +- "What are the benefits of using GraphRAG?" +- "THow are Knowledge Graphs associated with other technologies?" + +[.slide] +== Search lessons tool + +You will modify the `agent.py` code to: + +. Create a `VectorCypherRetriever` retriever that uses the `Chunk` vector index. +. Define a new `tool` function that uses this retriever to search for lesson content. +. Add the new tool to the agent's list of available tools. + +[.slide-only] +==== +**Continue with the lesson to create the text to Cypher retriever.** +==== + +[.transcript-only] +=== Update the agent + +Open `workshop-genai\agent.py` and make the following changes: + +. Add an embedding model for the retriever to convert the user query into a vector. ++ +[source,python] +.embedder +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=import_embedder] + +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=embedder] +---- +. Create a `retrieval_query` that the retriever will use to add additional context to the vector search results. ++ +[source,python] +.retrieval_query +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=retrieval_query] +---- +. Create a `VectorCypherRetriever` using the `chunkEmbedding` index, Neo4j `driver`, and `embedder`. ++ +[source,python] +.retriever +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=import_retriever] + +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=retriever] +---- +. Define a tool function to search for lesson content using the retriever. ++ +[source,python] +.Search-lesson-content tool +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=search_lessons] +---- ++ +[NOTE] +.Tool description +==== +The tool name `search-lesson-content` and docstring `Search for lesson content related to the query.` help the agent decide when to use this tool. +==== +. Update the `tools` list to include the new lesson search tool. ++ +[source,python] +.tools +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=tools] +---- +. Modify the `query` variable to test the new lesson search tool. ++ +[source,python] +.query +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=query] +---- + +[%collapsible] +.Reveal the complete code +==== +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_vector.py[tag=**;!example_queries] +---- +==== + +Run the agent. The agent should decide to use the new tool based on the query. + +[.slide] +## Experiment + +Experiment with agent, modify the `query` to ask different questions, for example: + +* `"How are Knowledge Graphs associated with other technologies?"` +* `"Summarize what concepts are associated with Knowledge Graphs?"` +* `"How would you minimize hallucinations in LLMs?"` + +Asking questions related to the graph schema should still use the schema tool, for example: + +* `"What entities exist in the graph?"` + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you modified the agent to use an additional tool for searching lesson content using a vector + Cypher retriever. + +In the next lesson, you will add tool that can query the database directly using a text to Cypher retriever. diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/4-agent-query-db/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/4-agent-query-db/lesson.adoc new file mode 100644 index 000000000..822abbdee --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/4-agent-query-db/lesson.adoc @@ -0,0 +1,127 @@ +# Query database +:order: 4 +:type: challenge +:branch: new-workshop + +[.slide.discrete] +== Overview + +In this lesson, you will add a tool to your agent so it can query the database directly. + +The tool will use a `TextToCypherRetriever` retriever to convert user queries into Cypher statements and return the results as context. + +Text to Cypher tools are useful for "catch all" scenarios where the user may ask a question not covered by other tools, such as: + +* Finding specific data in the graph. +* Performing aggregations or calculations. +* Exploring relationships between entities. + +[.slide] +== Query database tool + +You will modify the `agent.py` code to: + +. Create a `Text2CypherRetriever` retriever that uses and `llm` to convert user queries into Cypher. +. Define a new `tool` function that uses this retriever to query the database. +. Add the new tool to the agent's list of available tools. + +[.slide-only] +==== +**Continue with the lesson to create the text to Cypher retriever.** +==== + +[.transcript-only] +=== Modify the agent + +Open `workshop-genai\agent.py` and make the following changes: + +. Instantiate a `llm` that will be used to generate the Cypher. ++ +[source,python] +.Cypher generating llm +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=import_llm] + +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=llm] +---- +. Create a `Text2CypherRetriever` using the `llm` and Neo4j `driver`. ++ +[source,python] +.Text2CypherRetriever +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=import_retriever] + +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=retriever] +---- +. Define a tool function to query the database using the retriever. ++ +[source,python] +.Query-database tool +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=query_database] +---- +. Update the `tools` list to include the new database query tool. ++ +[source,python] +.tools +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=tools] +---- +. Modify the `query` variable to test the new database query tool. ++ +[source,python] +.query +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=query] +---- + +[%collapsible] +.Reveal the complete code +==== +[source, python] +---- +include::{repository-raw}/{branch}/workshop-genai/solutions/agent_text2cypher.py[tag=**;!example_queries] +---- +==== + +Run the agent. The agent should use the new database query tool to answer the question. + +You can see the Cypher query generated in the tool context's metadata. + +[.slide] +## Experiment + +Experiment with the agent, modify the `query` to ask different questions, for example: + +* `"Each lesson is part of a module. How many lessons are in each module?"` +* `"Search the graph and return a list of challenges."` +* `"What benefits are associated to the technologies described in the knowledge graph?"` + +Asking questions related to the graph schema or lesson content should still use the other tools, for example: + +* `"What entities exist in the graph?"` +* `"What are the benefits of using GraphRAG?"` + +You may find that the agent will execute multiple tools to answer some questions. + +[.transcript-only] +==== +[TIP] +.Specific tools +===== +You can create multiple Text to Cypher tools that are specialized for different types of queries. + +For example, you could create one tool for querying lessons and another for querying technologies. + +Each tool could have different prompt templates or examples to help the LLM generate more accurate Cypher for specific domains. +===== +==== + +read::Continue[] + +[.summary] +== Lesson Summary + +In this lesson, you added a query database tool to your agent using a text to Cypher retriever. + +In the next optional challenge, you will create your own agent with a custom set of tools. \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/5-create-an-agent/lesson.adoc b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/5-create-an-agent/lesson.adoc new file mode 100644 index 000000000..fc73e86f2 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/lessons/5-create-an-agent/lesson.adoc @@ -0,0 +1,29 @@ += Create an agent +:type: challenge +:order: 5 + +[.slide.discrete] +== Overview + +In this optional challenge, you will apply what you have learned to create an agent with a custom set of tools. + +You should: + +. Use either the lesson knowledge graph or link:../../2-knowledge-graph-construction/7-create-a-kg[your own knowledge graph^] from the previous challenge. +. Create an agent using the example code from this workshop. +. Define a set of tools that the agent can use to answer user queries based on the knowledge graph, these could include: +** Cypher query tools that run Cypher queries against the graph and return the results +** Vector + Cypher retrievers to semantically search the knowledge graph. +** Text to Cypher tools to query specific data from the graph. +. Test the agent with different user queries to see how it uses the tools to provide answers. + +You can come back to this challenge at any time. + +read::Complete[] + +[.summary] +== Lesson Summary + +In this lesson, you applied what you have learned throughout the workshop to create your agent to interact with a knowledge graph. + +Congratulations on completing the workshop. \ No newline at end of file diff --git a/asciidoc/courses/workshop-genai2/modules/4-agents/module.adoc b/asciidoc/courses/workshop-genai2/modules/4-agents/module.adoc new file mode 100644 index 000000000..32943fb05 --- /dev/null +++ b/asciidoc/courses/workshop-genai2/modules/4-agents/module.adoc @@ -0,0 +1,11 @@ += Agents +:order: 4 + +In this module, you will: + +* Learn about agents and their capabilities. +* Review how agents process user input and make decisions. +* Create a simple agent using Neo4j, Python, and LangChain. +* Integrate schema, vector + cypher, and text to Cypher retrievers into your agent. + +link:./1-what-is-an-agent/[Ready? Let's go →, role=btn] diff --git a/asciidoc/courses/workshop-genai2/summary.adoc b/asciidoc/courses/workshop-genai2/summary.adoc new file mode 100644 index 000000000..213bc9a6b --- /dev/null +++ b/asciidoc/courses/workshop-genai2/summary.adoc @@ -0,0 +1,19 @@ += Course Summary + +Congratulations on completing the Neo4j and Generative AI workshop. + +You have learned: + +* The basics of Generative AI and Large Language Models (LLMs) +* What Retrieval-Augmented Generation (RAG) is and why it is important +* How GraphRAG can improve the quality of LLM-generated content +* How to construct knowledge graphs from structured and unstructured data +* How to use Vectors in Neo4j for similarity search +* To build different types of retrievers using the `neo4j-graphrag` for Python package. +* To create a simple agent using Neo4j, Python, and LangChain. + +Continue your learning with the following resources: + +* link:https://graphacademy.neo4j.com[GraphAcademy^] - Free online training for Neo4j +* link:https://graphacademy.neo4j.com/courses/genai-fundamentals[Neo4j and Generative AI Fundamentals^] - Learn how Neo4j and GraphRAG can support your Generative AI projects +* link:https://graphacademy.neo4j.com/courses/genai-integration-langchain[Using Neo4j with LangChain^] - Learn how to use Neo4j in your GenAI applications with LangChain