DEV-185072: introduce context specification mcp tools#77
DEV-185072: introduce context specification mcp tools#77martin-anderson-collibra wants to merge 8 commits into
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Hey @martin-anderson-collibra, thank you for contributing. |
|
@nevers thanks, added a new skill. Let me know if you see any issues with it |
| return &chip.Tool[Input, Output]{ | ||
| Name: "get_context", | ||
| Title: "Get Context", | ||
| Description: "Context generation execution tool. Executes a Context Specification against a specific asset and returns the generated context as structured YAML. This is the final step in the context workflow: use list_context_specifications to discover specs, optionally use get_context_specification to inspect the mapping, then call this tool to generate the context.", |
There was a problem hiding this comment.
@martin-anderson-collibra any reason this could not live as a functional parameter for get_asset_details? Context is a very broad term and I worry an LLM will get confused figuring out when to use these tools vs others
There was a problem hiding this comment.
I've discussed with our PM and we agree that it doesn't make sense to combine it with get_asset_details, but we came up with a more specific name - get_asset_context_from_specification. Is it okay?
We also have a new skill describing in more detail how these tools should be used, so that will hopefully help as well.
There was a problem hiding this comment.
I worry about sprawl of LLM solving user problems. What is the different between details and context? I would prefer, prior to release of impactful tools such as this, performance tests be conducted to ensure we understand behavior of tools. Can you explain a bit more the reasoning for the separation?
As it stands, I still believe this should sit as an available option for get_asset_details. Happy to chat with you and/or your PM to work through this, and I can be convinced if performance testing bears out the approach of having multiple tools of this nature
| return &chip.Tool[Input, Output]{ | ||
| Name: "get_context_specification", | ||
| Title: "Get Context Specification", | ||
| Description: "Inspection tool. Returns the full Context Specification including the mappingYaml configuration, so you can understand which fields and metrics will be populated before executing get_context. Use this after list_context_specifications to examine a specific spec before running it against an asset.", |
There was a problem hiding this comment.
@martin-anderson-collibra this should be more specific. What is a context specification in the context of Collibra? What types of fields and metrics?
There was a problem hiding this comment.
I've updated the description to be more specific
| return &chip.Tool[Input, Output]{ | ||
| Name: "list_context_specifications", | ||
| Title: "List Context Specifications", | ||
| Description: "Primary discovery tool for Context Specifications. Returns all Context Specifications applicable to a given asset or asset type. Use assetId to find specs that match the type of a specific asset, or assetTypePublicId to filter by a known asset type. Call this first to discover which Context Specifications are available before calling get_context.", |
There was a problem hiding this comment.
@martin-anderson-collibra Same comment - please be specific about what a 'Context Specification' is
| |---|---|---|---| | ||
| | `listContextSpecifications` | Discover which Context Specifications (Knowledge Graph blueprints) are available for an asset or asset type | List of spec names, descriptions, and IDs | Always, entry point | | ||
| | `getContextSpecification` | Inspect a spec's blueprint: which relations it defines, what fields it extracts from the Knowledge Graph, what transforms it applies | Complete YAML mapping and spec metadata | Optional, only when user asks what a spec covers | | ||
| | `getContext` | Execute a spec's blueprint against an asset to extract and shape its governed metadata subset | Structured metadata (JSON, YAML, etc.) shaped for the target system | Always, output step | |
There was a problem hiding this comment.
Is getContext supposed to be the renamed get_asset_context_from_specification here.
There was a problem hiding this comment.
Indeed, looks like I missed those. Fixed and applied a few other changes suggested by AI
c5b057f to
b604dc6
Compare
| return &chip.Tool[Input, Output]{ | ||
| Name: "get_context", | ||
| Title: "Get Context", | ||
| Description: "Context generation execution tool. Executes a Context Specification against a specific asset and returns the generated context as structured YAML. This is the final step in the context workflow: use list_context_specifications to discover specs, optionally use get_context_specification to inspect the mapping, then call this tool to generate the context.", |
There was a problem hiding this comment.
I worry about sprawl of LLM solving user problems. What is the different between details and context? I would prefer, prior to release of impactful tools such as this, performance tests be conducted to ensure we understand behavior of tools. Can you explain a bit more the reasoning for the separation?
As it stands, I still believe this should sit as an available option for get_asset_details. Happy to chat with you and/or your PM to work through this, and I can be convinced if performance testing bears out the approach of having multiple tools of this nature
| return &chip.Tool[Input, Output]{ | ||
| Name: "list_context_specifications", | ||
| Title: "List Context Specifications", | ||
| Description: "Retrieve a list of available Context Specifications. A Context Specification defines how to extract governed metadata from Collibra. Starting from an asset (e.g., a Data Product), it specifies which relations to traverse, what fields to pull (name, status, description), and what shape to return for a target system (Snowflake, Databricks, custom for AI agents). Use this to discover which Contexts are available for querying metadata about specific asset or asset type.", |
There was a problem hiding this comment.
An LLM may struggle with how this extract is different than keyword search or get asset details or some other mechanism for getting metadata.
Specifically, I would be careful with this sentence, even though there is more detail after it. Please be very specific
A Context Specification defines how to extract governed metadata from Collibra
🎯 What does this PR do?
This PR introduces a new experimental feature gate,
context-specifications, which adds three new Model Context Protocol (MCP) tools. These tools integrate with Collibra's Semantic Blueprint and Context Engine APIs to allow LLMs to discover, inspect, and generate structured YAML context for assets.🚀 Key Changes
New Experimental Feature Gate: Added
context-specificationstoknownExperimentalFeaturesincmd/chip/experimental.go, preventing these tools from exposing themselves unless explicitly opted into by the user.API Client Implementations (
pkg/clients):semantic_blueprint_client.go: Handles fetching paged context specifications (ListContextSpecifications) and single specification details (GetContextSpecification).context_engine_client.go: Handles context generation (GenerateContext), allowing responses in both rawapplication/yamland full JSON envelope format (application/json) based on theincludeMetadataflag.error.go: Introduces a specializedexecuteCollibraRequesterror wrapper that extracts machine-readable error codes and human-readable user messages from thecollibraStandardErrorenvelope, helping downstream LLMs understand exact API failures.New MCP Tools Added:
list_context_specifications: Primary discovery tool used to find specs matching an asset ID or asset type.get_context_specification: Inspection tool that fetches specific blueprint mapping YAML configurations.get_context: Execution tool that processes the specification against an asset and surfaces structured semantic context.README.mdto track and document the new experimental context tools.✅ Checklist