diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/elasticsearch-mcp-server-for-chatgpt.ipynb b/supporting-blog-content/elasticsearch-chatgpt-connector/elasticsearch-mcp-server-for-chatgpt.ipynb new file mode 100644 index 00000000..ca5a08cc --- /dev/null +++ b/supporting-blog-content/elasticsearch-chatgpt-connector/elasticsearch-mcp-server-for-chatgpt.ipynb @@ -0,0 +1,1180 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "tvOHZz8TX_cp" + }, + "source": [ + "# Elasticsearch MCP Server for ChatGPT\n", + "\n", + "This notebook demonstrates how to deploy an MCP (Model Context Protocol) server that connects ChatGPT to Elasticsearch, enabling natural language queries over internal GitHub issues and pull requests.\n", + "\n", + "## What You'll Build\n", + "An MCP server that allows ChatGPT to search and retrieve information from your Elasticsearch index using natural language queries, combining semantic and keyword search for optimal results.\n", + "\n", + "## Steps\n", + "- **Install Dependencies**: Set up required Python packages (fastmcp, elasticsearch, pyngrok, pandas)\n", + "- **Configure Environment**: Set up Elasticsearch credentials and ngrok token\n", + "- **Initialize Elasticsearch**: Connect to your Elasticsearch cluster\n", + "- **Create Index**: Define mappings with semantic_text field for ELSER\n", + "- **Load Sample Data**: Import GitHub issues/PRs dataset\n", + "- **Ingest Documents**: Bulk index documents into Elasticsearch\n", + "- **Define MCP Tools**: Create search and fetch functions for ChatGPT\n", + "- **Deploy Server**: Start MCP server with ngrok tunnel\n", + "- **Connect to ChatGPT**: Get public URL for ChatGPT connector setup\n", + "\n", + "## Prerequisites\n", + "- Elasticsearch cluster with ELSER model deployed\n", + "- Ngrok account with auth token\n", + "- Python 3.8+" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LuAQtd-BYfTZ" + }, + "source": [ + "## Install Dependencies\n", + "\n", + "This cell installs all required Python packages: `fastmcp` for the MCP server framework, `elasticsearch` for connecting to Elasticsearch, `pyngrok` for creating a public tunnel, and `pandas` for data manipulation.\n", + "\n", + "**Alternative:** You can also install dependencies using the provided 'requirements.txt' file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yDRrpVfMNZpA", + "outputId": "8b8a3d59-92d9-4763-9ca0-30fd46763c94" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Dependencies installed\n" + ] + } + ], + "source": [ + "!pip install fastmcp elasticsearch pyngrok pandas -q\n", + "print(\"Dependencies installed\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "So0cgu4cYy4e" + }, + "source": [ + "## Import Libraries\n", + "\n", + "Import all necessary Python libraries for building and running the MCP server, including FastMCP for the server framework, Elasticsearch client for database connections, and pyngrok for tunneling." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "EVWlHusQpy9n", + "outputId": "ea0eb9a3-1524-4016-e01f-1b2a510f5eb2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Libraries imported successfully\n" + ] + } + ], + "source": [ + "import os\n", + "import json\n", + "import logging\n", + "import threading\n", + "import time\n", + "import pandas as pd\n", + "from typing import Dict, List, Any\n", + "from getpass import getpass\n", + "from fastmcp import FastMCP\n", + "from elasticsearch import Elasticsearch\n", + "from elasticsearch.helpers import bulk\n", + "from pyngrok import ngrok\n", + "from pyngrok.conf import PyngrokConfig\n", + "\n", + "logging.basicConfig(level=logging.INFO)\n", + "logger = logging.getLogger(__name__)\n", + "\n", + "print(\"Libraries imported successfully\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ycdj_n8DY8p8" + }, + "source": [ + "## Setup Configuration\n", + "\n", + "Load required credentials from environment variables or prompt for manual input. You'll need:\n", + "- **ELASTICSEARCH_URL**: Your Elasticsearch cluster endpoint\n", + "- **ELASTICSEARCH_API_KEY**: API key with read/write access \n", + "- **NGROK_TOKEN**: Free token from [ngrok.com](https://dashboard.ngrok.com/)\n", + "- **ELASTICSEARCH_INDEX**: Index name (defaults to 'github_internal')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "os.environ[\"ELASTICSEARCH_URL\"] = os.environ.get(\"ELASTICSEARCH_URL\") or getpass(\"Enter your Elasticsearch URL: \")\n", + "os.environ[\"ELASTICSEARCH_API_KEY\"] = os.environ.get(\"ELASTICSEARCH_API_KEY\") or getpass(\"Enter your Elasticsearch API key: \")\n", + "os.environ[\"NGROK_TOKEN\"] = os.environ.get(\"NGROK_TOKEN\") or getpass(\"Enter your Ngrok Token: \")\n", + "os.environ[\"ELASTICSEARCH_INDEX\"] = os.environ.get(\"ELASTICSEARCH_INDEX\") or getpass(\"Enter your Elasticsearch Index name (default: github_internal): \") or \"github_internal\"\n", + "\n", + "ELASTICSEARCH_URL = os.environ[\"ELASTICSEARCH_URL\"]\n", + "ELASTICSEARCH_API_KEY = os.environ[\"ELASTICSEARCH_API_KEY\"]\n", + "NGROK_TOKEN = os.environ[\"NGROK_TOKEN\"]\n", + "INDEX_NAME = os.environ[\"ELASTICSEARCH_INDEX\"]\n", + "\n", + "print(\"Configuration loaded successfully\")\n", + "print(f\"Index name: {INDEX_NAME}\")\n", + "print(f\"Elasticsearch URL: {ELASTICSEARCH_URL[:30]}...\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "s5nVMK5EhdCL" + }, + "source": [ + "## Initialize Elasticsearch Client\n", + "\n", + "Create an Elasticsearch client using your credentials and verify the connection by pinging the cluster. This ensures your credentials are valid before proceeding." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Z4tv4KLWhe8X", + "outputId": "aa70e0ef-f46e-4df6-f188-c024ef0e4aed" + }, + "outputs": [], + "source": [ + "es_client = Elasticsearch(\n", + " ELASTICSEARCH_URL,\n", + " api_key=ELASTICSEARCH_API_KEY\n", + ")\n", + "\n", + "if es_client.ping():\n", + " print(\"Elasticsearch connection successful\")\n", + " cluster_info = es_client.info()\n", + " print(f\"Cluster: {cluster_info['cluster_name']}\")\n", + " print(f\"Version: {cluster_info['version']['number']}\")\n", + "else:\n", + " print(\"ERROR: Could not connect to Elasticsearch\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rYfKl1uhZyLo" + }, + "source": [ + "## Create Index with Mappings\n", + "\n", + "Create an Elasticsearch index with optimized mappings for hybrid search. The key field is `text_semantic` which uses ELSER (`.elser-2-elasticsearch`) for semantic search, while other fields enable traditional keyword search." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "YqRm1fRqZ2E-", + "outputId": "a1e4b28f-8b19-4d59-ad02-2d7a4bfdefe0" + }, + "outputs": [], + "source": [ + "try:\n", + " es_client.indices.create(\n", + " index=INDEX_NAME,\n", + " body={\n", + " \"mappings\": {\n", + " \"properties\": {\n", + " \"id\": {\"type\": \"keyword\"},\n", + " \"title\": {\"type\": \"text\"},\n", + " \"text\": {\"type\": \"text\"},\n", + " \"text_semantic\": {\n", + " \"type\": \"semantic_text\",\n", + " \"inference_id\": \".elser-2-elasticsearch\"\n", + " },\n", + " \"url\": {\"type\": \"keyword\"},\n", + " \"type\": {\"type\": \"keyword\"},\n", + " \"status\": {\"type\": \"keyword\"},\n", + " \"priority\": {\"type\": \"keyword\"},\n", + " \"assignee\": {\"type\": \"keyword\"},\n", + " \"created_date\": {\"type\": \"date\", \"format\": \"iso8601\"},\n", + " \"resolved_date\": {\"type\": \"date\", \"format\": \"iso8601\"},\n", + " \"labels\": {\"type\": \"keyword\"},\n", + " \"related_pr\": {\"type\": \"keyword\"}\n", + " }\n", + " }\n", + " }\n", + " )\n", + " print(f\"Index '{INDEX_NAME}' created successfully\")\n", + "except Exception as e:\n", + " if 'resource_already_exists_exception' in str(e):\n", + " print(f\"Index '{INDEX_NAME}' already exists\")\n", + " else:\n", + " print(f\"Error creating index: {e}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sFw3zTwpcYJq" + }, + "source": [ + "## Load Sample Dataset\n", + "\n", + "Load the sample GitHub dataset containing 15 documents with issues, pull requests, and RFCs. The dataset includes realistic content with descriptions, comments, assignees, priorities, and relationships between issues and PRs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded 15 documents from dataset\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idtitletexturltypestatuspriorityassigneecreated_dateresolved_datelabelsrelated_pr
0ISSUE-1712Migrate from Elasticsearch 7.x to 8.xDescription: Current Elasticsearch cluster run...https://internal-git.techcorp.com/issues/1712issuein_progressmediumdavid_data2025-09-01None[infrastructure, elasticsearch, migration, upg...PR-598
1RFC-038API Versioning Strategy and Deprecation PolicyAbstract: Establishes a formal API versioning ...https://internal-git.techcorp.com/rfcs/038rfcclosedmediumsarah_dev2025-09-032025-09-25[api, architecture, design, rfc]None
2ISSUE-1834Add rate limiting per user endpointDescription: Currently rate limiting is implem...https://internal-git.techcorp.com/issues/1834issueclosedmediumjohn_backend2025-09-052025-09-12[feature, api, redis, rate-limiting]PR-543
3ISSUE-1756Implement OAuth2 support for external API inte...Description: Product team requesting OAuth2 au...https://internal-git.techcorp.com/issues/1756issueopenhighsarah_dev2025-09-08None[feature, api, security, oauth]None
4PR-543Implement per-user rate limiting with RedisDescription: Implements sliding window rate li...https://internal-git.techcorp.com/pulls/543pull_requestclosedmediumjohn_backend2025-09-102025-09-12[feature, redis, rate-limiting]None
5RFC-045Design Proposal: Microservices Migration Archi...Abstract: This RFC proposes a phased approach ...https://internal-git.techcorp.com/rfcs/045rfcopenhightech_lead_mike2025-09-14None[architecture, microservices, design, rfc]None
6ISSUE-1847API Gateway returning 429 errors during peak h...Description: Users are experiencing 429 rate l...https://internal-git.techcorp.com/issues/1847issueclosedcriticaljohn_backend2025-09-152025-09-18[bug, api, production, performance]PR-567
7PR-567Fix connection pool exhaustion in API middlewareDescription: Implements exponential backoff an...https://internal-git.techcorp.com/pulls/567pull_requestclosedcriticaljohn_backend2025-09-162025-09-18[bug-fix, api, performance]None
8ISSUE-1889SQL injection vulnerability in search endpointDescription: Security audit identified SQL inj...https://internal-git.techcorp.com/issues/1889issueclosedcriticalsarah_dev2025-09-182025-09-19[security, vulnerability, bug, sql]PR-578
9PR-578Security hotfix: Patch SQL injection vulnerabi...Description: CRITICAL SECURITY FIX for ISSUE-1...https://internal-git.techcorp.com/pulls/578pull_requestclosedcriticalsarah_dev2025-09-192025-09-19[security, hotfix, sql]None
10PR-598Elasticsearch 8.x migration - Application code...Description: Updates application code for Elas...https://internal-git.techcorp.com/pulls/598pull_requestin_progressmediumdavid_data2025-09-20None[infrastructure, elasticsearch, migration]None
11ISSUE-1923PostgreSQL query timeout in analytics serviceDescription: The analytics dashboard is timing...https://internal-git.techcorp.com/issues/1923issuein_progresshighdavid_data2025-09-22None[bug, database, performance, postgresql]PR-589
12PR-589Add database index for analytics query optimiz...Description: Resolves ISSUE-1923 by adding com...https://internal-git.techcorp.com/pulls/589pull_requestopenhighdavid_data2025-09-23None[database, performance, postgresql]None
13ISSUE-1998Memory leak in notification microserviceDescription: Notification service consuming in...https://internal-git.techcorp.com/issues/1998issuein_progresscriticaljohn_backend2025-09-28None[bug, production, memory-leak, microservices]PR-612
14PR-612Fix memory leak in WebSocket notification serviceDescription: Resolves ISSUE-1998 memory leak c...https://internal-git.techcorp.com/pulls/612pull_requestin_progresscriticaljohn_backend2025-09-29None[bug-fix, memory-leak, websocket]None
\n", + "
" + ], + "text/plain": [ + " id title \\\n", + "0 ISSUE-1712 Migrate from Elasticsearch 7.x to 8.x \n", + "1 RFC-038 API Versioning Strategy and Deprecation Policy \n", + "2 ISSUE-1834 Add rate limiting per user endpoint \n", + "3 ISSUE-1756 Implement OAuth2 support for external API inte... \n", + "4 PR-543 Implement per-user rate limiting with Redis \n", + "5 RFC-045 Design Proposal: Microservices Migration Archi... \n", + "6 ISSUE-1847 API Gateway returning 429 errors during peak h... \n", + "7 PR-567 Fix connection pool exhaustion in API middleware \n", + "8 ISSUE-1889 SQL injection vulnerability in search endpoint \n", + "9 PR-578 Security hotfix: Patch SQL injection vulnerabi... \n", + "10 PR-598 Elasticsearch 8.x migration - Application code... \n", + "11 ISSUE-1923 PostgreSQL query timeout in analytics service \n", + "12 PR-589 Add database index for analytics query optimiz... \n", + "13 ISSUE-1998 Memory leak in notification microservice \n", + "14 PR-612 Fix memory leak in WebSocket notification service \n", + "\n", + " text \\\n", + "0 Description: Current Elasticsearch cluster run... \n", + "1 Abstract: Establishes a formal API versioning ... \n", + "2 Description: Currently rate limiting is implem... \n", + "3 Description: Product team requesting OAuth2 au... \n", + "4 Description: Implements sliding window rate li... \n", + "5 Abstract: This RFC proposes a phased approach ... \n", + "6 Description: Users are experiencing 429 rate l... \n", + "7 Description: Implements exponential backoff an... \n", + "8 Description: Security audit identified SQL inj... \n", + "9 Description: CRITICAL SECURITY FIX for ISSUE-1... \n", + "10 Description: Updates application code for Elas... \n", + "11 Description: The analytics dashboard is timing... \n", + "12 Description: Resolves ISSUE-1923 by adding com... \n", + "13 Description: Notification service consuming in... \n", + "14 Description: Resolves ISSUE-1998 memory leak c... \n", + "\n", + " url type status \\\n", + "0 https://internal-git.techcorp.com/issues/1712 issue in_progress \n", + "1 https://internal-git.techcorp.com/rfcs/038 rfc closed \n", + "2 https://internal-git.techcorp.com/issues/1834 issue closed \n", + "3 https://internal-git.techcorp.com/issues/1756 issue open \n", + "4 https://internal-git.techcorp.com/pulls/543 pull_request closed \n", + "5 https://internal-git.techcorp.com/rfcs/045 rfc open \n", + "6 https://internal-git.techcorp.com/issues/1847 issue closed \n", + "7 https://internal-git.techcorp.com/pulls/567 pull_request closed \n", + "8 https://internal-git.techcorp.com/issues/1889 issue closed \n", + "9 https://internal-git.techcorp.com/pulls/578 pull_request closed \n", + "10 https://internal-git.techcorp.com/pulls/598 pull_request in_progress \n", + "11 https://internal-git.techcorp.com/issues/1923 issue in_progress \n", + "12 https://internal-git.techcorp.com/pulls/589 pull_request open \n", + "13 https://internal-git.techcorp.com/issues/1998 issue in_progress \n", + "14 https://internal-git.techcorp.com/pulls/612 pull_request in_progress \n", + "\n", + " priority assignee created_date resolved_date \\\n", + "0 medium david_data 2025-09-01 None \n", + "1 medium sarah_dev 2025-09-03 2025-09-25 \n", + "2 medium john_backend 2025-09-05 2025-09-12 \n", + "3 high sarah_dev 2025-09-08 None \n", + "4 medium john_backend 2025-09-10 2025-09-12 \n", + "5 high tech_lead_mike 2025-09-14 None \n", + "6 critical john_backend 2025-09-15 2025-09-18 \n", + "7 critical john_backend 2025-09-16 2025-09-18 \n", + "8 critical sarah_dev 2025-09-18 2025-09-19 \n", + "9 critical sarah_dev 2025-09-19 2025-09-19 \n", + "10 medium david_data 2025-09-20 None \n", + "11 high david_data 2025-09-22 None \n", + "12 high david_data 2025-09-23 None \n", + "13 critical john_backend 2025-09-28 None \n", + "14 critical john_backend 2025-09-29 None \n", + "\n", + " labels related_pr \n", + "0 [infrastructure, elasticsearch, migration, upg... PR-598 \n", + "1 [api, architecture, design, rfc] None \n", + "2 [feature, api, redis, rate-limiting] PR-543 \n", + "3 [feature, api, security, oauth] None \n", + "4 [feature, redis, rate-limiting] None \n", + "5 [architecture, microservices, design, rfc] None \n", + "6 [bug, api, production, performance] PR-567 \n", + "7 [bug-fix, api, performance] None \n", + "8 [security, vulnerability, bug, sql] PR-578 \n", + "9 [security, hotfix, sql] None \n", + "10 [infrastructure, elasticsearch, migration] None \n", + "11 [bug, database, performance, postgresql] PR-589 \n", + "12 [database, performance, postgresql] None \n", + "13 [bug, production, memory-leak, microservices] PR-612 \n", + "14 [bug-fix, memory-leak, websocket] None " + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "file_path = 'github_internal_dataset.json'\n", + "df = pd.read_json(file_path)\n", + "\n", + "documents = df.to_dict('records')\n", + "print(f\"Loaded {len(documents)} documents from dataset\")\n", + "\n", + "df" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "80BHi4HgdSxh" + }, + "source": [ + "## Ingest Documents to Elasticsearch\n", + "\n", + "Bulk index all documents into Elasticsearch. The code copies the `text` field to `text_semantic` for ELSER processing, then waits 15 seconds for semantic embeddings to be generated before verifying the document count." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "u1JgCgwldWhN", + "outputId": "1cae2312-5508-4757-a3c8-c2a2870ba129" + }, + "outputs": [], + "source": [ + "def generate_actions():\n", + " for doc in documents:\n", + " doc['text_semantic'] = doc['text']\n", + " yield {\n", + " '_index': INDEX_NAME,\n", + " '_source': doc\n", + " }\n", + "\n", + "try:\n", + " success, errors = bulk(es_client, generate_actions())\n", + " print(f\"Successfully indexed {success} documents\")\n", + "\n", + " if errors:\n", + " print(f\"Errors during indexing: {errors}\")\n", + "\n", + " print(\"Waiting 15 seconds for ELSER to process documents...\")\n", + " time.sleep(15)\n", + "\n", + " count = es_client.count(index=INDEX_NAME)['count']\n", + " print(f\"Total documents in index: {count}\")\n", + "\n", + "except Exception as e:\n", + " print(f\"Error during bulk indexing: {str(e)}\")\n", + " print(\"If you see timeout errors, wait a few seconds and try again\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OTg5ePAbdbCW" + }, + "source": [ + "## Define MCP Server\n", + "\n", + "Define the MCP server with two tools that ChatGPT will use:\n", + "1. **search(query)**: Hybrid search combining semantic (ELSER) and keyword (BM25) search using RRF (Reciprocal Rank Fusion). Returns top 10 results with id, title, and url.\n", + "2. **fetch(id)**: Retrieves complete document details by ID, returning all fields including full text content and metadata." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "5aLcJzLxdeLS", + "outputId": "3c29e137-1c9f-43ee-bb33-c18945c34680" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "MCP server defined successfully\n" + ] + } + ], + "source": [ + "server_instructions = \"\"\"\n", + "This MCP server provides access to TechCorp's internal GitHub issues and pull requests.\n", + "Use search to find relevant issues/PRs, then fetch to get complete details.\n", + "\"\"\"\n", + "\n", + "def create_server():\n", + " mcp = FastMCP(\n", + " name=\"Elasticsearch GitHub Issues MCP\",\n", + " instructions=server_instructions\n", + " )\n", + "\n", + " @mcp.tool()\n", + " async def search(query: str) -> Dict[str, List[Dict[str, Any]]]:\n", + " \"\"\"\n", + " Search for internal issues and PRs using hybrid search.\n", + " Returns list with id, title, and url.\n", + " \"\"\"\n", + " if not query or not query.strip():\n", + " return {\"results\": []}\n", + "\n", + " logger.info(f\"Searching for: '{query}'\")\n", + "\n", + " try:\n", + " # Hybrid search using RRF: combines semantic (ELSER) + keyword (multi_match) results\n", + " response = es_client.search(\n", + " index=INDEX_NAME,\n", + " size=10,\n", + " source=[\"id\", \"title\", \"url\", \"type\", \"priority\"],\n", + " retriever={\n", + " \"rrf\": {\n", + " \"retrievers\": [\n", + " {\n", + " # Semantic retriever using ELSER embeddings\n", + " \"standard\": {\n", + " \"query\": {\n", + " \"semantic\": {\n", + " \"field\": \"text_semantic\",\n", + " \"query\": query\n", + " }\n", + " }\n", + " }\n", + " },\n", + " {\n", + " # Keyword retriever with fuzzy matching\n", + " \"standard\": {\n", + " \"query\": {\n", + " \"multi_match\": {\n", + " \"query\": query,\n", + " \"fields\": [\n", + " \"title^3\",\n", + " \"text^2\",\n", + " \"assignee^2\",\n", + " \"type\",\n", + " \"labels\",\n", + " \"priority\"\n", + " ],\n", + " \"type\": \"best_fields\",\n", + " \"fuzziness\": \"AUTO\"\n", + " }\n", + " }\n", + " }\n", + " }\n", + " ],\n", + " \"rank_window_size\": 50,\n", + " \"rank_constant\": 60\n", + " }\n", + " }\n", + " )\n", + "\n", + " # Extract and format search results\n", + " results = []\n", + " if response and 'hits' in response:\n", + " for hit in response['hits']['hits']:\n", + " source = hit['_source']\n", + " results.append({\n", + " \"id\": source.get('id', hit['_id']),\n", + " \"title\": source.get('title', 'Unknown'),\n", + " \"url\": source.get('url', '')\n", + " })\n", + "\n", + " logger.info(f\"Found {len(results)} results\")\n", + " return {\"results\": results}\n", + "\n", + " except Exception as e:\n", + " logger.error(f\"Search error: {e}\")\n", + " raise ValueError(f\"Search failed: {str(e)}\")\n", + "\n", + " @mcp.tool()\n", + " async def fetch(id: str) -> Dict[str, Any]:\n", + " \"\"\"\n", + " Retrieve complete issue/PR details by ID.\n", + " Returns id, title, text, url, and metadata.\n", + " \"\"\"\n", + " if not id:\n", + " raise ValueError(\"ID is required\")\n", + "\n", + " logger.info(f\"Fetching: {id}\")\n", + "\n", + " try:\n", + " # Query by ID to get full document\n", + " response = es_client.search(\n", + " index=INDEX_NAME,\n", + " body={\n", + " \"query\": {\n", + " \"term\": {\n", + " \"id\": id\n", + " }\n", + " },\n", + " \"size\": 1\n", + " }\n", + " )\n", + "\n", + " if not response or not response['hits']['hits']:\n", + " raise ValueError(f\"Document with id '{id}' not found\")\n", + "\n", + " hit = response['hits']['hits'][0]\n", + " source = hit['_source']\n", + "\n", + " # Return all document fields\n", + " result = {\n", + " \"id\": source.get('id', id),\n", + " \"title\": source.get('title', 'Unknown'),\n", + " \"text\": source.get('text', ''),\n", + " \"url\": source.get('url', ''),\n", + " \"type\": source.get('type', ''),\n", + " \"status\": source.get('status', ''),\n", + " \"priority\": source.get('priority', ''),\n", + " \"assignee\": source.get('assignee', ''),\n", + " \"created_date\": source.get('created_date', ''),\n", + " \"resolved_date\": source.get('resolved_date', ''),\n", + " \"labels\": source.get('labels', ''),\n", + " \"related_pr\": source.get('related_pr', '')\n", + " }\n", + "\n", + " logger.info(f\"Fetched: {result['title']}\")\n", + " return result\n", + "\n", + " except Exception as e:\n", + " logger.error(f\"Fetch error: {e}\")\n", + " raise ValueError(f\"Failed to fetch '{id}': {str(e)}\")\n", + "\n", + " return mcp\n", + "\n", + "print(\"MCP server defined successfully\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0aTU4xeedpHc" + }, + "source": [ + "## Start Ngrok Tunnel\n", + "\n", + "Create a public HTTPS tunnel using ngrok to expose your local MCP server on port 8000. This allows ChatGPT to connect to your server from anywhere. Copy the displayed URL (ending in `/sse`) to use in ChatGPT's connector settings." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "SN-wlTOGdtIs", + "outputId": "c78552d6-914d-4327-943f-6430d4f12569" + }, + "outputs": [], + "source": [ + "ngrok.set_auth_token(NGROK_TOKEN)\n", + "\n", + "pyngrok_config = PyngrokConfig(region=\"us\")\n", + "public_url = ngrok.connect(\n", + " 8000,\n", + " \"http\",\n", + " pyngrok_config=pyngrok_config,\n", + " bind_tls=True\n", + ")\n", + "\n", + "print(\"=\"*70)\n", + "print(\"MCP SERVER IS READY!\")\n", + "print(\"=\"*70)\n", + "print(f\"\\nPublic URL (use in ChatGPT): {public_url}/sse\")\n", + "print(\"\\nIMPORTANT: Copy the URL above (including /sse at the end)\")\n", + "print(\"\\nTo connect in ChatGPT:\")\n", + "print(\"1. Go to Settings > Connectors\")\n", + "print(\"2. Click 'Create' or 'Add Custom Connector'\")\n", + "print(\"3. Paste the URL above\")\n", + "print(\"4. Save and start using!\")\n", + "print(\"\\nKeep this notebook running while using the connector\")\n", + "print(\"=\"*70)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "SFfSlDVAdxe1" + }, + "source": [ + "## Run MCP Server\n", + "\n", + "Start the MCP server in a background thread using SSE (Server-Sent Events) transport. The server runs on `0.0.0.0:8000` and stays alive to handle requests from ChatGPT via the ngrok tunnel. Keep this cell running while using the connector." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "okQYfcJfdzp-", + "outputId": "d226ff48-41fc-4b5b-b63d-f651a20d23d5" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Starting MCP server...\n", + "Server is running. To stop: Runtime > Interrupt execution\n", + "\n", + "Server started successfully!\n", + "Your ngrok URL is ready to use in ChatGPT\n", + "Keep this cell running...\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n",
+       "\n",
+       "                 ╭──────────────────────────────────────────────────────────────────────────────╮                  \n",
+       "                                                                                                                 \n",
+       "                                                                                          \n",
+       "                                                                                               \n",
+       "                                                                                                                 \n",
+       "                                                FastMCP 2.13.0.2                                                 \n",
+       "                                                                                                                 \n",
+       "                                                                                                                 \n",
+       "                                🖥  Server name: Elasticsearch GitHub Issues MCP                                  \n",
+       "                                                                                                                 \n",
+       "                                📦 Transport:   SSE                                                              \n",
+       "                                🔗 Server URL:  http://0.0.0.0:8000/sse                                          \n",
+       "                                                                                                                 \n",
+       "                                📚 Docs:        https://gofastmcp.com                                            \n",
+       "                                🚀 Hosting:     https://fastmcp.cloud                                            \n",
+       "                                                                                                                 \n",
+       "                 ╰──────────────────────────────────────────────────────────────────────────────╯                  \n",
+       "\n",
+       "\n",
+       "
\n" + ], + "text/plain": [ + "\n", + "\n", + " \u001b[2m╭──────────────────────────────────────────────────────────────────────────────╮\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[38;2;0;198;255m \u001b[0m\u001b[38;2;0;195;255m▄\u001b[0m\u001b[38;2;0;192;255m▀\u001b[0m\u001b[38;2;0;189;255m▀\u001b[0m\u001b[38;2;0;186;255m \u001b[0m\u001b[38;2;0;184;255m▄\u001b[0m\u001b[38;2;0;181;255m▀\u001b[0m\u001b[38;2;0;178;255m█\u001b[0m\u001b[38;2;0;175;255m \u001b[0m\u001b[38;2;0;172;255m█\u001b[0m\u001b[38;2;0;169;255m▀\u001b[0m\u001b[38;2;0;166;255m▀\u001b[0m\u001b[38;2;0;163;255m \u001b[0m\u001b[38;2;0;160;255m▀\u001b[0m\u001b[38;2;0;157;255m█\u001b[0m\u001b[38;2;0;155;255m▀\u001b[0m\u001b[38;2;0;152;255m \u001b[0m\u001b[38;2;0;149;255m█\u001b[0m\u001b[38;2;0;146;255m▀\u001b[0m\u001b[38;2;0;143;255m▄\u001b[0m\u001b[38;2;0;140;255m▀\u001b[0m\u001b[38;2;0;137;255m█\u001b[0m\u001b[38;2;0;134;255m \u001b[0m\u001b[38;2;0;131;255m█\u001b[0m\u001b[38;2;0;128;255m▀\u001b[0m\u001b[38;2;0;126;255m▀\u001b[0m\u001b[38;2;0;123;255m \u001b[0m\u001b[38;2;0;120;255m█\u001b[0m\u001b[38;2;0;117;255m▀\u001b[0m\u001b[38;2;0;114;255m█\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[38;2;0;198;255m \u001b[0m\u001b[38;2;0;195;255m█\u001b[0m\u001b[38;2;0;192;255m▀\u001b[0m\u001b[38;2;0;189;255m \u001b[0m\u001b[38;2;0;186;255m \u001b[0m\u001b[38;2;0;184;255m█\u001b[0m\u001b[38;2;0;181;255m▀\u001b[0m\u001b[38;2;0;178;255m█\u001b[0m\u001b[38;2;0;175;255m \u001b[0m\u001b[38;2;0;172;255m▄\u001b[0m\u001b[38;2;0;169;255m▄\u001b[0m\u001b[38;2;0;166;255m█\u001b[0m\u001b[38;2;0;163;255m \u001b[0m\u001b[38;2;0;160;255m \u001b[0m\u001b[38;2;0;157;255m█\u001b[0m\u001b[38;2;0;155;255m \u001b[0m\u001b[38;2;0;152;255m \u001b[0m\u001b[38;2;0;149;255m█\u001b[0m\u001b[38;2;0;146;255m \u001b[0m\u001b[38;2;0;143;255m▀\u001b[0m\u001b[38;2;0;140;255m \u001b[0m\u001b[38;2;0;137;255m█\u001b[0m\u001b[38;2;0;134;255m \u001b[0m\u001b[38;2;0;131;255m█\u001b[0m\u001b[38;2;0;128;255m▄\u001b[0m\u001b[38;2;0;126;255m▄\u001b[0m\u001b[38;2;0;123;255m \u001b[0m\u001b[38;2;0;120;255m█\u001b[0m\u001b[38;2;0;117;255m▀\u001b[0m\u001b[38;2;0;114;255m▀\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1;34mFastMCP 2.13.0.2\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m🖥 \u001b[0m\u001b[1m \u001b[0m\u001b[36mServer name:\u001b[0m\u001b[36m \u001b[0m\u001b[1;2;34mElasticsearch GitHub Issues MCP\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m \u001b[0m\u001b[36m \u001b[0m\u001b[1;2;34m \u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m📦\u001b[0m\u001b[1m \u001b[0m\u001b[36mTransport: \u001b[0m\u001b[36m \u001b[0m\u001b[2mSSE \u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m🔗\u001b[0m\u001b[1m \u001b[0m\u001b[36mServer URL: \u001b[0m\u001b[36m \u001b[0m\u001b[2mhttp://0.0.0.0:8000/sse \u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m \u001b[0m\u001b[1m \u001b[0m\u001b[36m \u001b[0m\u001b[36m \u001b[0m\u001b[2m \u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m📚\u001b[0m\u001b[1m \u001b[0m\u001b[36mDocs: \u001b[0m\u001b[36m \u001b[0m\u001b[2mhttps://gofastmcp.com \u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[1m🚀\u001b[0m\u001b[1m \u001b[0m\u001b[36mHosting: \u001b[0m\u001b[36m \u001b[0m\u001b[2mhttps://fastmcp.cloud \u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n", + " \u001b[2m╰──────────────────────────────────────────────────────────────────────────────╯\u001b[0m \n", + "\n", + "\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/html": [ + "
[11/13/25 11:36:01] INFO     Starting MCP server 'Elasticsearch GitHub Issues MCP' with transport    server.py:2050\n",
+       "                             'sse' on http://0.0.0.0:8000/sse                                                      \n",
+       "
\n" + ], + "text/plain": [ + "\u001b[2;36m[11/13/25 11:36:01]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Starting MCP server \u001b[32m'Elasticsearch GitHub Issues MCP'\u001b[0m with transport \u001b[2mserver.py\u001b[0m\u001b[2m:\u001b[0m\u001b[2m2050\u001b[0m\n", + "\u001b[2;36m \u001b[0m \u001b[32m'sse'\u001b[0m on \u001b[4;94mhttp://0.0.0.0:8000/sse\u001b[0m \u001b[2m \u001b[0m\n" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO: Started server process [47952]\n", + "INFO: Waiting for application startup.\n", + "INFO: Application startup complete.\n", + "INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)\n", + "INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:09-0300 lvl=info msg=\"join connections\" obj=join id=2f547f1e02b9 l=127.0.0.1:8000 r=191.233.196.115:8612\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO: 191.233.196.115:0 - \"POST /sse HTTP/1.1\" 405 Method Not Allowed\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:10-0300 lvl=info msg=\"join connections\" obj=join id=f157e39aac9d l=127.0.0.1:8000 r=191.233.196.120:47762\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO: 191.233.196.120:0 - \"GET /sse HTTP/1.1\" 200 OK\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:10-0300 lvl=info msg=\"join connections\" obj=join id=5a9192136cfb l=127.0.0.1:8000 r=191.233.196.117:53796\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "INFO: 191.233.196.117:0 - \"POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1\" 202 Accepted\n", + "INFO: 191.233.196.117:0 - \"POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1\" 202 Accepted\n", + "INFO: 191.233.196.117:0 - \"POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1\" 202 Accepted\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest\n", + "INFO:pyngrok.process.ngrok:t=2025-11-13T11:47:43-0300 lvl=info msg=\"received stop request\" obj=app stopReq=\"{err: restart:false}\"\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Server stopped\n" + ] + } + ], + "source": [ + "server = create_server()\n", + "\n", + "print(\"Starting MCP server...\")\n", + "print(\"Server is running. To stop: Runtime > Interrupt execution\")\n", + "print()\n", + "\n", + "def run_server():\n", + " server.run(transport=\"sse\", host=\"0.0.0.0\", port=8000)\n", + "\n", + "server_thread = threading.Thread(target=run_server, daemon=True)\n", + "server_thread.start()\n", + "\n", + "print(\"Server started successfully!\")\n", + "print(\"Your ngrok URL is ready to use in ChatGPT\")\n", + "print(\"Keep this cell running...\")\n", + "print()\n", + "\n", + "try:\n", + " while True:\n", + " time.sleep(1)\n", + "except KeyboardInterrupt:\n", + " print(\"\\nServer stopped\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Example: ChatGPT Interaction\n", + "\n", + "Here's an example of ChatGPT using the Elasticsearch connector to search through GitHub issues:\n", + " - **Search tool:**\n", + "\n", + " ![Search Example](./images/chatgpt-search-example.png)\n", + " - **Fetch tool:**\n", + "\n", + " ![Fetch Example](./images/chatgpt-fetch-example.png)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oNzxwv__hy8D" + }, + "source": [ + "## Cleanup (Optional)\n", + "\n", + "Delete the Elasticsearch index to remove all demo data. \n", + "**WARNING**: This permanently deletes all documents in the index. Only run this if you want to start fresh or clean up after the demo." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "7C6sin_gh2Be" + }, + "outputs": [], + "source": [ + "try:\n", + " result = es_client.options(ignore_status=[400, 404]).indices.delete(index=INDEX_NAME)\n", + " if result.get('acknowledged', False):\n", + " print(f\"Index '{INDEX_NAME}' deleted successfully\")\n", + " else:\n", + " print(f\"Error deleting index: {result}\")\n", + "except Exception as e:\n", + " print(f\"Error: {e}\")" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": ".venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.14.0" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/github_internal_dataset.json b/supporting-blog-content/elasticsearch-chatgpt-connector/github_internal_dataset.json new file mode 100644 index 00000000..719e6347 --- /dev/null +++ b/supporting-blog-content/elasticsearch-chatgpt-connector/github_internal_dataset.json @@ -0,0 +1,212 @@ +[ + { + "id": "ISSUE-1712", + "title": "Migrate from Elasticsearch 7.x to 8.x", + "text": "Description: Current Elasticsearch cluster running 7.17 which reaches EOL in Q1 2026. Need to upgrade to 8.x to get security updates and new features.\n\nPlanning:\n- 3-node cluster with 500GB data per node\n- 15M documents across 8 indices\n- Zero-downtime upgrade required\n\nMigration Steps:\n1. Set up parallel 8.x cluster\n2. Configure cross-cluster replication\n3. Update application code for breaking changes\n4. Gradual traffic migration using feature flags\n5. Decommission 7.x cluster\n\nComments:\n- @david_data: Main breaking change is removal of mapping types\n- @sarah_dev: API client library needs upgrade from elasticsearch-py 7.x to 8.x\n- @john_backend: Testing pagination changes, from/size behavior different in 8.x\n- @alex_devops: Provisioned new cluster in staging, ready for testing\n- @maria_frontend: No frontend changes needed, API contract stays same\n\nCurrent Status: Staging migration successful. PR-598 contains all code changes. Planning production migration for next weekend.", + "url": "https://internal-git.techcorp.com/issues/1712", + "type": "issue", + "status": "in_progress", + "priority": "medium", + "assignee": "david_data", + "created_date": "2025-09-01", + "resolved_date": null, + "labels": ["infrastructure", "elasticsearch", "migration", "upgrade"], + "related_pr": "PR-598" + }, + { + "id": "RFC-038", + "title": "API Versioning Strategy and Deprecation Policy", + "text": "Abstract: Establishes a formal API versioning strategy and deprecation policy to maintain backward compatibility while enabling API evolution.\n\nProblem Statement:\n- Current API has breaking changes in minor releases\n- No clear deprecation timeline for endpoints\n- Customers complain about unexpected breaking changes\n- Support burden from maintaining old API behavior\n\nProposed Versioning Strategy:\n- Semantic versioning: MAJOR.MINOR.PATCH\n- URL-based versioning: /api/v1, /api/v2, etc.\n- Header-based version override: X-API-Version: 2.1\n- Maintain 2 major versions simultaneously\n\nDeprecation Policy:\n1. Announce deprecation 6 months in advance\n2. Add deprecation warnings in response headers\n3. Update documentation with migration guide\n4. Send email notifications to affected users\n5. Sunset endpoint after 6-month grace period\n\nImplementation:\n- API version middleware to route requests\n- Automated deprecation header injection\n- Deprecation tracking dashboard\n- User notification system integration\n\nResponse Headers:\n- X-API-Version: 2.0\n- X-API-Deprecated: true\n- X-API-Sunset-Date: 2026-03-01\n- X-API-Migration-Guide: https://docs.techcorp.com/migration/v2\n\nDocumentation Requirements:\n- Changelog for each version\n- Migration guides with code examples\n- API version compatibility matrix\n- Deprecation timeline calendar\n\nDiscussion:\n- @sarah_dev: URL versioning is clearer than header-only approach\n- @maria_frontend: 6 months feels right for enterprise customers\n- @tech_lead_mike: Need to track which users are on old versions\n- @john_backend: I'll implement version routing middleware\n- @product_manager_lisa: This will improve customer trust\n\nStatus: Approved. Implementation tracked in ISSUE-2034. Target: Q4 2025 release.", + "url": "https://internal-git.techcorp.com/rfcs/038", + "type": "rfc", + "status": "closed", + "priority": "medium", + "assignee": "sarah_dev", + "created_date": "2025-09-03", + "resolved_date": "2025-09-25", + "labels": ["api", "architecture", "design", "rfc"], + "related_pr": null + }, + { + "id": "ISSUE-1834", + "title": "Add rate limiting per user endpoint", + "text": "Description: Currently rate limiting is implemented globally at gateway level. Need per-user rate limiting to prevent abuse while allowing legitimate high-volume users.\n\nRequirements:\n- Implement sliding window rate limiter using Redis\n- Different limits for free tier (100 req/min) and premium (1000 req/min)\n- Return remaining quota in response headers\n- Graceful degradation if Redis is unavailable\n\nComments:\n- @john_backend: I'll implement using Redis sorted sets with ZREMRANGEBYSCORE\n- @sarah_dev: Need to coordinate with billing service for tier information\n- @maria_frontend: Should we show rate limit info in user dashboard?\n- @tech_lead_mike: Yes, add a usage widget showing current consumption\n\nImplementation completed in PR-543. Deployed to production 2025-09-12. Monitoring shows 300+ users hitting limits daily with appropriate 429 responses. No performance impact on API response times.", + "url": "https://internal-git.techcorp.com/issues/1834", + "type": "issue", + "status": "closed", + "priority": "medium", + "assignee": "john_backend", + "created_date": "2025-09-05", + "resolved_date": "2025-09-12", + "labels": ["feature", "api", "redis", "rate-limiting"], + "related_pr": "PR-543" + }, + { + "id": "ISSUE-1756", + "title": "Implement OAuth2 support for external API integrations", + "text": "Description: Product team requesting OAuth2 authentication support for third-party integrations. Currently only supporting API key authentication which doesn't meet security requirements for enterprise customers.\n\nRequirements:\n- Support OAuth2 authorization code flow\n- Implement refresh token rotation\n- Add scopes for granular permission control\n- Support PKCE for mobile apps\n\nComments:\n- @sarah_dev: I can handle the backend implementation, estimated 3-4 days\n- @maria_frontend: Frontend changes needed for OAuth flow UI, will coordinate\n- @security_team_alice: Please ensure refresh tokens are stored encrypted in PostgreSQL\n- @john_backend: Let's use the authlib library for OAuth2, it's well maintained\n- @tech_lead_mike: Approved, targeting for Q4 release\n\nImplementation Notes:\n- Using PostgreSQL for token storage with encryption at rest\n- Redis for temporary authorization codes (5 minute TTL)\n- Integration tests covering all OAuth2 flows\n- Documentation for API consumers\n\nCurrent Status: Design phase complete, implementation starting next sprint.", + "url": "https://internal-git.techcorp.com/issues/1756", + "type": "issue", + "status": "open", + "priority": "high", + "assignee": "sarah_dev", + "created_date": "2025-09-08", + "resolved_date": null, + "labels": ["feature", "api", "security", "oauth"], + "related_pr": null + }, + { + "id": "PR-543", + "title": "Implement per-user rate limiting with Redis", + "text": "Description: Implements sliding window rate limiter for ISSUE-1834 using Redis sorted sets.\n\nImplementation:\n- RateLimiter class using Redis ZSET with timestamp scores\n- Sliding window of 60 seconds\n- Automatic cleanup of old entries using ZREMRANGEBYSCORE\n- Fallback to in-memory limiter if Redis unavailable\n\nCode Structure:\n- New module: middleware/rate_limiter.py (180 lines)\n- Configuration in config/rate_limits.yaml\n- Integration in main.py request pipeline\n\nRate Limit Tiers:\n- Free tier: 100 requests/minute\n- Professional: 500 requests/minute \n- Enterprise: 1000 requests/minute\n\nResponse Headers Added:\n- X-RateLimit-Limit: user's tier limit\n- X-RateLimit-Remaining: requests left in window\n- X-RateLimit-Reset: timestamp when limit resets\n\nTesting:\n- Unit tests with mocked Redis (coverage: 95%)\n- Load testing: 10k requests with mixed tiers, all limits enforced correctly\n- Latency impact: +0.8ms per request (acceptable)\n\nComments:\n- @sarah_dev: Clean implementation, like the fallback strategy\n- @tech_lead_mike: Approved, deploy to staging first\n- @alex_devops: Deployed to production 2025-09-12, monitoring looks good\n\nPost-Deployment: Running smoothly for 2 weeks, no issues reported.", + "url": "https://internal-git.techcorp.com/pulls/543", + "type": "pull_request", + "status": "closed", + "priority": "medium", + "assignee": "john_backend", + "created_date": "2025-09-10", + "resolved_date": "2025-09-12", + "labels": ["feature", "redis", "rate-limiting"], + "related_pr": null + }, + { + "id": "RFC-045", + "title": "Design Proposal: Microservices Migration Architecture", + "text": "Abstract: This RFC proposes a phased approach to migrate our monolithic application to a microservices architecture over 18 months.\n\nCurrent State:\n- Single Django monolith (~250k lines of code)\n- PostgreSQL database with 50+ tables\n- Deployed as single unit, scaling limitations\n- Deployment takes 30 minutes, high risk\n\nProposed Architecture:\n1. API Gateway (Kong) - routing and authentication\n2. User Service - authentication and profile management\n3. Billing Service - subscriptions and payments\n4. Notification Service - emails, SMS, WebSocket\n5. Analytics Service - reporting and data warehouse\n6. Search Service - Elasticsearch integration\n\nCommunication:\n- Synchronous: REST APIs with circuit breakers\n- Asynchronous: RabbitMQ for events\n- Service mesh: Istio for observability\n\nData Strategy:\n- Each service owns its database\n- Event sourcing for data synchronization\n- Saga pattern for distributed transactions\n- Read replicas for cross-service queries\n\nMigration Phases:\nPhase 1 (Months 1-4): Extract notification service\nPhase 2 (Months 5-8): Extract billing service\nPhase 3 (Months 9-12): Extract user service\nPhase 4 (Months 13-18): Extract analytics and search\n\nInfrastructure Requirements:\n- Kubernetes cluster (3 nodes minimum per environment)\n- RabbitMQ cluster (3 nodes)\n- Service mesh (Istio)\n- Monitoring (Prometheus + Grafana)\n- Distributed tracing (Jaeger)\n\nEstimated Costs:\n- Infrastructure: +$5k/month\n- Engineering time: 2.5 FTE for 18 months\n- Risk mitigation: 3-month buffer\n\nDiscussion:\n- @tech_lead_mike: Strong proposal, phased approach reduces risk\n- @alex_devops: Infrastructure costs are manageable, need dedicated DevOps\n- @sarah_dev: Concerned about distributed transaction complexity\n- @john_backend: Event sourcing will help with debugging\n- @cto_robert: Approved in principle, need detailed Phase 1 plan\n\nStatus: Approved for Phase 1 implementation. Kickoff meeting scheduled 2025-10-01.", + "url": "https://internal-git.techcorp.com/rfcs/045", + "type": "rfc", + "status": "open", + "priority": "high", + "assignee": "tech_lead_mike", + "created_date": "2025-09-14", + "resolved_date": null, + "labels": ["architecture", "microservices", "design", "rfc"], + "related_pr": null + }, + { + "id": "ISSUE-1847", + "title": "API Gateway returning 429 errors during peak hours", + "text": "Description: Users are experiencing 429 rate limit errors during peak hours (2-4 PM EST). The API gateway is rejecting requests even though we're within our configured limits.\n\nInvestigation:\n- @john_backend: Checked Redis cache, TTL is set correctly at 300s\n- @sarah_dev: Found the issue - middleware is not properly handling connection pooling\n- Root cause: Connection pool exhausted due to long-running queries in user service\n\nComments:\n- @maria_frontend: This is affecting the dashboard heavily, marking as critical\n- @john_backend: PR-567 ready with fix implementing exponential backoff\n- @alex_devops: Added monitoring alerts for connection pool utilization\n\nResolution: Deployed PR-567 to production on 2025-09-18. Monitoring shows 429 errors reduced by 95%. Added connection pool metrics to Grafana dashboard.", + "url": "https://internal-git.techcorp.com/issues/1847", + "type": "issue", + "status": "closed", + "priority": "critical", + "assignee": "john_backend", + "created_date": "2025-09-15", + "resolved_date": "2025-09-18", + "labels": ["bug", "api", "production", "performance"], + "related_pr": "PR-567" + }, + { + "id": "PR-567", + "title": "Fix connection pool exhaustion in API middleware", + "text": "Description: Implements exponential backoff and proper connection pool management to resolve ISSUE-1847 (429 errors during peak hours).\n\nChanges:\n- Refactored middleware/connection_pool.py to use contextlib for proper cleanup\n- Increased pool size from 10 to 50 connections\n- Added exponential backoff with max retry of 3 attempts\n- Implemented connection health checks before reuse\n\nTechnical Details:\n- Using asyncpg connection pool with proper async context managers\n- Added metrics for pool utilization (current: 45%, max: 85%)\n- Timeout handling improved with graceful degradation\n\nTesting:\n- Unit tests for backoff logic (test_exponential_backoff.py)\n- Load testing with 500 concurrent users showed no 429 errors\n- Staging deployment ran for 48 hours without issues\n\nCode Review Comments:\n- @tech_lead_mike: LGTM, good use of async context managers\n- @sarah_dev: Consider adding alerts for pool utilization > 80%\n- @alex_devops: Approved for production deployment\n\nMetrics After Deployment:\n- 429 errors: 1200/hour → 60/hour (95% reduction)\n- Average response time: 145ms → 132ms\n- Connection pool utilization: stable at 45-55%", + "url": "https://internal-git.techcorp.com/pulls/567", + "type": "pull_request", + "status": "closed", + "priority": "critical", + "assignee": "john_backend", + "created_date": "2025-09-16", + "resolved_date": "2025-09-18", + "labels": ["bug-fix", "api", "performance"], + "related_pr": null + }, + { + "id": "ISSUE-1889", + "title": "SQL injection vulnerability in search endpoint", + "text": "Description: Security audit identified SQL injection vulnerability in /api/v1/search endpoint. User input from query parameter is not properly sanitized before being used in raw SQL query.\n\nSeverity: HIGH - Immediate action required\n\nAffected Code:\n- File: services/search/query_builder.py\n- Line: 145-152\n- Issue: String concatenation used instead of parameterized queries\n\nInvestigation:\n- @security_team_alice: Confirmed exploitable with UNION-based injection\n- @sarah_dev: Checking all other endpoints for similar patterns\n- @john_backend: Found 3 more instances in legacy codebase\n\nRemediation:\n- Rewrite using SQLAlchemy ORM or parameterized queries\n- Add input validation and sanitization\n- Implement WAF rules as additional layer\n- Security regression tests\n\nComments:\n- @tech_lead_mike: Stop all other work, this is P0\n- @sarah_dev: PR-578 ready with fixes for all 4 vulnerable endpoints\n- @alex_devops: Deployed hotfix to production 2025-09-19 at 14:30 UTC\n- @security_team_alice: Verified fix, conducting full pentest next week\n\nResolution: All vulnerable endpoints patched. Added pre-commit hooks to catch raw SQL queries. Security training scheduled for team.", + "url": "https://internal-git.techcorp.com/issues/1889", + "type": "issue", + "status": "closed", + "priority": "critical", + "assignee": "sarah_dev", + "created_date": "2025-09-18", + "resolved_date": "2025-09-19", + "labels": ["security", "vulnerability", "bug", "sql"], + "related_pr": "PR-578" + }, + { + "id": "PR-578", + "title": "Security hotfix: Patch SQL injection vulnerabilities", + "text": "Description: CRITICAL SECURITY FIX for ISSUE-1889. Patches SQL injection vulnerabilities in search and filter endpoints.\n\nVulnerabilities Fixed:\n1. services/search/query_builder.py - search endpoint\n2. services/filters/advanced_filter.py - filter endpoint \n3. services/export/csv_export.py - export functionality\n4. services/admin/user_lookup.py - admin search\n\nChanges Applied:\n- Replaced all string concatenation with parameterized queries\n- Migrated to SQLAlchemy ORM where possible\n- Added input validation using Pydantic models\n- Implemented query whitelisting for column names\n- Added SQL injection detection in WAF rules\n\nSecurity Testing:\n- Attempted UNION-based injection: blocked ✓\n- Attempted boolean-based injection: blocked ✓\n- Attempted time-based injection: blocked ✓\n- Tested with sqlmap: all attacks blocked ✓\n\nCode Review:\n- @security_team_alice: Verified all fixes, running full pentest\n- @tech_lead_mike: APPROVED for immediate deployment\n- @john_backend: Reviewed query patterns, all look safe now\n\nDeployment:\n- Hotfix deployed 2025-09-19 at 14:30 UTC\n- No user-facing changes\n- API performance unchanged\n- Added security regression tests to CI pipeline\n\nFollow-up Actions:\n- Security training for team scheduled\n- Pre-commit hooks added to catch raw SQL\n- Code audit of entire codebase planned", + "url": "https://internal-git.techcorp.com/pulls/578", + "type": "pull_request", + "status": "closed", + "priority": "critical", + "assignee": "sarah_dev", + "created_date": "2025-09-19", + "resolved_date": "2025-09-19", + "labels": ["security", "hotfix", "sql"], + "related_pr": null + }, + { + "id": "PR-598", + "title": "Elasticsearch 8.x migration - Application code changes", + "text": "Description: Updates application code for Elasticsearch 8.x compatibility as part of ISSUE-1712 migration.\n\nBreaking Changes Addressed:\n1. Removed mapping types (was using _doc type)\n2. Updated elasticsearch-py from 7.17.0 to 8.10.0\n3. Changed query DSL for better performance\n4. Updated index templates to new format\n\nCode Changes:\n- services/search/elasticsearch_client.py - major refactor\n- Updated all queries to use new Python client syntax\n- Removed deprecated query parameters\n- Added new security context for API keys\n\nIndex Template Updates:\n- Migrated from legacy to composable templates\n- Updated field mappings for text/keyword types\n- Added new runtime fields for computed values\n- Optimized for better search performance\n\nConfiguration Changes:\n- Added Elasticsearch API key authentication\n- Updated connection pool settings\n- Configured request compression\n- Added retry logic for transient failures\n\nTesting:\n- All 450 search integration tests passing\n- Performance testing shows 15% improvement in query speed\n- Backward compatibility maintained with feature flags\n- Staging cluster validated with production traffic replay\n\nComments:\n- @david_data: Composable templates much cleaner than legacy\n- @sarah_dev: Nice performance improvements in aggregation queries\n- @john_backend: API key auth is more secure than basic auth\n- @alex_devops: Staging migration complete, ready for production\n\nDeployment Plan:\n- Deploy code with feature flag (pointing to 7.x)\n- Switch traffic gradually to 8.x cluster\n- Monitor for 48 hours before decommissioning 7.x", + "url": "https://internal-git.techcorp.com/pulls/598", + "type": "pull_request", + "status": "in_progress", + "priority": "medium", + "assignee": "david_data", + "created_date": "2025-09-20", + "resolved_date": null, + "labels": ["infrastructure", "elasticsearch", "migration"], + "related_pr": null + }, + { + "id": "ISSUE-1923", + "title": "PostgreSQL query timeout in analytics service", + "text": "Description: The analytics dashboard is timing out when users request reports for date ranges longer than 30 days. Query timeout set at 30s but queries are taking 45-60s.\n\nInvestigation:\n- @david_data: Analyzed query execution plan - missing index on transactions.created_at column\n- @sarah_dev: Confirmed table has 12M rows, full table scan happening\n- @john_backend: Tested with index in staging, query time reduced to 3.2s\n\nComments:\n- @maria_frontend: Users are complaining, this affects monthly/quarterly reports\n- @david_data: CREATE INDEX idx_transactions_created_at ON transactions(created_at, user_id)\n- @alex_devops: Index creation will take ~15 minutes in production, scheduling maintenance window\n\nStatus: PR-589 opened with migration. Waiting for approval from @tech_lead_mike before production deployment.", + "url": "https://internal-git.techcorp.com/issues/1923", + "type": "issue", + "status": "in_progress", + "priority": "high", + "assignee": "david_data", + "created_date": "2025-09-22", + "resolved_date": null, + "labels": ["bug", "database", "performance", "postgresql"], + "related_pr": "PR-589" + }, + { + "id": "PR-589", + "title": "Add database index for analytics query optimization", + "text": "Description: Resolves ISSUE-1923 by adding composite index on transactions table to eliminate full table scans.\n\nDatabase Changes:\n- Added index: idx_transactions_created_at_user_id\n- Columns: (created_at DESC, user_id)\n- Index size: ~450MB\n- Creation time in production: estimated 12-15 minutes\n\nMigration Script:\nCREATE INDEX CONCURRENTLY idx_transactions_created_at_user_id \nON transactions(created_at DESC, user_id)\nWHERE deleted_at IS NULL;\n\nANALYZE transactions;\n\nPerformance Testing Results:\n- Before: 45-60s query time\n- After: 2.8-3.5s query time (94% improvement)\n- No impact on INSERT performance (tested with 10k inserts)\n- Disk usage increase: acceptable 450MB for 12M row table\n\nComments:\n- @david_data: Used CONCURRENTLY to avoid locking production table\n- @john_backend: Tested with full month query, works great\n- @tech_lead_mike: Approved, schedule for maintenance window\n- @alex_devops: Deployment scheduled for 2025-09-24 at 02:00 UTC\n\nStatus: Awaiting production deployment window.", + "url": "https://internal-git.techcorp.com/pulls/589", + "type": "pull_request", + "status": "open", + "priority": "high", + "assignee": "david_data", + "created_date": "2025-09-23", + "resolved_date": null, + "labels": ["database", "performance", "postgresql"], + "related_pr": null + }, + { + "id": "ISSUE-1998", + "title": "Memory leak in notification microservice", + "text": "Description: Notification service consuming increasing memory over time, requiring restart every 48 hours. Heap usage grows from 512MB to 4GB before OOMKiller terminates the process.\n\nInvestigation:\n- @alex_devops: Monitoring shows steady memory growth, not correlated with traffic\n- @john_backend: Heap dump analysis reveals WebSocket connections not being properly closed\n- Root cause: Event listeners not being removed when WebSocket disconnects\n\nTechnical Details:\n- Node.js v20.5.1, using ws library for WebSocket connections\n- Found 3000+ orphaned event listeners after 24 hours runtime\n- Memory profile shows listeners retaining references to large user objects\n\nComments:\n- @sarah_dev: We should implement connection cleanup in the disconnect handler\n- @john_backend: Also need to add heartbeat mechanism to detect stale connections\n- @alex_devops: Current workaround: Auto-restart service every 24 hours via k8s\n- @tech_lead_mike: This is affecting 15k active WebSocket users, prioritize fix\n\nResolution: PR-612 implements proper cleanup with WeakMap for connection tracking. Testing in staging for 72 hours before production release.", + "url": "https://internal-git.techcorp.com/issues/1998", + "type": "issue", + "status": "in_progress", + "priority": "critical", + "assignee": "john_backend", + "created_date": "2025-09-28", + "resolved_date": null, + "labels": ["bug", "production", "memory-leak", "microservices"], + "related_pr": "PR-612" + }, + { + "id": "PR-612", + "title": "Fix memory leak in WebSocket notification service", + "text": "Description: Resolves ISSUE-1998 memory leak caused by orphaned event listeners in WebSocket connections.\n\nRoot Cause Analysis:\n- Event listeners registered on connection but not removed on disconnect\n- Each listener retained reference to full user object (~8KB)\n- After 24 hours: 3000+ orphaned listeners = 24MB+ leaked memory\n- Compounded by other retained objects in closure scope\n\nImplementation:\n- Refactored connection manager to use WeakMap for listener tracking\n- Implemented explicit cleanup in disconnect handler\n- Added heartbeat mechanism (30s interval) to detect stale connections\n- Automatic connection timeout after 5 minutes of inactivity\n\nCode Changes:\n- services/notifications/websocket_manager.js - 250 lines refactored\n- Added cleanup middleware in disconnect pipeline\n- Implemented connection pool monitoring\n- Memory profiling instrumentation added\n\nTesting:\n- Load test: 5000 concurrent WebSocket connections for 72 hours\n- Memory usage: stable at 512MB (previously grew to 4GB)\n- No connection drops or data loss\n- Heap snapshots show proper cleanup\n\nComments:\n- @john_backend: Used WeakMap to prevent memory retention\n- @alex_devops: Running in staging, memory is flat, looking good!\n- @tech_lead_mike: Excellent fix, approve for production after 72h staging test\n- @sarah_dev: Should we add memory usage alerts? \n- @alex_devops: Added CloudWatch alert for >2GB usage\n\nStatus: In staging testing, pending 72-hour validation before production.", + "url": "https://internal-git.techcorp.com/pulls/612", + "type": "pull_request", + "status": "in_progress", + "priority": "critical", + "assignee": "john_backend", + "created_date": "2025-09-29", + "resolved_date": null, + "labels": ["bug-fix", "memory-leak", "websocket"], + "related_pr": null + } +] diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-fetch-example.png b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-fetch-example.png new file mode 100644 index 00000000..d5445175 Binary files /dev/null and b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-fetch-example.png differ diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-search-example.png b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-search-example.png new file mode 100644 index 00000000..dac4335f Binary files /dev/null and b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-search-example.png differ diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/requirements.txt b/supporting-blog-content/elasticsearch-chatgpt-connector/requirements.txt new file mode 100644 index 00000000..3153d657 --- /dev/null +++ b/supporting-blog-content/elasticsearch-chatgpt-connector/requirements.txt @@ -0,0 +1,4 @@ +fastmcp>=2.13.0 +elasticsearch>=8.0.0 +pyngrok>=7.0.0 +pandas>=2.0.0 \ No newline at end of file