diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/elasticsearch-mcp-server-for-chatgpt.ipynb b/supporting-blog-content/elasticsearch-chatgpt-connector/elasticsearch-mcp-server-for-chatgpt.ipynb
new file mode 100644
index 00000000..ca5a08cc
--- /dev/null
+++ b/supporting-blog-content/elasticsearch-chatgpt-connector/elasticsearch-mcp-server-for-chatgpt.ipynb
@@ -0,0 +1,1180 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tvOHZz8TX_cp"
+ },
+ "source": [
+ "# Elasticsearch MCP Server for ChatGPT\n",
+ "\n",
+ "This notebook demonstrates how to deploy an MCP (Model Context Protocol) server that connects ChatGPT to Elasticsearch, enabling natural language queries over internal GitHub issues and pull requests.\n",
+ "\n",
+ "## What You'll Build\n",
+ "An MCP server that allows ChatGPT to search and retrieve information from your Elasticsearch index using natural language queries, combining semantic and keyword search for optimal results.\n",
+ "\n",
+ "## Steps\n",
+ "- **Install Dependencies**: Set up required Python packages (fastmcp, elasticsearch, pyngrok, pandas)\n",
+ "- **Configure Environment**: Set up Elasticsearch credentials and ngrok token\n",
+ "- **Initialize Elasticsearch**: Connect to your Elasticsearch cluster\n",
+ "- **Create Index**: Define mappings with semantic_text field for ELSER\n",
+ "- **Load Sample Data**: Import GitHub issues/PRs dataset\n",
+ "- **Ingest Documents**: Bulk index documents into Elasticsearch\n",
+ "- **Define MCP Tools**: Create search and fetch functions for ChatGPT\n",
+ "- **Deploy Server**: Start MCP server with ngrok tunnel\n",
+ "- **Connect to ChatGPT**: Get public URL for ChatGPT connector setup\n",
+ "\n",
+ "## Prerequisites\n",
+ "- Elasticsearch cluster with ELSER model deployed\n",
+ "- Ngrok account with auth token\n",
+ "- Python 3.8+"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LuAQtd-BYfTZ"
+ },
+ "source": [
+ "## Install Dependencies\n",
+ "\n",
+ "This cell installs all required Python packages: `fastmcp` for the MCP server framework, `elasticsearch` for connecting to Elasticsearch, `pyngrok` for creating a public tunnel, and `pandas` for data manipulation.\n",
+ "\n",
+ "**Alternative:** You can also install dependencies using the provided 'requirements.txt' file."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "yDRrpVfMNZpA",
+ "outputId": "8b8a3d59-92d9-4763-9ca0-30fd46763c94"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Dependencies installed\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install fastmcp elasticsearch pyngrok pandas -q\n",
+ "print(\"Dependencies installed\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "So0cgu4cYy4e"
+ },
+ "source": [
+ "## Import Libraries\n",
+ "\n",
+ "Import all necessary Python libraries for building and running the MCP server, including FastMCP for the server framework, Elasticsearch client for database connections, and pyngrok for tunneling."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "EVWlHusQpy9n",
+ "outputId": "ea0eb9a3-1524-4016-e01f-1b2a510f5eb2"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Libraries imported successfully\n"
+ ]
+ }
+ ],
+ "source": [
+ "import os\n",
+ "import json\n",
+ "import logging\n",
+ "import threading\n",
+ "import time\n",
+ "import pandas as pd\n",
+ "from typing import Dict, List, Any\n",
+ "from getpass import getpass\n",
+ "from fastmcp import FastMCP\n",
+ "from elasticsearch import Elasticsearch\n",
+ "from elasticsearch.helpers import bulk\n",
+ "from pyngrok import ngrok\n",
+ "from pyngrok.conf import PyngrokConfig\n",
+ "\n",
+ "logging.basicConfig(level=logging.INFO)\n",
+ "logger = logging.getLogger(__name__)\n",
+ "\n",
+ "print(\"Libraries imported successfully\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ycdj_n8DY8p8"
+ },
+ "source": [
+ "## Setup Configuration\n",
+ "\n",
+ "Load required credentials from environment variables or prompt for manual input. You'll need:\n",
+ "- **ELASTICSEARCH_URL**: Your Elasticsearch cluster endpoint\n",
+ "- **ELASTICSEARCH_API_KEY**: API key with read/write access \n",
+ "- **NGROK_TOKEN**: Free token from [ngrok.com](https://dashboard.ngrok.com/)\n",
+ "- **ELASTICSEARCH_INDEX**: Index name (defaults to 'github_internal')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "os.environ[\"ELASTICSEARCH_URL\"] = os.environ.get(\"ELASTICSEARCH_URL\") or getpass(\"Enter your Elasticsearch URL: \")\n",
+ "os.environ[\"ELASTICSEARCH_API_KEY\"] = os.environ.get(\"ELASTICSEARCH_API_KEY\") or getpass(\"Enter your Elasticsearch API key: \")\n",
+ "os.environ[\"NGROK_TOKEN\"] = os.environ.get(\"NGROK_TOKEN\") or getpass(\"Enter your Ngrok Token: \")\n",
+ "os.environ[\"ELASTICSEARCH_INDEX\"] = os.environ.get(\"ELASTICSEARCH_INDEX\") or getpass(\"Enter your Elasticsearch Index name (default: github_internal): \") or \"github_internal\"\n",
+ "\n",
+ "ELASTICSEARCH_URL = os.environ[\"ELASTICSEARCH_URL\"]\n",
+ "ELASTICSEARCH_API_KEY = os.environ[\"ELASTICSEARCH_API_KEY\"]\n",
+ "NGROK_TOKEN = os.environ[\"NGROK_TOKEN\"]\n",
+ "INDEX_NAME = os.environ[\"ELASTICSEARCH_INDEX\"]\n",
+ "\n",
+ "print(\"Configuration loaded successfully\")\n",
+ "print(f\"Index name: {INDEX_NAME}\")\n",
+ "print(f\"Elasticsearch URL: {ELASTICSEARCH_URL[:30]}...\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "s5nVMK5EhdCL"
+ },
+ "source": [
+ "## Initialize Elasticsearch Client\n",
+ "\n",
+ "Create an Elasticsearch client using your credentials and verify the connection by pinging the cluster. This ensures your credentials are valid before proceeding."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Z4tv4KLWhe8X",
+ "outputId": "aa70e0ef-f46e-4df6-f188-c024ef0e4aed"
+ },
+ "outputs": [],
+ "source": [
+ "es_client = Elasticsearch(\n",
+ " ELASTICSEARCH_URL,\n",
+ " api_key=ELASTICSEARCH_API_KEY\n",
+ ")\n",
+ "\n",
+ "if es_client.ping():\n",
+ " print(\"Elasticsearch connection successful\")\n",
+ " cluster_info = es_client.info()\n",
+ " print(f\"Cluster: {cluster_info['cluster_name']}\")\n",
+ " print(f\"Version: {cluster_info['version']['number']}\")\n",
+ "else:\n",
+ " print(\"ERROR: Could not connect to Elasticsearch\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rYfKl1uhZyLo"
+ },
+ "source": [
+ "## Create Index with Mappings\n",
+ "\n",
+ "Create an Elasticsearch index with optimized mappings for hybrid search. The key field is `text_semantic` which uses ELSER (`.elser-2-elasticsearch`) for semantic search, while other fields enable traditional keyword search."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "YqRm1fRqZ2E-",
+ "outputId": "a1e4b28f-8b19-4d59-ad02-2d7a4bfdefe0"
+ },
+ "outputs": [],
+ "source": [
+ "try:\n",
+ " es_client.indices.create(\n",
+ " index=INDEX_NAME,\n",
+ " body={\n",
+ " \"mappings\": {\n",
+ " \"properties\": {\n",
+ " \"id\": {\"type\": \"keyword\"},\n",
+ " \"title\": {\"type\": \"text\"},\n",
+ " \"text\": {\"type\": \"text\"},\n",
+ " \"text_semantic\": {\n",
+ " \"type\": \"semantic_text\",\n",
+ " \"inference_id\": \".elser-2-elasticsearch\"\n",
+ " },\n",
+ " \"url\": {\"type\": \"keyword\"},\n",
+ " \"type\": {\"type\": \"keyword\"},\n",
+ " \"status\": {\"type\": \"keyword\"},\n",
+ " \"priority\": {\"type\": \"keyword\"},\n",
+ " \"assignee\": {\"type\": \"keyword\"},\n",
+ " \"created_date\": {\"type\": \"date\", \"format\": \"iso8601\"},\n",
+ " \"resolved_date\": {\"type\": \"date\", \"format\": \"iso8601\"},\n",
+ " \"labels\": {\"type\": \"keyword\"},\n",
+ " \"related_pr\": {\"type\": \"keyword\"}\n",
+ " }\n",
+ " }\n",
+ " }\n",
+ " )\n",
+ " print(f\"Index '{INDEX_NAME}' created successfully\")\n",
+ "except Exception as e:\n",
+ " if 'resource_already_exists_exception' in str(e):\n",
+ " print(f\"Index '{INDEX_NAME}' already exists\")\n",
+ " else:\n",
+ " print(f\"Error creating index: {e}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "sFw3zTwpcYJq"
+ },
+ "source": [
+ "## Load Sample Dataset\n",
+ "\n",
+ "Load the sample GitHub dataset containing 15 documents with issues, pull requests, and RFCs. The dataset includes realistic content with descriptions, comments, assignees, priorities, and relationships between issues and PRs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Loaded 15 documents from dataset\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " id | \n",
+ " title | \n",
+ " text | \n",
+ " url | \n",
+ " type | \n",
+ " status | \n",
+ " priority | \n",
+ " assignee | \n",
+ " created_date | \n",
+ " resolved_date | \n",
+ " labels | \n",
+ " related_pr | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " ISSUE-1712 | \n",
+ " Migrate from Elasticsearch 7.x to 8.x | \n",
+ " Description: Current Elasticsearch cluster run... | \n",
+ " https://internal-git.techcorp.com/issues/1712 | \n",
+ " issue | \n",
+ " in_progress | \n",
+ " medium | \n",
+ " david_data | \n",
+ " 2025-09-01 | \n",
+ " None | \n",
+ " [infrastructure, elasticsearch, migration, upg... | \n",
+ " PR-598 | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " RFC-038 | \n",
+ " API Versioning Strategy and Deprecation Policy | \n",
+ " Abstract: Establishes a formal API versioning ... | \n",
+ " https://internal-git.techcorp.com/rfcs/038 | \n",
+ " rfc | \n",
+ " closed | \n",
+ " medium | \n",
+ " sarah_dev | \n",
+ " 2025-09-03 | \n",
+ " 2025-09-25 | \n",
+ " [api, architecture, design, rfc] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " ISSUE-1834 | \n",
+ " Add rate limiting per user endpoint | \n",
+ " Description: Currently rate limiting is implem... | \n",
+ " https://internal-git.techcorp.com/issues/1834 | \n",
+ " issue | \n",
+ " closed | \n",
+ " medium | \n",
+ " john_backend | \n",
+ " 2025-09-05 | \n",
+ " 2025-09-12 | \n",
+ " [feature, api, redis, rate-limiting] | \n",
+ " PR-543 | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " ISSUE-1756 | \n",
+ " Implement OAuth2 support for external API inte... | \n",
+ " Description: Product team requesting OAuth2 au... | \n",
+ " https://internal-git.techcorp.com/issues/1756 | \n",
+ " issue | \n",
+ " open | \n",
+ " high | \n",
+ " sarah_dev | \n",
+ " 2025-09-08 | \n",
+ " None | \n",
+ " [feature, api, security, oauth] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " PR-543 | \n",
+ " Implement per-user rate limiting with Redis | \n",
+ " Description: Implements sliding window rate li... | \n",
+ " https://internal-git.techcorp.com/pulls/543 | \n",
+ " pull_request | \n",
+ " closed | \n",
+ " medium | \n",
+ " john_backend | \n",
+ " 2025-09-10 | \n",
+ " 2025-09-12 | \n",
+ " [feature, redis, rate-limiting] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 5 | \n",
+ " RFC-045 | \n",
+ " Design Proposal: Microservices Migration Archi... | \n",
+ " Abstract: This RFC proposes a phased approach ... | \n",
+ " https://internal-git.techcorp.com/rfcs/045 | \n",
+ " rfc | \n",
+ " open | \n",
+ " high | \n",
+ " tech_lead_mike | \n",
+ " 2025-09-14 | \n",
+ " None | \n",
+ " [architecture, microservices, design, rfc] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 6 | \n",
+ " ISSUE-1847 | \n",
+ " API Gateway returning 429 errors during peak h... | \n",
+ " Description: Users are experiencing 429 rate l... | \n",
+ " https://internal-git.techcorp.com/issues/1847 | \n",
+ " issue | \n",
+ " closed | \n",
+ " critical | \n",
+ " john_backend | \n",
+ " 2025-09-15 | \n",
+ " 2025-09-18 | \n",
+ " [bug, api, production, performance] | \n",
+ " PR-567 | \n",
+ "
\n",
+ " \n",
+ " | 7 | \n",
+ " PR-567 | \n",
+ " Fix connection pool exhaustion in API middleware | \n",
+ " Description: Implements exponential backoff an... | \n",
+ " https://internal-git.techcorp.com/pulls/567 | \n",
+ " pull_request | \n",
+ " closed | \n",
+ " critical | \n",
+ " john_backend | \n",
+ " 2025-09-16 | \n",
+ " 2025-09-18 | \n",
+ " [bug-fix, api, performance] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 8 | \n",
+ " ISSUE-1889 | \n",
+ " SQL injection vulnerability in search endpoint | \n",
+ " Description: Security audit identified SQL inj... | \n",
+ " https://internal-git.techcorp.com/issues/1889 | \n",
+ " issue | \n",
+ " closed | \n",
+ " critical | \n",
+ " sarah_dev | \n",
+ " 2025-09-18 | \n",
+ " 2025-09-19 | \n",
+ " [security, vulnerability, bug, sql] | \n",
+ " PR-578 | \n",
+ "
\n",
+ " \n",
+ " | 9 | \n",
+ " PR-578 | \n",
+ " Security hotfix: Patch SQL injection vulnerabi... | \n",
+ " Description: CRITICAL SECURITY FIX for ISSUE-1... | \n",
+ " https://internal-git.techcorp.com/pulls/578 | \n",
+ " pull_request | \n",
+ " closed | \n",
+ " critical | \n",
+ " sarah_dev | \n",
+ " 2025-09-19 | \n",
+ " 2025-09-19 | \n",
+ " [security, hotfix, sql] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 10 | \n",
+ " PR-598 | \n",
+ " Elasticsearch 8.x migration - Application code... | \n",
+ " Description: Updates application code for Elas... | \n",
+ " https://internal-git.techcorp.com/pulls/598 | \n",
+ " pull_request | \n",
+ " in_progress | \n",
+ " medium | \n",
+ " david_data | \n",
+ " 2025-09-20 | \n",
+ " None | \n",
+ " [infrastructure, elasticsearch, migration] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 11 | \n",
+ " ISSUE-1923 | \n",
+ " PostgreSQL query timeout in analytics service | \n",
+ " Description: The analytics dashboard is timing... | \n",
+ " https://internal-git.techcorp.com/issues/1923 | \n",
+ " issue | \n",
+ " in_progress | \n",
+ " high | \n",
+ " david_data | \n",
+ " 2025-09-22 | \n",
+ " None | \n",
+ " [bug, database, performance, postgresql] | \n",
+ " PR-589 | \n",
+ "
\n",
+ " \n",
+ " | 12 | \n",
+ " PR-589 | \n",
+ " Add database index for analytics query optimiz... | \n",
+ " Description: Resolves ISSUE-1923 by adding com... | \n",
+ " https://internal-git.techcorp.com/pulls/589 | \n",
+ " pull_request | \n",
+ " open | \n",
+ " high | \n",
+ " david_data | \n",
+ " 2025-09-23 | \n",
+ " None | \n",
+ " [database, performance, postgresql] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " | 13 | \n",
+ " ISSUE-1998 | \n",
+ " Memory leak in notification microservice | \n",
+ " Description: Notification service consuming in... | \n",
+ " https://internal-git.techcorp.com/issues/1998 | \n",
+ " issue | \n",
+ " in_progress | \n",
+ " critical | \n",
+ " john_backend | \n",
+ " 2025-09-28 | \n",
+ " None | \n",
+ " [bug, production, memory-leak, microservices] | \n",
+ " PR-612 | \n",
+ "
\n",
+ " \n",
+ " | 14 | \n",
+ " PR-612 | \n",
+ " Fix memory leak in WebSocket notification service | \n",
+ " Description: Resolves ISSUE-1998 memory leak c... | \n",
+ " https://internal-git.techcorp.com/pulls/612 | \n",
+ " pull_request | \n",
+ " in_progress | \n",
+ " critical | \n",
+ " john_backend | \n",
+ " 2025-09-29 | \n",
+ " None | \n",
+ " [bug-fix, memory-leak, websocket] | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " id title \\\n",
+ "0 ISSUE-1712 Migrate from Elasticsearch 7.x to 8.x \n",
+ "1 RFC-038 API Versioning Strategy and Deprecation Policy \n",
+ "2 ISSUE-1834 Add rate limiting per user endpoint \n",
+ "3 ISSUE-1756 Implement OAuth2 support for external API inte... \n",
+ "4 PR-543 Implement per-user rate limiting with Redis \n",
+ "5 RFC-045 Design Proposal: Microservices Migration Archi... \n",
+ "6 ISSUE-1847 API Gateway returning 429 errors during peak h... \n",
+ "7 PR-567 Fix connection pool exhaustion in API middleware \n",
+ "8 ISSUE-1889 SQL injection vulnerability in search endpoint \n",
+ "9 PR-578 Security hotfix: Patch SQL injection vulnerabi... \n",
+ "10 PR-598 Elasticsearch 8.x migration - Application code... \n",
+ "11 ISSUE-1923 PostgreSQL query timeout in analytics service \n",
+ "12 PR-589 Add database index for analytics query optimiz... \n",
+ "13 ISSUE-1998 Memory leak in notification microservice \n",
+ "14 PR-612 Fix memory leak in WebSocket notification service \n",
+ "\n",
+ " text \\\n",
+ "0 Description: Current Elasticsearch cluster run... \n",
+ "1 Abstract: Establishes a formal API versioning ... \n",
+ "2 Description: Currently rate limiting is implem... \n",
+ "3 Description: Product team requesting OAuth2 au... \n",
+ "4 Description: Implements sliding window rate li... \n",
+ "5 Abstract: This RFC proposes a phased approach ... \n",
+ "6 Description: Users are experiencing 429 rate l... \n",
+ "7 Description: Implements exponential backoff an... \n",
+ "8 Description: Security audit identified SQL inj... \n",
+ "9 Description: CRITICAL SECURITY FIX for ISSUE-1... \n",
+ "10 Description: Updates application code for Elas... \n",
+ "11 Description: The analytics dashboard is timing... \n",
+ "12 Description: Resolves ISSUE-1923 by adding com... \n",
+ "13 Description: Notification service consuming in... \n",
+ "14 Description: Resolves ISSUE-1998 memory leak c... \n",
+ "\n",
+ " url type status \\\n",
+ "0 https://internal-git.techcorp.com/issues/1712 issue in_progress \n",
+ "1 https://internal-git.techcorp.com/rfcs/038 rfc closed \n",
+ "2 https://internal-git.techcorp.com/issues/1834 issue closed \n",
+ "3 https://internal-git.techcorp.com/issues/1756 issue open \n",
+ "4 https://internal-git.techcorp.com/pulls/543 pull_request closed \n",
+ "5 https://internal-git.techcorp.com/rfcs/045 rfc open \n",
+ "6 https://internal-git.techcorp.com/issues/1847 issue closed \n",
+ "7 https://internal-git.techcorp.com/pulls/567 pull_request closed \n",
+ "8 https://internal-git.techcorp.com/issues/1889 issue closed \n",
+ "9 https://internal-git.techcorp.com/pulls/578 pull_request closed \n",
+ "10 https://internal-git.techcorp.com/pulls/598 pull_request in_progress \n",
+ "11 https://internal-git.techcorp.com/issues/1923 issue in_progress \n",
+ "12 https://internal-git.techcorp.com/pulls/589 pull_request open \n",
+ "13 https://internal-git.techcorp.com/issues/1998 issue in_progress \n",
+ "14 https://internal-git.techcorp.com/pulls/612 pull_request in_progress \n",
+ "\n",
+ " priority assignee created_date resolved_date \\\n",
+ "0 medium david_data 2025-09-01 None \n",
+ "1 medium sarah_dev 2025-09-03 2025-09-25 \n",
+ "2 medium john_backend 2025-09-05 2025-09-12 \n",
+ "3 high sarah_dev 2025-09-08 None \n",
+ "4 medium john_backend 2025-09-10 2025-09-12 \n",
+ "5 high tech_lead_mike 2025-09-14 None \n",
+ "6 critical john_backend 2025-09-15 2025-09-18 \n",
+ "7 critical john_backend 2025-09-16 2025-09-18 \n",
+ "8 critical sarah_dev 2025-09-18 2025-09-19 \n",
+ "9 critical sarah_dev 2025-09-19 2025-09-19 \n",
+ "10 medium david_data 2025-09-20 None \n",
+ "11 high david_data 2025-09-22 None \n",
+ "12 high david_data 2025-09-23 None \n",
+ "13 critical john_backend 2025-09-28 None \n",
+ "14 critical john_backend 2025-09-29 None \n",
+ "\n",
+ " labels related_pr \n",
+ "0 [infrastructure, elasticsearch, migration, upg... PR-598 \n",
+ "1 [api, architecture, design, rfc] None \n",
+ "2 [feature, api, redis, rate-limiting] PR-543 \n",
+ "3 [feature, api, security, oauth] None \n",
+ "4 [feature, redis, rate-limiting] None \n",
+ "5 [architecture, microservices, design, rfc] None \n",
+ "6 [bug, api, production, performance] PR-567 \n",
+ "7 [bug-fix, api, performance] None \n",
+ "8 [security, vulnerability, bug, sql] PR-578 \n",
+ "9 [security, hotfix, sql] None \n",
+ "10 [infrastructure, elasticsearch, migration] None \n",
+ "11 [bug, database, performance, postgresql] PR-589 \n",
+ "12 [database, performance, postgresql] None \n",
+ "13 [bug, production, memory-leak, microservices] PR-612 \n",
+ "14 [bug-fix, memory-leak, websocket] None "
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "file_path = 'github_internal_dataset.json'\n",
+ "df = pd.read_json(file_path)\n",
+ "\n",
+ "documents = df.to_dict('records')\n",
+ "print(f\"Loaded {len(documents)} documents from dataset\")\n",
+ "\n",
+ "df"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "80BHi4HgdSxh"
+ },
+ "source": [
+ "## Ingest Documents to Elasticsearch\n",
+ "\n",
+ "Bulk index all documents into Elasticsearch. The code copies the `text` field to `text_semantic` for ELSER processing, then waits 15 seconds for semantic embeddings to be generated before verifying the document count."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "u1JgCgwldWhN",
+ "outputId": "1cae2312-5508-4757-a3c8-c2a2870ba129"
+ },
+ "outputs": [],
+ "source": [
+ "def generate_actions():\n",
+ " for doc in documents:\n",
+ " doc['text_semantic'] = doc['text']\n",
+ " yield {\n",
+ " '_index': INDEX_NAME,\n",
+ " '_source': doc\n",
+ " }\n",
+ "\n",
+ "try:\n",
+ " success, errors = bulk(es_client, generate_actions())\n",
+ " print(f\"Successfully indexed {success} documents\")\n",
+ "\n",
+ " if errors:\n",
+ " print(f\"Errors during indexing: {errors}\")\n",
+ "\n",
+ " print(\"Waiting 15 seconds for ELSER to process documents...\")\n",
+ " time.sleep(15)\n",
+ "\n",
+ " count = es_client.count(index=INDEX_NAME)['count']\n",
+ " print(f\"Total documents in index: {count}\")\n",
+ "\n",
+ "except Exception as e:\n",
+ " print(f\"Error during bulk indexing: {str(e)}\")\n",
+ " print(\"If you see timeout errors, wait a few seconds and try again\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OTg5ePAbdbCW"
+ },
+ "source": [
+ "## Define MCP Server\n",
+ "\n",
+ "Define the MCP server with two tools that ChatGPT will use:\n",
+ "1. **search(query)**: Hybrid search combining semantic (ELSER) and keyword (BM25) search using RRF (Reciprocal Rank Fusion). Returns top 10 results with id, title, and url.\n",
+ "2. **fetch(id)**: Retrieves complete document details by ID, returning all fields including full text content and metadata."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "5aLcJzLxdeLS",
+ "outputId": "3c29e137-1c9f-43ee-bb33-c18945c34680"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MCP server defined successfully\n"
+ ]
+ }
+ ],
+ "source": [
+ "server_instructions = \"\"\"\n",
+ "This MCP server provides access to TechCorp's internal GitHub issues and pull requests.\n",
+ "Use search to find relevant issues/PRs, then fetch to get complete details.\n",
+ "\"\"\"\n",
+ "\n",
+ "def create_server():\n",
+ " mcp = FastMCP(\n",
+ " name=\"Elasticsearch GitHub Issues MCP\",\n",
+ " instructions=server_instructions\n",
+ " )\n",
+ "\n",
+ " @mcp.tool()\n",
+ " async def search(query: str) -> Dict[str, List[Dict[str, Any]]]:\n",
+ " \"\"\"\n",
+ " Search for internal issues and PRs using hybrid search.\n",
+ " Returns list with id, title, and url.\n",
+ " \"\"\"\n",
+ " if not query or not query.strip():\n",
+ " return {\"results\": []}\n",
+ "\n",
+ " logger.info(f\"Searching for: '{query}'\")\n",
+ "\n",
+ " try:\n",
+ " # Hybrid search using RRF: combines semantic (ELSER) + keyword (multi_match) results\n",
+ " response = es_client.search(\n",
+ " index=INDEX_NAME,\n",
+ " size=10,\n",
+ " source=[\"id\", \"title\", \"url\", \"type\", \"priority\"],\n",
+ " retriever={\n",
+ " \"rrf\": {\n",
+ " \"retrievers\": [\n",
+ " {\n",
+ " # Semantic retriever using ELSER embeddings\n",
+ " \"standard\": {\n",
+ " \"query\": {\n",
+ " \"semantic\": {\n",
+ " \"field\": \"text_semantic\",\n",
+ " \"query\": query\n",
+ " }\n",
+ " }\n",
+ " }\n",
+ " },\n",
+ " {\n",
+ " # Keyword retriever with fuzzy matching\n",
+ " \"standard\": {\n",
+ " \"query\": {\n",
+ " \"multi_match\": {\n",
+ " \"query\": query,\n",
+ " \"fields\": [\n",
+ " \"title^3\",\n",
+ " \"text^2\",\n",
+ " \"assignee^2\",\n",
+ " \"type\",\n",
+ " \"labels\",\n",
+ " \"priority\"\n",
+ " ],\n",
+ " \"type\": \"best_fields\",\n",
+ " \"fuzziness\": \"AUTO\"\n",
+ " }\n",
+ " }\n",
+ " }\n",
+ " }\n",
+ " ],\n",
+ " \"rank_window_size\": 50,\n",
+ " \"rank_constant\": 60\n",
+ " }\n",
+ " }\n",
+ " )\n",
+ "\n",
+ " # Extract and format search results\n",
+ " results = []\n",
+ " if response and 'hits' in response:\n",
+ " for hit in response['hits']['hits']:\n",
+ " source = hit['_source']\n",
+ " results.append({\n",
+ " \"id\": source.get('id', hit['_id']),\n",
+ " \"title\": source.get('title', 'Unknown'),\n",
+ " \"url\": source.get('url', '')\n",
+ " })\n",
+ "\n",
+ " logger.info(f\"Found {len(results)} results\")\n",
+ " return {\"results\": results}\n",
+ "\n",
+ " except Exception as e:\n",
+ " logger.error(f\"Search error: {e}\")\n",
+ " raise ValueError(f\"Search failed: {str(e)}\")\n",
+ "\n",
+ " @mcp.tool()\n",
+ " async def fetch(id: str) -> Dict[str, Any]:\n",
+ " \"\"\"\n",
+ " Retrieve complete issue/PR details by ID.\n",
+ " Returns id, title, text, url, and metadata.\n",
+ " \"\"\"\n",
+ " if not id:\n",
+ " raise ValueError(\"ID is required\")\n",
+ "\n",
+ " logger.info(f\"Fetching: {id}\")\n",
+ "\n",
+ " try:\n",
+ " # Query by ID to get full document\n",
+ " response = es_client.search(\n",
+ " index=INDEX_NAME,\n",
+ " body={\n",
+ " \"query\": {\n",
+ " \"term\": {\n",
+ " \"id\": id\n",
+ " }\n",
+ " },\n",
+ " \"size\": 1\n",
+ " }\n",
+ " )\n",
+ "\n",
+ " if not response or not response['hits']['hits']:\n",
+ " raise ValueError(f\"Document with id '{id}' not found\")\n",
+ "\n",
+ " hit = response['hits']['hits'][0]\n",
+ " source = hit['_source']\n",
+ "\n",
+ " # Return all document fields\n",
+ " result = {\n",
+ " \"id\": source.get('id', id),\n",
+ " \"title\": source.get('title', 'Unknown'),\n",
+ " \"text\": source.get('text', ''),\n",
+ " \"url\": source.get('url', ''),\n",
+ " \"type\": source.get('type', ''),\n",
+ " \"status\": source.get('status', ''),\n",
+ " \"priority\": source.get('priority', ''),\n",
+ " \"assignee\": source.get('assignee', ''),\n",
+ " \"created_date\": source.get('created_date', ''),\n",
+ " \"resolved_date\": source.get('resolved_date', ''),\n",
+ " \"labels\": source.get('labels', ''),\n",
+ " \"related_pr\": source.get('related_pr', '')\n",
+ " }\n",
+ "\n",
+ " logger.info(f\"Fetched: {result['title']}\")\n",
+ " return result\n",
+ "\n",
+ " except Exception as e:\n",
+ " logger.error(f\"Fetch error: {e}\")\n",
+ " raise ValueError(f\"Failed to fetch '{id}': {str(e)}\")\n",
+ "\n",
+ " return mcp\n",
+ "\n",
+ "print(\"MCP server defined successfully\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0aTU4xeedpHc"
+ },
+ "source": [
+ "## Start Ngrok Tunnel\n",
+ "\n",
+ "Create a public HTTPS tunnel using ngrok to expose your local MCP server on port 8000. This allows ChatGPT to connect to your server from anywhere. Copy the displayed URL (ending in `/sse`) to use in ChatGPT's connector settings."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "SN-wlTOGdtIs",
+ "outputId": "c78552d6-914d-4327-943f-6430d4f12569"
+ },
+ "outputs": [],
+ "source": [
+ "ngrok.set_auth_token(NGROK_TOKEN)\n",
+ "\n",
+ "pyngrok_config = PyngrokConfig(region=\"us\")\n",
+ "public_url = ngrok.connect(\n",
+ " 8000,\n",
+ " \"http\",\n",
+ " pyngrok_config=pyngrok_config,\n",
+ " bind_tls=True\n",
+ ")\n",
+ "\n",
+ "print(\"=\"*70)\n",
+ "print(\"MCP SERVER IS READY!\")\n",
+ "print(\"=\"*70)\n",
+ "print(f\"\\nPublic URL (use in ChatGPT): {public_url}/sse\")\n",
+ "print(\"\\nIMPORTANT: Copy the URL above (including /sse at the end)\")\n",
+ "print(\"\\nTo connect in ChatGPT:\")\n",
+ "print(\"1. Go to Settings > Connectors\")\n",
+ "print(\"2. Click 'Create' or 'Add Custom Connector'\")\n",
+ "print(\"3. Paste the URL above\")\n",
+ "print(\"4. Save and start using!\")\n",
+ "print(\"\\nKeep this notebook running while using the connector\")\n",
+ "print(\"=\"*70)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SFfSlDVAdxe1"
+ },
+ "source": [
+ "## Run MCP Server\n",
+ "\n",
+ "Start the MCP server in a background thread using SSE (Server-Sent Events) transport. The server runs on `0.0.0.0:8000` and stays alive to handle requests from ChatGPT via the ngrok tunnel. Keep this cell running while using the connector."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "id": "okQYfcJfdzp-",
+ "outputId": "d226ff48-41fc-4b5b-b63d-f651a20d23d5"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Starting MCP server...\n",
+ "Server is running. To stop: Runtime > Interrupt execution\n",
+ "\n",
+ "Server started successfully!\n",
+ "Your ngrok URL is ready to use in ChatGPT\n",
+ "Keep this cell running...\n",
+ "\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ " ╭──────────────────────────────────────────────────────────────────────────────╮ \n",
+ " │ │ \n",
+ " │ ▄▀▀ ▄▀█ █▀▀ ▀█▀ █▀▄▀█ █▀▀ █▀█ │ \n",
+ " │ █▀ █▀█ ▄▄█ █ █ ▀ █ █▄▄ █▀▀ │ \n",
+ " │ │ \n",
+ " │ FastMCP 2.13.0.2 │ \n",
+ " │ │ \n",
+ " │ │ \n",
+ " │ 🖥 Server name: Elasticsearch GitHub Issues MCP │ \n",
+ " │ │ \n",
+ " │ 📦 Transport: SSE │ \n",
+ " │ 🔗 Server URL: http://0.0.0.0:8000/sse │ \n",
+ " │ │ \n",
+ " │ 📚 Docs: https://gofastmcp.com │ \n",
+ " │ 🚀 Hosting: https://fastmcp.cloud │ \n",
+ " │ │ \n",
+ " ╰──────────────────────────────────────────────────────────────────────────────╯ \n",
+ "\n",
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ "\n",
+ "\n",
+ " \u001b[2m╭──────────────────────────────────────────────────────────────────────────────╮\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[38;2;0;198;255m \u001b[0m\u001b[38;2;0;195;255m▄\u001b[0m\u001b[38;2;0;192;255m▀\u001b[0m\u001b[38;2;0;189;255m▀\u001b[0m\u001b[38;2;0;186;255m \u001b[0m\u001b[38;2;0;184;255m▄\u001b[0m\u001b[38;2;0;181;255m▀\u001b[0m\u001b[38;2;0;178;255m█\u001b[0m\u001b[38;2;0;175;255m \u001b[0m\u001b[38;2;0;172;255m█\u001b[0m\u001b[38;2;0;169;255m▀\u001b[0m\u001b[38;2;0;166;255m▀\u001b[0m\u001b[38;2;0;163;255m \u001b[0m\u001b[38;2;0;160;255m▀\u001b[0m\u001b[38;2;0;157;255m█\u001b[0m\u001b[38;2;0;155;255m▀\u001b[0m\u001b[38;2;0;152;255m \u001b[0m\u001b[38;2;0;149;255m█\u001b[0m\u001b[38;2;0;146;255m▀\u001b[0m\u001b[38;2;0;143;255m▄\u001b[0m\u001b[38;2;0;140;255m▀\u001b[0m\u001b[38;2;0;137;255m█\u001b[0m\u001b[38;2;0;134;255m \u001b[0m\u001b[38;2;0;131;255m█\u001b[0m\u001b[38;2;0;128;255m▀\u001b[0m\u001b[38;2;0;126;255m▀\u001b[0m\u001b[38;2;0;123;255m \u001b[0m\u001b[38;2;0;120;255m█\u001b[0m\u001b[38;2;0;117;255m▀\u001b[0m\u001b[38;2;0;114;255m█\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[38;2;0;198;255m \u001b[0m\u001b[38;2;0;195;255m█\u001b[0m\u001b[38;2;0;192;255m▀\u001b[0m\u001b[38;2;0;189;255m \u001b[0m\u001b[38;2;0;186;255m \u001b[0m\u001b[38;2;0;184;255m█\u001b[0m\u001b[38;2;0;181;255m▀\u001b[0m\u001b[38;2;0;178;255m█\u001b[0m\u001b[38;2;0;175;255m \u001b[0m\u001b[38;2;0;172;255m▄\u001b[0m\u001b[38;2;0;169;255m▄\u001b[0m\u001b[38;2;0;166;255m█\u001b[0m\u001b[38;2;0;163;255m \u001b[0m\u001b[38;2;0;160;255m \u001b[0m\u001b[38;2;0;157;255m█\u001b[0m\u001b[38;2;0;155;255m \u001b[0m\u001b[38;2;0;152;255m \u001b[0m\u001b[38;2;0;149;255m█\u001b[0m\u001b[38;2;0;146;255m \u001b[0m\u001b[38;2;0;143;255m▀\u001b[0m\u001b[38;2;0;140;255m \u001b[0m\u001b[38;2;0;137;255m█\u001b[0m\u001b[38;2;0;134;255m \u001b[0m\u001b[38;2;0;131;255m█\u001b[0m\u001b[38;2;0;128;255m▄\u001b[0m\u001b[38;2;0;126;255m▄\u001b[0m\u001b[38;2;0;123;255m \u001b[0m\u001b[38;2;0;120;255m█\u001b[0m\u001b[38;2;0;117;255m▀\u001b[0m\u001b[38;2;0;114;255m▀\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1;34mFastMCP 2.13.0.2\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m🖥 \u001b[0m\u001b[1m \u001b[0m\u001b[36mServer name:\u001b[0m\u001b[36m \u001b[0m\u001b[1;2;34mElasticsearch GitHub Issues MCP\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m \u001b[0m\u001b[36m \u001b[0m\u001b[1;2;34m \u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m📦\u001b[0m\u001b[1m \u001b[0m\u001b[36mTransport: \u001b[0m\u001b[36m \u001b[0m\u001b[2mSSE \u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m🔗\u001b[0m\u001b[1m \u001b[0m\u001b[36mServer URL: \u001b[0m\u001b[36m \u001b[0m\u001b[2mhttp://0.0.0.0:8000/sse \u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m \u001b[0m\u001b[1m \u001b[0m\u001b[36m \u001b[0m\u001b[36m \u001b[0m\u001b[2m \u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m📚\u001b[0m\u001b[1m \u001b[0m\u001b[36mDocs: \u001b[0m\u001b[36m \u001b[0m\u001b[2mhttps://gofastmcp.com \u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[1m🚀\u001b[0m\u001b[1m \u001b[0m\u001b[36mHosting: \u001b[0m\u001b[36m \u001b[0m\u001b[2mhttps://fastmcp.cloud \u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m│\u001b[0m \u001b[2m│\u001b[0m \n",
+ " \u001b[2m╰──────────────────────────────────────────────────────────────────────────────╯\u001b[0m \n",
+ "\n",
+ "\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "[11/13/25 11:36:01] INFO Starting MCP server 'Elasticsearch GitHub Issues MCP' with transport server.py:2050\n",
+ " 'sse' on http://0.0.0.0:8000/sse \n",
+ "
\n"
+ ],
+ "text/plain": [
+ "\u001b[2;36m[11/13/25 11:36:01]\u001b[0m\u001b[2;36m \u001b[0m\u001b[34mINFO \u001b[0m Starting MCP server \u001b[32m'Elasticsearch GitHub Issues MCP'\u001b[0m with transport \u001b[2mserver.py\u001b[0m\u001b[2m:\u001b[0m\u001b[2m2050\u001b[0m\n",
+ "\u001b[2;36m \u001b[0m \u001b[32m'sse'\u001b[0m on \u001b[4;94mhttp://0.0.0.0:8000/sse\u001b[0m \u001b[2m \u001b[0m\n"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "INFO: Started server process [47952]\n",
+ "INFO: Waiting for application startup.\n",
+ "INFO: Application startup complete.\n",
+ "INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)\n",
+ "INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:09-0300 lvl=info msg=\"join connections\" obj=join id=2f547f1e02b9 l=127.0.0.1:8000 r=191.233.196.115:8612\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "INFO: 191.233.196.115:0 - \"POST /sse HTTP/1.1\" 405 Method Not Allowed\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:10-0300 lvl=info msg=\"join connections\" obj=join id=f157e39aac9d l=127.0.0.1:8000 r=191.233.196.120:47762\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "INFO: 191.233.196.120:0 - \"GET /sse HTTP/1.1\" 200 OK\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "INFO:pyngrok.process.ngrok:t=2025-11-13T11:37:10-0300 lvl=info msg=\"join connections\" obj=join id=5a9192136cfb l=127.0.0.1:8000 r=191.233.196.117:53796\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "INFO: 191.233.196.117:0 - \"POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1\" 202 Accepted\n",
+ "INFO: 191.233.196.117:0 - \"POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1\" 202 Accepted\n",
+ "INFO: 191.233.196.117:0 - \"POST /messages/?session_id=a8b8863d0264414f8cadb3694f26e121 HTTP/1.1\" 202 Accepted\n"
+ ]
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "INFO:mcp.server.lowlevel.server:Processing request of type ListToolsRequest\n",
+ "INFO:pyngrok.process.ngrok:t=2025-11-13T11:47:43-0300 lvl=info msg=\"received stop request\" obj=app stopReq=\"{err: restart:false}\"\n"
+ ]
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "Server stopped\n"
+ ]
+ }
+ ],
+ "source": [
+ "server = create_server()\n",
+ "\n",
+ "print(\"Starting MCP server...\")\n",
+ "print(\"Server is running. To stop: Runtime > Interrupt execution\")\n",
+ "print()\n",
+ "\n",
+ "def run_server():\n",
+ " server.run(transport=\"sse\", host=\"0.0.0.0\", port=8000)\n",
+ "\n",
+ "server_thread = threading.Thread(target=run_server, daemon=True)\n",
+ "server_thread.start()\n",
+ "\n",
+ "print(\"Server started successfully!\")\n",
+ "print(\"Your ngrok URL is ready to use in ChatGPT\")\n",
+ "print(\"Keep this cell running...\")\n",
+ "print()\n",
+ "\n",
+ "try:\n",
+ " while True:\n",
+ " time.sleep(1)\n",
+ "except KeyboardInterrupt:\n",
+ " print(\"\\nServer stopped\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Example: ChatGPT Interaction\n",
+ "\n",
+ "Here's an example of ChatGPT using the Elasticsearch connector to search through GitHub issues:\n",
+ " - **Search tool:**\n",
+ "\n",
+ " \n",
+ " - **Fetch tool:**\n",
+ "\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "oNzxwv__hy8D"
+ },
+ "source": [
+ "## Cleanup (Optional)\n",
+ "\n",
+ "Delete the Elasticsearch index to remove all demo data. \n",
+ "**WARNING**: This permanently deletes all documents in the index. Only run this if you want to start fresh or clean up after the demo."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "7C6sin_gh2Be"
+ },
+ "outputs": [],
+ "source": [
+ "try:\n",
+ " result = es_client.options(ignore_status=[400, 404]).indices.delete(index=INDEX_NAME)\n",
+ " if result.get('acknowledged', False):\n",
+ " print(f\"Index '{INDEX_NAME}' deleted successfully\")\n",
+ " else:\n",
+ " print(f\"Error deleting index: {result}\")\n",
+ "except Exception as e:\n",
+ " print(f\"Error: {e}\")"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": ".venv",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.14.0"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/github_internal_dataset.json b/supporting-blog-content/elasticsearch-chatgpt-connector/github_internal_dataset.json
new file mode 100644
index 00000000..719e6347
--- /dev/null
+++ b/supporting-blog-content/elasticsearch-chatgpt-connector/github_internal_dataset.json
@@ -0,0 +1,212 @@
+[
+ {
+ "id": "ISSUE-1712",
+ "title": "Migrate from Elasticsearch 7.x to 8.x",
+ "text": "Description: Current Elasticsearch cluster running 7.17 which reaches EOL in Q1 2026. Need to upgrade to 8.x to get security updates and new features.\n\nPlanning:\n- 3-node cluster with 500GB data per node\n- 15M documents across 8 indices\n- Zero-downtime upgrade required\n\nMigration Steps:\n1. Set up parallel 8.x cluster\n2. Configure cross-cluster replication\n3. Update application code for breaking changes\n4. Gradual traffic migration using feature flags\n5. Decommission 7.x cluster\n\nComments:\n- @david_data: Main breaking change is removal of mapping types\n- @sarah_dev: API client library needs upgrade from elasticsearch-py 7.x to 8.x\n- @john_backend: Testing pagination changes, from/size behavior different in 8.x\n- @alex_devops: Provisioned new cluster in staging, ready for testing\n- @maria_frontend: No frontend changes needed, API contract stays same\n\nCurrent Status: Staging migration successful. PR-598 contains all code changes. Planning production migration for next weekend.",
+ "url": "https://internal-git.techcorp.com/issues/1712",
+ "type": "issue",
+ "status": "in_progress",
+ "priority": "medium",
+ "assignee": "david_data",
+ "created_date": "2025-09-01",
+ "resolved_date": null,
+ "labels": ["infrastructure", "elasticsearch", "migration", "upgrade"],
+ "related_pr": "PR-598"
+ },
+ {
+ "id": "RFC-038",
+ "title": "API Versioning Strategy and Deprecation Policy",
+ "text": "Abstract: Establishes a formal API versioning strategy and deprecation policy to maintain backward compatibility while enabling API evolution.\n\nProblem Statement:\n- Current API has breaking changes in minor releases\n- No clear deprecation timeline for endpoints\n- Customers complain about unexpected breaking changes\n- Support burden from maintaining old API behavior\n\nProposed Versioning Strategy:\n- Semantic versioning: MAJOR.MINOR.PATCH\n- URL-based versioning: /api/v1, /api/v2, etc.\n- Header-based version override: X-API-Version: 2.1\n- Maintain 2 major versions simultaneously\n\nDeprecation Policy:\n1. Announce deprecation 6 months in advance\n2. Add deprecation warnings in response headers\n3. Update documentation with migration guide\n4. Send email notifications to affected users\n5. Sunset endpoint after 6-month grace period\n\nImplementation:\n- API version middleware to route requests\n- Automated deprecation header injection\n- Deprecation tracking dashboard\n- User notification system integration\n\nResponse Headers:\n- X-API-Version: 2.0\n- X-API-Deprecated: true\n- X-API-Sunset-Date: 2026-03-01\n- X-API-Migration-Guide: https://docs.techcorp.com/migration/v2\n\nDocumentation Requirements:\n- Changelog for each version\n- Migration guides with code examples\n- API version compatibility matrix\n- Deprecation timeline calendar\n\nDiscussion:\n- @sarah_dev: URL versioning is clearer than header-only approach\n- @maria_frontend: 6 months feels right for enterprise customers\n- @tech_lead_mike: Need to track which users are on old versions\n- @john_backend: I'll implement version routing middleware\n- @product_manager_lisa: This will improve customer trust\n\nStatus: Approved. Implementation tracked in ISSUE-2034. Target: Q4 2025 release.",
+ "url": "https://internal-git.techcorp.com/rfcs/038",
+ "type": "rfc",
+ "status": "closed",
+ "priority": "medium",
+ "assignee": "sarah_dev",
+ "created_date": "2025-09-03",
+ "resolved_date": "2025-09-25",
+ "labels": ["api", "architecture", "design", "rfc"],
+ "related_pr": null
+ },
+ {
+ "id": "ISSUE-1834",
+ "title": "Add rate limiting per user endpoint",
+ "text": "Description: Currently rate limiting is implemented globally at gateway level. Need per-user rate limiting to prevent abuse while allowing legitimate high-volume users.\n\nRequirements:\n- Implement sliding window rate limiter using Redis\n- Different limits for free tier (100 req/min) and premium (1000 req/min)\n- Return remaining quota in response headers\n- Graceful degradation if Redis is unavailable\n\nComments:\n- @john_backend: I'll implement using Redis sorted sets with ZREMRANGEBYSCORE\n- @sarah_dev: Need to coordinate with billing service for tier information\n- @maria_frontend: Should we show rate limit info in user dashboard?\n- @tech_lead_mike: Yes, add a usage widget showing current consumption\n\nImplementation completed in PR-543. Deployed to production 2025-09-12. Monitoring shows 300+ users hitting limits daily with appropriate 429 responses. No performance impact on API response times.",
+ "url": "https://internal-git.techcorp.com/issues/1834",
+ "type": "issue",
+ "status": "closed",
+ "priority": "medium",
+ "assignee": "john_backend",
+ "created_date": "2025-09-05",
+ "resolved_date": "2025-09-12",
+ "labels": ["feature", "api", "redis", "rate-limiting"],
+ "related_pr": "PR-543"
+ },
+ {
+ "id": "ISSUE-1756",
+ "title": "Implement OAuth2 support for external API integrations",
+ "text": "Description: Product team requesting OAuth2 authentication support for third-party integrations. Currently only supporting API key authentication which doesn't meet security requirements for enterprise customers.\n\nRequirements:\n- Support OAuth2 authorization code flow\n- Implement refresh token rotation\n- Add scopes for granular permission control\n- Support PKCE for mobile apps\n\nComments:\n- @sarah_dev: I can handle the backend implementation, estimated 3-4 days\n- @maria_frontend: Frontend changes needed for OAuth flow UI, will coordinate\n- @security_team_alice: Please ensure refresh tokens are stored encrypted in PostgreSQL\n- @john_backend: Let's use the authlib library for OAuth2, it's well maintained\n- @tech_lead_mike: Approved, targeting for Q4 release\n\nImplementation Notes:\n- Using PostgreSQL for token storage with encryption at rest\n- Redis for temporary authorization codes (5 minute TTL)\n- Integration tests covering all OAuth2 flows\n- Documentation for API consumers\n\nCurrent Status: Design phase complete, implementation starting next sprint.",
+ "url": "https://internal-git.techcorp.com/issues/1756",
+ "type": "issue",
+ "status": "open",
+ "priority": "high",
+ "assignee": "sarah_dev",
+ "created_date": "2025-09-08",
+ "resolved_date": null,
+ "labels": ["feature", "api", "security", "oauth"],
+ "related_pr": null
+ },
+ {
+ "id": "PR-543",
+ "title": "Implement per-user rate limiting with Redis",
+ "text": "Description: Implements sliding window rate limiter for ISSUE-1834 using Redis sorted sets.\n\nImplementation:\n- RateLimiter class using Redis ZSET with timestamp scores\n- Sliding window of 60 seconds\n- Automatic cleanup of old entries using ZREMRANGEBYSCORE\n- Fallback to in-memory limiter if Redis unavailable\n\nCode Structure:\n- New module: middleware/rate_limiter.py (180 lines)\n- Configuration in config/rate_limits.yaml\n- Integration in main.py request pipeline\n\nRate Limit Tiers:\n- Free tier: 100 requests/minute\n- Professional: 500 requests/minute \n- Enterprise: 1000 requests/minute\n\nResponse Headers Added:\n- X-RateLimit-Limit: user's tier limit\n- X-RateLimit-Remaining: requests left in window\n- X-RateLimit-Reset: timestamp when limit resets\n\nTesting:\n- Unit tests with mocked Redis (coverage: 95%)\n- Load testing: 10k requests with mixed tiers, all limits enforced correctly\n- Latency impact: +0.8ms per request (acceptable)\n\nComments:\n- @sarah_dev: Clean implementation, like the fallback strategy\n- @tech_lead_mike: Approved, deploy to staging first\n- @alex_devops: Deployed to production 2025-09-12, monitoring looks good\n\nPost-Deployment: Running smoothly for 2 weeks, no issues reported.",
+ "url": "https://internal-git.techcorp.com/pulls/543",
+ "type": "pull_request",
+ "status": "closed",
+ "priority": "medium",
+ "assignee": "john_backend",
+ "created_date": "2025-09-10",
+ "resolved_date": "2025-09-12",
+ "labels": ["feature", "redis", "rate-limiting"],
+ "related_pr": null
+ },
+ {
+ "id": "RFC-045",
+ "title": "Design Proposal: Microservices Migration Architecture",
+ "text": "Abstract: This RFC proposes a phased approach to migrate our monolithic application to a microservices architecture over 18 months.\n\nCurrent State:\n- Single Django monolith (~250k lines of code)\n- PostgreSQL database with 50+ tables\n- Deployed as single unit, scaling limitations\n- Deployment takes 30 minutes, high risk\n\nProposed Architecture:\n1. API Gateway (Kong) - routing and authentication\n2. User Service - authentication and profile management\n3. Billing Service - subscriptions and payments\n4. Notification Service - emails, SMS, WebSocket\n5. Analytics Service - reporting and data warehouse\n6. Search Service - Elasticsearch integration\n\nCommunication:\n- Synchronous: REST APIs with circuit breakers\n- Asynchronous: RabbitMQ for events\n- Service mesh: Istio for observability\n\nData Strategy:\n- Each service owns its database\n- Event sourcing for data synchronization\n- Saga pattern for distributed transactions\n- Read replicas for cross-service queries\n\nMigration Phases:\nPhase 1 (Months 1-4): Extract notification service\nPhase 2 (Months 5-8): Extract billing service\nPhase 3 (Months 9-12): Extract user service\nPhase 4 (Months 13-18): Extract analytics and search\n\nInfrastructure Requirements:\n- Kubernetes cluster (3 nodes minimum per environment)\n- RabbitMQ cluster (3 nodes)\n- Service mesh (Istio)\n- Monitoring (Prometheus + Grafana)\n- Distributed tracing (Jaeger)\n\nEstimated Costs:\n- Infrastructure: +$5k/month\n- Engineering time: 2.5 FTE for 18 months\n- Risk mitigation: 3-month buffer\n\nDiscussion:\n- @tech_lead_mike: Strong proposal, phased approach reduces risk\n- @alex_devops: Infrastructure costs are manageable, need dedicated DevOps\n- @sarah_dev: Concerned about distributed transaction complexity\n- @john_backend: Event sourcing will help with debugging\n- @cto_robert: Approved in principle, need detailed Phase 1 plan\n\nStatus: Approved for Phase 1 implementation. Kickoff meeting scheduled 2025-10-01.",
+ "url": "https://internal-git.techcorp.com/rfcs/045",
+ "type": "rfc",
+ "status": "open",
+ "priority": "high",
+ "assignee": "tech_lead_mike",
+ "created_date": "2025-09-14",
+ "resolved_date": null,
+ "labels": ["architecture", "microservices", "design", "rfc"],
+ "related_pr": null
+ },
+ {
+ "id": "ISSUE-1847",
+ "title": "API Gateway returning 429 errors during peak hours",
+ "text": "Description: Users are experiencing 429 rate limit errors during peak hours (2-4 PM EST). The API gateway is rejecting requests even though we're within our configured limits.\n\nInvestigation:\n- @john_backend: Checked Redis cache, TTL is set correctly at 300s\n- @sarah_dev: Found the issue - middleware is not properly handling connection pooling\n- Root cause: Connection pool exhausted due to long-running queries in user service\n\nComments:\n- @maria_frontend: This is affecting the dashboard heavily, marking as critical\n- @john_backend: PR-567 ready with fix implementing exponential backoff\n- @alex_devops: Added monitoring alerts for connection pool utilization\n\nResolution: Deployed PR-567 to production on 2025-09-18. Monitoring shows 429 errors reduced by 95%. Added connection pool metrics to Grafana dashboard.",
+ "url": "https://internal-git.techcorp.com/issues/1847",
+ "type": "issue",
+ "status": "closed",
+ "priority": "critical",
+ "assignee": "john_backend",
+ "created_date": "2025-09-15",
+ "resolved_date": "2025-09-18",
+ "labels": ["bug", "api", "production", "performance"],
+ "related_pr": "PR-567"
+ },
+ {
+ "id": "PR-567",
+ "title": "Fix connection pool exhaustion in API middleware",
+ "text": "Description: Implements exponential backoff and proper connection pool management to resolve ISSUE-1847 (429 errors during peak hours).\n\nChanges:\n- Refactored middleware/connection_pool.py to use contextlib for proper cleanup\n- Increased pool size from 10 to 50 connections\n- Added exponential backoff with max retry of 3 attempts\n- Implemented connection health checks before reuse\n\nTechnical Details:\n- Using asyncpg connection pool with proper async context managers\n- Added metrics for pool utilization (current: 45%, max: 85%)\n- Timeout handling improved with graceful degradation\n\nTesting:\n- Unit tests for backoff logic (test_exponential_backoff.py)\n- Load testing with 500 concurrent users showed no 429 errors\n- Staging deployment ran for 48 hours without issues\n\nCode Review Comments:\n- @tech_lead_mike: LGTM, good use of async context managers\n- @sarah_dev: Consider adding alerts for pool utilization > 80%\n- @alex_devops: Approved for production deployment\n\nMetrics After Deployment:\n- 429 errors: 1200/hour → 60/hour (95% reduction)\n- Average response time: 145ms → 132ms\n- Connection pool utilization: stable at 45-55%",
+ "url": "https://internal-git.techcorp.com/pulls/567",
+ "type": "pull_request",
+ "status": "closed",
+ "priority": "critical",
+ "assignee": "john_backend",
+ "created_date": "2025-09-16",
+ "resolved_date": "2025-09-18",
+ "labels": ["bug-fix", "api", "performance"],
+ "related_pr": null
+ },
+ {
+ "id": "ISSUE-1889",
+ "title": "SQL injection vulnerability in search endpoint",
+ "text": "Description: Security audit identified SQL injection vulnerability in /api/v1/search endpoint. User input from query parameter is not properly sanitized before being used in raw SQL query.\n\nSeverity: HIGH - Immediate action required\n\nAffected Code:\n- File: services/search/query_builder.py\n- Line: 145-152\n- Issue: String concatenation used instead of parameterized queries\n\nInvestigation:\n- @security_team_alice: Confirmed exploitable with UNION-based injection\n- @sarah_dev: Checking all other endpoints for similar patterns\n- @john_backend: Found 3 more instances in legacy codebase\n\nRemediation:\n- Rewrite using SQLAlchemy ORM or parameterized queries\n- Add input validation and sanitization\n- Implement WAF rules as additional layer\n- Security regression tests\n\nComments:\n- @tech_lead_mike: Stop all other work, this is P0\n- @sarah_dev: PR-578 ready with fixes for all 4 vulnerable endpoints\n- @alex_devops: Deployed hotfix to production 2025-09-19 at 14:30 UTC\n- @security_team_alice: Verified fix, conducting full pentest next week\n\nResolution: All vulnerable endpoints patched. Added pre-commit hooks to catch raw SQL queries. Security training scheduled for team.",
+ "url": "https://internal-git.techcorp.com/issues/1889",
+ "type": "issue",
+ "status": "closed",
+ "priority": "critical",
+ "assignee": "sarah_dev",
+ "created_date": "2025-09-18",
+ "resolved_date": "2025-09-19",
+ "labels": ["security", "vulnerability", "bug", "sql"],
+ "related_pr": "PR-578"
+ },
+ {
+ "id": "PR-578",
+ "title": "Security hotfix: Patch SQL injection vulnerabilities",
+ "text": "Description: CRITICAL SECURITY FIX for ISSUE-1889. Patches SQL injection vulnerabilities in search and filter endpoints.\n\nVulnerabilities Fixed:\n1. services/search/query_builder.py - search endpoint\n2. services/filters/advanced_filter.py - filter endpoint \n3. services/export/csv_export.py - export functionality\n4. services/admin/user_lookup.py - admin search\n\nChanges Applied:\n- Replaced all string concatenation with parameterized queries\n- Migrated to SQLAlchemy ORM where possible\n- Added input validation using Pydantic models\n- Implemented query whitelisting for column names\n- Added SQL injection detection in WAF rules\n\nSecurity Testing:\n- Attempted UNION-based injection: blocked ✓\n- Attempted boolean-based injection: blocked ✓\n- Attempted time-based injection: blocked ✓\n- Tested with sqlmap: all attacks blocked ✓\n\nCode Review:\n- @security_team_alice: Verified all fixes, running full pentest\n- @tech_lead_mike: APPROVED for immediate deployment\n- @john_backend: Reviewed query patterns, all look safe now\n\nDeployment:\n- Hotfix deployed 2025-09-19 at 14:30 UTC\n- No user-facing changes\n- API performance unchanged\n- Added security regression tests to CI pipeline\n\nFollow-up Actions:\n- Security training for team scheduled\n- Pre-commit hooks added to catch raw SQL\n- Code audit of entire codebase planned",
+ "url": "https://internal-git.techcorp.com/pulls/578",
+ "type": "pull_request",
+ "status": "closed",
+ "priority": "critical",
+ "assignee": "sarah_dev",
+ "created_date": "2025-09-19",
+ "resolved_date": "2025-09-19",
+ "labels": ["security", "hotfix", "sql"],
+ "related_pr": null
+ },
+ {
+ "id": "PR-598",
+ "title": "Elasticsearch 8.x migration - Application code changes",
+ "text": "Description: Updates application code for Elasticsearch 8.x compatibility as part of ISSUE-1712 migration.\n\nBreaking Changes Addressed:\n1. Removed mapping types (was using _doc type)\n2. Updated elasticsearch-py from 7.17.0 to 8.10.0\n3. Changed query DSL for better performance\n4. Updated index templates to new format\n\nCode Changes:\n- services/search/elasticsearch_client.py - major refactor\n- Updated all queries to use new Python client syntax\n- Removed deprecated query parameters\n- Added new security context for API keys\n\nIndex Template Updates:\n- Migrated from legacy to composable templates\n- Updated field mappings for text/keyword types\n- Added new runtime fields for computed values\n- Optimized for better search performance\n\nConfiguration Changes:\n- Added Elasticsearch API key authentication\n- Updated connection pool settings\n- Configured request compression\n- Added retry logic for transient failures\n\nTesting:\n- All 450 search integration tests passing\n- Performance testing shows 15% improvement in query speed\n- Backward compatibility maintained with feature flags\n- Staging cluster validated with production traffic replay\n\nComments:\n- @david_data: Composable templates much cleaner than legacy\n- @sarah_dev: Nice performance improvements in aggregation queries\n- @john_backend: API key auth is more secure than basic auth\n- @alex_devops: Staging migration complete, ready for production\n\nDeployment Plan:\n- Deploy code with feature flag (pointing to 7.x)\n- Switch traffic gradually to 8.x cluster\n- Monitor for 48 hours before decommissioning 7.x",
+ "url": "https://internal-git.techcorp.com/pulls/598",
+ "type": "pull_request",
+ "status": "in_progress",
+ "priority": "medium",
+ "assignee": "david_data",
+ "created_date": "2025-09-20",
+ "resolved_date": null,
+ "labels": ["infrastructure", "elasticsearch", "migration"],
+ "related_pr": null
+ },
+ {
+ "id": "ISSUE-1923",
+ "title": "PostgreSQL query timeout in analytics service",
+ "text": "Description: The analytics dashboard is timing out when users request reports for date ranges longer than 30 days. Query timeout set at 30s but queries are taking 45-60s.\n\nInvestigation:\n- @david_data: Analyzed query execution plan - missing index on transactions.created_at column\n- @sarah_dev: Confirmed table has 12M rows, full table scan happening\n- @john_backend: Tested with index in staging, query time reduced to 3.2s\n\nComments:\n- @maria_frontend: Users are complaining, this affects monthly/quarterly reports\n- @david_data: CREATE INDEX idx_transactions_created_at ON transactions(created_at, user_id)\n- @alex_devops: Index creation will take ~15 minutes in production, scheduling maintenance window\n\nStatus: PR-589 opened with migration. Waiting for approval from @tech_lead_mike before production deployment.",
+ "url": "https://internal-git.techcorp.com/issues/1923",
+ "type": "issue",
+ "status": "in_progress",
+ "priority": "high",
+ "assignee": "david_data",
+ "created_date": "2025-09-22",
+ "resolved_date": null,
+ "labels": ["bug", "database", "performance", "postgresql"],
+ "related_pr": "PR-589"
+ },
+ {
+ "id": "PR-589",
+ "title": "Add database index for analytics query optimization",
+ "text": "Description: Resolves ISSUE-1923 by adding composite index on transactions table to eliminate full table scans.\n\nDatabase Changes:\n- Added index: idx_transactions_created_at_user_id\n- Columns: (created_at DESC, user_id)\n- Index size: ~450MB\n- Creation time in production: estimated 12-15 minutes\n\nMigration Script:\nCREATE INDEX CONCURRENTLY idx_transactions_created_at_user_id \nON transactions(created_at DESC, user_id)\nWHERE deleted_at IS NULL;\n\nANALYZE transactions;\n\nPerformance Testing Results:\n- Before: 45-60s query time\n- After: 2.8-3.5s query time (94% improvement)\n- No impact on INSERT performance (tested with 10k inserts)\n- Disk usage increase: acceptable 450MB for 12M row table\n\nComments:\n- @david_data: Used CONCURRENTLY to avoid locking production table\n- @john_backend: Tested with full month query, works great\n- @tech_lead_mike: Approved, schedule for maintenance window\n- @alex_devops: Deployment scheduled for 2025-09-24 at 02:00 UTC\n\nStatus: Awaiting production deployment window.",
+ "url": "https://internal-git.techcorp.com/pulls/589",
+ "type": "pull_request",
+ "status": "open",
+ "priority": "high",
+ "assignee": "david_data",
+ "created_date": "2025-09-23",
+ "resolved_date": null,
+ "labels": ["database", "performance", "postgresql"],
+ "related_pr": null
+ },
+ {
+ "id": "ISSUE-1998",
+ "title": "Memory leak in notification microservice",
+ "text": "Description: Notification service consuming increasing memory over time, requiring restart every 48 hours. Heap usage grows from 512MB to 4GB before OOMKiller terminates the process.\n\nInvestigation:\n- @alex_devops: Monitoring shows steady memory growth, not correlated with traffic\n- @john_backend: Heap dump analysis reveals WebSocket connections not being properly closed\n- Root cause: Event listeners not being removed when WebSocket disconnects\n\nTechnical Details:\n- Node.js v20.5.1, using ws library for WebSocket connections\n- Found 3000+ orphaned event listeners after 24 hours runtime\n- Memory profile shows listeners retaining references to large user objects\n\nComments:\n- @sarah_dev: We should implement connection cleanup in the disconnect handler\n- @john_backend: Also need to add heartbeat mechanism to detect stale connections\n- @alex_devops: Current workaround: Auto-restart service every 24 hours via k8s\n- @tech_lead_mike: This is affecting 15k active WebSocket users, prioritize fix\n\nResolution: PR-612 implements proper cleanup with WeakMap for connection tracking. Testing in staging for 72 hours before production release.",
+ "url": "https://internal-git.techcorp.com/issues/1998",
+ "type": "issue",
+ "status": "in_progress",
+ "priority": "critical",
+ "assignee": "john_backend",
+ "created_date": "2025-09-28",
+ "resolved_date": null,
+ "labels": ["bug", "production", "memory-leak", "microservices"],
+ "related_pr": "PR-612"
+ },
+ {
+ "id": "PR-612",
+ "title": "Fix memory leak in WebSocket notification service",
+ "text": "Description: Resolves ISSUE-1998 memory leak caused by orphaned event listeners in WebSocket connections.\n\nRoot Cause Analysis:\n- Event listeners registered on connection but not removed on disconnect\n- Each listener retained reference to full user object (~8KB)\n- After 24 hours: 3000+ orphaned listeners = 24MB+ leaked memory\n- Compounded by other retained objects in closure scope\n\nImplementation:\n- Refactored connection manager to use WeakMap for listener tracking\n- Implemented explicit cleanup in disconnect handler\n- Added heartbeat mechanism (30s interval) to detect stale connections\n- Automatic connection timeout after 5 minutes of inactivity\n\nCode Changes:\n- services/notifications/websocket_manager.js - 250 lines refactored\n- Added cleanup middleware in disconnect pipeline\n- Implemented connection pool monitoring\n- Memory profiling instrumentation added\n\nTesting:\n- Load test: 5000 concurrent WebSocket connections for 72 hours\n- Memory usage: stable at 512MB (previously grew to 4GB)\n- No connection drops or data loss\n- Heap snapshots show proper cleanup\n\nComments:\n- @john_backend: Used WeakMap to prevent memory retention\n- @alex_devops: Running in staging, memory is flat, looking good!\n- @tech_lead_mike: Excellent fix, approve for production after 72h staging test\n- @sarah_dev: Should we add memory usage alerts? \n- @alex_devops: Added CloudWatch alert for >2GB usage\n\nStatus: In staging testing, pending 72-hour validation before production.",
+ "url": "https://internal-git.techcorp.com/pulls/612",
+ "type": "pull_request",
+ "status": "in_progress",
+ "priority": "critical",
+ "assignee": "john_backend",
+ "created_date": "2025-09-29",
+ "resolved_date": null,
+ "labels": ["bug-fix", "memory-leak", "websocket"],
+ "related_pr": null
+ }
+]
diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-fetch-example.png b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-fetch-example.png
new file mode 100644
index 00000000..d5445175
Binary files /dev/null and b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-fetch-example.png differ
diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-search-example.png b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-search-example.png
new file mode 100644
index 00000000..dac4335f
Binary files /dev/null and b/supporting-blog-content/elasticsearch-chatgpt-connector/images/chatgpt-search-example.png differ
diff --git a/supporting-blog-content/elasticsearch-chatgpt-connector/requirements.txt b/supporting-blog-content/elasticsearch-chatgpt-connector/requirements.txt
new file mode 100644
index 00000000..3153d657
--- /dev/null
+++ b/supporting-blog-content/elasticsearch-chatgpt-connector/requirements.txt
@@ -0,0 +1,4 @@
+fastmcp>=2.13.0
+elasticsearch>=8.0.0
+pyngrok>=7.0.0
+pandas>=2.0.0
\ No newline at end of file