Skip to content

feat: Bug Recommendation system using vector cosine-similarity#1

Open
SuhaniBhati wants to merge 3 commits intomainfrom
ai-service-backend
Open

feat: Bug Recommendation system using vector cosine-similarity#1
SuhaniBhati wants to merge 3 commits intomainfrom
ai-service-backend

Conversation

@SuhaniBhati
Copy link
Copy Markdown
Collaborator

@SuhaniBhati SuhaniBhati commented Apr 8, 2026

Summary by CodeRabbit

  • New Features
    • Added a bug recommendation system that finds similar bugs using vector similarity.
    • Introduced an embedding service and utilities to generate embeddings from bug data.
    • Added API endpoints to create/update bug vectors and to fetch recommendations by bug ID.
  • Chores
    • Updated ignore rules to exclude Python bytecode artifacts.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Adds a Python Flask microservice that returns text embeddings and updates the Node.js core service with vector creation, an external embedding integration, cosine similarity utilities, model schema changes, and recommendation endpoints for similar bugs.

Changes

Cohort / File(s) Summary
AI service
ai-service/app.py, ai-service/model.py, ai-service/utils.py, ai-service/requirements.txt, ./.gitignore
New Flask app exposing POST /embed, SentenceTransformer model loaded at import, text-building helpers (build_text, safe_join), requirements file added, and .gitignore updated to ignore Python bytecode.
Vector creation & embedding integration
core-services/src/controllers/vector.Controller.js, core-services/src/routes/vector.Routes.js, core-services/src/services/embedding.Service.js, core-services/src/app.js
New createVector controller and POST /api/vectors route; embedding service posts to external AI service via axios; app mounts vector router.
Similarity & recommendations
core-services/src/controllers/recommendation.Controller.js, core-services/src/routes/recommendation.Routes.js, core-services/src/utils/similarityScore.js
Added cosineSimilarity util, getSimilarBugs controller computing top-5 similar bugs by cosine similarity, and GET /api/recommendations/:id router.
Models & packaging
core-services/src/models/Vector.js, core-services/src/models/Bug.js, core-services/package.json
Converted models to CommonJS exports; Vector schema adds modelVersion and updatedAt; simplified Bug schema formatting; added axios dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant CoreAPI as Core API (Node.js)
    participant AIService as AI Service (Flask)
    participant DB as Database (MongoDB)

    Client->>CoreAPI: POST /api/vectors (bugId)
    CoreAPI->>DB: Bug.findById(bugId)
    DB-->>CoreAPI: bug document
    CoreAPI->>AIService: POST /embed (built text)
    AIService-->>CoreAPI: {embedding: [...]}
    CoreAPI->>DB: Vector.findOneAndUpdate(... upsert)
    DB-->>CoreAPI: saved vector
    CoreAPI-->>Client: 200 {data: vector}
Loading
sequenceDiagram
    participant Client
    participant CoreAPI as Core API (Node.js)
    participant DB as Database (MongoDB)

    Client->>CoreAPI: GET /api/recommendations/:id
    CoreAPI->>DB: Vector.findById(id)
    DB-->>CoreAPI: target vector
    CoreAPI->>DB: Vector.find(...) (other vectors)
    DB-->>CoreAPI: other vectors
    Note over CoreAPI: compute cosineSimilarity for each vector\nsort desc, pick top-5
    CoreAPI->>DB: Bug.find({_id: {$in: topIds}})
    DB-->>CoreAPI: bug documents
    CoreAPI-->>Client: 200 {recommendations: [{bug, similarityScore}, ...]}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰
I nibble lines and weave the text,
I hum till vectors are perplexed,
I hop through scores to find a friend,
Five nearest bugs — my joyous end.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature: a bug recommendation system using vector embeddings and cosine similarity, matching the substantial changes across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ai-service-backend

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (5)
ai-service/requirements.txt (1)

1-4: Use pinned or bounded versions for runtime dependencies.

Current bare package names make deployments non-reproducible across environments.

♻️ Suggested change
-flask
-sentence-transformers
-numpy
-scikit-learn
+flask>=3.0,<4.0
+sentence-transformers>=3.0,<4.0
+numpy>=2.0,<3.0
+scikit-learn>=1.5,<2.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ai-service/requirements.txt` around lines 1 - 4, The requirements.txt
currently lists unpinned runtime dependencies ("flask", "sentence-transformers",
"numpy", "scikit-learn"); update each entry to use either exact pinned versions
(e.g., "package==x.y.z") or conservative version bounds (e.g.,
"package>=x.y,<x+1") to make deployments reproducible. Determine compatible
versions used in CI or local environment and replace the bare names in
requirements.txt with those pinned/bounded specifications so installs are
deterministic.
core-services/src/models/Vector.js (1)

4-25: Explicit _id: ObjectId field declaration is unnecessary.

Mongoose auto-generates _id as an ObjectId by default. Declaring it explicitly without auto: false has no effect and may cause confusion. Consider removing line 5.

🧹 Proposed cleanup
 const vectorSchema = new mongoose.Schema({
-  _id: ObjectId,
   bugId: {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core-services/src/models/Vector.js` around lines 4 - 25, Remove the explicit
_id declaration from the mongoose schema to avoid confusion since Mongoose
auto-generates an ObjectId; update the vectorSchema by deleting the "_id:
ObjectId" entry so the schema only contains bugId, vector, modelVersion, and
updatedAt, leaving default Mongoose _id behavior intact (refer to the
vectorSchema constant to locate the schema definition).
core-services/src/services/embedding.Service.js (1)

5-18: Consider adding a request timeout.

The axios request has no timeout configured. If the AI service is unresponsive, this call will hang indefinitely, potentially exhausting connection pools or blocking the request thread.

⏱️ Proposed fix: add timeout
     const res = await axios.post(
       `${process.env.AI_SERVICE_URL}/embed`,
       {
         title: bug.title,
         description: bug.description,
-        tags: Array.isArray(bug.tags)
-          ? bug.tags.join(", ")
-          : bug.tags,
-        techStack: Array.isArray(bug.techStack)
-          ? bug.techStack.join(", ")
-          : bug.techStack,
+        tags: Array.isArray(bug.tags) ? bug.tags : [],
+        techStack: Array.isArray(bug.techStack) ? bug.techStack : [],
         difficulty: bug.difficulty
-      }
+      },
+      { timeout: 10000 } // 10 second timeout
     );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core-services/src/services/embedding.Service.js` around lines 5 - 18, The
axios.post call in embedding.Service.js that sends payload from bug (title,
description, tags, techStack, difficulty) lacks a timeout and can hang; add an
axios request timeout (e.g., pass a config object with a timeout value in
milliseconds as the third argument to axios.post) to the call that assigns to
res so the request fails fast when the AI service is unresponsive, and choose a
sensible timeout constant (e.g., DEFAULT_EMBED_TIMEOUT) used by the module or
service.
core-services/src/controllers/recommendation.Controller.js (1)

15-22: Full collection scan doesn't scale.

Vector.find({ bugId: { $ne: bugId } }) loads all vectors into memory and MongoDB cannot efficiently use indexes for $ne queries. This is acceptable for a small dataset but will degrade as the collection grows.

For future scalability, consider:

  • Using MongoDB Atlas Vector Search or a dedicated vector database
  • Implementing pagination or limiting candidates
  • Pre-computing nearest neighbors offline
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core-services/src/controllers/recommendation.Controller.js` around lines 15 -
22, The current Vector.find({ bugId: { $ne: bugId } }) causes a full collection
scan and will not scale; change the recommendation logic in the recommendation
controller to avoid loading all vectors into memory by limiting candidate
vectors (e.g., add a .limit(N) and sensible sort/filters or sample recent /
relevant documents) or delegate to a proper vector search (MongoDB Atlas Vector
Search or an external vector DB) instead of mapping all results; update the code
around the Vector.find call and the downstream similarities/top5 computation so
it queries a bounded candidate set (use
Vector.find(query).sort(...).limit(CANDIDATE_LIMIT) or replace with a
vector-search API call) and ensure cosineSimilarity and top5 logic operate only
on that limited result set.
core-services/src/controllers/vector.Controller.js (1)

24-35: Consider validating the embedding before storing.

If the AI service returns an unexpected response (null, empty array, wrong dimensions), this will store an invalid vector. A quick validation could prevent corrupted data.

🛡️ Proposed validation
     const embedding = await getEmbedding(bug);

+    if (!Array.isArray(embedding) || embedding.length === 0) {
+      return res.status(502).json({ message: "Invalid embedding from AI service" });
+    }
+
     const vectorDoc = await Vector.findOneAndUpdate(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@core-services/src/controllers/vector.Controller.js` around lines 24 - 35, The
code calls getEmbedding(bug) then upserts it via Vector.findOneAndUpdate without
validation; add a validation step after getEmbedding to ensure the returned
embedding is non-null, is an Array of numbers, and matches the expected
dimensionality for "all-MiniLM-L6-v2" (e.g., expectedLength) before calling
Vector.findOneAndUpdate; if validation fails, log or throw a clear error
(including bug._id and the embedding shape) and skip the DB write so invalid
vectors are not stored. Use the symbols getEmbedding, Vector.findOneAndUpdate,
and modelVersion in your checks and error messages.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ai-service/app.py`:
- Around line 9-12: Check that request.json is present and valid before calling
build_text; if request.json is None or missing required fields, return a 400 Bad
Request instead of proceeding. Modify the handler around request.json /
build_text(data) to validate data (e.g., ensure data is not None and has
expected keys) and return an appropriate error response via jsonify with a 400
status; only call build_text(data) and get_embedding(text) when validation
passes.
- Line 15: The app currently calls app.run(port=3000, debug=True) which
hardcodes Flask debug mode; remove the hardcoded debug=True and make debug
configurable via environment/config (e.g., read from an env var or app config)
so debug is only enabled in development; update the app.run call in app.py to
use that config value (e.g., app.config or os.environ) and ensure the default is
False in production.

In `@ai-service/utils.py`:
- Around line 5-9: Normalize the incoming tags and techStack values before
joining: ensure tags and tech_stack are lists (convert None to [], wrap a string
into a single-item list rather than iterating its characters, and leave existing
lists untouched) before constructing tags_text and tech_text. Update the
variables tags and tech_stack in ai-service/utils.py so they handle str, None,
and list types safely (e.g., check isinstance(tags, str) /
isinstance(tech_stack, str) and convert accordingly) and then use " ".join on
the normalized lists to produce tags_text and tech_text.

In `@core-services/src/controllers/recommendation.Controller.js`:
- Line 7: The controller reads const bugId = req.params.id without validating
it; add an ObjectId format check (e.g., using mongoose.Types.ObjectId.isValid)
at the start of the relevant handler in recommendation.Controller.js (where
bugId is used) and return a 400 Bad Request with a clear message when invalid,
so invalid ids do not trigger Mongoose cast errors; keep the existing variable
name bugId and perform the check before any DB calls that use bugId.

In `@core-services/src/services/embedding.Service.js`:
- Around line 10-15: The payload currently converts arrays to comma-separated
strings for the tags and techStack fields, which breaks the Python build_text()
logic that expects arrays; update the assignments in embedding.Service.js so
tags and techStack are sent as arrays (e.g., for tags: use bug.tags directly if
Array.isArray(bug.tags) otherwise wrap/split into an array) and do the same for
techStack, ensuring the service sends arrays to match ai-service/utils.py's
build_text().

In `@core-services/src/utils/similarityScore.js`:
- Around line 1-13: In cosineSimilarity, validate inputs and guard against NaNs:
check that A and B are arrays and have the same length (or explicitly iterate to
Math.min(A.length, B.length)), coerce entries to numbers (skip/ treat undefined
as 0), compute norms and if either norm is zero return 0 (or a defined fallback)
instead of performing the division, and ensure you never divide by
Math.sqrt(normA)*Math.sqrt(normB); update exports.cosineSimilarity accordingly
to return a deterministic numeric value when vectors are zero-length,
zero-magnitude, or length-mismatched.

---

Nitpick comments:
In `@ai-service/requirements.txt`:
- Around line 1-4: The requirements.txt currently lists unpinned runtime
dependencies ("flask", "sentence-transformers", "numpy", "scikit-learn"); update
each entry to use either exact pinned versions (e.g., "package==x.y.z") or
conservative version bounds (e.g., "package>=x.y,<x+1") to make deployments
reproducible. Determine compatible versions used in CI or local environment and
replace the bare names in requirements.txt with those pinned/bounded
specifications so installs are deterministic.

In `@core-services/src/controllers/recommendation.Controller.js`:
- Around line 15-22: The current Vector.find({ bugId: { $ne: bugId } }) causes a
full collection scan and will not scale; change the recommendation logic in the
recommendation controller to avoid loading all vectors into memory by limiting
candidate vectors (e.g., add a .limit(N) and sensible sort/filters or sample
recent / relevant documents) or delegate to a proper vector search (MongoDB
Atlas Vector Search or an external vector DB) instead of mapping all results;
update the code around the Vector.find call and the downstream similarities/top5
computation so it queries a bounded candidate set (use
Vector.find(query).sort(...).limit(CANDIDATE_LIMIT) or replace with a
vector-search API call) and ensure cosineSimilarity and top5 logic operate only
on that limited result set.

In `@core-services/src/controllers/vector.Controller.js`:
- Around line 24-35: The code calls getEmbedding(bug) then upserts it via
Vector.findOneAndUpdate without validation; add a validation step after
getEmbedding to ensure the returned embedding is non-null, is an Array of
numbers, and matches the expected dimensionality for "all-MiniLM-L6-v2" (e.g.,
expectedLength) before calling Vector.findOneAndUpdate; if validation fails, log
or throw a clear error (including bug._id and the embedding shape) and skip the
DB write so invalid vectors are not stored. Use the symbols getEmbedding,
Vector.findOneAndUpdate, and modelVersion in your checks and error messages.

In `@core-services/src/models/Vector.js`:
- Around line 4-25: Remove the explicit _id declaration from the mongoose schema
to avoid confusion since Mongoose auto-generates an ObjectId; update the
vectorSchema by deleting the "_id: ObjectId" entry so the schema only contains
bugId, vector, modelVersion, and updatedAt, leaving default Mongoose _id
behavior intact (refer to the vectorSchema constant to locate the schema
definition).

In `@core-services/src/services/embedding.Service.js`:
- Around line 5-18: The axios.post call in embedding.Service.js that sends
payload from bug (title, description, tags, techStack, difficulty) lacks a
timeout and can hang; add an axios request timeout (e.g., pass a config object
with a timeout value in milliseconds as the third argument to axios.post) to the
call that assigns to res so the request fails fast when the AI service is
unresponsive, and choose a sensible timeout constant (e.g.,
DEFAULT_EMBED_TIMEOUT) used by the module or service.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 649e3593-65d8-42f9-9903-02378c34ea97

📥 Commits

Reviewing files that changed from the base of the PR and between bb17ae6 and 3fc0322.

⛔ Files ignored due to path filters (1)
  • core-services/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (15)
  • .gitignore
  • ai-service/app.py
  • ai-service/model.py
  • ai-service/requirements.txt
  • ai-service/utils.py
  • core-services/package.json
  • core-services/src/app.js
  • core-services/src/controllers/recommendation.Controller.js
  • core-services/src/controllers/vector.Controller.js
  • core-services/src/models/Bug.js
  • core-services/src/models/Vector.js
  • core-services/src/routes/recommendation.Routes.js
  • core-services/src/routes/vector.Routes.js
  • core-services/src/services/embedding.Service.js
  • core-services/src/utils/similarityScore.js

Comment thread ai-service/app.py Outdated
Comment on lines +9 to +12
data = request.json
text = build_text(data)
embedding = get_embedding(text)
return jsonify({"embedding": embedding})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -name "app.py" -path "*/ai-service/*" | head -20

Repository: call-0f-code/Bug-Graveyard

Length of output: 89


🏁 Script executed:

cat -n ai-service/app.py

Repository: call-0f-code/Bug-Graveyard

Length of output: 549


🏁 Script executed:

# Search for build_text function definition
rg "def build_text" -A 5

Repository: call-0f-code/Bug-Graveyard

Length of output: 388


🏁 Script executed:

# Check Flask version if available
if [ -f "requirements.txt" ]; then cat requirements.txt; elif [ -f "setup.py" ]; then grep -i flask setup.py; elif [ -f "pyproject.toml" ]; then grep -i flask pyproject.toml; fi

Repository: call-0f-code/Bug-Graveyard

Length of output: 52


Validate request body before building text.

If request JSON is missing or invalid, request.json returns None and the call to build_text(data) will raise an AttributeError when attempting to call .get() on None, resulting in a 500 error instead of a proper 400 Bad Request.

🐛 Proposed fix
 `@app.route`("/embed", methods=["POST"])
 def embed():
-    data = request.json
+    data = request.get_json(silent=True)
+    if not isinstance(data, dict):
+        return jsonify({"error": "Invalid JSON payload"}), 400
     text = build_text(data)
     embedding = get_embedding(text)
     return jsonify({"embedding": embedding})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ai-service/app.py` around lines 9 - 12, Check that request.json is present
and valid before calling build_text; if request.json is None or missing required
fields, return a 400 Bad Request instead of proceeding. Modify the handler
around request.json / build_text(data) to validate data (e.g., ensure data is
not None and has expected keys) and return an appropriate error response via
jsonify with a 400 status; only call build_text(data) and get_embedding(text)
when validation passes.

Comment thread ai-service/app.py Outdated
Comment thread ai-service/utils.py Outdated
Comment on lines +5 to +9
tags = data.get("tags", [])
tech_stack = data.get("techStack", [])
difficulty = data.get("difficulty", "")
tags_text = " ".join(tags)
tech_text = " ".join(tech_stack)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize tags/techStack before joining.

Line 8 and Line 9 break when values are strings or null: strings become character-spaced text, and None raises TypeError. This is likely with the current upstream payload shape from core-services/src/services/embedding.Service.js:10-15.

🐛 Proposed fix
 def build_text(data):
     # Safe extraction
     title = data.get("title", "")
     description = data.get("description", "")
     tags = data.get("tags", [])
     tech_stack = data.get("techStack", [])
     difficulty = data.get("difficulty", "")
-    tags_text = " ".join(tags)
-    tech_text = " ".join(tech_stack)
+    def normalize_terms(value):
+        if value is None:
+            return ""
+        if isinstance(value, (list, tuple, set)):
+            return " ".join(str(v).strip() for v in value if v is not None)
+        return str(value).replace(",", " ").strip()
+
+    tags_text = normalize_terms(tags)
+    tech_text = normalize_terms(tech_stack)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
tags = data.get("tags", [])
tech_stack = data.get("techStack", [])
difficulty = data.get("difficulty", "")
tags_text = " ".join(tags)
tech_text = " ".join(tech_stack)
tags = data.get("tags", [])
tech_stack = data.get("techStack", [])
difficulty = data.get("difficulty", "")
def normalize_terms(value):
if value is None:
return ""
if isinstance(value, (list, tuple, set)):
return " ".join(str(v).strip() for v in value if v is not None)
return str(value).replace(",", " ").strip()
tags_text = normalize_terms(tags)
tech_text = normalize_terms(tech_stack)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ai-service/utils.py` around lines 5 - 9, Normalize the incoming tags and
techStack values before joining: ensure tags and tech_stack are lists (convert
None to [], wrap a string into a single-item list rather than iterating its
characters, and leave existing lists untouched) before constructing tags_text
and tech_text. Update the variables tags and tech_stack in ai-service/utils.py
so they handle str, None, and list types safely (e.g., check isinstance(tags,
str) / isinstance(tech_stack, str) and convert accordingly) and then use "
".join on the normalized lists to produce tags_text and tech_text.

Comment thread core-services/src/controllers/recommendation.Controller.js
Comment thread core-services/src/services/embedding.Service.js Outdated
Comment thread core-services/src/utils/similarityScore.js
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ai-service/app.py (1)

20-22: Consider using the logging module instead of print().

For production services, using Python's logging module provides better control over log levels, formatting, and output destinations.

♻️ Suggested improvement
+import logging
 import os
 from dotenv import load_dotenv
...
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
 app = Flask(__name__)
...
     except Exception as e:
-        print("Error in /embed:", str(e))
+        logger.exception("Error in /embed")
         return jsonify({"error": "Internal server error"}), 500
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ai-service/app.py` around lines 20 - 22, Replace the ad-hoc print in the
/embed exception handler with structured logging: import and configure Python's
logging (or use Flask's app.logger) and call logger.exception("Error in /embed")
(or app.logger.exception) to record the error and stack trace instead of
print(str(e)); keep the existing jsonify return and status code. Reference the
exception handler block for /embed and the exception variable e when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ai-service/app.py`:
- Around line 24-27: The DEBUG assignment currently compares a string to a list
(DEBUG = os.getenv("FLASK_DEBUG", "false").lower() == ["true", "1", "yes"])
which is always False; change it to check membership instead by evaluating the
lowered env value against the list of truthy strings (use "in" with the list) so
DEBUG correctly reflects FLASK_DEBUG; update the DEBUG expression near PORT and
app.run references (symbols: PORT, DEBUG, app.run, os.getenv("FLASK_DEBUG")) to
use membership testing.
- Around line 16-19: The duplicate call to build_text(data) is unnecessary and
wastes CPU; remove the second invocation so the handler only calls
build_text(data) once to produce text, then passes that text to
get_embedding(text) and returns jsonify({"embedding": embedding}), 200; locate
the duplicate call in the same block that calls build_text and get_embedding and
delete the redundant build_text(data) invocation.

---

Nitpick comments:
In `@ai-service/app.py`:
- Around line 20-22: Replace the ad-hoc print in the /embed exception handler
with structured logging: import and configure Python's logging (or use Flask's
app.logger) and call logger.exception("Error in /embed") (or
app.logger.exception) to record the error and stack trace instead of
print(str(e)); keep the existing jsonify return and status code. Reference the
exception handler block for /embed and the exception variable e when making the
change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbdb4142-eeb7-47f5-95c3-5850846874a8

📥 Commits

Reviewing files that changed from the base of the PR and between 3fc0322 and 56de14b.

📒 Files selected for processing (6)
  • ai-service/app.py
  • ai-service/requirements.txt
  • ai-service/utils.py
  • core-services/src/controllers/recommendation.Controller.js
  • core-services/src/services/embedding.Service.js
  • core-services/src/utils/similarityScore.js
✅ Files skipped from review due to trivial changes (1)
  • ai-service/requirements.txt
🚧 Files skipped from review as they are similar to previous changes (4)
  • core-services/src/utils/similarityScore.js
  • core-services/src/services/embedding.Service.js
  • ai-service/utils.py
  • core-services/src/controllers/recommendation.Controller.js

Comment thread ai-service/app.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
ai-service/app.py (1)

19-21: Replace print() with proper logging.

Using print() for error output is not production-ready. The logging module provides better control over log levels, formatting, and output destinations, making it easier to monitor and debug the service in production environments.

📋 Proposed refactor to use logging module

Add at the top of the file after imports:

+import logging
+
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)

Then update the exception handler:

     except Exception as e:
-        print("Error in /embed:", str(e))
+        logger.exception("Error in /embed endpoint")
         return jsonify({"error": "Internal server error"}), 500
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ai-service/app.py` around lines 19 - 21, Replace the use of print() in the
/embed exception handler with the logging module: import and configure a logger
(or use Flask's app.logger) at top of ai-service/app.py, and inside the except
Exception as e block replace print("Error in /embed:", str(e)) with a
logger.error call that includes context ("Error in /embed") and the exception
(e) so the stacktrace and message are recorded at error level; ensure logging is
initialized before the route is registered.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ai-service/app.py`:
- Around line 23-26: Validate parsing of the PORT environment variable in the
__main__ block: replace the direct int(os.getenv("PORT", 3000)) usage with code
that attempts to parse os.getenv("PORT") inside a try/except (or use str.isdigit
checks), on ValueError log a clear error (use the app or module logger), fall
back to the default port 3000 (or exit if you prefer strict behavior), and then
call app.run(port=PORT, debug=DEBUG); update the section around the PORT
variable assignment and the app.run call to use the validated PORT.

---

Nitpick comments:
In `@ai-service/app.py`:
- Around line 19-21: Replace the use of print() in the /embed exception handler
with the logging module: import and configure a logger (or use Flask's
app.logger) at top of ai-service/app.py, and inside the except Exception as e
block replace print("Error in /embed:", str(e)) with a logger.error call that
includes context ("Error in /embed") and the exception (e) so the stacktrace and
message are recorded at error level; ensure logging is initialized before the
route is registered.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d5c2f541-861f-4870-96c6-765f7709a279

📥 Commits

Reviewing files that changed from the base of the PR and between 56de14b and d76494e.

📒 Files selected for processing (1)
  • ai-service/app.py

Comment thread ai-service/app.py
Comment on lines +23 to +26
if __name__ == "__main__":
PORT = int(os.getenv("PORT", 3000))
DEBUG = os.getenv("FLASK_DEBUG", "false").lower() in ["true", "1", "yes"]
app.run(port=PORT, debug=DEBUG) No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add error handling for PORT environment variable parsing.

If the PORT environment variable contains a non-numeric value, int() will raise a ValueError and crash the application at startup. While this is an edge case, adding validation would make the service more robust.

🛡️ Proposed fix to handle invalid PORT values
 if __name__ == "__main__":
-    PORT = int(os.getenv("PORT", 3000))
+    try:
+        PORT = int(os.getenv("PORT", "3000"))
+    except ValueError:
+        PORT = 3000
     DEBUG = os.getenv("FLASK_DEBUG", "false").lower() in ["true", "1", "yes"]
     app.run(port=PORT, debug=DEBUG)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ai-service/app.py` around lines 23 - 26, Validate parsing of the PORT
environment variable in the __main__ block: replace the direct
int(os.getenv("PORT", 3000)) usage with code that attempts to parse
os.getenv("PORT") inside a try/except (or use str.isdigit checks), on ValueError
log a clear error (use the app or module logger), fall back to the default port
3000 (or exit if you prefer strict behavior), and then call app.run(port=PORT,
debug=DEBUG); update the section around the PORT variable assignment and the
app.run call to use the validated PORT.

@Harish-Naruto Harish-Naruto self-requested a review April 12, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant