A monorepo powering an end-to-end B2B lead discovery, scoring, and notification system built for HPCL's Direct Sales division. The system scrapes public web signals (tenders, news, acquisitions), infers product-need fit against HPCL's portfolio, scores and ranks leads, and delivers actionable dossiers to sales officers via a web dashboard, mobile app, and WhatsApp alerts.
Built for the IIT Roorkee E-Summit Productathon 2026. See PROBLEM.md for the full problem statement.
pipeline.py (orchestrator)
|
+-------------+-------------+
| | |
scraper/ llm/ deal_scoring_model/
(discover) (recommend) (score)
| | |
+------+------+------+-----+
| |
MongoDB (PRISM) |
| |
fastapi_backend/ |
(REST API) |
/ \ |
monorepo/ hpclMobile/ |
(Next.js) (Expo) |
|
whatsapp/
(Twilio alerts)
Top-level orchestrator. A single python pipeline.py invocation runs the entire data pipeline in three sequential steps: scrape, recommend, score. Each step is also independently importable.
Web scraping engine that discovers B2B leads from public sources. Searches DuckDuckGo (text and news), RSS feeds, and the eProcure.gov.in government tender portal for signals related to petroleum products, industrial fuels, solvents, bitumen, marine bunker fuels, and other HPCL product categories. Extracted results are enriched and validated via an LLM (Ollama / Qwen) and stored in MongoDB.
Handles robots.txt compliance, domain blocklisting, URL deduplication, and provenance logging.
Stack: Python, BeautifulSoup4, feedparser, pymongo, Ollama.
LLM-powered product recommendation engine. Analyzes each scraped lead and recommends the top 3 most relevant HPCL Direct Sales products with confidence scores, reason codes, and uncertainty flags. Falls back to heuristic keyword matching when the LLM is unavailable. Supports CLI usage (JSON file or stdin) and batch processing of uncomputed MongoDB documents.
Contains the full HPCL product catalog covering fuels, lubricants, solvents, bitumen, gases, and specialty chemicals.
Stack: Python, pymongo, Ollama.
Lead scoring model that computes a composite deal score (0--100) for each lead using a scikit-learn RandomForestRegressor. Feature engineering covers intent strength, signal freshness decay, company size proxy, geographic proximity to HPCL depots/DSROs, and source trust score. Integrates product recommendations from the llm/ module to generate full lead dossiers. Supports online learning from sales officer feedback (accepted / rejected / converted).
Also runs batch precomputation of partial scores and triggers WhatsApp alerts for high-scoring leads.
Stack: Python, scikit-learn, numpy, pandas, pymongo.
REST API that glues the ML pipeline to the frontend applications. Serves endpoints for fetching top-scored leads by Indian state, heatmap data, lead dossiers, feedback submission, and WhatsApp notification triggering. Reads from the same MongoDB database populated by the scraper and scoring model.
Key routes: /api/computeTopTen, /api/heatmap, /api/dossier, /api/feedback, /api/notify.
Stack: Python, FastAPI, Uvicorn, pymongo, pydantic, geopandas.
Next.js progressive web app serving as the HPCL Lead Intelligence dashboard. Provides sign-up/sign-in authentication, a protected dashboard with tabbed navigation for viewing and acting on B2B leads, and proxies API calls to the FastAPI backend. Styled with Tailwind CSS and ShadCN UI components. Configured as a PWA with offline support.
Stack: Next.js 16, React 19, TypeScript, Tailwind CSS v4, ShadCN/Radix UI, MongoDB, JWT auth, Bun.
React Native mobile app built with Expo that mirrors the web dashboard on iOS and Android. Provides authentication flows, tabbed navigation, lead detail views, and one-tap actions (call, email, schedule). Consumes the same API endpoints as the Next.js frontend.
Stack: Expo 54, React Native 0.81, React 19, TypeScript, Expo Router, AsyncStorage.
Agent that builds and manages normalized company entity profiles in MongoDB. Searches for company details using DuckDuckGo, extracts structured information via LLM, and inserts deduplicated records conforming to a rich schema (identifiers, industry classification, headquarters, coordinates, officer assignments, depot associations).
Stack: Python, pymongo, Ollama.
WhatsApp notification module that sends structured lead alerts to assigned sales officers via the Twilio API. Uses Twilio Content Templates for compliant WhatsApp Business messaging. Formats lead details (company name, sector, deadline) into approved template variables.
Stack: Python, Twilio SDK, python-dotenv.
All modules share a single MongoDB database (PRISM). Collections include scraped leads, company profiles, scoring results, feedback records, and model weights.
# Run the full end-to-end pipeline (scrape -> recommend -> score)
python pipeline.pyIndividual modules can also be run independently -- refer to each folder's own README for details.
# FastAPI backend
cd fastapi_backend && uvicorn app.main:app --reload
# Next.js web app
cd monorepo && bun dev
# Expo mobile app
cd hpclMobile && bun expo startThe system expects the following environment variables (typically via .env files):
| Variable | Used By |
|---|---|
MONGODB_URI |
scraper, llm, deal_scoring_model, fastapi_backend, monorepo |
TWILIO_ACCOUNT_SID |
|
TWILIO_AUTH_TOKEN |
|
TWILIO_WHATSAPP_FROM |
|
JWT_SECRET |
monorepo, fastapi_backend |