The primary inspiration for the Smart Screen Explainer came from the desire to create a frictionless, on-demand learning tool. I was particularly inspired by advanced mobile features like "Circle to Search" and the common, frustrating experience of reading technical content.
When analyzing a research paper, a complex diagram, or an unfamiliar piece of code, I constantly found myself stopping to search for context. The goal was to eliminate this friction. Instead of cropping an image, opening a new tab, and typing a query, I wanted a solution where a user could simply take a screenshot and instantly ask the extension to explain it, all within seconds and customized to their learning style.
The Smart Screen Explainer is a Chrome extension that turns any visual content on the user's screen into an immediately understandable explanation.
-
Universal Screenshot Capture: Captures the current visible screen or entire display using the secure
navigator.mediaDevices.getDisplayMediaAPI. -
Multimodal Analysis: Sends the captured image data directly to the Gemini API (gemini-2.5-flash), which is adept at multimodal understanding.
-
Personalized Explanation Tones: The user can select from a curated list of personalities (e.g., "Explain like I'm 5," "Teacher," "Sarcastic," "College Student") to instantly change the tone, complexity, and focus of the AI's explanation.
The application effectively serves as an instant "on-screen tutor" for diagrams, complex tables, interface elements, or any visual information a user needs clarified.
-
The project was built as a Chrome Extension, using a modern web stack tailored for browser environments:
-
Front-end & Logic: The popup interface and local scripting were built using TypeScript (TS), ensuring strong typing and maintainability.
-
Styling: Custom CSS was used for the responsive and polished user interface.
-
Core AI Integration: The entire backend logic runs within the Chrome Service Worker, which communicates with the Gemini API via fetch. The image is converted into a Base64 string and included in the API payload for multimodal analysis. This architecture ensures the API calls don't block the UI thread and that the sensitive API key is managed securely in the extension's local storage.
-
We faced several specific challenges inherent to building a secure and functional browser extension:
-
TypeScript Compilation Pipeline: Setting up the development environment, specifically configuring
tsconfig.jsonand a build tool likeesbuild, to correctly compile the TypeScript files (.ts) into runnable JavaScript (.js) required for the Chrome extension environment was a significant initial hurdle. -
API Key Management: Securely storing the user-provided API key was critical. We had to implement logic to save the key to
chrome.storage.localand retrieve it only in the background service worker, ensuring it was never exposed or hardcoded. -
Chrome Extension Architecture: Understanding the separation of concerns between the Popup Script (UI interaction) and the Background Service Worker (network calls and API integration) was crucial. This is vastly different from traditional single-page applications (SPAs) and required careful message passing (
chrome.runtime.sendMessage) to coordinate the screenshot data transfer and the Gemini API response. -
Dependency Handling: Resolving issues around using NPM dependencies versus external CDN dependencies within the Service Worker context, given Chrome Extension security policies, required careful selection of libraries and direct API integration.
We are most proud of the successful end-to-end integration of a complex multimodal workflow. Successfully obtaining the complex, contextualized text output from the Gemini API—an output that correctly interprets a visual element and tailors its explanation based on a user-defined tone—was a major accomplishment. This required seamless choreography between the screen capture API, the Service Worker, the external API call, and the final rendering in the popup.
This project provided invaluable hands-on experience in:
-
Chrome Extension Development: Gaining a deep understanding of the
manifest.jsonfile, its security permissions, and the life cycles of different extension scripts (e.g., popup vs. service worker). We learned how extensions are fundamentally different from standard webpages in terms of security and execution model. -
TypeScript for Reliability: The use of TypeScript proved essential for managing the asynchronous nature of the Chrome APIs and the API communication, making the codebase much more robust.
-
Multimodal Gemini API Implementation: Mastering the correct payload structure for sending image data (
inlineDatawith Base64 encoding) alongside a natural language prompt to achieve tailored and contextual image analysis.
The immediate next steps for the Smart Screen Explainer are focused on increasing utility and user control:
-
Custom Search Points (Bounding Boxes): Implementing a feature to allow the user to draw a bounding box on the captured screenshot, enabling them to specify a precise region of interest for the AI to analyze, rather than the entire screen.
-
Search Grounding Integration: Integrating the Gemini API's Google Search grounding tool to ensure explanations of current events, product names, or recent research are based on the latest available web information.
-
Expanded Features: Adding capabilities like text extraction (OCR) from the image and the ability to ask follow-up questions within the popup interface.
git clone https://github.com/F7Z7/Smart-Screen-Explainer.gitcd Smart-Screen-Explainer
After cloning the repository, install all the required packages using npm:
npm install
npm run watch
-
Open Google Chrome.
-
Go to chrome://extensions/.
-
Enable Developer mode (toggle in the top-right corner).
-
Click "Load unpacked".
-
Select the project folder — the one containing the manifest.json file.
-
The Screen Explainer extension will now appear in your extensions bar.
-
Click on the Screen Explainer icon in the toolbar.
-
Enter your Gemini API key in the input box.
-
Click Save Key — it will be securely stored in Chrome’s local storage.
- If you don’t have one, create a free Gemini API key from Google AI Studio
-
Open any webpage.
-
Click the Screen Explainer icon.
-
The extension will capture the visible area and select a personality for explanation and wait for some time.
- The personalities include
- Explain like I'm 5
- Teacher
- Sarcastic
- Ai Friend
- College Student
- The personalities include
-
Gemini will return the explanation with the personality choosen by user
-
You’ll see explanations or summaries appear inside the popup interface.
-
All source files are inside the src/ folder.
-
Built JS files are generated in dist .
-
manifest.json handles permissions and entry points.
-
The extension uses:
-
TypeScript for source code
-
esbuild for bundling
-
Gemini API for AI-powered analysis
-
| Interface | Explanation |
|---|---|
![]() |
![]() |

