GitHub - F7Z7/Smart-Screen-Explainer: AI Study Buddy: Instantly explains math and diagrams on your screen in simple terms

Inspiration

The primary inspiration for the Smart Screen Explainer came from the desire to create a frictionless, on-demand learning tool. I was particularly inspired by advanced mobile features like "Circle to Search" and the common, frustrating experience of reading technical content.

When analyzing a research paper, a complex diagram, or an unfamiliar piece of code, I constantly found myself stopping to search for context. The goal was to eliminate this friction. Instead of cropping an image, opening a new tab, and typing a query, I wanted a solution where a user could simply take a screenshot and instantly ask the extension to explain it, all within seconds and customized to their learning style.

What it does

The Smart Screen Explainer is a Chrome extension that turns any visual content on the user's screen into an immediately understandable explanation.

Key Functionality:

Universal Screenshot Capture: Captures the current visible screen or entire display using the secure navigator.mediaDevices.getDisplayMedia API.
Multimodal Analysis: Sends the captured image data directly to the Gemini API (gemini-2.5-flash), which is adept at multimodal understanding.
Personalized Explanation Tones: The user can select from a curated list of personalities (e.g., "Explain like I'm 5," "Teacher," "Sarcastic," "College Student") to instantly change the tone, complexity, and focus of the AI's explanation.

The application effectively serves as an instant "on-screen tutor" for diagrams, complex tables, interface elements, or any visual information a user needs clarified.

How I built it

The project was built as a Chrome Extension, using a modern web stack tailored for browser environments:
- Front-end & Logic: The popup interface and local scripting were built using TypeScript (TS), ensuring strong typing and maintainability.
- Styling: Custom CSS was used for the responsive and polished user interface.
- Core AI Integration: The entire backend logic runs within the Chrome Service Worker, which communicates with the Gemini API via fetch. The image is converted into a Base64 string and included in the API payload for multimodal analysis. This architecture ensures the API calls don't block the UI thread and that the sensitive API key is managed securely in the extension's local storage.

Challenges I ran into

We faced several specific challenges inherent to building a secure and functional browser extension:

TypeScript Compilation Pipeline: Setting up the development environment, specifically configuring tsconfig.json and a build tool like esbuild, to correctly compile the TypeScript files (.ts) into runnable JavaScript (.js) required for the Chrome extension environment was a significant initial hurdle.
API Key Management: Securely storing the user-provided API key was critical. We had to implement logic to save the key to chrome.storage.local and retrieve it only in the background service worker, ensuring it was never exposed or hardcoded.
Chrome Extension Architecture: Understanding the separation of concerns between the Popup Script (UI interaction) and the Background Service Worker (network calls and API integration) was crucial. This is vastly different from traditional single-page applications (SPAs) and required careful message passing (chrome.runtime.sendMessage) to coordinate the screenshot data transfer and the Gemini API response.
Dependency Handling: Resolving issues around using NPM dependencies versus external CDN dependencies within the Service Worker context, given Chrome Extension security policies, required careful selection of libraries and direct API integration.

Accomplishments that I'm proud of

We are most proud of the successful end-to-end integration of a complex multimodal workflow. Successfully obtaining the complex, contextualized text output from the Gemini API—an output that correctly interprets a visual element and tailors its explanation based on a user-defined tone—was a major accomplishment. This required seamless choreography between the screen capture API, the Service Worker, the external API call, and the final rendering in the popup.

What I learned

This project provided invaluable hands-on experience in:

Chrome Extension Development: Gaining a deep understanding of the manifest.json file, its security permissions, and the life cycles of different extension scripts (e.g., popup vs. service worker). We learned how extensions are fundamentally different from standard webpages in terms of security and execution model.
TypeScript for Reliability: The use of TypeScript proved essential for managing the asynchronous nature of the Chrome APIs and the API communication, making the codebase much more robust.
Multimodal Gemini API Implementation: Mastering the correct payload structure for sending image data (inlineData with Base64 encoding) alongside a natural language prompt to achieve tailored and contextual image analysis.

What's next for Smart Screen

The immediate next steps for the Smart Screen Explainer are focused on increasing utility and user control:

Custom Search Points (Bounding Boxes): Implementing a feature to allow the user to draw a bounding box on the captured screenshot, enabling them to specify a precise region of interest for the AI to analyze, rather than the entire screen.
Search Grounding Integration: Integrating the Gemini API's Google Search grounding tool to ensure explanations of current events, product names, or recent research are based on the latest available web information.
Expanded Features: Adding capabilities like text extraction (OCR) from the image and the ability to ask follow-up questions within the popup interface.

How to use

Clone the repo

git clone https://github.com/F7Z7/Smart-Screen-Explainer.git

Go to repo

cd Smart-Screen-Explainer

Install Dependencies

After cloning the repository, install all the required packages using npm:

npm install

Build the Extension

npm run watch

Load the Extension in Chrome

Open Google Chrome.
Go to chrome://extensions/.
Enable Developer mode (toggle in the top-right corner).
Click "Load unpacked".
Select the project folder — the one containing the manifest.json file.
The Screen Explainer extension will now appear in your extensions bar.

Add Your Gemini API Key

Click on the Screen Explainer icon in the toolbar.
Enter your Gemini API key in the input box.
Click Save Key — it will be securely stored in Chrome’s local storage.
- If you don’t have one, create a free Gemini API key from Google AI Studio

Use the Extension

Open any webpage.
Click the Screen Explainer icon.
The extension will capture the visible area and select a personality for explanation and wait for some time.
- The personalities include
  - Explain like I'm 5
  - Teacher
  - Sarcastic
  - Ai Friend
  - College Student
Gemini will return the explanation with the personality choosen by user
You’ll see explanations or summaries appear inside the popup interface.

Development Notes

All source files are inside the src/ folder.
Built JS files are generated in dist .
manifest.json handles permissions and entry points.
The extension uses:
- TypeScript for source code
- esbuild for bundling
- Gemini API for AI-powered analysis

Ouput

Sample Output Video

Sample Screenshots

Interface	Explanation

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.idea		.idea
assets		assets
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
manifest.json		manifest.json
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inspiration

What it does

Key Functionality:

How I built it

Challenges I ran into

Accomplishments that I'm proud of

What I learned

What's next for Smart Screen

How to use

Clone the repo

Go to repo

Install Dependencies

Build the Extension

Load the Extension in Chrome

Add Your Gemini API Key

Use the Extension

Development Notes

Ouput

Sample Screenshots

About

Uh oh!

Releases

Packages

Languages

License

F7Z7/Smart-Screen-Explainer

Folders and files

Latest commit

History

Repository files navigation

Inspiration

What it does

Key Functionality:

How I built it

Challenges I ran into

Accomplishments that I'm proud of

What I learned

What's next for Smart Screen

How to use

Clone the repo

Go to repo

Install Dependencies

Build the Extension

Load the Extension in Chrome

Add Your Gemini API Key

Use the Extension

Development Notes

Ouput

Sample Screenshots

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages