GitHub - as567-code/Video-Understanding-Chat-System-MVP-: Multimodal video analysis — Qwen2-VL + BLIP frame captioning, Flan-T5 conversational Q&A over timestamped captions. Streamlit UI.

Video Understanding & Chat System (MVP)

A simple Streamlit app that:

Uploads a video
Extracts frames every N seconds (default 2s)
Sends each frame to a FREE vision model (BLIP-2) for descriptions
Stores timestamped frame captions
Lets users chat and ask questions answered by a small FREE LLM using the collected descriptions

Tech Stack

Python 3.10+
Streamlit (web UI)
OpenCV (video processing)
Hugging Face Transformers (BLIP-2 + small LLM)
PyTorch (CPU by default)

Project Structure

video-chat-system/
├── app.py                # Streamlit UI
├── video_processor.py    # Frame extraction (OpenCV)
├── vision_analyzer.py    # BLIP-2 image captioning
├── chat_handler.py       # Q&A over frame descriptions
├── requirements.txt
├── .env                  # Optional environment variables
└── README.md

Setup

Create and activate a virtual environment

python3 -m venv .venv
source .venv/bin/activate  # macOS/Linux
# or on Windows: .venv\\Scripts\\activate

Install dependencies (CPU by default)

pip install --upgrade pip
pip install -r requirements.txt

(Optional) Configure environment Create .env if needed. Example options:

# Cache directories (optional)
HF_HOME=.cache/huggingface
TRANSFORMERS_CACHE=.cache/transformers

# Use an alternative HF endpoint if needed
# HF_ENDPOINT=https://huggingface.co

Running the App

streamlit run app.py

Then open the provided local URL in your browser.

Notes on Models (Free/Open-Source)

Vision: BLIP-2 (Salesforce) via transformers pipeline or model classes
LLM for chat: start with google/flan-t5-base (free, CPU-friendly). You can swap to other small open-source models later.

Roadmap

Project scaffold
Requirements
Implement frame extraction in video_processor.py
Add BLIP-2 captioning in vision_analyzer.py
Implement chat handler in chat_handler.py
Build Streamlit UI in app.py

License

This project uses only FREE and open-source components. Verify model licenses before distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Understanding & Chat System (MVP)

Tech Stack

Project Structure

Setup

Running the App

Notes on Models (Free/Open-Source)

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.env		.env
.env.example		.env.example
.gitignore		.gitignore
QUICK_START.md		QUICK_START.md
README.md		README.md
README_PROFESSOR.md		README_PROFESSOR.md
app.py		app.py
chat_handler.py		chat_handler.py
requirements.txt		requirements.txt
video_processor.py		video_processor.py
vision_analyzer.py		vision_analyzer.py

Folders and files

Latest commit

History

Repository files navigation

Video Understanding & Chat System (MVP)

Tech Stack

Project Structure

Setup

Running the App

Notes on Models (Free/Open-Source)

Roadmap

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages