This is the official repository for Audio-Visual Contrastive Decoding (AVCD), a simple, training-free method for mitigating hallucinations in AV-LLMs during decoding without relying on external tools.
- ✅ AVCD code released !
- ✅ Accepted at NeurIPS 2025
- Reformulates conventional CD (Contrastive Decoding) from single-instance (e.g., video–text) to three-modality interactions
- Dynamically detects the dominant modality and masks less dominant modalities before applying CD
- Introduces entropy-guided adaptive gating to skip unnecessary forward passes and improve inference speed
Follow the VideoLLaMA2 repository setup guide (audio-visual branch):
We use the AVHBench and MUSIC-AVQA datasets for AVCD. The repositories are available below:
| Dataset | Link |
|---|---|
| AVHBench | GitHub |
| MUSIC-AVQA | GitHub |
The data and scripts for running inference and evaluation are organized as follows:
| Purpose | Path |
|---|---|
| Data QA files | json/ |
| Inference scripts | videollama2/inference/ |
| Evaluation scripts | videollama2/eval/ |
git clone https://github.com/kaistmm/AVCD.git
cd AVCDThis stage saves the generated answers from inference. An example command for running inference with the original model is shown below:
python videollama2/inference/inference_AVH_val.pyTo enable AVCD, add the --use-AVCD argument:
python videollama2/inference/inference_AVH_val.py --use-AVCD TrueThe inference step generates a JSON file that includes the question, the answer, and the prediction.
During evaluation, these JSON files can be used to directly measure accuracy or compute scores using GPT-based evaluation.
Accuracy (AVH)
- AVH: Audio-driven Video Hallucination, Video-driven Audio Hallucination, AV Matching
python videollama2/eval/eval_acc.py --pred-path <path_to_preds>.jsonCaptioning Score (AVH_cap)
- AVH_cap: AV Captioning
python videollama2/eval/eval_caption.py --pred-path <path_to_preds>.json --output-dir <dir>Open-ended QA (MUSIC-AVQA)
python videollama2/eval/eval_gpt.py --pred-path <path_to_preds>.json --output-dir <dir>@inproceedings{jung2025avcd,
author = {Jung, Chaeyoung and Jang, Youngjoon and Chung, Joon Son},
title = {AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding},
booktitle = {NeurIPS},
year = {2025}
}
