AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

This is the official repository for Audio-Visual Contrastive Decoding (AVCD), a simple, training-free method for mitigating hallucinations in AV-LLMs during decoding without relying on external tools.

🚀 Updates

✅ AVCD code released !
✅ Accepted at NeurIPS 2025

📖 Overview

Reformulates conventional CD (Contrastive Decoding) from single-instance (e.g., video–text) to three-modality interactions
Dynamically detects the dominant modality and masks less dominant modalities before applying CD
Introduces entropy-guided adaptive gating to skip unnecessary forward passes and improve inference speed

⚙️ Setup

1. Environment

Follow the VideoLLaMA2 repository setup guide (audio-visual branch):

2. Datasets

We use the AVHBench and MUSIC-AVQA datasets for AVCD. The repositories are available below:

Dataset	Link
AVHBench	GitHub
MUSIC-AVQA	GitHub

3. Repository layout

The data and scripts for running inference and evaluation are organized as follows:

Purpose	Path
Data QA files	`json/`
Inference scripts	`videollama2/inference/`
Evaluation scripts	`videollama2/eval/`

4. Usage

git clone https://github.com/kaistmm/AVCD.git
cd AVCD

5. Inference

This stage saves the generated answers from inference. An example command for running inference with the original model is shown below:

python videollama2/inference/inference_AVH_val.py

To enable AVCD, add the --use-AVCD argument:

python videollama2/inference/inference_AVH_val.py --use-AVCD True

6. Evaluation

The inference step generates a JSON file that includes the question, the answer, and the prediction.
During evaluation, these JSON files can be used to directly measure accuracy or compute scores using GPT-based evaluation.

Accuracy (AVH)

AVH: Audio-driven Video Hallucination, Video-driven Audio Hallucination, AV Matching

python videollama2/eval/eval_acc.py --pred-path <path_to_preds>.json

Captioning Score (AVH_cap)

AVH_cap: AV Captioning

python videollama2/eval/eval_caption.py --pred-path <path_to_preds>.json --output-dir <dir>

Open-ended QA (MUSIC-AVQA)

python videollama2/eval/eval_gpt.py --pred-path <path_to_preds>.json --output-dir <dir>

📝 Citation

@inproceedings{jung2025avcd,
  author    = {Jung, Chaeyoung and Jang, Youngjoon and Chung, Joon Son},
  title     = {AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding},
  booktitle = {NeurIPS},
  year      = {2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
json		json
videollama2		videollama2
.gitignore		.gitignore
AVCD.png		AVCD.png
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

🚀 Updates

📖 Overview

⚙️ Setup

1. Environment

2. Datasets

3. Repository layout

4. Usage

5. Inference

6. Evaluation

📝 Citation

About

Uh oh!

Releases

Packages

Languages

License

kaistmm/AVCD

Folders and files

Latest commit

History

Repository files navigation

AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding

🚀 Updates

📖 Overview

⚙️ Setup

1. Environment

2. Datasets

3. Repository layout

4. Usage

5. Inference

6. Evaluation

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages