A quick start guide and additional materials are available on our project page. To learn more, refer to our arXiv preprint.
To set up the environment, navigate to the root directory containing environment.yml and run:
conda env create --name interpretation_env --file environment.yml
conda activate interpretation_envGiven a feature extractor E and an image i, we can obtain its features as f = E(i). The reconstruction model is trained on pairs (i, f). To generate such dataset pairs:
Generate Validation Split
# Prepare Data
mkdir coco_subsets
cd coco_subsets
wget http://images.cocodataset.org/zips/val2017.zip
wget http://images.cocodataset.org/zips/train2017.zip
unzip train2017.zip # Extract archieve
unzip val2017.zip # Extract archieve
cd - # Get back to project rootGenerate Validation Split Features
Generated dataset size ~ 300Mb
# Run Validation Dataset Generation
VISION_MODEL="google/siglip2-base-patch16-512"
python dataset_generation/generation.py \
--vision_model_name "$VISION_MODEL" \
--coco_images_path "./coco_subsets/val2017" \
--split val \
--max_count 1000Generate Train Split Features
Generated dataset size ~ 30GB
You may limit count of processed images with --max_count 1000 parameter.
VISION_MODEL="google/siglip2-base-patch16-512"
python dataset_generation/generation.py \
--vision_model_name "$VISION_MODEL" \
--coco_images_path "./coco_subsets/train2017" \
--split trainThis script will:
- Create
feature_extractor_weightsdirectory for storing pretrained weights - Generate datasets in
generated_datasetsdirectory - Use images from
coco_subsets/val2017by default (configurable via script flags)
Run reconstructor training
Script running takes could take from 6 to 24 hours depending from model supported image resolution.
python training/train.py --vision_model_name $VISION_MODELThis will:
- Train a reconstructor for
google/siglip2-base-patch16-512by default - Use the generated dataset from previous step
- Create
training/samplesfor training logs - Save weights in
training/checkpoint
google/siglip-base-patch16-{224,256,384,512}google/siglip2-base-patch16-{224,256,384,512}
Modify the script arguments to use different extractors.
To compute CLIP similarity metrics:
- Generate dataset for your target feature extractor
- Train reconstructor or use precomputed weights
- Place weights in
metrics_calculation/precalculated_weights/following the pattern:models--google--siglip-base-patch16-512.ptmodels--google--siglip2-base-patch16-512.pt
- Run:
bash metrics_calculation/siglip_vs_siglip2/calculate_similarity.sh
For SigLIP vs SigLIP2 comparison:
- Compute metrics for all 8 models
- Run the analysis notebook:
metrics_calculation/siglip_vs_siglip2/understanding_graphs_for_article.ipynb
Example output:
To study orthogonal transformations in feature space:
- Generate dataset for
google/siglip2-base-patch16-512 - Train reconstructor or use precomputed weights
- Place weights at:
metrics_calculation/precalculated_weights/models--google--siglip2-base-patch16-512.pt
- Run the analysis notebook:
metrics_calculation/rb_swap/understanding_rgb-to-bgr_rotation.ipynb
Example output:
To study linear transformations in feature space:
- Generate dataset for
google/siglip2-base-patch16-512 - Train reconstructor or use precomputed weights
- Place weights at:
metrics_calculation/precalculated_weights/models--google--siglip2-base-patch16-512.pt
- Run the analysis notebook:
metrics_calculation/b_channel_suppression/understanding_b_suppression.ipynb
Example output:
To study linear transformations in feature space:
- Generate dataset for
google/siglip2-base-patch16-512 - Train reconstructor or use precomputed weights
- Place weights at:
metrics_calculation/precalculated_weights/models--google--siglip2-base-patch16-512.pt
- Run the analysis notebook:
metrics_calculation/colorization/understanding_colorization.ipynb
Example output:
If you find this work useful, please cite it as follows:
@misc{allakhverdov2025imagereconstructiontoolfeature,
title={Image Reconstruction as a Tool for Feature Analysis},
author={Eduard Allakhverdov and Dmitrii Tarasov and Elizaveta Goncharova and Andrey Kuznetsov},
year={2025},
eprint={2506.07803},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.07803},
}


