VALUEFLOW

Toward Pluralistic and Steerable Value-based Alignment in Large Language Models

ICML 2026

VALUEFLOW is a unified framework for value-based alignment that spans extraction, evaluation, and intensity-controlled steering of LLMs. It integrates three components:

HIVES — a hierarchical value embedding space capturing intra- and cross-theory value structure.
VIDB (Value Intensity DataBase) — a large-scale resource of value-labeled texts with intensity estimates derived from ranking-based aggregation.
Anchor-based evaluator — produces calibrated intensity scores by ranking model outputs against VIDB panels.

0) Environment Setup

The conda environment specification will be uploaded later.

# TODO: conda env create -f environment.yml
# conda activate valueflow

1) Embedding Model (HIVES)

HIVES is a hierarchical value embedding model trained on top of Qwen/Qwen3-Embedding-0.6B.

Download

# TODO: model link (Hugging Face)

Usage — encode texts and save per-theory embeddings:

python embedding/save_embedding.py \
  --model_name_or_path <hives-model-path-or-hf-name> \
  --input_csv data/values.csv \
  --output_dir ./embeddings \
  --batch_size 64

The input CSV needs at least text and theory columns; one NPZ of (text, embedding) pairs is written per theory.

2) VIDB Download (Value Intensity DataBase)

VIDB is the reference anchor set used for evaluation and steering. Two versions are released:

Full set — complete value-labeled corpus with intensity estimates.
Filtered set — a curated subset where LLM and human ratings agree.

# TODO: full set link
# TODO: filtered set link

Place the downloaded files under data/ (e.g. data/final_ratings/) before running evaluation.

3) Text Value Evaluation

Score open-ended responses by ranking them against VIDB anchors, producing calibrated per-value intensities in [-10, 10].

First download VIDB (Section 2), then run:

bash scripts/evaluate_responses.sh

Key settings inside the script:

Variable	Meaning
`VALUES`	Target values to evaluate (e.g. benevolence, power, …).
`INTENSITIES`	Steering intensity levels of the responses being scored (`-2, -1, 0, 1, 2`).
`EVAL_LLM`	Judge model used for ranking (default `google/gemma-3-27b-it`).
`THEORY`	Value theory: `pvq` \| `mft` \| `duty` \| `right`.
`K`	Texts per ranking window (1 response + `K-1` sampled VIDB anchors).
`M`	Number of ranking windows (comparisons) per response.
`ITER`	Epochs for the 1-D Plackett–Luce (PL) utility optimization.
`LR`	Learning rate for the PL utility ascent.
`SAMPLING_METHOD`	Anchor sampling strategy (`bucket` \| `random` \| `fixed`).
`PROMPT_FORMAT`	Ranking prompt format (`default` \| `oneshot`).

The script loops over every value × intensity, ranks each response against VIDB panels, runs PL optimization, and writes calibrated intensity scores to the output directory.

4) Profiling Prompts → OpinionQA Generation + Evaluation

Use demographic value profiles to steer generation on OpinionQA and evaluate behavior-prediction accuracy.

bash scripts/run_opinionqa_final.sh {qwen3|phi4|glm4}

Each preset runs profling/generate_response_opinionqa.py with a model-specific configuration:

Preset	Model	Theory	`num_prompts`	`threshold`
`qwen3`	`Qwen/Qwen3-32B`	duty	4	0.65
`phi4`	`microsoft/phi-4`	pvq	6	0.17
`glm4`	`zai-org/GLM-4-32B-0414`	duty	4	0.5

Main arguments:

--profile_dir — directory of per-group profile prompts (profling/prompts/, e.g. POLPARTY_Democrat.json).
--relevance_csv — value/duty relevance scores per question (used with --threshold).
--theory — value theory used to build the profile prompt (pvq | mft | duty | rights).
--num_prompts — number of profile prompts combined per query.
--threshold — relevance cutoff for selecting value dimensions.

Outputs are written under outputs/opinionqa_final/.

Data Creation

Value categorization. Each text is mapped onto a theory hierarchy via human–LLM collaboration. At each level, a panel of seven LLMs votes on the best category; a label is accepted if ≥5 agree or the leader is ahead by ≥2 votes, otherwise a Neutral option is re-prompted. The process descends the chosen child node until a neutral stop or leaf is reached, and the final label is the root-to-leaf path. Per-theory scripts live in categorization/ (categorize_values_pvq.py, categorize_values_mft.py, categorize_duties.py, categorize_rights.py).

Embedding Model Training

HIVES is trained in two stages:

Stage 1 — Intra-theory alignment (embedding/train_stage1.py): a hierarchical contrastive loss pulls together texts that share hierarchy prefix and direction label, preserving the taxonomy structure within each theory.
Stage 2 — Inter-theory & anchor alignment (embedding/train_stage2.py): an InfoNCE objective over cross-theory anchors and user-friendly value instances unifies the heterogeneous theories into a shared semantic space.

python embedding/train_stage1.py   # intra-theory
python embedding/train_stage2.py   # cross-theory

Related Resources

For a curated, continually updated collection of recent work on human values and pluralistic alignment in LLMs, check out Awesome-LLM-Values-and-Pluralistic-Alignment.

License

The VALUEFLOW code is released under the MIT License.

VIDB is built on top of several source corpora, each of which retains its original license. When using VIDB, please comply with the terms of the underlying dataset licenses:

Dataset	License
MFRC	Creative Commons Attribution 4.0 International (CC BY 4.0)
Social Chemistry	Creative Commons Attribution–ShareAlike 4.0 International (CC BY-SA 4.0)
ValueNet	Creative Commons Attribution–NonCommercial–ShareAlike (CC BY-NC-SA)
ValueEval	Creative Commons Attribution 4.0 International (CC BY 4.0)
ValuePrism	AI2 ImpACT License, Medium Risk Artifacts ("MR Agreement")

Note that ValueNet (CC BY-NC-SA) restricts use to non-commercial purposes, and ValuePrism is governed by the AI2 ImpACT Medium Risk Agreement.

Citation

@inproceedings{kim2026valueflow,
  title     = {VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models},
  author    = {Kim, Woojin and Hyeon, Sieun and Oh, Jusang and Do, Jaeyoung},
  booktitle = {Proceedings of the 43rd International Conference on Machine Learning (ICML)},
  year      = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
categorization		categorization
data		data
docs		docs
embedding		embedding
evaluation		evaluation
generation		generation
models		models
profling		profling
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VALUEFLOW

0) Environment Setup

1) Embedding Model (HIVES)

2) VIDB Download (Value Intensity DataBase)

3) Text Value Evaluation

4) Profiling Prompts → OpinionQA Generation + Evaluation

Data Creation

Embedding Model Training

Related Resources

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VALUEFLOW

0) Environment Setup

1) Embedding Model (HIVES)

2) VIDB Download (Value Intensity DataBase)

3) Text Value Evaluation

4) Profiling Prompts → OpinionQA Generation + Evaluation

Data Creation

Embedding Model Training

Related Resources

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages