A research paper titled "Embeddings for Preferences, Not Semantics" was submitted to arXiv on 8 May 2026, addressing a fundamental challenge in using AI for collective decision-making 1. Authored by Carter Blair, Ariel D. Procaccia, and Milind Tambe, the 28-page paper argues that standard text embeddings fail to capture participant preferences in deliberation systems 1.

The paper identifies a core problem: modern AI enables collective decision-making where participants express views as free-form text rather than voting on fixed candidates 1. While embedding these opinions in vector space appears natural for applying facility location and fair clustering algorithms, standard text embeddings measure semantic similarity rather than what the authors call preferential similarity 1.

Preferential similarity, according to the paper, requires that a participant's agreement with text be inversely related to their distance from it 1. Off-the-shelf embeddings inherit only a coarse preference signal through correlation between semantic and preferential similarity, but fail when this correlation breaks 1.

The researchers formalise this as an invariance problem 1. Text embedding models encode both preference-relevant signals—such as stance and values—and semantic nuisance, including style and wording 1. Because these dimensions are observationally correlated, a geometry relying on nuisance can appear preference-correct even when it is not 1.

The paper demonstrates that synthetic training data designed to break this correlation provably shifts the optimal scorer away from nuisance-dominated cosine similarity 1. The approach significantly improves preference prediction across 11 online deliberation datasets 1.

The research paper is classified under Artificial Intelligence (cs.AI) and carries the identifier arXiv:2605.08360 1. It has been assigned DOI 10.48550/arXiv.2605.08360 via DataCite, pending registration 1. Version 1 was submitted on Friday, 8 May 2026 at 18:15:14 UTC 1.

The work is supported by the Simons Foundation, member institutions of arXiv, and all contributors 1.

How this was made. This article was assembled by Startupniti's editorial AI from the source listed in the right rail. The synthesis ran through our 4-model cascade (Gemini Flash Lite → GPT-4o-mini → DeepSeek → Llama 3.3 70B), logged to ops.llm_calls. Every fact traces to a citation. If a fact looks wrong, write to corrections.