PromptCD: Test-Time Behavior Enhancement via Polarity-Prompt Contrastive Decoding

Publication
arXiv preprint arXiv:2602.20696

Abstract:

We present PromptCD, a test-time method for controlling LLM behavior without additional training. The approach creates paired positive and negative guiding prompts for a target behavior and contrasts model responses at the token-probability level for LLMs and through visual attention patterns for VLMs. The method targets the 3H alignment objectives – helpfulness, honesty, and harmlessness – and demonstrates that post-trained models can achieve meaningful self-enhancement purely at test time. For vision-language models, PromptCD improves VQA performance by reinforcing behavior-consistent visual grounding. PromptCD is a simple, general, and cost-efficient strategy for reliable behavior control across modalities.

Next
Previous

Related