Return to Article Details Direct Preference Optimization (DPO) for Improving Logical Consistency and Decision-Making in LLM Reasoning Download Download PDF