Return to Article Details
Direct Preference Optimization (DPO) for Improving Logical Consistency and Decision-Making in LLM Reasoning
Download
Download PDF