(1)
Vincent Kanka; Debabrata Das; Akhil Reddy Bairi. Direct Preference Optimization (DPO) for Improving Logical Consistency and Decision-Making in LLM Reasoning. J. of Artificial Int. Research and App. 2024, 4 (1), 733-769.