VINCENT KANKA; DEBABRATA DAS; AKHIL REDDY BAIRI. Direct Preference Optimization (DPO) for Improving Logical Consistency and Decision-Making in LLM Reasoning. Journal of Artificial Intelligence Research and Applications, London, U.K., v. 4, n. 1, p. 733–769, 2024. Disponível em: https://aimlstudies.co.uk/index.php/jaira/article/view/353.. Acesso em: 15 jan. 2025.