Post-Training Logical Reasoning Evaluation Frameworks for Advanced LLM Applications

Authors

  • Sayantan Bhattacharyya Sayantan Bhattacharyya, EY Parthenon, USA Author
  • Vincent Kanka Vincent Kanka, Homesite, USA Author
  • Akhil Reddy Bairi Akhil Reddy Bairi, BetterCloud, USA Author

Keywords:

logical reasoning evaluation, large language models, scenario-based challenges

Abstract

The rapid advancement of large language models (LLMs) has revolutionized natural language processing (NLP), enabling unprecedented capabilities in text generation, comprehension, and application. However, despite their remarkable performance in diverse tasks, a significant gap remains in their ability to exhibit consistent and robust logical reasoning. This research paper proposes a comprehensive framework for the post-training evaluation and enhancement of logical reasoning capabilities in advanced LLMs. The framework employs scenario-based logic challenges and reasoning puzzles designed to identify and address logical inconsistencies in model outputs. By leveraging a structured feedback-driven refinement strategy, the framework iteratively evaluates and improves the logical coherence and reasoning accuracy of the models.

Central to this study is the development of modular evaluation protocols that utilize tools such as Hugging Face libraries to quantify reasoning performance across multiple dimensions, including deductive reasoning, inductive reasoning, and the ability to resolve ambiguous or conflicting scenarios. These protocols emphasize real-world applicability by introducing task-specific logical challenges inspired by domains such as legal reasoning, scientific inquiry, and ethical decision-making. Furthermore, this paper presents methodologies for diagnosing reasoning errors, such as flawed premise recognition, circular logic, and overgeneralization, which are prevalent in LLM outputs.

To enhance reasoning capabilities, the feedback-driven refinement process integrates adversarial retraining and reinforcement learning-based fine-tuning methods, supported by curated datasets specifically designed to target logical inadequacies. The iterative nature of this approach ensures continuous improvement while preserving the general linguistic proficiency of the models. The proposed framework is experimentally validated using state-of-the-art LLMs, including models with billions of parameters, demonstrating measurable improvements in reasoning metrics across diverse benchmarks.

This research contributes to the broader field of NLP by addressing a critical limitation of LLMs, thereby enhancing their applicability in high-stakes domains where logical consistency is paramount. The findings underscore the importance of post-training interventions and pave the way for future research in reasoning-specific model architectures, evaluation techniques, and ethical considerations associated with logical inference in artificial intelligence systems.

Downloads

Download data is not yet available.

References

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "BERT: Pre-training of deep bidirectional transformers for language understanding," in Proc. NAACL-HLT, 2019, pp. 4171–4186.

T. Wolf, V. Sanh, J. Chaumond, and C. Chiu, "Transformers: State-of-the-art natural language processing," in Proc. of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 38–45.

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," OpenAI, 2018.

S. Ruder, "An overview of multi-task learning in deep neural networks," arXiv preprint arXiv:1706.05098, 2017.

D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," in Proc. of ICLR, 2015.

A. B. Sharma, M. Gupta, S. Agarwal, and V. Arora, "Reasoning with large language models: A comprehensive evaluation," Journal of Artificial Intelligence Research, vol. 70, pp. 127–148, 2021.

B. Li, D. H. Lee, and P. S. Yu, "Leveraging inductive reasoning for enhanced decision-making in large language models," IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 8, pp. 3361–3373, 2021.

X. Zhang, D. Xie, H. Yang, and C. P. Zhang, "Adversarial training for logical consistency in large language models," Journal of Machine Learning Research, vol. 22, pp. 1–19, 2021.

A. Devlin, C. N. Cartwright, and M. Clark, "A survey on logical reasoning in AI and deep learning systems," IEEE Access, vol. 10, pp. 10045–10061, 2022.

S. J. H. Sohn, Y. Lee, and H. Kim, "Enhancing the interpretability of language models through logical reasoning analysis," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 5, pp. 1768–1779, 2022.

E. P. Stojanovic, G. B. Russo, and V. R. Mann, "Logical evaluation strategies in AI systems: Techniques and tools," IEEE Transactions on Artificial Intelligence, vol. 5, no. 4, pp. 245–259, 2023.

M. Clark, L. P. Aiello, and D. S. Velu, "Comparative analysis of reasoning techniques for natural language processing," Proceedings of the 2022 AAAI Conference on Artificial Intelligence, vol. 36, no. 5, pp. 4557–4564, 2022.

K. Bahdanau, D. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proc. of ICLR, 2015.

D. W. Griffiths and S. R. Bickmore, "The use of modular evaluation techniques for enhancing AI model reasoning," Artificial Intelligence Review, vol. 34, pp. 1200–1215, 2024.

S. H. Choi, S. Chandra, and S. J. Ali, "Reasoning consistency in large transformer models: From theoretical foundations to practical applications," IEEE Transactions on Computational Intelligence, vol. 14, no. 7, pp. 3125–3142, 2023.

J. D. Sutton, R. S. Evans, and C. M. Johnson, "Reinforcement learning and adversarial training to improve logical inference in NLP tasks," Journal of Machine Learning Research, vol. 23, no. 4, pp. 1149–1165, 2022.

D. L. Thomas and E. L. Wells, "Benchmarking and standardizing logic-based AI evaluation metrics," IEEE Transactions on Artificial Intelligence, vol. 8, pp. 92–106, 2023.

R. D. Schmitt, S. M. Peterson, and P. A. Lawson, "Dynamic evaluation methods for post-training logical refinement in AI," Proceedings of the 2023 International Conference on AI and Machine Learning, pp. 385–399, 2023.

Y. A. Lee, A. G. Mathews, and M. J. Rakesh, "Enhancing argumentation and deduction in deep learning models for high-stakes decision-making," IEEE Transactions on Neural Networks and Learning Systems, vol. 35, pp. 1803–1817, 2024.

L. G. Tan, J. M. Allen, and P. S. Howard, "Ethical considerations and logical consistency in decision-making AI systems," IEEE Transactions on AI and Ethics, vol. 9, no. 3, pp. 1500–1516, 2024.

Downloads

Published

13-08-2024

How to Cite

[1]
Sayantan Bhattacharyya, Vincent Kanka, and Akhil Reddy Bairi, “Post-Training Logical Reasoning Evaluation Frameworks for Advanced LLM Applications ”, J. of Artificial Int. Research and App., vol. 4, no. 2, pp. 243–282, Aug. 2024, Accessed: Jan. 15, 2025. [Online]. Available: https://aimlstudies.co.uk/index.php/jaira/article/view/356

Most read articles by the same author(s)

Similar Articles

1-10 of 271

You may also start an advanced similarity search for this article.