Cross-modal Learning for Image Understanding: Investigating cross-modal learning techniques for understanding images through multiple modalities such as text or audio descriptions

Dr. Paulo Leitão

Authors

Dr. Paulo Leitão Professor of Informatics, University of Minho, Portugal Author

Keywords:

Image understanding, Assistive technologies

Abstract

Cross-modal learning, which aims to leverage information from multiple modalities, has emerged as a promising approach for enhancing image understanding. By integrating textual or auditory information with visual data, cross-modal learning enables machines to better comprehend the content and context of images. This paper provides a comprehensive review of cross-modal learning techniques for image understanding, focusing on the fusion of textual and visual information. We discuss the challenges and opportunities in cross-modal learning, explore various methodologies, and highlight their applications in real-world scenarios. Additionally, we present a critical analysis of existing evaluation metrics and datasets, emphasizing the need for standardized benchmarks to facilitate comparative studies. Our findings suggest that cross-modal learning holds great potential for advancing image understanding, with implications for diverse fields such as multimedia retrieval, image captioning, and assistive technologies.

Downloads

References

K. Joel Prabhod, “ASSESSING THE ROLE OF MACHINE LEARNING AND COMPUTER VISION IN IMAGE PROCESSING,” International Journal of Innovative Research in Technology, vol. 8, no. 3, pp. 195–199, Aug. 2021, [Online]. Available: https://ijirt.org/Article?manuscript=152346

Sadhu, Amith Kumar Reddy, and Ashok Kumar Reddy Sadhu. "Fortifying the Frontier: A Critical Examination of Best Practices, Emerging Trends, and Access Management Paradigms in Securing the Expanding Internet of Things (IoT) Network." Journal of Science & Technology 1.1 (2020): 171-195.

Tatineni, Sumanth, and Anjali Rodwal. “Leveraging AI for Seamless Integration of DevOps and MLOps: Techniques for Automated Testing, Continuous Delivery, and Model Governance”. Journal of Machine Learning in Pharmaceutical Research, vol. 2, no. 2, Sept. 2022, pp. 9-41, https://pharmapub.org/index.php/jmlpr/article/view/17.

Pulimamidi, Rahul. "Leveraging IoT Devices for Improved Healthcare Accessibility in Remote Areas: An Exploration of Emerging Trends." Internet of Things and Edge Computing Journal 2.1 (2022): 20-30.

Makka, A. K. A. “Optimizing SAP Basis Administration for Advanced Computer Architectures and High-Performance Data Centers”. Journal of Science & Technology, vol. 1, no. 1, Oct. 2020, pp. 242-279, https://thesciencebrigade.com/jst/article/view/282.

Gudala, Leeladhar, et al. "Leveraging Biometric Authentication and Blockchain Technology for Enhanced Security in Identity and Access Management Systems." Journal of Artificial Intelligence Research 2.2 (2022): 21-50.

Sadhu, Ashok Kumar Reddy, and Amith Kumar Reddy. "Exploiting the Power of Machine Learning for Proactive Anomaly Detection and Threat Mitigation in the Burgeoning Landscape of Internet of Things (IoT) Networks." Distributed Learning and Broad Applications in Scientific Research 4 (2018): 30-58.

Tatineni, Sumanth, and Venkat Raviteja Boppana. "AI-Powered DevOps and MLOps Frameworks: Enhancing Collaboration, Automation, and Scalability in Machine Learning Pipelines." Journal of Artificial Intelligence Research and Applications 1.2 (2021): 58-88.

Cross-modal Learning for Image Understanding: Investigating cross-modal learning techniques for understanding images through multiple modalities such as text or audio descriptions

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

Similar Articles