Multimodal Recognition of Users States at Human-AI Interaction Adaptation

Authors

  • Izabella Krzeminska Orange Innovation Poland, Warszawa, Al. Jerozolimskie 160

DOI:

https://doi.org/10.47577/technium.v26i.12398

Keywords:

Multimodal Emotion Recognition (MMER); Human-AI Interaction; Deep Learning; Adaptive AI Systems; AI Privacy & Ethics; Affective Computing; Decision Trees; Data Fusion

Abstract

The study investigates advancements in multimodal emotion recognition (MMER) within the evolving landscape of human-AI interaction. By synthesizing insights from psychology, computer science, and cognitive neuroscience, this paper examines the integration of multiple modalities—such as facial expressions, vocal tone, textual sentiment, and physiological signals—to achieve adaptive and emotionally intelligent AI systems. Leveraging a systematic literature review, it identifies state-of-the-art methodologies, including deep learning-based data fusion techniques, and their applications across sectors such as healthcare, education, and transportation. The review provides practical tools to support the implementation of MMER systems, such as a decision tree for selecting the most appropriate theoretical approach based on specific application needs and another for choosing the optimal fusion method. The proposed simplifications can help system designers and researchers solve key challenges, such as
integrating different data types and real-time processing. The article will also address ethical issues such as privacy and anti-bias

References

Akbar, M Taufik et al., (2019). “Enhancing game experience with facial expression recognition as dynamic balancing”, Procedia Computer Science, Vol. 157, pp. 388-395.

Amershi, Saleema et al., (2019). “Guidelines for human-AI interaction”, Proceedings of the 2019 chi conference on human factors in computing systems, pp. 1-13.

Arnold, Magda B (2013). Feelings and emotions: The Loyola symposium, Vol. 7. Academic Press.

Bălan, Oana et al., (2020). “An investigation of various machine and deep learning techniques applied in automatic fear level detection and acrophobia virtual therapy”, Sensors, Vol. 20 No. 2, p. 496.

Barrett, Lisa Feldman (2006). “Solving the emotion paradox: Categorization and the experience of emotion”, Personality and social psychology review, Vol. 10 No. 1, pp. 20-46.

— (2017). How emotions are made: The secret life of the brain, Pan Macmillan.

Barthet, Matthew et al., (2024). “Closing the Affective Loop via Experience-Driven Reinforcement Learning Designers”, arXiv preprint arXiv:2408.06346,

Belaiche, Reda et al., (2020). “Cost-effective CNNs for real-time micro-expression recognition”, Applied Sciences, Vol. 10 No. 14, p. 4959.

Binns, Reuben et al., (2018). “'It's Reducing a Human Being to a Percentage' Perceptions of Justice in Algorithmic Decisions”, Proceedings of the 2018 Chi conference on human factors in computing systems, pp. 1-14.

Bruner, Jerome (1990). Acts of meaning: Four lectures on mind and culture, Vol. 3. Harvard university press.

Bucur, Ana-Maria et al., (2023). “It's just a matter of time: Detecting depression with time- enriched multimodal transformers”, European Conference on Information Retrieval. Springer, pp. 200-215.

Buechel, Sven and Hahn, Udo (2017). “A flexible mapping scheme for discrete and dimensional emotion representations: Evidence from textual stimuli”, CogSci 2017—Proceedings of the 39th Annual Meeting of the Cognitive Science Society, pp. 180-185.

Buolamwini, Joy and Gebru, Timnit (2018). “Gender shades: Intersectional accuracy disparities in commercial gender classification”, Conference on fairness, accountability and transparency. PMLR, pp. 77-91.

Butz, Andreas (2010). “User interfaces and HCI for ambient intelligence and smart environments”, Handbook of ambient intelligence and smart environments. Springer, pp. 535-558.

Cambria, Erik et al., (2024). “SenticNet 8: Fusing emotion AI and commonsense AI for in¬terpretable, trustworthy, and explainable affective computing”, International Conference on Human-Computer Interaction (HCII).

Chatterjee, Chintan et al., (2024). “A Survey on Multi-modal Emotion Detection Techniques”,

Chavez, Robert S and Heatherton, Todd F (2015). “Multimodal frontostriatal connectivity under¬lies individual differences in self-esteem”, Social cognitive and affective neuroscience, Vol. 10 No. 3, pp. 364-370.

Chen, Rongfei et al., (2022). “Video-based cross-modal auxiliary network for multimodal sentiment analysis”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 32 No. 12, pp. 8703-8716.

Chen, Xiaofan, Lao, Songyang, and Duan, Ting (2020). “Multimodal fusion of visual dialog: A survey”, Proceedings of the 2020 2nd International Conference on Robotics, Intelligent Control and Artificial Intelligence, pp. 302-308.

Chen, Yiyuan et al., (2024). “EEG emotion recognition based on Ordinary Differential Equation Graph Convolutional Networks and Dynamic Time Wrapping”, Applied Soft Computing, Vol. 152, p. 111181.

Choo, Sanghyun et al., (2023). “Effectiveness of multi-task deep learning framework for EEG-based emotion and context recognition”, Expert Systems with Applications, Vol. 227, p. 120348.

Colett, Alexis (2017). The Art of Hype, PhD thesis. Berklee College of Music.

Conati, Cristina and Maclaren, Heather (2009). “Empirically building and evaluating a proba¬bilistic model of user affect”, User Modeling and User-Adapted Interaction, Vol. 19, pp. 267¬303.

D'Mello, Sidney, Dieterle, Ed, and Duckworth, Angela (2017). “Advanced, analytic, automated (AAA) measurement of engagement during learning”, Educational psychologist, Vol. 52 No. 2, pp. 104-123.

Dehghani, Amin, Soltanian-Zadeh, Hamid, and Hossein-Zadeh, Gholam-Ali (2023). “Probing fMRI brain connectivity and activity changes during emotion regulation by EEG neurofeedback”, Frontiers in Human Neuroscience, Vol. 16, p. 988890.

Dricu, Mihai and Fruhholz, Sascha (2020). “A neurocognitive model of perceptual decision-making on emotional signals”, Human Brain Mapping, Vol. 41 No. 6, pp. 1532-1556.

Durante, Zane et al., (2024). “Agent ai: Surveying the horizons of multimodal interaction”, arXiv preprint arXiv:2401.03568,

Dwork, Cynthia, Roth, Aaron, et al., (2014). “The algorithmic foundations of differential privacy”, Foundations and Trends® in Theoretical Computer Science, Vol. 9 No. 3-4, pp. 211-407.

Ekman, Paul and Cordaro, Daniel (2011). “What is meant by calling emotions basic”, Emotion review, Vol. 3 No. 4, pp. 364-370.

Ekman, Paul, Friesen, W v, and Hager, J (2002). “Facial action coding system: Research Nexus”, Network Research Information, Salt Lake City, UT, Vol. 1.

Ekman, Paul and Friesen, Wallace V (1971). “Constants across cultures in the face and emotion.” Journal of personality and social psychology, Vol. 17 No. 2, p. 124.

— (1978). “Facial action coding system”, Environmental Psychology & Nonverbal Behavior,

Ekman, Paul, Sorenson, E Richard, and Friesen, Wallace V (1969). “Pan-cultural elements in facial displays of emotion”, Science, Vol. 164 No. 3875, pp. 86-88.

Fontaine, Johnny JR, Scherer, Klaus R, and Soriano, Cristina (2013). Components of emotional meaning: A sourcebook, OUP Oxford.

Frachi, Yann, Chanel, Guillaume, and Barthet, Mathieu (2023). “Affective gaming using adaptive speed controlled by biofeedback”, Companion Publication of the 25th International Conference on Multimodal Interaction, pp. 238-246.

Frijda, Nico H (1986). The emotions, Cambridge University Press.

Gandhi, Ankita et al., (2023). “Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions”, Informa¬tion Fusion, Vol. 91, pp. 424-444.

GDPR, General Data Protection Regulation (2016). “General data protection regulation”, Reg¬ulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC,

Ghandeharioun, Asma et al., (2019). “Emma: An emotion-aware wellbeing chatbot”, 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, pp. 1-7.

Ghosh, Anay et al., (2023). “A multimodal sentiment analysis system for recognizing person ag¬gressiveness in pain based on textual and visual information”, Journal of Ambient Intelligence and Humanized Computing, Vol. 14 No. 4, pp. 4489-4501.

Ghosh, Soumitra et al., (2023). “Multitasking of sentiment detection and emotion recognition in code-mixed Hinglish data”, Know ledge-Based Systems, Vol. 260, p. 110182.

Gregory, Sue, Tavares-Jones, Nancy, and Jerry, Paul (2013). The Hype Cycle Upswing: The Resur¬gence of Virtual Worlds,

Gunning, David and Aha, David (2019). “DARPA's explainable artificial intelligence (XAI) pro¬gram”, AI magazine, Vol. 40 No. 2, pp. 44-58.

Hamdy, Salma and King, David (2018). “Affective games: a multimodal classification system”, 19th annual European GAME-ON Conference (GAME-ON'2018) on Simulation and AI in Computer Games. EUROSIS.

Hareli, Shlomo and Parkinson, Brian (2008). “What's social about social emotions?”, Journal for the theory of social behaviour, Vol. 38 No. 2, pp. 131-156.

He, Ping et al., (2023). “Cross-Modal Sentiment Analysis of Text and Video Based on Bi-GRU Cyclic Network and Correlation Enhancement”, Applied Sciences, Vol. 13 No. 13, p. 7489.

Hess, Ursula and Thibault, Pascal (2009). “Why the same expression may not mean the same when shown on different faces or seen by different people”, Affective information processing. Springer, pp. 145-158.

Hu, Guimin et al., (2024). “Recent Trends of Multimodal Affective Computing: A Survey from NLP Perspective”, arXiv preprint arXiv:2409.07388,

Hudlicka, Eva (2016). “Virtual affective agents and therapeutic games”, Artificial intelligence in behavioral and mental health care. Elsevier, pp. 81-115.

— (2019). “Modeling cognition-emotion interactions in symbolic agent architectures: Examples of research and applied models”, Cognitive Architectures, pp. 129-143.

Hutchinson, Ben and Mitchell, Margaret (2019). “50 years of test (un) fairness: Lessons for machine learning”, Proceedings of the conference on fairness, accountability, and transparency, pp. 49¬58.

Izard, Carroll E (1994). “Innate and universal facial expressions: evidence from developmental and cross-cultural research.”

Juyal, Prachi (2022). “Multi-modal sentiment analysis of audio and visual context of the data using machine learning”, 2022 3rd International Conference on Smart Electronics and Communica¬tion (ICOSEC). IEEE, pp. 1198-1205.

Kadyr, Sarsenbek and Tolganay, Chinibayeva (2024). “Affective computing methods for simulation of action scenarios in video games”, Procedia Computer Science, Vol. 231, pp. 341-346.

Kalateh, Sepideh et al., (2024). “A systematic review on multimodal emotion recognition: building blocks, current state, applications, and challenges”, IEEE Access,

Khalaf, Oshamah Ibrahim et al., (2024). “Elevating metaverse virtual reality experiences through network-integrated neuro-fuzzy emotion recognition and adaptive content generation algo¬rithms”, Engineering Reports, e12894.

Khan, Umair Ali et al., (2024). “Exploring contactless techniques in multimodal emotion recogni¬tion: insights into diverse applications, challenges, solutions, and prospects”, Multimedia Sys¬tems, Vol. 30 No. 3, pp. 1-48.

Kim, Taewoon and Vossen, Piek (2021). “Emoberta: Speaker-aware emotion recognition in conver¬sation with roberta”, arXiv preprint arXiv:2108.12009,

Kong, Chenqi et al., (2024). “M '{3} FAS: An Accurate and Robust MultiModal Mobile Face

Anti-Spoofing System”, IEEE Transactions on Dependable and Secure Computing,

Kong, Weiqi (2024). “Research Advanced in Multimodal Emotion Recognition Based on Deep

Learning”, Highlights in Science, Engineering and Technology, Vol. 85, pp. 602-608.

Krishna, DN (2021). “Using large pre-trained models with cross-modal attention for multi-modal emotion recognition”, arXiv preprint arXiv:2108.09669, Vol. 2.

Kuppens, Peter, Oravecz, Zita, and Tuerlinckx, Francis (2010). “Feelings change: accounting for individual differences in the temporal dynamics of affect.” Journal of personality and social psychology, Vol. 99 No. 6, p. 1042.

Kwon, Seungjin et al., (2022). “Analytical framework for facial expression on game experience test”, IEEE Access, Vol. 10, pp. 104486-104497.

Lazarus, Richard S (1968). “Emotions and adaptation: Conceptual and empirical relations.” Ne¬braska symposium on motivation. University of Nebraska Press.

— (1991). “Progress on a cognitive-motivational-relational theory of emotion.” American psychol¬ogist, Vol. 46 No. 8, p. 819.

LeCun, Yann, Bengio, Yoshua, and Hinton, Geoffrey (2015). “Deep learning”, nature, Vol. 521 No. 7553, pp. 436-444.

Lei, Jing, Sala, Johannan, and Jasra, Shashi K (2017). “Identifying correlation between facial expression and heart rate and skin conductance with iMotions biometric platform”, Journal of Emerging Forensic Sciences Research, Vol. 2 No. 2, pp. 53-83.

Levy, Yair and Ellis, Timothy J (2006). “A systems approach to conduct an effective literature review in support of information systems research.” Informing Science, Vol. 9.

Li, Dongyuan et al., (2023). “Joyful: Joint Modality Fusion and Graph Contrastive Learning for

Multimoda Emotion Recognition”, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Ed. by Bouamor, Houda, Pino, Juan, and Bali, Kalika. Asso¬ciation for Computational Linguistics: Singapore, pp. 16051-16069. doi: 10.18653/v1/2023. emnlp-main.996. available at: https://aclanthology.org/2023.emnlp-main.996.

Li, Shaokai et al., (2024). “Feature distribution Adaptation Network for Speech Emotion Recog¬nition”, arXiv preprint arXiv:2410.22023,

Lian, Hailun et al., (2023). “A Survey of Deep Learning-Based Multimodal Emotion Recognition: Speech, Text, and Face”, Entropy, Vol. 25 No. 10, p. 1440.

Lindquist, Kristen A etal., (2012). “The brain basis of emotion: a meta-analytic review”, Behavioral and brain sciences, Vol. 35 No. 3, pp. 121-143.

Lisetti, Christine L and Nasoz, Fatma (2002). “MAUI: a multimodal affective user interface”, Proceedings of the tenth ACM international conference on Multimedia, pp. 161-170.

Liu, Chang et al., (2024). “EmoFace: Audio-driven Emotional 3D Face Animation”, 2024 IEEE Conference Virtual Reality and 3D User Interfaces (VR). IEEE, pp. 387-397.

Lopes, Julio Castro and Lopes, Rui Pedro (2022). “A review of dynamic difficulty adjustment methods for serious games”, International Conference on Optimization, Learning Algorithms and Applications. Springer, pp. 144-159.

Ma, Fei et al., (2022). “Data augmentation for audio-visual emotion recognition with an efficient multimodal conditional GAN”, Applied Sciences, Vol. 12 No. 1, p. 527.

Mamieva, Dilnoza et al., (2023). “Multimodal emotion detection via attention-based fusion of extracted facial and speech features”, Sensors, Vol. 23 No. 12, p. 5475.

Markus, Hazel Rose and Kitayama, Shinobu (2010). “Cultures and selves: A cycle of mutual con¬stitution”, Perspectives on psychological science, Vol. 5 No. 4, pp. 420-430.

May, Alyxander David et al., (2017). “Human emotional understanding for empathetic companion robots”, Advances in Computational Intelligence Systems: Contributions Presented at the 16th UK Workshop on Computational Intelligence, September 7-9, 2016, Lancaster, UK. Springer, pp. 277-285.

McDuff, Daniel et al., (2014). “Predicting ad liking and purchase intent: Large-scale analysis of facial responses to ads”, IEEE Transactions on Affective Computing, Vol. 6 No. 3, pp. 223-235.

Mehrabi, Ninareh et al., (2021). “A survey on bias and fairness in machine learning”, ACM com¬puting surveys (CSUR), Vol. 54 No. 6, pp. 1-35.

Mehu, Marc and Scherer, Klaus R (2015). “Emotion categories and dimensions in the facial com¬munication of affect: An integrated approach.” Emotion, Vol. 15 No. 6, p. 798.

Meng, Tao et al., (2024). “Masked graph learning with recurrent alignment for multimodal emo¬tion recognition in conversation”, IEEE/ACM Transactions on Audio, Speech, and Language Processing,

Mesquita, Batja, Boiger, Michael, and De Leersnyder, Jozefien (2016). “The cultural construction of emotions”, Current opinion in psychology, Vol. 8, pp. 31-36.

Minsky, Marvin (1988). Society of mind, Simon and Schuster.

Mitchell, Shira et al., (2021). “Algorithmic fairness: Choices, assumptions, and definitions”, Annual review of statistics and its application, Vol. 8 No. 1, pp. 141-163.

Mocanu, Bogdan, Tapu, Ruxandra, and Zaharia, Titus (2023). “Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning”, Image and Vision Computing, Vol. 133, p. 104676.

Moors, Agnes et al., (2013). “Appraisal theories of emotion: State of the art and future develop¬ment”, Emotion review, Vol. 5 No. 2, pp. 119-124.

Nasir, Jauwairia et al., (2022). “What if social robots look for productive engagement? Automated assessment of goal-centric engagement in learning applications”, International Journal of Social Robotics, Vol. 14 No. 1, pp. 55-71.

Nissenbaum, Helen (2011). “A contextual approach to privacy online”, Daedalus, Vol. 140 No. 4, pp. 32-48.

Okoli, Chitu and Schabram, Kira (2015). “A guide to conducting a systematic literature review of information systems research”,

Otamendi, F Javier (2022). “Statistical emotion control: Comparing intensity and duration of emotional reactions based on facial expressions”, Expert Systems with Applications, Vol. 200, p. 117074.

Parliament, European (2021). “Artificial intelligence act”, European Parliament: European Parlia¬mentary Research Service,

Peng, Min et al., (2017). “Dual temporal scale convolutional neural network for micro-expression recognition”, Frontiers in psychology, Vol. 8, p. 273835.

Pervez, Farrukh et al., (2024). “Affective Computing and the Road to an Emotionally Intelligent Metaverse”, IEEE Open Journal of the Computer Society,

Pessoa, Luiz (2013). The cognitive-emotional brain: From interactions to integration, MIT press.

Picard, Rosalind W (2010). “Affective computing: from laughter to IEEE”, IEEE transactions on affective computing, Vol. 1 No. 1, pp. 11-17.

Poria, Soujanya et al., (2017). “Context-dependent sentiment analysis in user-generated videos”, Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers), pp. 873-883.

Porter, Stephen and Ten Brinke, Leanne (2008). “Reading between the lies: Identifying concealed and falsified emotions in universal facial expressions”, Psychological science, Vol. 19 No. 5, pp. 508-514.

Rahman, MD et al., (2024). A comprehensive NLP-based voice assistant system for streamlined information retrieval in metro rail services of Bangladesh, PhD thesis. Brac University.

Reardon, Claudia L et al., (2019). “Mental health in elite athletes: International Olympic Commit¬tee consensus statement (2019)”, British journal of sports medicine, Vol. 53 No. 11, pp. 667¬699.

Reuderink, Boris, Muhl, Christian, and Poel, Mannes (2013). “Valence, arousal and dominance in the EEG during game play”, International journal of autonomous and adaptive communications systems, Vol. 6 No. 1, pp. 45-62.

Rezapour, Mohammad Mahdi, Fatemi, Afsaneh, and Nematbakhsh, Mohammad Ali (2024). “A methodology for using players' chat content for dynamic difficulty adjustment in metaverse multiplayer games”, Applied Soft Computing, Vol. 156, p. 111497.

Riva, Giuseppe, Wiederhold, Brenda K, and Mantovani, Fabrizia (2019). “Neuroscience of virtual reality: from virtual exposure to embodied medicine”, Cyberpsychology, behavior, and social networking, Vol. 22 No. 1, pp. 82-96.

Russell, James A (1980). “A circumplex model of affect.” Journal of personality and social psy¬chology, Vol. 39 No. 6, p. 1161.

— (1994). “Is there universal recognition of emotion from facial expression? A review of the cross- cultural studies.” Psychological bulletin, Vol. 115 No. 1, p. 102.

— (2003). “Core affect and the psychological construction of emotion.” Psychological review, Vol. 110 No. 1, p. 145.

Russell, James A, Lewicka, Maria, and Niit, Toomas (1989). “A cross-cultural study of a circumplex model of affect.” Journal of personality and social psychology, Vol. 57 No. 5, p. 848.

Savchenko, AV and Savchenko, LV (2022). “Audio-visual continuous recognition of emotional state in a multi-user system based on personalized representation of facial expressions and voice”, Pattern Recognition and Image Analysis, Vol. 32 No. 3, pp. 665-671.

Savchenko, Lyudmila and V Savchenko, Andrey (2021). “Speaker-aware training of speech emo¬tion classifier with speaker recognition”, Speech and Computer: 23rd International Confer¬ence, SPECOM 2021, St. Petersburg, Russia, September 27-30, 2021, Proceedings 23. Springer, pp. 614-625.

Sayyed, Mudassar et al., (2024). “Human-Machine Interaction in the Metaverse: A Comprehensive Review and Proposed Framework”, Impact and Potential of Machine Learning in the Metaverse, pp. 1-28.

Scherer, Klaus R (1982). The nature and function of emotion,

— (2009). “The dynamic architecture of emotion: Evidence for the component process model”, Cognition and emotion, Vol. 23 No. 7, pp. 1307-1351.

Scherer, Klaus R, Schorr, Angela, and Johnstone, Tom (2001). Appraisal processes in emotion: Theory, methods, research, Oxford University Press.

Scherer, Klaus R and Wallbott, Harald G (1994). “Evidence for universality and cultural variation of differential emotion response patterning.” Journal of personality and social psychology, Vol. 66 No. 2, p. 310.

Schmidt, Philip et al., (2019). “Wearable-based affect recognition—A review”, Sensors, Vol. 19 No. 19, p. 4079.

Shahin, Mohammad, Chen, F Frank, and Hosseinzadeh, Ali (2024). “Harnessing customized AI to create voice of customer via GPT3. 5”, Advanced Engineering Informatics, Vol. 61, p. 102462.

Sham, Abdallah Hussein et al., (2023). “Towards context-aware facial emotion reaction database for dyadic interaction settings”, Sensors, Vol. 23 No. 1, p. 458.

Shou, Yuntao et al., (2024a). “A low-rank matching attention based cross-modal feature fusion method for conversational emotion recognition”, IEEE Transactions on Affective Computing,

Shou, Yuntao et al., (2024b). “Adversarial alignment and graph fusion via information bottleneck for multimodal emotion recognition in conversations”, Information Fusion, Vol. 112, p. 102590.

Simic, Nikola et al., (2024). “Enhancing Emotion Recognition through Federated Learning: A Multimodal Approach with Convolutional Neural Networks”, Applied Sciences, Vol. 14 No. 4, p. 1325.

Singh, Bhupinder and Kaunert, Christian (2024). “Augmented Reality and Virtual Reality Modules for Mindfulness: Boosting Emotional Intelligence and Mental Wellness”, Applications of Virtual and Augmented Reality for Health and Wellbeing. IGI Global, pp. 111-128.

Smith, Craig A and Lazarus, Richard S (1993). “Appraisal components, core relational themes, and the emotions”, Cognition & emotion, Vol. 7 No. 3-4, pp. 233-269.

Srivastava, Arun Pratap et al., (2024). “Bridging the Gap Between Modalities with Cross-Modal Generative AI and Large Model”, 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT). IEEE, pp. 965-971.

Suguitan, Michael et al., (2024). “Face2Gesture: Translating facial expressions into robot move¬ments through shared latent space neural networks”, ACM Transactions on Human-Robot In¬teraction, Vol. 13 No. 3, pp. 1-18.

Sutton, Tina M, Herbert, Andrew M, and Clark, Dailyn Q (2019). “Valence, arousal, and domi¬nance ratings for facial stimuli”, Quarterly Journal of Experimental Psychology, Vol. 72 No. 8, pp. 2046-2055.

Tang, Jiehao et al., (2024). “Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment”, Information Fusion, Vol. 103, p. 102129.

Vygotsky, Lev Semenovich and Cole, Michael (1978). Mind in society: Development of higher psychological processes, Harvard university press.

Wang, Ruiqi et al., (2024). “Husformer: A multi-modal transformer for multi-modal human state recognition”, IEEE Transactions on Cognitive and Developmental Systems,

Wang, Yan etal., (2022). “A systematic review on affective computing: Emotion models, databases, and recent advances”, Information Fusion, Vol. 83, pp. 19-52.

Wang, Yusong, Li, Dongyuan, and Shen, Jialun (2024). “Inter-Modality and Intra-Sample Align¬ment for Multi-Modal Emotion Recognition”, ICASSP 2024-2024 IEEE International Confer¬ence on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 8301-8305.

Web article:Human-Computer Interaction and Visualization, (n.d.). https://research.google/research- areas/human-computer-interaction-and-visualization/.

Xiaoming, ZHAO, Yijiao, YANG, and Shiqing, ZHANG (2022). “Survey of deep learning based multimodal emotion recognition”, Journal of Frontiers of Computer Science & Technology, Vol. 16 No. 7, p. 1479.

Xu, Wei and Gao, Zaifeng (2023). “Enabling Human-Centered AI: A Methodological Perspective”, arXiv preprint arXiv:2311.06703,

Yi, Lu and Mak, Man-Wai (2019). “Adversarial data augmentation network for speech emotion recognition”, 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, pp. 529-534.

Yi, Yufan etal., (2023). “DBT: multimodal emotion recognition based on dual-branch transformer”, The Journal of Supercomputing, Vol. 79 No. 8, pp. 8611-8633.

Zadeh, Amir et al., (2017). “Tensor fusion network for multimodal sentiment analysis”, arXiv preprint arXiv:1707.07250,

Zajonc, Robert B (1984). “On the primacy of affect.”

Zeng, Zhihong et al., (2007). “A survey of affect recognition methods: audio, visual and spontaneous expressions”, Proceedings of the 9th international conference on Multimodal interfaces, pp. 126¬133.

Zepf, Sebastian et al., (2020). “Driver emotion recognition for intelligent vehicles: A survey”, ACM Computing Surveys (CSUR), Vol. 53 No. 3, pp. 1-30.

Zhang, Chuan-Ke et al., (2017). “An extended reciprocally convex matrix inequality for stability analysis of systems with time-varying delay”, Automatica, Vol. 85, pp. 481-485.

Zhang, Shiqing et al., (2024). “Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: A systematic review of recent advancements and future prospects”, Expert Systems with Applications, Vol. 237, p. 121692.

Zhang, Xiaoheng et al., (2024). “A Multi-Level Alignment and Cross-Modal Unified Semantic Graph Refinement Network for Conversational Emotion Recognition”, IEEE Transactions on Affective Computing,

Zhang, Yazhou et al., (2021). “Multi-task learning for jointly detecting depression and emo¬tion”, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, pp. 3142-3149.

Zhang, Yong, Cheng, Cheng, and Zhang, Yidie (2021). “Multimodal emotion recognition using a hierarchical fusion convolutional neural network”, IEEE access, Vol. 9, pp. 7943-7951.

Zhao, Sicheng et al., (2021). “Emotion recognition from multiple modalities: Fundamentals and methodologies”, IEEE Signal Processing Magazine, Vol. 38 No. 6, pp. 59-73.

Downloads

Published

2025-01-22

How to Cite

Krzeminska, I. (2025). Multimodal Recognition of Users States at Human-AI Interaction Adaptation. Technium: Romanian Journal of Applied Sciences and Technology, 26, 102–140. https://doi.org/10.47577/technium.v26i.12398