Join the Association of Human-Computer Interaction

The original article is published under the Creative Commons Attribution 4.0 International (CC BY 4.0) license

Using Gaussian elliptical models and pixel-based labeling to recognize human-object interactions

Human-Object Interaction (HOI) detection requires deep and significant detail from image sequences due to its importance in many computer vision-based applications. Incorporating semantics into understanding the scene has led to a deep understanding of human-centered action. a semantic HOI detection system based on multiview sensors. In the proposed system, denoised RGB and depth images are segmented into multiple clusters by bilateral filtering (BLF) using a simple linear iterative clustering (SLIC) algorithm.The skeleton is then extracted from segmented RGB and depth images via Euclidean Distance Transform (EDT). Human joints extracted from the skeleton provide the annotations for precise pixel-level annotation. Then an elliptical human model is generated by a Gaussian Mixture Model (GMM). A Conditional Random Field (CRF) model is trained to assign a specific label to each pixel of different parts of the human body and an interaction object. Two types of semantic features extracted from each tagged human body part and tagged objects are: datum points and 3D point cloud. Feature descriptors are quantified using Fisher Linear Discriminant Analysis (FLDA) and classified using Kary Tree Hashing (KATH). In the experimental phase, the detection accuracy achieved with the Sports dataset is 92.88%, with the Sun YatSen University (SYSU) 3D HOI dataset 93.5%, and with the Nanyang Technological University RGB+D dataset (NTU) is 94.16%. The proposed system has been validated through extensive experiments and should be applicable to many computer vision based applications such as health monitoring, security and assisted living systems, etc.