In wearable-based human activity recognition (HAR) research, one of the major challenges is the large intra-class variability problem. The collected activity signal is often, if not always, coupled with noises or bias caused by personal, environmental, or ...
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few work ...
Face recognition has become a popular authentication tool in recent years. Modern state-of-the-art (SOTA) face recognition methods rely on deep neural networks, which extract discriminative features from face images. Although these methods have high recogn ...
The human face plays an essential role in social interactions as it brings information about someone's identity, state of mind, or mood. People are, by nature, very good at catching this non-spoken information. Therefore, scientists have been interested in ...
Spatial self-attention layers, in the form of Non-Local blocks, introduce long-range dependencies in Convolutional Neural Networks by computing pairwise similarities among all possible positions. Such pairwise functions underpin the effectiveness of non-lo ...
Emotion recognition is usually achieved by collecting features (physiological signals, events, facial expressions, etc.) to predict an emotional ground truth. This ground truth is arguably unreliable due to its subjective nature. In this paper, we introduc ...