Publications associées à Long-Term Spectral Statistics for Voice Presentation Attack Detection

Novel Methods For Detection And Analysis Of Atypical Aspects In Speech

Atypical aspects in speech concern speech that deviates from what is commonly considered normal or healthy. In this thesis, we propose novel methods for detection and analysis of these aspects, e.g. to monitor the temporary state of a speaker, diseases tha ...

EPFL2023

Multilingual Training and Adaptation in Speech Recognition

Sibo Tong

State-of-the-art acoustic models for Automatic Speech Recognition (ASR) are based on Hidden Markov Models (HMM) and Deep Neural Networks (DNN) and often require thousands of hours of transcribed speech data during training. Therefore, building multilingual ...

EPFL2020

Trustworthy speaker recognition with minimal prior knowledge using neural networks

Hannah Muckenhirn

The performance of speaker recognition systems has considerably improved in the last decade. This is mainly due to the development of Gaussian mixture model-based systems and in particular to the use of i-vectors. These systems handle relatively well noise ...

EPFL2019

End-to-End Acoustic Modeling using Convolutional Neural Networks for HMM-based Automatic Speech Recognition

Mathew Magimai Doss, Dimitri Palaz, Ronan Collobert

In hidden Markov model (HMM) based automatic speech recognition (ASR) system, modeling the statistical relationship between the acoustic speech signal and the HMM states that represent linguistically motivated subword units such as phonemes is a crucial st ...

ELSEVIER SCIENCE BV2019

Visual speech recognition : from traditional to deep learning frameworks

Marina Zimmermann

Speech is the most natural means of communication for humans. Therefore, since the beginning of computers it has been a goal to interact with machines via speech. While there have been gradual improvements in this field over the decades, and with recent dr ...

EPFL2018

Phonetic aware techniques for Speaker Verification

Subhadeep Dey

The goal of this thesis is to improve current state-of-the-art techniques in speaker verification (SV), typically based on â identity-vectorsâ (i-vectors) and deep neural network (DNN), by exploiting diverse (phonetic) information extracted using variou ...

EPFL2018

Analysis of Language Dependent Front-End for Speaker Recognition

Petr Motlicek, Subhadeep Dey

In Deep Neural Network (DNN) i-vector based speaker recognition systems, acoustic models trained for Automatic Speech Recognition are employed to estimate sufficient statistics for i-vector modeling. The DNN based acoustic model is typically trained on a w ...

ISCA-INT SPEECH COMMUNICATION ASSOC2018

On Learning to Identify Genders from Raw Speech Signal Using CNNs

Hannah Muckenhirn, Selen Hande Kabil

Automatic Gender Recognition (AGR) is the task of identifying the gender of a speaker given a speech signal. Standard approaches extract features like fundamental frequency and cepstral features from the speech signal and train a binary classifier. Inspire ...

ISCA-INT SPEECH COMMUNICATION ASSOC2018

Cross-lingual Adaptation of a CTC-based multilingual Acoustic Model

Hervé Bourlard, Philip Neil Garner, Sibo Tong

Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-de ...

ELSEVIER SCIENCE BV2018

Template-matching for text-dependent speaker verification

Petr Motlicek, Subhadeep Dey

In the last decade, i-vector and Joint Factor Analysis (JFA) approaches to speaker modeling have become ubiquitous in the area of automatic speaker recognition. Both of these techniques involve the computation of posterior probabilities, using either Gauss ...

2017

On Modeling the Synergy Between Acoustic and Lexical Information for Pronunciation Lexicon Development

Marzieh Razavi

State-of-the-art automatic speech recognition (ASR) and text-to-speech systems require a pronunciation lexicon that maps each word to a sequence of phones. Manual development of lexicons is costly as it needs linguistic knowledge and human expertise. To fa ...

EPFL2017

Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations

Sébastien Marcel

Research in the area of automatic speaker verification (ASV) has been advanced enough for the industry to start using ASV systems in practical applications. However, these systems are highly vulnerable to spoofing or presentation attacks, limiting their wi ...

2017

Towards End-to-End Speech Recognition

Dimitri Palaz

Standard automatic speech recognition (ASR) systems follow a divide and conquer approach to convert speech into text. Alternately, the end goal is achieved by a combination of sub-tasks, namely, feature extraction, acoustic modeling and sequence decoding, ...

EPFL2016

Presentation Attack Detection Using Long-Term Spectral Statistics for Trustworthy Speaker Verification

Mathew Magimai Doss, Sébastien Marcel, Hannah Muckenhirn

In recent years, there has been a growing interest in developing countermeasures against non zero-effort attacks for speaker verification systems. Until now, the focus has been on logical access attacks, where the spoofed samples are injected into the syst ...

Ieee2016