Publication

Multi-party Speech Recovery Exploiting Structured Sparsity Models

Publications associées (30)

Unsupervised Visual Entity Abstraction towards 2D and 3D Compositional Models

Object-centric learning has gained significant attention over the last years as it can serve as a powerful tool to analyze complex scenes as a composition of simpler entities. Well-established tasks in computer vision, such as object detection or instance ...

EPFL2022

Generating Higher-Fidelity Synthetic Datasets with Privacy Guarantees

Boi Faltings, Aleksei Triastcyn

We consider the problem of enhancing user privacy in common data analysis and machine learning development tasks, such as data annotation and inspection, by substituting the real data with samples from a generative adversarial network. We propose employing ...

MDPI2022

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

Mathew Magimai Doss, Eklavya Sarkar

Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source ...

ISCA2022

Learning of Continuous and Piecewise-Linear Functions With Hessian Total-Variation Regularization

Michaël Unser, Joaquim Gonçalves Garcia Barreto Campos, Shayan Aziznejad

We develop a novel 2D functional learning framework that employs a sparsity-promoting regularization based on second-order derivatives. Motivated by the nature of the regularizer, we restrict the search space to the span of piecewise-linear box splines shi ...

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC2022

Discriminative clustering with representation learning with any ratio of labeled to unlabeled data

We present a discriminative clustering approach in which the feature representation can be learned from data and moreover leverage labeled data. Representation learning can give a similarity-based clustering method the ability to automatically adapt to an ...

2022

Leveraging topology, geometry, and symmetries for efficient Machine Learning

Michaël Defferrard

When learning from data, leveraging the symmetries of the domain the data lies on is a principled way to combat the curse of dimensionality: it constrains the set of functions to learn from. It is more data efficient than augmentation and gives a generaliz ...

EPFL2022

Controllability and Interpretability in Affective Speech Synthesis

Bastian Schnell

Thanks to Deep Learning Text-To-Speech (TTS) has achieved high audio quality with large databases. But at the same time the complex models lost any ability to control or interpret the generation process. For the big challenge of affective TTS it is infeasi ...

EPFL2022

Semantic Segmentation of Remote Sensing Images With Sparse Annotations

Devis Tuia

Training convolutional neural networks (CNNs) for very high-resolution images requires a large quantity of high-quality pixel-level annotations, which is extremely labor-intensive and time-consuming to produce. Moreover, professional photograph interpreter ...

2022

Novel Methods for Incorporating Prior Knowledge for Automatic Speech Assessment

Subrahmanya Pavankumar Dubagunta

Speech signal conveys several kinds of information such as a message, speaker identity, emotional state of the speaker and social state of the speaker. Automatic speech assessment is a broad area that refers to using automatic methods to predict human judg ...

EPFL2021

Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-specific Scaling

Paolo Prandoni, Milos Cernak, Natalia Nessler

In communication systems, it is crucial to estimate the perceived quality of audio and speech. The industrial standards for many years have been PESQ, 3QUEST, and POLQA, which are intrusive methods. This restricts the possibilities of using these metrics i ...

ISCA-INT SPEECH COMMUNICATION ASSOC2021