Prompt–RSVQA: Prompting visual context to a language model for Remote Sensing Visual Question Answering

À propos
Confidentialité
Mentions légales

Graph Chatbot

Publications associées (30)

Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN

Maria Brbic, Ziang Li

Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint ...

Nature Portfolio2024

Incorporating Projective Geometry into Deep Learning

Michal Jan Tyszkiewicz

In this thesis we explore the applications of projective geometry, a mathematical theory of the relation between 3D scenes and their 2D images, in modern learning-based computer vision systems. This is an interesting research question which contradicts the ...

EPFL2024

Efficient local linearity regularization to overcome catastrophic overfitting

Volkan Cevher, Grigorios Chrysos, Fanghui Liu, Elias Abad Rocamora

Catastrophic overfitting (CO) in single-step adversarial training (AT) results in abrupt drops in the adversarial test accuracy (even down to 0%). For models trained with multi-step AT, it has been observed that the loss function behaves locally linearly w ...

2024

Robust machine learning for neuroscientific inference

Steffen Schneider

Modern neuroscience research is generating increasingly large datasets, from recording thousands of neurons over long timescales to behavioral recordings of animals spanning weeks, months, or even years. Despite a great variety in recording setups and expe ...

EPFL2024

Visual complexity of urban streetscapes: human vs computer vision

Pietro Florio

Understanding visual complexity of urban environments may improve urban design strategies and limit visual pollution due to advertising, road signage, telecommunication systems and machinery. This paper aims at quantifying visual complexity specifically in ...

Springer2024

Fast and Future: Towards Efficient Forecasting in Video Semantic Segmentation

Evann Pierre Guy Courdier

Deep learning has revolutionized the field of computer vision, a success largely attributable to the growing size of models, datasets, and computational power. Simultaneously, a critical pain point arises as several computer vision applications are deploye ...

EPFL2024

Text Representation Learning for Low Cost Natural Language Understanding

Jan Frederik Jonas Florian Mai

Natural language processing and other artificial intelligence fields have witnessed impressive progress over the past decade. Although some of this progress is due to algorithmic advances in deep learning, the majority has arguably been enabled by scaling ...

EPFL2023

Deep Generative Models for Autonomous Driving: from Motion Forecasting to Realistic Image Synthesis

Saeed Saadatnejad

Forecasting is a capability inherent in humans when navigating. Humans routinely plan their paths, considering the potential future movements of those around them. Similarly, to achieve comparable sophistication and safety, autonomous systems must embrace ...

EPFL2023

Improving Generalization of Pretrained Language Models

Rabeeh Karimi Mahabadi

In this dissertation, we propose multiple methods to improve transfer learning for pretrained language models (PLMs). Broadly, transfer learning is a powerful technique in natural language processing, where a language model is first pre-trained on a data-r ...

EPFL2023

Multi-task prompt-RSVQA to explicitly count objects on aerial images

Devis Tuia, Christel Marie Tartini-Chappuis, Nicola Antonio Santacroce, Sylvain Lobry, Javiera Francisca Castillo Navarro

Introduced to enable a wider use of Earth Observation images using natural language, Remote Sensing Visual Question Answering (RSVQA) remains a challenging task, in particular for questions related to counting. To address this specific challenge, we propos ...

2023

Page 2 sur 2