Publication
Remote sensing visual question answering (RSVQA) aims at predicting an answer to a question (both in natural language) about an overhead image. Through natural language processing, this task allows end users to extract high-level information from remote sensing data. In this chapter, we discuss some of the works that have been proposed for RSVQA. We first systematically review eight existing datasets that can be used to train and evaluate RSVQA models. We then examine contributions on RSVQA models on the visual, language, fusion of modalities and answer prediction parts. Finally, we discuss new research directions that could be pursued to advance the field.