University of Cambridge > Talks.cam > NLIP Seminar Series > The potential of synthetic data for more informative evaluation in Visual Question Answering

The potential of synthetic data for more informative evaluation in Visual Question Answering

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Andrew Caines.

Visual Question Answering (VQA) combines language and scene understanding in a straightforward task. However, the popular VQA Dataset exhibits characteristics which make it easy for even a simple system to achieve surprisingly good performance. At the same time, sophisticated model improvements are often barely, if at all, reflected in corresponding performance gains. I will discuss various reasons for that, and why this type of monolithic real-world dataset as sole benchmark thus might be a dead end for VQA . Various synthetic abstract VQA datasets have recently been published, to help overcome this problem. I will introduce “ShapeWorld”, our framework for automatically generating abstract multimodal data, discuss some of the design choices of the generation system, and contrast it with related synthetic datasets. Central to our system is an evaluation methodology akin to “unit-testing”, in contrast to real-world datasets which serve more as application benchmarks. Finally, I will present some experimental results, focusing on quantifier statements, which illustrate this approach and how it enables more targeted and detailed analysis of deep learning models.

This talk is part of the NLIP Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity