Humans often have to read multiple documents to address their information needs. However, most existing reading comprehension tasks only focus on questions for which the contexts provide all the information required to answer them, thus not evaluating a system's performance at identifying a potential lack of sufficient information and locating the sources for that information. IIRC is a crowdsourced dataset consisting of information-seeking questions requiring models to identify and then retrieve necessary information that is missing from the original context.

Find more details in the links below.

  • Paper, describing the dataset and our baseline models for it.
  • Dataset with about 10.8k questions in the train set and 1.3k questions in both the dev and test sets. The data is distributed under the CC BY 4.0 license.
  • Code for the baseline described in the paper.
  • Citation:

    @inproceedings{Ferguson2020IIRC,
      author={James Ferguson and Matt Gardner and Hannaneh Hajishirzi and Tushar Khot and Pradeep Dasigi},
      title={  {IIRC}: A Dataset of Incomplete Information Reading Comprehension Questions},
      booktitle={Proc. of EMNLP},
      year={2020}
    }