Machine comprehension of texts longer than a single sentence often requires coreference resolution. However, most current reading comprehension benchmarks do not contain complex coreferential phenomena and hence fail to evaluate the ability of models to resolve coreference. We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in about 4.7K English paragraphs from Wikipedia.

AllenNLP provides an easy way for you to get started with this dataset, with a dataset reader that can be used with any model you design, and a reference implementation of the basline models used in the paper. Find more details in the links below.

  • Paper, describing the dataset and our basline models for it.
  • Data, with over 19K questions in the train set and over 2.4K questions in the dev set (and a similar number in a hidden test set). The data is distributed under the CC BY-SA 4.0 license.
  • Leaderboard with an automated docker-based evaluation on a hidden test set.
  • Citation:

    @inproceedings{Dasigi2019Quoref,
      author={Pradeep Dasigi and Nelson F. Liu and Ana Marasovi\'{c} and Noah A. Smith and Matt Gardner},
      title={Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning},
      booktitle={Proc. of EMNLP-IJCNLP},
      year={2019}
    }