With system performance on existing reading comprehension benchmarks nearing or surpassing human performance, we need a new, hard dataset that improves systems' capabilities to actually read paragraphs of text. DROP is a crowdsourced, adversarially-created, 96k-question benchmark, in which a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets.

AllenNLP provides an easy way for you to get started with this dataset, with a dataset reader that can be used with any model you design, and a reference implementation of the NAQANet model that was introduced in the DROP paper. Find more details in the links below.

  • Paper, describing the dataset and our initial model for it, Numerically-Augmented QANet (NAQANet), that adds some rudimentary numerical reasoning capability on top of QANet.
  • Data, with about 77k questions in the train set and 9.5k questions in the dev set (and a similar number in a hidden test set).
  • Code for the NAQANet model lives in AllenNLP: dataset reader, NAQANet model. Code for the other baselines in the paper may get added to AllenNLP in the future; open an issue on github if there's something in particular you'd like to see.
  • Leaderboard with an automated docker-based evaluation on a hidden test set.
  • NAQANet demo - see how well current NLP systems understand paragraphs! The examples in the select box at the top should give you some sense of what kinds of questions are in DROP, what the system can do well, and a bit of what it can't. Change the paragraphs, input your own, try your own complex questions, and see what you find. If you find something interesting, let us know on twitter!


      author={Dheeru Dua and Yizhong Wang and Pradeep Dasigi and Gabriel Stanovsky and Sameer Singh and Matt Gardner},
      title={  {DROP}: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs},
      booktitle={Proc. of NAACL},