See how our models compare to the competition. AllenNLP provides strong performance with reasonable runtimes, along with the infrastructure to easily run them.
Machine Comprehension (MC) models answer natural language questions by selecting an answer span within an evidence text. The AllenNLP MC model is a reimplementation of BiDAF (Seo et al, 2017), or Bi-Directional Attention Flow, a widely used MC baseline that achieves near state-of-the-art accuracies on the SQuAD dataset. The AllenNLP BIDAF model achieves an EM score of 68.3 on the SQuAD dev set, just slightly ahead of the original BIDAF system's score of 67.7, while also training at a 10x speedup (4 hours on a p2.xlarge).
SRL, or Semantic Role Labeling, models recover the latent predicate argument structure of a sentence. SRL builds representations that answer basic questions about sentence meaning, including "who" did "what" to “whom," etc. The AllenNLP SRL model is a reimplementation of a deep BiLSTM model (He et al, 2017). The AllenNLP SRL model closely matches the published model, achieving a F1 of 78.9 on CoNLL 2012.
Textual Entailment (TE) models take a pair of sentences and predict whether the facts in the first necessarily imply the facts in the second one. The AllenNLP TE model is a reimplementation of the decomposable attention model (Parikh et al, 2017), a widely used TE baseline that is relatively simple and achieves near state-of-the-art performance onthe SNLI dataset. The AllenNLP TE model achieves an accuracy of 84.7% on the SNLI 1.0 test dataset, which is comparable to the original system's score of 86.3%.