Using the included models is fine, but at some point you’ll probably want to implement your own models, which is what this tutorial is for.
Generally speaking, in order to implement a new model, you’ll need to implement a
subclass to read in your datasets and a
subclass corresponding to the model you want to implement.
(If there’s already a
DatasetReader for the dataset you want to use,
of course you can reuse that one.)
In this tutorial we’ll also implement a custom PyTorch
but you won’t need to do that in general.
Our simple tagger model uses an LSTM to capture dependencies between the words in the input sentence, but doesn’t have a great way to capture dependencies between the tags. This can be a problem for tasks like named-entity recognition where you’d never want to (for example) have a “start of a place” tag followed by an “inside a person” tag.
We’ll try to build an NER model that can outperform our simple tagger on the CoNLL 2003 dataset, which (due to licensing reasons) you’ll have to source for yourself.
The simple tagger gets about 88% span-based F1 on the validation dataset. We’d like to do better.
One way to approach this is to add a Conditional Random Field layer at the end of our tagging model. (If you’re not familiar with conditional random fields, this overview paper is helpful, as is this PyTorch tutorial.)
The “linear-chain” conditional random field we’ll implement has a
num_tags matrix of transition costs,
transitions[i, j] represents the likelihood of transitioning
j-th tag to the
In addition to whatever tags we’re trying to predict, we’ll have special
“start” and “end” tags that we’ll stick before and after each sentence
in order to capture the “transition” inherent in being the tag at the
beginning or end of a sentence.
As this is just a component of our model, we’ll implement it as a Module.
To implement a PyTorch module, we just need to inherit from
def forward(self, *input): ...
to compute the log-likelihood of the provided inputs.
To initialize this module, we just need the number of tags.
def __init__(self, num_tags: int) -> None: super().__init__() self.num_tags = num_tags # transitions[i, j] is the logit for transitioning from state i to state j. self.transitions = torch.nn.Parameter(torch.randn(num_tags, num_tags)) # Also need logits for transitioning from "start" state and to "end" state. self.start_transitions = torch.nn.Parameter(torch.randn(num_tags)) self.end_transitions = torch.nn.Parameter(torch.randn(num_tags))
I’m not going to get into the exact mechanics of how the log-likelihood is calculated; you should read the aforementioned overview paper (and look at our implementation) if you want the details. The key points are
(sequence_length, num_tags)tensor of logits representing the likelihood of each tag at each position in some sequence and a
(sequence_length,)tensor of gold tags. (In fact, we actually provide batches consisting of multiple sequences, but I’m glossing over that detail.)
viterbi_tags()method that accepts some input logits, gets the transition probabilities, and uses the Viterbi algorithm to compute the most likely sequence of tags for a given input.
CrfTagger is not terribly different from the
so we can take that as a starting point. We need to make the following changes:
crfattribute containing an appropriately initialized
We can then register the new model as
The CoNLL data is formatted like
U.N. NNP I-NP I-ORG official NN I-NP O Ekeus NNP I-NP I-PER heads VBZ I-VP O for IN I-PP O Baghdad NNP I-NP I-LOC . . O O
where each line contains a token, a part-of-speech tag, a syntactic chunk tag, and a named-entity tag. An empty line indicates the end of a sentence, and a line
-DOCSTART- -X- O O
indicates the end of a document. (Our reader is concerned only with sentences and doesn’t care about documents.)
You can poke at the code yourself, but at a high level we use
to chunk our input into groups of either “dividers” or “sentences”.
Then for each sentence we split each row into four columns,
TextField for the token, and create a
for the tags (which for us will be the NER tags).
CrfTagger model is quite similar to the
we can get away with a similar configuration file. We need to make only
a couple of changes:
"dataset_reader.tag_label"field with value “ner” (to indicate that the NER labels are what we’re predicting)
We don’t need to, but we also make a few other changes
transitionsparameters to help avoid overfitting
evaluate_on_testto true. This is mostly to ensure that our token embedding layer loads the GloVe vectors corresponding to tokens in the test data set, so that they are not treated as out-of-vocabulary at evaluation time. The second flag just evaluates the model on the test set when training stops. Use this flag cautiously, when you’re doing real science you don’t want to evaluate on your test set too often.
At this point we’re ready to train the model.
In this case our new classes are part of the
which means we can just use
but if you were to create your own model they wouldn’t be.
In that case
allennlp/run.py never loads the modules in which
you’ve defined your classes, they never get registered, and then
AllenNLP is unable to instantiate them based on the configuration file.
In such a case you’ll need to create your own such script. You can actually copy that one, the only change you need to make is to import all of your custom classes at the top:
from myallennlp.data.dataset_readers import Conll2003DatasetReader from myallennlp.models import CrfTagger
and so on. After which you’re ready to train:
$ my_run.py train tutorials/getting_started/crf_tagger.json -s /tmp/crf_model