I am using an inbuilt model of Spacy that is en_core_web_lg
and want to train it using my custom entities. While doing that, I am facing two issues,
It overwrite the new trained data with the old one and results in not recognizing the other entities. for example, Before training, it can recognize the PERSON and ORG but after training it doesn't recognize the PERSON and ORG.
During the training process, it is giving me the following error,
UserWarning: [W030] Some entities could not be aligned in the text "('I work in Google.',)"
Here is my whole code,
import spacy
import random
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training.example import Example
sentence = ""
body1 = "James work in Facebook and love to have tuna fishes in the breafast."
nlp_lg = spacy.load("en_core_web_lg")
print(nlp_lg.pipe_names)
doc = nlp_lg(body1)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
train = [
('I had tuna fish in breakfast', {'entities': [(6,14,'FOOD')]}),
('I love prawns the most', {'entities': [(6,12,'FOOD')]}),
('fish is the rich source of protein', {'entities': [(0,4,'FOOD')]}),
('I work in Google.', {'entities': [(9,15,'ORG')]})
]
ner = nlp_lg.get_pipe("ner")
for _, annotations in train:
for ent in annotations.get("entities"):
ner.add_label(ent[2])
disable_pipes = [pipe for pipe in nlp_lg.pipe_names if pipe != 'ner']
with nlp_lg.disable_pipes(*disable_pipes):
optimizer = nlp_lg.resume_training()
for interation in range(30):
random.shuffle(train)
losses = {}
batches = minibatch(train, size=compounding(1.0,4.0,1.001))
for batch in batches:
text, annotation = zip(*batch)
doc1 = nlp_lg.make_doc(str(text))
example = Example.from_dict(doc1, annotations)
nlp_lg.update(
[example],
drop = 0.5,
losses = losses,
sgd = optimizer
)
print("Losses",losses)
doc = nlp_lg(body1)
for ent in doc.ents:
print(ent.text, ent.start_char, ent.end_char, ent.label_)
Expected Output :
James 0 5 PERSON
Facebook 14 22 ORG
tuna fishes 40 51 FOOD
Currently recognizing no entities..
Please let me know where I am doing it wrong. Thanks!
Add a comment