Update built-in NER model of Spacy instead of overwrite

Zubair Bin Hasan
Jan 04, 2025
0 answers
10 times viewed

I am using an inbuilt model of Spacy that is en_core_web_lg and want to train it using my custom entities. While doing that, I am facing two issues,

It overwrite the new trained data with the old one and results in not recognizing the other entities. for example, Before training, it can recognize the PERSON and ORG but after training it doesn't recognize the PERSON and ORG.

During the training process, it is giving me the following error,

UserWarning: [W030] Some entities could not be aligned in the text "('I work in Google.',)"

Here is my whole code,

import spacy
import random
from spacy.util import minibatch, compounding
from pathlib import Path
from spacy.training.example import Example
sentence = ""
body1 = "James work in Facebook and love to have tuna fishes in the breafast."
nlp_lg = spacy.load("en_core_web_lg")
print(nlp_lg.pipe_names)
doc = nlp_lg(body1)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)


train = [
    ('I had tuna fish in breakfast', {'entities': [(6,14,'FOOD')]}),
    ('I love prawns the most', {'entities': [(6,12,'FOOD')]}),
    ('fish is the rich source of protein', {'entities': [(0,4,'FOOD')]}),
    ('I work in Google.', {'entities': [(9,15,'ORG')]})
    ]


ner = nlp_lg.get_pipe("ner")

for _, annotations in train:
    for ent in annotations.get("entities"):
        ner.add_label(ent[2])

disable_pipes = [pipe for pipe in nlp_lg.pipe_names if pipe != 'ner']

with nlp_lg.disable_pipes(*disable_pipes):
    optimizer = nlp_lg.resume_training()
    for interation in range(30):
        random.shuffle(train)
        losses = {}

        batches = minibatch(train, size=compounding(1.0,4.0,1.001))
        for batch in batches:
            text, annotation = zip(*batch)
            doc1 = nlp_lg.make_doc(str(text))
            example = Example.from_dict(doc1, annotations)
            nlp_lg.update(
                [example],
                drop = 0.5,
                losses = losses,
                sgd = optimizer
                )
            print("Losses",losses)

doc = nlp_lg(body1)
for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Expected Output :

James 0 5 PERSON
Facebook 14 22 ORG
tuna fishes 40 51 FOOD

Currently recognizing no entities..

Please let me know where I am doing it wrong. Thanks!

Update built-in NER model of Spacy instead of overwrite

0 Answer

Write your Answer

Update built-in NER model of Spacy instead of overwrite

0 Answer

Write your Answer

Sign In

Welcome back!

Trouble logging in?

Sign Up