Explore Network Navigate Register Login
  • Sign Up
  • Sign In
  • Home
  • ──
  • View
2

Update built-in NER model of Spacy instead of overwrite

  • Zubair Bin Hasan
  • Jan 04, 2025
  • 0 answers
  • 10 times viewed

I am using an inbuilt model of Spacy that is en_core_web_lg and want to train it using my custom entities. While doing that, I am facing two issues,

  1. It overwrite the new trained data with the old one and results in not recognizing the other entities. for example, Before training, it can recognize the PERSON and ORG but after training it doesn't recognize the PERSON and ORG.

  2. During the training process, it is giving me the following error,

    UserWarning: [W030] Some entities could not be aligned in the text "('I work in Google.',)" 

    Here is my whole code,

    import spacy
    import random
    from spacy.util import minibatch, compounding
    from pathlib import Path
    from spacy.training.example import Example
    sentence = ""
    body1 = "James work in Facebook and love to have tuna fishes in the breafast."
    nlp_lg = spacy.load("en_core_web_lg")
    print(nlp_lg.pipe_names)
    doc = nlp_lg(body1)
    for ent in doc.ents:
        print(ent.text, ent.start_char, ent.end_char, ent.label_)
    
    
    train = [
        ('I had tuna fish in breakfast', {'entities': [(6,14,'FOOD')]}),
        ('I love prawns the most', {'entities': [(6,12,'FOOD')]}),
        ('fish is the rich source of protein', {'entities': [(0,4,'FOOD')]}),
        ('I work in Google.', {'entities': [(9,15,'ORG')]})
        ]
    
    
    ner = nlp_lg.get_pipe("ner")
    
    for _, annotations in train:
        for ent in annotations.get("entities"):
            ner.add_label(ent[2])
    
    disable_pipes = [pipe for pipe in nlp_lg.pipe_names if pipe != 'ner']
    
    with nlp_lg.disable_pipes(*disable_pipes):
        optimizer = nlp_lg.resume_training()
        for interation in range(30):
            random.shuffle(train)
            losses = {}
    
            batches = minibatch(train, size=compounding(1.0,4.0,1.001))
            for batch in batches:
                text, annotation = zip(*batch)
                doc1 = nlp_lg.make_doc(str(text))
                example = Example.from_dict(doc1, annotations)
                nlp_lg.update(
                    [example],
                    drop = 0.5,
                    losses = losses,
                    sgd = optimizer
                    )
                print("Losses",losses)
    
    doc = nlp_lg(body1)
    for ent in doc.ents:
        print(ent.text, ent.start_char, ent.end_char, ent.label_)

     

    Expected Output :

    James 0 5 PERSON
    Facebook 14 22 ORG
    tuna fishes 40 51 FOOD

     

    Currently recognizing no entities..

    Please let me know where I am doing it wrong. Thanks!

Add a comment

Tags nlp python
Share

0 Answer

Write your Answer

Categories

Education Websites, IT & Software Accounting & Human Resources Health & Medicine

Recent Posts

Sign In

Welcome back!

To stay connected with the education world, login with your personal details.

Forgot Password?
Trouble logging in?

Enter your email, phone, or username and we'll send you a link to get back into your account..

Back to Login
New User? Register

Sign Up

Learner Master Boffin
Use 8+ characters with at least one number.
Use 8+ characters with at least one number.
Use 8+ characters with at least one number.
Already User? Login

SUPPORT

Privacy & Policy Refund Policy Terms & Conditions Coins & Pricing

For Users

How It Works - Boffin How It Works - Master How It Works - LearnerReport bug

ABOUT

About Us FAQs
© 2025 Cyfinex