Jul 7, 2023 | 5 minute read
Varshith S
Software Developer, SofTronicLabs
How to Detect Person Names in Text Using SpaCy and Named Entity Recognition Algorithm
Jul 7, 2023 | 5 minute read
Software Developer, SofTronicLabs
Named Entity Recognition (NER) is a crucial task in Natural Language Processing (NLP) that involves identifying and categorizing named entities into predefined classes. These entities can include names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, and more. In this blog post, we will delve into training a NER model to specifically recognize person names using the spaCy library, a widely-used tool for NLP tasks.
Setting Up the Environment:
We start by setting up the environment. We import the necessary libraries, including spaCy for NLP tasks, and additional modules required for our implementation. We then initialize a blank spaCy model for the English language.
import spacy from spacy.training.example import Example
nlp = spacy.blank("en")
ner = nlp.add_pipe("ner")
ner.add_label("PERSON")
Preparing the Training Data:
we prepare the training data. It consists of sentences paired with annotations for person names. Each annotation includes the start and end indices of the person name in the sentence.
train_data = [
("John met Alice at the park.", {"entities"; [(0, 4, "PERSON"), (8, 12, "PERSON")]}),
("Mary loves watching movies.", {"entities"; [(0, 4, "PERSON")]}),
("Emily invited David to her birthday party.", {"entities"; [(0, 5, "PERSON"), (15, 19, "PERSON")]}),
("David surprised Alice with a bouquet of flowers.", {"entities"; [(0, 5, "PERSON"), (15, 19, "PERSON")]}),
("Sarah called John to discuss the project.", {"entities"; [(0, 5, "PERSON"), (12, 16, "PERSON")]}),
("Robert Downey Jr. impressed everyone with his acting skills.", {"entities"; [(0, 14, "PERSON")]}),
("Alice introduced Mary to her favorite book.", {"entities"; [(0, 5, "PERSON"), (15, 19, "PERSON")]}),
("John helped Emily with her homework.", {"entities"; [(0, 4, "PERSON"), (10, 14, "PERSON")]}),
("David congratulated Sarah on her promotion.", {"entities"; [(0, 5, "PERSON"), (14, 18, "PERSON")]}),
("Mary asked Robert Downey Jr. for an autograph.", {"entities"; [(0, 4, "PERSON"), (10, 24, "PERSON")]}),
("My friend, Sarah, is coming over.", {"entities"; [(9, 14, "PERSON")]}),
("The actor, Robert Downey Jr., starred in the movie.", {"entities"; [(13, 28, "PERSON")]}),
("Our neighbor, Mary, baked delicious cookies.", {"entities"; [(13, 17, "PERSON")]}),
("The author, Emily, is signing books at the bookstore.", {"entities"; [(12, 17, "PERSON")]}),
("My classmate, David, won the science fair.", {"entities"; [(12, 17, "PERSON")]}),
("The singer, John, performed a beautiful song.", {"entities"; [(12, 16, "PERSON")]}),
("Our colleague, Alice, organized the charity event.", {"entities"; [(13, 18, "PERSON")]}),
("The chef, Sarah, prepared a fantastic meal.", {"entities"; [(9, 14, "PERSON")]}),
("The artist, Mary, created stunning paintings.", {"entities"; [(9, 14, "PERSON")]}),
("The architect, David, designed the new building.", {"entities"; [(11, 16, "PERSON")]}),
("After the meeting, John discussed the agenda.", {"entities"; [(19, 23, "PERSON")]}),
("In the morning, Emily will give a presentation.", {"entities"; [(16, 21, "PERSON")]}),
("During the break, David grabbed a cup of coffee.", {"entities"; [(20, 24, "PERSON")]}),
("Following the instructions, Alice completed the task.", {"entities"; [(27, 31, "PERSON")]}),
("After the rain stopped, Mary went for a walk.", {"entities"; [(22, 26, "PERSON")]}),
("In the evening, Robert Downey Jr. attended the premiere.", {"entities"; [(22, 37, "PERSON")]}),
("During the interview, Sarah answered questions confidently.", {"entities"; [(20, 25, "PERSON")]}),
("Before the event started, John welcomed the guests.", {"entities"; [(26, 30, "PERSON")]}),
("In the afternoon, David will meet with the clients.", {"entities"; [(19, 23, "PERSON")]}),
("After the conference call, Alice sent out the report.", {"entities"; [(26, 30, "PERSON")]}),
("Hello, Sarah, how are you?", {"entities"; [(7, 12, "PERSON")]}),
("Robert, could you pass the salt?", {"entities"; [(0, 6, "PERSON")]}),
("Mary, please join us for dinner.", {"entities"; [(0, 4, "PERSON")]}),
("John, don't forget to bring your laptop.", {"entities"; [(0, 4, "PERSON")]}),
("David, could you share your thoughts on this matter?", {"entities"; [(0, 5, "PERSON")]}),
("Emily, could you help me with this crossword puzzle?", {"entities"; [(0, 5, "PERSON")]}),
("Sarah, I appreciate your assistance.", {"entities"; [(0, 5, "PERSON")]}),
("Alice, please take a seat at the conference table.", {"entities"; [(0, 5, "PERSON")]}),
("Mary, congratulations on your achievement.", {"entities"; [(0, 4, "PERSON")]}),
("John, let's catch up over a cup of coffee.", {"entities"; [(0, 4, "PERSON")]}),
("Mary said, 'David is coming to the party.'", {"entities"; [(0, 4, "PERSON"), (17, 21, "PERSON")]}),
("'Alice,' John called out, 'come here.'", {"entities"; [(1, 6, "PERSON")]}),
("Sarah exclaimed, 'I can't believe it!'", {"entities"; [(0, 5, "PERSON")]}),
("'Robert Downey Jr.,' the fan said, 'you're my favorite actor.'", {"entities"; [(0, 15, "PERSON")]}),
("'Emily,' her teacher asked, 'can you solve this math problem?'", {"entities"; [(1, 6, "PERSON")]}),
("'David,' his friend remarked, 'you're always so helpful.'", {"entities"; [(1, 6, "PERSON")]}),
("'John,' his colleague inquired, 'did you receive my email?'", {"entities"; [(1, 6, "PERSON")]}),
("'Alice,' her sister pleaded, 'please lend me your dress.'", {"entities"; [(1, 6, "PERSON")]}),
("'Mary,' her mother advised, 'wear a jacket; it's chilly outside.'", {"entities"; [(1, 6, "PERSON")]}),
("'Sarah,' her coworker noted, 'you did an excellent job.'", {"entities"; [(1, 6, "PERSON")]}),
("Professor Smith will lecture on history.", {"entities"; [(0, 14, "PERSON")]}),
("The director, Alice Johnson, will make the announcement.", {"entities"; [(13, 25, "PERSON")]}),
("Our CEO, John, will address the company.", {"entities"; [(9, 13, "PERSON")]}),
("The manager, Mary, is organizing the team-building event.", {"entities"; [(13, 17, "PERSON")]}),
("David is the lead developer for the project.", {"entities"; [(0, 5, "PERSON")]}),
("Emily is the head of the marketing department.", {"entities"; [(0, 5, "PERSON")]}),
("Robert Downey Jr. is a renowned actor in Hollywood.", {"entities"; [(0, 15, "PERSON")]}),
("Sarah works as a financial analyst at the bank.", {"entities"; [(0, 5, "PERSON")]}),
("Alice is the editor-in-chief of the magazine.", {"entities"; [(0, 5, "PERSON")]}),
("John was promoted to senior manager.", {"entities"; [(0, 4, "PERSON")]}),
("The winner of the competition is Emily Wilson.", {"entities"; [(30, 41, "PERSON")]}),
("We are pleased to announce that David Brown will lead the project.", {"entities"; [(32, 44, "PERSON")]}),
("John Smith received the Employee of the Month award.", {"entities"; [(0, 10, "PERSON")]}),
("Mary Davis was elected as the new team captain.", {"entities"; [(0, 9, "PERSON")]}),
("The company picnic will be held at the park on Saturday.", {"entities"; [(23, 27, "PERSON")]}),
("Emily Wilson's book signing event is scheduled for next week.", {"entities"; [(0, 12, "PERSON")]}),
("David Brown's presentation received a standing ovation.", {"entities"; [(0, 11, "PERSON")]}),
("Alice Johnson was recognized for her outstanding contributions.", {"entities"; [(0, 12, "PERSON")]}),
("The annual charity gala will take place on Friday night.", {"entities"; [(4, 23, "PERSON")]}),
("John Smith achieved a record-breaking sales performance.", {"entities"; [(0, 10, "PERSON")]}),
("Allmark IT Company was known for its challenging projects and high salaries.", {"entities"; [(0, 19, "ORG")]}),
("Ashutosh, the project manager, was responsible for overseeing the project's progress.", {"entities"; [(0, 8, "PERSON"), (10, 26, "JOB_TITLE")]}),
("Gunavanth, the team lead, worked closely with Ashutosh to ensure the team met project goals.", {"entities"; [(0, 9, "PERSON"), (11, 20, "JOB_TITLE"), (41, 48, "PERSON")]}),
("Matti and Lantti, two software engineers, were known for their coding skills.", {"entities"; [(0, 13, "PERSON"), (18, 35, "JOB_TITLE")]}),
("Rutta, another software engineer, was also a part of the team.", {"entities"; [(0, 6, "PERSON"), (13, 30, "JOB_TITLE")]}),
("Naayaki, the test engineer, ensured the software was thoroughly tested for quality.", {"entities"; [(0, 7, "PERSON"), (11, 24, "JOB_TITLE")]}),
("Due to schedule invariance and defect leakage, stress in the work culture began to rise.", {"entities"; []}),
("Despite the challenges, the company continued to offer competitive salaries.", {"entities"; []}),
("Unfortunately, the project faced a setback, leading to its failure.", {"entities"; []}),
("In response to the failure, Rutta, a talented software engineer, decided to write a negative review.", {"entities"; [(27, 32, "PERSON"), (34, 51, "JOB_TITLE")]}),
]
Raining the NER Model:
We proceed to train the NER model using a training loop. This loop involves shuffling the training data and updating the model using spaCy's update() method. We iterate through the training data, create training examples, and update the model accordingly.
optimizer = nlp.begin_training()
for _ in range(20);
Random.shuffle(train_data)
Losses = {}
For text, annotations in train_data;
Example = Example.from_dict(nlp.make_doc(text), annotations)
Nlp.update([example], drop=0.5, losses=losses)
Print("Losses=", losses)
Saving the Trained Model:
Once the model is trained, we save it to a directory for later use.
nlp.to_disk("ner_model")
Input Text Processing:
Finally, we test our trained model with sample text to recognize person names.
test_text = “”"Michael PERSON, the project manager, was known for his exceptional management
Skills and forward-thinking. When Michael PERSON assigned some tasks to Sarah PERSON, Sarah PERSON collaborated with
An engineer named Alex PERSON to complete the assigned tasks.
””
Doc = nlp(test_text)
Extracting Named Entities and Displaying:
Once we have processed the sample text and obtained a Doc object, we can extract named entities recognized by the NER model. Named entities are specific pieces of information that the model identifies as entities of interest, such as names of persons, organizations, locations, and more.
for ent in doc.ents;
print(ent.text, ent.label_)
Conclusion:
we explored training a Named Entity Recognizer (NER) using spaCy to identify person names. We set up the environment, prepared the training data with annotated sentences, and trained the model using a training loop. After successful training, we saved the model for future use. Testing the model on sample text confirmed its ability to recognize person names accurately. Named Entity Recognition with spaCy opens up possibilities for various NLP applications, showcasing its efficiency and effectiveness in entity recognition tasks.
No matching comments available.