Word prediction/Spell correction using python library-Spello

Word prediction/Spell correction using python library-Spello

As we all came through searching something in google, for some words we might not know the exact spelling. so we type something related to that word and google would suggest like “this is what you are trying to search right?”. Here is an example, most of us not using medical terms and words in everyday routine life, unless we are from the healthcare domain or healthcare-related domain.

When I tried to develop a web application for the healthcare domain, I wanted to get the disease name from the user. When a user types a disease name with minor spelling mistakes, I need to correct the spelling mistakes or predict the word as per user input. To achieve this, I’ve tried Spello library in Python.


Prerequisites:

1. We need to install Python and Pandas. In my example, I’m using Python==3.8.10 & Pandas==1.3.4

2. Install Jupyter notebook — pip install notebook==6.4.6

3. Install Spello — pip install spello==1.3.0


What is Spello?

Spello is a spell-checking library in Python. It is designed to provide an easy-to-use and flexible interface for users to check the spelling of words in their text.

For official documentaion: https://pypi.org/project/spello/ 

One of the key features of Spello is its ability to use multiple dictionaries, which allows users to customize the spell-checking process to fit their specific needs.

For example, a user could use a standard English dictionary for general text, and then switch to a specialized dictionary for technical terms.

Spello also supports fuzzy matching, which can help to identify misspelled words that are similar to words in the dictionary. This feature can be especially useful for catching typos or correcting mistakes in the text.


How Spello handles this?

  1. It is built with a combination of two models, Phoneme and Symspell.
  2. Phoneme Model uses Soundex algorithm in the background and suggests correct spellings using phonetic concepts to identify similar-sounding words.
  3. Symspell Model uses the concept of edit distance in order to suggest correct spellings.

Spello gets you the best of both, taking into consideration the context of the word as well.

Currently, this module is available in English(en) and Hindi(hi).

let’s dive into the real stuff,

  1. Open the command prompt and move to the concerned folder you like to work
  2. Install all the prerequisites
  3. Open jupyter notebook, by typing “jupyter notebook “.
No alt text provided for this image
Opening jupyter notebook

After that, the web browser opens like the below.

No alt text provided for this image
Jupyter notebook home page

To open a new workbook, click on the new button in the top right corner & select Python 3(ipykernel).

No alt text provided for this image
Opening new workbook

An empty workbook will be open as below.

No alt text provided for this image
new work book

Jupyter notebook will work as our normal command prompt, we can install packages & run our code for instant results. Here I’m installing spello by using jupyter notebook.

No alt text provided for this image
Installing spello in jupyter notebook
Import required packages & block of code for training our model.
from spello.model import SpellCorrectionMode
from nltk.tokenize import TreebankWordTokenizer
#Defining_model 
sp = SpellCorrectionModel(language='en')

#Reading_keywords_from_csv_file
df = pd.read_Csv('disease_name.csv')

#Creating_list_of_keywords
disease_name_list = df['disease_name'].tolist()

tokenizer  = TreebankWordTokenizer()
#Removing_unwanted_strings_from_keyword
list_words = [re.sub('^\W|\s'," ",w).lower().strip() for w in disease_name_list if len(w) > 2]
#Traing_our_model
sp.train(list_words)

In the above code, after importing the required packages, define the model. Then, we are reading a CSV file to train our model & converting it into a dataframe.

In the next step, we create a list from that file. Then, remove unwanted spaces & characters.

Finally, training our model with our customized keywords

The CSV file will contain data like below.

No alt text provided for this image
CSV file data samples

If everything goes well, you will see the screen like this

No alt text provided for this image
Training model
Spell-checking
keyword = input('Enter any keyword.. '
corrected_keyword = sp.spell_correct(keyword)
print('Corrected keyword is : ', corrected_keyword))

In the above code, we get keywords as input from the user. Then, we are passing the input to the spell_correct(input) method to get the corrected keyword.

No alt text provided for this image
Getting input from the user


No alt text provided for this image
Result

In the above image, we can see that user entered “nemonia” and our model predicted the correct keyword as “Pneumonia”.

Conclusion:

Most spell correction & word prediction libraries have their own dictionary. Some common dictionary words work fine in those libraries. As we discussed in the first paragraph of our story, if we need to achieve this spell correction & word prediction in a particular domain or field, we should have the option to set our own dictionary of keywords. Spello library gives us that option.

To view or add a comment, sign in

Others also viewed

Explore content categories