Using Knowledge Engineering to perform NLP analysis over student feedback data
The universities always get lots of free text feedback from students which can be quite useful. How to get insights from these free texts is very important. Analying the free text data is a Natural Language Process problem and there are basically two types of methodologies to process free text data - Knowledge Engineering and Machine Learning.
Machine Learning
The framework of apply machine learning algorithms to free-text data normally includes the following steps:
1. Split the sentences to words.
2. Remove the “stop words”.
3. Apply stemming.
4. Transform the free text data to “bag of words”.
5. Label the text with “positive” or “negative” (if not labeled).
6. Building the model over the “training” and “testing” datasets - normally a Naïve Bayes Model is used.
7. Predict the feedback use the trained model.
There are lots of pros and cons to use machine learning for NLP. The most important con is most of the feedback has no label of “positive” or “negative” - so labelling is needed before building the models, which always cost lots of work.
Knowledge Engineering
On the other hand, Knowledge Engineering is rule based and developed by experienced language engineers. SPSS text analytics and SAS text miner are examples. In our work, we choose SPSS and SAS to build our NLP framework. The results show that knowledge engineering is quite powerful in analyzing students feedback data. For confidential reasons, the following examples only show how the Knowledge Engineering works, all the data used in this article are artificial.
1. Sentimental Analysis
The feedback can be classified into “positive”, “negative”, “mixed” or “neutral” - based on rules defined by default or by users, see following pic.
2. Extract top concepts
We can extract concepts that appears most in the feedback, see following pic.
Here, what we get from students feedback data will be different, the top concepts may include “assignment”, “international flight” (during the pandemic).
3. Mining for Text Links
We can detect the connections between concepts using this feature, see following pic.
Here, we can find similar connections in student feedback - students “like” or “dislike” certain things – such as a class, a tutorial, or even a certain person.
4. Extract Topics
The software also supports auto detection of topics as well as user defined topics. The comments will be grouped into different topics based on the definition. The framework will be something like this:
The topics extracted from the comments can be something like this:
If we define our own topics, such as “online course”, “teaching quality”, the comments related with these topics will be grouped together.
After we classify the feedback into different topics and sentimental category using this framework, we converted unstructured data into structured data. Then we can perform analysis over the new dataset. We can analyze the topics difference among different faculties, or among postgraduates and undergraduates. We can build reports to show the sentimental difference between male and female using PowerBI or Tableau. Furthermore, we can perform co-relationship analysis between attributes and build predictive models. Through this work, we extract lots of very interesting insights form the feedbacks which allow universities to response accordingly.
The real framework is much more complicated, and this article just shows a demo of how we can utilize the NLP to better understand our students.