Why Deep learning is the state of the art model for natural language processing
Natural language processing is the science that describes how computers will be able to handle natural language when it receives one as input and also produce an output of natural language data. It is a fast growing field with notable applications such as openAI chatGPT and Google Bard, which have chatbots whose response to questions analyst have said will change modern businesses operations. At the heart of these systems are deep learning models, a machine learning algorithm whose source of inspiration is the human brain.
Deep learning unlike other machine learning algorithms like binary classifier, and support vector machines previously used for natural language processing do not just match input to output but can correctly represent the data that is used for prediction. This is a key important point since the vast scope of natural language means any attempt to come up with up rules about how computer's aught to represent and interpret natural language was a difficult one.
Another challenge is that "language is symbolic and discrete." Words such as "hamburger" or "Pizza" even if same or different have no inherent way within the words themselves of telling us how they are related to each other. So most neural network have embedding layers which perform a lookup function to map for example "Pizza" into continuous vectors. Once represented as vectors it becomes available to all vector operations such as addition, subtraction and distance calculations, which are powerful tools to help to help us answer the question of whether "hamburger" and "pizza" are related.
Another advantage of the use of neural network in language processing is that it has several architectures which can be combined in novel ways. Feed-forward neural network, for example, can learn to combine input data in such a meaningful format that can easily be used to solve classification problems and recently language modeling problems. Convolutional and recurrent neural network are "specialized architectures" with the former useful for extracting local patterns and the latter, a previous go to model for sequential inputs.
Deep learning are suitable for classification, sentence generation, summarization, etc., because they can theoretical map whatever relationship exit in the input data. The relationship can be trained using gradient descent algorithm and loss functions to change parameters often referred to us weight and bias until it can correctly match input to the desired output. There are several loss functions with the possible choice dependent on the type of problem at hand. In natural language processing commonly used loss functions are hinge(binary & multi class), log loss, binary cross entropy loss, and categorical cross-entropy loss and ranking loss.
I hope I have given you enough reasons to learn deep neural network if you are interest in natural language processing?
References:
Neural Network Methods for Natural Language Processing by Yoav Goldberg.