Semantic Based Big Data Model for Document Retrieval
Abstract
The tag-based document retrieval aims to address a challenge to search relevant text documents given a set of tags. The tagging aims to address a mission to search relevant text-documents given a set of tags. In addition, the tag primarily based approaches acquired wide attention as a possible approach to the big-content. Probabilistic topic version methods, consisting of Dirichlet distribution and non-negative matrix factorization are used for the tagging process. Both have many challenges. The iterations in addition to semantic coherence are considered challenges in semantic tagging applications. In addition to the scalability issue for the large-scale data. In light of this, this thesis has two major goals. First, proposing a learning tagging model called semantic non-negative matrix factorization (#SNNMF), which introduces the utilization of the semantic text representation via a knowledge-based approach to extract the term-topic matrix and the topic-document matrix via semantically approach. Second, proposing a distributed version of the #SNNMF algorithm (named as #DistSNNMF), the training task is splitting into many sub batch tasks and distributed across multiple worker nodes, such that the whole training process is accelerated. The results showed the proposed model(#SNNMF) has an ability to generate extra topics with semantic coherence, this by having a sensitivity to the disambiguation of meaning in addition to extra dimensionality reduction to the semantic topic coherence. In addition to, the results of the proposed model(#DistSNNMF) explored the strong relation between the characteristics of the distributed job to require the accelerator (#GPU), such as the core function ,the frequency number of the core function and the IO files per the distributed job.
#Semantic_tagging #Semantically_dimensionality_reduction #Semantic_topic_model #Distributed_topic_model #Scalability_topic_model #Machinelearning #Machine_Learning #machinelearning #NMF #topic_model #Tensorflow #HPC #NLP #Natural_language_processing