Document Retrieval in Brief

Document Retrieval in Brief

The user’s query is processed by a search engine. Using the inverted index collection algorithms related to the index together with the quantity of documents containing each term, and the length of every document. The search engine accepts queries from its users, then processes these queries and returns ranked lists of results e.g. list of related documents . When many simultaneous users must be supported, the query throughput measured in queries per second becomes a crucial factor in system performance. An indexing is a term used in the information retrieval which defines the topic of a document. An index term can be used to extract the keywords from the document. The index term can be words, numeric, or phrase. The document retrieval system consists of a database of documents and a clustering algorithm to build a full-text index. There are two main classes of indexing and cluster schema for document retrieval systems. The first, form-based (or word-based) , such as attributes (document type, author, printing year etc.). The second, content-based indexing. The content-based approach exploits semantic-connections between the parts of the documents in addition to semantic-connections between queries and documents. The weight given to the topics in a document determines the class to which the document is assigned . Another idea is to extract the semantic content from documents to link related documents. In a primary step, all chains of semantically associated phrases are detected primarily based on WordNet’s synsets. The second step, for each pair of documents, this information has to be compared. With increasing the amount of the data and emergence of the big data, the analyzing of big data needs a different technology from the earlier. The scalable algorithms are overcoming on the analyzing of big data having volume and velocity characteristics.


#WordNet #Document Retrieval #Document_indexing

To view or add a comment, sign in

More articles by Fatma Gadelrab

Explore content categories