Text Summarization – Maturity Model

Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning. So far, numerous approaches have been developed for automatic text summarization and applied widely in various domains. For example, search engines generate snippets as the previews of the documents . Other examples include news websites which produce condensed descriptions of news topics usually as headlines to facilitate browsing or knowledge extractive approaches

 Automatic text summarization is very challenging, because when we as humans summarize a piece of text, we usually read it entirely to develop our understanding, and then write a summary highlighting its main points, contexts and add something relevant if required to enrich the summery. Since computers lack human knowledge and language capability, it makes automatic text summarization a very difficult and non-trivial task

Here is our effort to identify the several stages that may possibly reach the human capability such an extent but significant or near to the human expectation of text summarization.

Keyword/Structure Based

Bayesian Topic Models, Extract & Abstract to Summarization

An extractive summarization technique produces text summary by selecting a subset of the sentences in the original text or paragraph. These extracts consist of key sentences, where key phrases are extracted appear most. Input can be a single section with multiple paragraphs.

Key steps involves

1.Extract the Key phrases, domain terms, POS and TFIDF

2.Primary Sentence selection : Identify the sentences from the input section, i.e.an intermediate representation of the input text which expresses the main aspects of the text.

3.Score the sentences based on the representation in line with Key Phrases extracted , POS and possibly use the topic modelling techniques.

4.Refine the selected sentences and construct the summary comprising of a number of sentences filtered out from the point 2 and 3.

5.Limit the no of sentences within the summary by, say 1/10 or at-least 5-10 sentences (whichever is suitable) in the summary.

Lexical Based -Context and Relationships

A Context, relationship and aided Summarization technique produces the text summary by identifying the Context, subject (topic) , identifying the relationship of sentences within the sections and considering the key terms highlighted in the summary

Key process involves

1.All the processes mentioned in the previous stage

2.Context within the Paragraph and between the sentences

3.Relationship between the sentences for the topic/s identified and segregating the subjects and Context.

4.Consider the key terms highlighted in isolation with input text.

5.Summarization with filtering out or considering the sentences for the out of context , yet relevant.

6. Summary Ranking 

Semantic/Concept Based -Multiple Sections, Context and Relationship Summarization

This stage considers the multiple sections from the same document or any other documents to arrive at the summary.

The key process also involves previous stages , on top these, it also attempts to consider the relevant sections from reference documents/free flowing texts and ranking of the short summaries , if available. For instance, History of black holes , where references considered towards significant contribution made by Scientists to enrich the summary.

Federated -Emphasized, Context Across Heterogeneous sections

This stage identifies the top of the topics by adopting the machine learning techniques and extending the context vicinity across heterogeneous sections of the input text or document or references

The key process also involves all the pre requisites of the previous stages and considers the below process

1.Emphasize the topics, contexts and POS (Person, locations, events etc)

2.Extend the context from the ignored sections/documents to enrich the +ve or –ve aspects, for instance Einstein was sent home with note as not fit to study in that school , yet a genius of all the time emerged when considering great inventions to inspire the power of his mothers belief.

3.Ranking of the Set of summaries

Inference and Reasoning based  -Close to human interpretation Long Text, multiple context & Relationship extracted with references

This stage identifies the all the pre requisites of the previous stage and perform below additional processes

1.Consider the multiple relevant contexts in selecting relevant sentences , key phrases, words or domain words while preparing the corpus for summary to be generated.

2.Natural Language generation with set of refined sentences, context and relationships in mind

3.Removing the unrelated, ambiguous, non context and non related sentences without loosing the gist.

4.Refine the summery with unambiguous interpretation.

5.Ranking of the Set of summaries.

To view or add a comment, sign in

Others also viewed

Explore content categories