Getting Started with Text Summarization

Anirban Saha

Published Jun 26, 2020

What is Text Summarization?

“Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks).”

— Page 1, Advances in Automatic Text Summarization, 1999.

Across the industry, we often face a situation where we need to read a 50-page document to realise that there are only 5 takeaway points which are of value. Text Summarization is the task of condensing this piece of information into those 5 takeaway points while preserving key informational elements and the crux of the content.

Now at this age of Information, when content is being generated every second, manual text summarization is infeasible. With advances in Artificial Intelligence, Deep Learning and Natural Language Processing, we have tools to achieve this task and solve this problem of summarization for us.

“Automatic text summarization is the task of producing a concise and fluent summary while preserving key information content and overall meaning”

-Text Summarization Techniques: A Brief Survey, 2017

Benefits of Automatic Text Summarization:

1. Summaries reduces reading time and presents actionable insights in a concise form.

2. During the selection of articles for more further analysis, summarization significantly quickens the process by reducing redundant data.

3. Automatic Text Summarization algorithms are less biased than human summarizers.

4. Using Automatic Text Summarization enables commercial abstract services to increase the number of text documents they can process.

Types of Text Summarization

Now, most importantly, based on the Type of Output, we have two main approaches:

Extractive Summarization

This approach selects the sentences from the text corpus which best summarizes the document and arranges these sentences to form the summary. The sentences are selected based on scoring functions and accordingly, we have different extractive summarization approaches like:

1. NLTK Summarizer: Scores sentences based on word frequencies.

2. Gensim Summarizer: Scores sentences using the Text Rank algorithm which is built upon the PageRank algorithm which is used by Google.

3. Summa Summarizer: It is an improvisation of the Gensim algorithm which optimises on the different sentences’ similarity scoring algorithms.

4. Extractive-BERT summarizer: Uses word embeddings followed by Clustering and chooses sentences closest to the cluster centres as the summary sentences. It uses co-referencing to resolve words in summaries that need more context.

Abstractive Summarization

This approach interprets texts using advanced Natural Language techniques involving Deep Learning to generate novel sentences, it entails paraphrasing and forms its own words and sentences to produce a more coherent summary, like what a human would generate.

Out of the box implementations that can be used are Abstractive-BERT summarizer and Google’s T5 summarizer. Recurrent Neural Networks can be trained using encoder-decoder models with Attention Layers to implement a novel abstractive summarizer.

Our Experience with Text Summarization

Based on our experience with implementing each of these techniques we observed that for documents where the crux is to portray a conversation, or the content can be perceived to be in the form of elaborate dialogue, abstractive summarizers understand the topic of this discussion and summarizes the document with its natural language understanding to come up with novel sentences to express the emotion of the document in a clear, concise and coherent way. However, when working with documents like resumes, where-in each statement itself is a summary of experience, extractive summarizers use algorithms like Text Rank algorithm to come up with the most important sentences and does a better job than Abstractive Summarizers.

Summary

Text Summarization as a field is slowly gaining traction in the wider field of Natural Language Processing (NLP), because of the impulse with which textual data is generated in this age and the ever-growing need to work on this data by shrinking it to aid downstream applications like news digests, report generation, news summarization and headline generation.

Extractive Summarizations are easier to implement and maintains the accuracy of text in the summary as they are directly extracted from the text corpus. However, to introduce paraphrasing and generalization we need to introduce abstractive methods. With growing advances in Deep Learning and NLP and as research in this area continues, we expect to see significant advances in the abstractive summarization domain soon enough to generate grammatically correct human-like summaries with speed, accuracy, and easier implementations.

Anirban Saha 5y

You can try it out at https://colab.research.google.com/drive/1IsB4fWe7gt2YgbUjCEiRW36wqr1AS-oM?usp=sharing

Anirban Saha 5y

https://github.com/anirbansaha96/AI-ML-Playground/blob/master/Google_T5_Abstractive_Summarizer.ipynb

Anirban Saha 5y

You can find more about these at https://github.com/anirbansaha96/AI-ML-Playground/blob/master/abstractive_bert.ipynb

See more comments

To view or add a comment, sign in

Getting Started with Text Summarization

Anirban Saha

What is Text Summarization?

Benefits of Automatic Text Summarization:

Types of Text Summarization

Extractive Summarization

Abstractive Summarization

Our Experience with Text Summarization

Summary

More articles by Anirban Saha

Others also viewed

Google's Nano Banana Just Launched – Google’s New AI Image Editing Tool Overview

Count Vectorizer vs TFIDF Vectorizer | Natural Language Processing

Spam Emails Identification & Filtering

Understanding GraphRAG and Its Challenges

NATURAL LANGUAGE PROCESSING

Self‑Adapting Language Models: LLMs That Update Themselves

Fuzzy Wuzzy Matching

Understanding RAG: How Retrieval-Augmented Generation Works with Vector Databases for Efficient LLM Responses

Should You Try and Train Domain-Specific Embeddings?

Version 2 : From PDFs to Profits: Building an AI-Powered Financial Chatbot with LangChain & OpenAI

Explore content categories

What is Text Summarization?

Benefits of Automatic Text Summarization:

Types of Text Summarization

Extractive Summarization

Abstractive Summarization

Our Experience with Text Summarization

Summary

More articles by Anirban Saha

Getting Started with Moogsoft AIOps

Getting Started with Splunk

'Machine Learning Onramp' by Mathworks.

Others also viewed

Google's Nano Banana Just Launched – Google’s New AI Image Editing Tool Overview

Count Vectorizer vs TFIDF Vectorizer | Natural Language Processing

Spam Emails Identification & Filtering

Understanding GraphRAG and Its Challenges

NATURAL LANGUAGE PROCESSING

Self‑Adapting Language Models: LLMs That Update Themselves

Fuzzy Wuzzy Matching

Understanding RAG: How Retrieval-Augmented Generation Works with Vector Databases for Efficient LLM Responses

Should You Try and Train Domain-Specific Embeddings?

Version 2 : From PDFs to Profits: Building an AI-Powered Financial Chatbot with LangChain & OpenAI

Similar topics

Automatic Summarization Processes

Natural Language Processing Algorithms

Utilizing Natural Language Processing in AI Recommendations

Explore content categories