Generating and Language Translating Articles using AI: A Step-by-Step Guide
Generating articles using AI has never been easier, thanks to the power of pre-trained models like GPT-2 and mBART. In this post, we'll walk you through the step-by-step process of how to use these models to generate an article on a given topic, and then translate it into any language of your choice.
GPT-2 (Generative Pre-trained Transformer 2) is a pre-trained transformer-based model developed by OpenAI. It is trained on a large dataset of web pages and is fine-tuned to perform well on various natural language processing tasks such as language translation, summarization, and text generation.
In this guide, we are using the GPT-2 model to generate an article on the topic 'Benefits of Sleeping Early'. The model is pre-trained and fine-tuned on a large dataset of web pages, so it is able to generate coherent and meaningful text on a given topic.
MBART (Multilingual denoising autoencoder for language understanding) is a pre-trained transformer-based model developed by Facebook AI. It is trained on a diverse set of monolingual data from many different languages and fine-tuned on many-to-many machine translation tasks. This allows it to perform well on tasks that involve translating between multiple languages.
In this guide, we are using the mBART model to translate the generated article from English to another language of your choice. The model is pre-trained and fine-tuned on a diverse set of monoling of languages, so it can translate the text accurately and fluently.
Generating article using AI and translate in any language (GPT2 and MBart)
Step 1: pip install transformers
pip install transformers
Step 2: import pytorch and GPT2
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
Step 3: import tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained("gpt2-large"
model = GPT2LMHeadModel.from_pretrained("gpt2-large",
pad_token_id=tokenizer.eos_token_id))
Step 4: Set the topic(Pick any)
topic = 'Benefits of Sleeping Early'
Step 5: Encode the input i.e. topic
input_ids = tokenizer.encode(topic, return_tensors='pt')
input_ids = tokenizer.encode(topic, return_tensors='pt')
Step 6: Generate Blog
# Generate Blog
#max_lenth-Number of Words in the Article
Recommended by LinkedIn
#num_beams-Number of different combination of words that can be chained together #no_repeat_ngram_size-No of words that be combined together and repeated, example: ['benefits of sleeping' can be repeated 2 times but not more ]
output = model.generate(input_ids, max_length=200, num_beams=30
no_repeat_ngram_size=4, early_stopping=True),
Step 7: Output
print(tokenizer.decode(output[0], skip_special_tokens=True))
Step 8: Save the output in a variable (‘article_en’ in this case)
article_en = tokenizer.decode(output[0], skip_special_tokens=True)
Step 9: import MBart model and tokenizer
from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50-one-to-many-mmt")
tokenizer = MBart50TokenizerFast.from_pretrained("facebook/mbart-large-50-one-to-many-mmt", src_lang="en_XX")
Step 10: input
model_inputs = tokenizer(article_en, return_tensors="pt")
Step 11: Generate tokens to be converted into Polish
# translate from English to Polis
generated_tokens = model.generate(
**model_inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["pl_PL"]
)
Step 12: Translated into Polish
translation = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True
translation)
Translated into Hindi
# translate from English to Hind
generated_tokens = model.generate(
**model_inputs,
forced_bos_token_id=tokenizer.lang_code_to_id["hi_IN"]
) i