How to Draw Heatmap of Multiple Sequence Alignment in R

How to Draw Heatmap of Multiple Sequence Alignment in R

Recently I had the privilege to help a few fellow bioinformaticians in a telegram group and one question among them was how to draw a heatmap of MSA. My first response was to convert the distance to a matrix and plot a heatmap. But then it struck me. I am pretty sure someone already thought about this and have definitely made a package to do the same. And there it was. The key was to make it work. Unfortunately, i faced a few simple easy-to-miss mistakes anyone may face. That's why I am writing this today.

Ingredients :

A newly aligned fasta sequence.

R packages: Biostrings, ggmsa and pheatmap

This tutorial 😄

★ Remember it's not a regular fasta file. It should be an aligned fasta file.

This is a regular fasta file.

No alt text provided for this image

This is an aligned fasta file.

No alt text provided for this image

You can either use online MSA tools like clustalo, Muscle or go for offline tools of the same using Unipro UGENE or MEGA to align your fasta sequences.

Download the aligned sequences as fasta.

If you are using Unipro UGENE, then import the sequences, then perform MSA.

No alt text provided for this image

Then export the aligned files as FASTA.

No alt text provided for this image
No alt text provided for this image

Then use this file to create the heatmap.

Now for the R part.

library(Biostrings
library(ggmsa)

protein_sequences <- system.file("extdata", "MSA.fasta", package = "ggmsa")


aln = readAAMultipleAlignment(protein_sequences)

ggmsa(protein_sequences, start = 265, end = 300)


aln = unmasked(aln)
names(aln)[1]


ref = aln[1]

bm = sapply(1:length(aln),function(i){
  as.numeric(as.matrix(aln[i])==as.matrix(ref))
})

bm = t(bm)
rownames(bm) = names(aln)

library(pheatmap)
pheatmap(bm[nrow(bm):1,265:300],cluster_rows=FALSE,cluster_cols=FALSE))        

You can download the files used from this GitHub repository

The result :

No alt text provided for this image

★ Now, the ggmsa package sometimes shows problems importing a fasta file.

The only workaround is to paste your aligned fasta file inside the extdata of the package installation files itself. Usually, the file location in windows OS is “C:\Users\#user-name#\AppData\Local\R\win-library\4.2\ggmsa\extdata”.

Now you can directly call your fasta file and everything works just fine.

Thank you.

To view or add a comment, sign in

More articles by Ambu Vijayan

Others also viewed

Explore content categories