Spatial Transcriptomic Analysis

Spatial Transcriptomic Analysis

Background

Spatial Transcriptomics (ST) has emerged as a transformative technology in molecular biology, enabling the simultaneous interrogation of gene expression and spatial context within intact tissue sections. Unlike traditional bulk or single-cell RNA sequencing, which dissociates cells and loses critical spatial information, ST preserves the architectural organization of tissues. This allows researchers to map transcriptional activity to specific anatomical regions, providing unprecedented insights into cellular heterogeneity, tissue microenvironments, and spatially regulated biological processes such as development, disease progression, and immune responses. As a result, ST has become indispensable in fields like oncology, neuroscience, and developmental biology, where spatial organization is central to biological function.

The ability to study tissue architecture with ST reveals how gene expression varies across different regions, offering deep insights into tissue organization and function. For example, in disease research, ST helps identify spatial patterns of gene expression in conditions like cancer, uncovering tumor microenvironments, immune cell infiltration, and disease progression. In developmental biology, it tracks gene expression changes during tissue development and differentiation, while also illuminating cell-cell interactions, such as signaling pathways in the brain or immune system. By bridging the gap between molecular biology and tissue morphology, ST offers a powerful tool to explore the spatial complexity of biological systems, advancing our understanding of health and disease and guiding precision medicine through the identification of spatially resolved biomarkers and therapeutic targets.

However, the complexity of ST data—combining high-dimensional gene expression profiles with spatial coordinates—poses significant computational and analytical challenges. Researchers must navigate preprocessing raw data, correcting technical artifacts, normalizing across spatial domains, and identifying spatially variable genes or patterns. Additionally, integrating ST datasets with other omics layers or single-cell references requires robust statistical frameworks. To address these challenges, the R programming language has become a cornerstone of ST analysis due to its rich ecosystem of bioinformatics packages, flexibility in handling matrix-based data, and advanced visualization capabilities.

ST encompasses a variety of technologies that preserve spatial context while capturing gene expression data. These can be broadly classified into imaging-based, array-based, and sequencing-based methods, each generating distinct types of data. Imaging-based methods, such as single-molecule fluorescence in situ hybridization (smFISH), multiplexed error-robust fluorescence in situ hybridization (MERFISH), and sequential fluorescence in situ hybridization (seqFISH), provide subcellular resolution by labeling RNA molecules with fluorescent probes. The data from these methods typically include high-resolution RNA spot images, gene expression matrices, and spatial coordinate files mapping detected transcripts to specific locations.

Array-based approaches, such as 10x Genomics Visium and Slide-Seq, rely on spatially barcoded arrays to capture mRNA from tissue sections. These methods produce gene expression matrices linked to specific spatial spots, along with metadata files detailing spot locations, image scaling factors, and tissue histology images. In Visium data, key files include filtered_feature_bc_matrix.h5 (containing gene expression counts), tissue_positions_list.csv (spatial barcode coordinates), and high- and low-resolution tissue images in .png format. Slide-Seq improves spatial resolution by using barcoded beads instead of spots, capturing transcriptomic data at a finer scale.

Sequencing-based spatial transcriptomics methods, such as STARmap and Stereo-seq, offer near-single-cell or subcellular resolution. These technologies sequence transcripts directly within the tissue, generating highly detailed gene expression matrices, spatial coordinate files, and large-scale histological images. Technologies like GeoMx Digital Spatial Profiling (DSP) further refine spatial analysis by allowing targeted profiling of specific regions of interest within a sample.

Overall, spatial transcriptomic datasets include a combination of gene expression matrices, spatial coordinates, and tissue histology images. The choice of technology influences the resolution, number of detectable genes, and data structure, making integration with histopathology and single-cell RNA sequencing a powerful approach for understanding tissue organization and cellular heterogeneity.

This article outlines a comprehensive workflow for analysing Spatial Transcriptomics data (Synthetic and 10X Visium data) in R, leveraging tools from CRAN, Bioconductor, and specialized spatial packages. The workflow encompasses data import and quality control (e.g., SpatialExperiment), normalization (e.g., scran), spatial pattern detection (e.g., SPARK, Giotto), and clustering (e.g., Seurat). Visualization techniques using ggplot2 and spatial plotting libraries enable intuitive exploration of gene expression gradients and tissue architecture. By integrating these steps, researchers can uncover spatially resolved molecular signatures, infer cell-cell interactions, and generate hypotheses about regional biological processes.

The adoption of R for ST analysis not only enhances reproducibility but also empowers researchers to customize pipelines for diverse experimental designs. This guide aims to equip biologists and bioinformaticians with the foundational tools to harness the full potential of Spatial Transcriptomics, bridging the gap between spatial biology and computational innovation.

 Spatial Transcriptomic Analysis with 10X Visium Data

1. Install and Load Required Packages

# Load libraries
library(SpatialExperiment)
library(scran)
library(scater)
library(GEOquery)
library(Seurat)
library(SPARK)
library(ggplot2)
library(dplyr)
library(patchwork)
library(Matrix)
library(SpatialCPie)
library(STexampleData)
library(ggExtra)
library(ggspavis)
library(hdf5r)
library(png)        

2. Import Data - Data Acquisition from Databases - Run smoothly

Data link: https://www.10xgenomics.com/datasets/human-ovarian-cancer-11-mm-capture-area-ffpe-2-standard

# Define the path to the extracted data directory
data_dir <- "F:/DGE_mySITE2025/SpatialTranscriptomics_2025"

# Load the spatial transcriptomic data
seurat_obj <- Load10X_Spatial(
  data.dir = data_dir,  # Path to the directory containing the data
  filename = "CytAssist_11mm_FFPE_Human_Ovarian_Carcinoma_filtered_feature_bc_matrix.h5",  # HDF5 file with gene expression data
  assay = "Spatial",  # Name of the assay
  slice = "lowres"  # Name of the image slice
)        

3. Filtering & Normalization

3.1. Filtering

Filtering involves removing low-quality cells and genes that may not contribute meaningful information to the analysis. Common filtering steps include:

Removing cells with low total counts (likely to be empty droplets or low-quality cells). Removing genes that are expressed in very few cells (likely to be noise).

# Filter cells based on the number of detected genes and total counts
seurat_obj <- subset(seurat_obj, subset = nFeature_Spatial > 200 & nCount_Spatial > 500)

# Filter genes: Keep genes that are expressed in at least 3 cells
seurat_obj <- seurat_obj[rowSums(GetAssayData(seurat_obj, assay = "Spatial", layer = "counts") > 0) >= 3, ]

# Check the dimensions of the filtered data
dim(seurat_obj)        

3.2. Normalization

Normalization is performed to account for differences in sequencing depth and other technical variations across cells. Seurat uses a global-scaling normalization method called "LogNormalize" by default, which normalizes the feature expression measurements for each cell by the total expression, multiplies by a scale factor (10,000 by default), and log-transforms the result.

# Normalize the data
seurat_obj <- NormalizeData(seurat_obj, assay = "Spatial", normalization.method = "LogNormalize", scale.factor = 10000)

# Check the normalized data
head(GetAssayData(seurat_obj, assay = "Spatial", slot = "data")[, 1:5])        

4. Visualization of Filtered and Normalized Data

Visualizing filtered and normalized data is an important step to ensure that the quality of your data is acceptable and to gain insights into the distribution of gene expression and spatial patterns. Below, I'll provide the code for visualizing filtered and normalized data using the Seurat package.

4.1. Visualize the Distribution of Gene Counts and UMI Counts

You can use violin plots to visualize the distribution of the number of genes (nFeature_Spatial) and UMI counts (nCount_Spatial) per cell after filtering:

# Violin plots for nFeature_Spatial and nCount_Spatial
VlnPlot(seurat_obj, features = c("nFeature_Spatial", "nCount_Spatial"), pt.size = 0.1)        


Article content
Figure1: Violin plots of genes (


4.2. Visualize Spatial Distribution of Gene Counts and UMI Counts

You can overlay the total gene counts or UMI counts on the spatial coordinates to see if there are any spatial biases:

# Spatial plot for nFeature_Spatial (number of genes per spot)
SpatialFeaturePlot(seurat_obj, features = "nFeature_Spatial") + theme(legend.position = "right")

# Spatial plot for nCount_Spatial (total UMI counts per spot)
SpatialFeaturePlot(seurat_obj, features = "nCount_Spatial") + theme(legend.position = "right")        


Article content
Figure 2: Spatial plot for nFeature_Spatial (number of genes per spot)



Article content
Figure 3: Spatial plot for nCount_Spatial (total UMI counts per spot)


4.3. Visualize the Distribution of Normalized Gene Expression

After normalization, you can visualize the distribution of expression for specific genes or the overall expression levels:

# Violin plot for a specific gene (e.g., a housekeeping gene like ACTB)
VlnPlot(seurat_obj, features = "ACTB", assay = "Spatial", pt.size = 0.1)

# Violin plot for the top highly variable genes (if identified)
if (length(VariableFeatures(seurat_obj)) > 0) {
  VlnPlot(seurat_obj, features = VariableFeatures(seurat_obj)[1:3], assay = "Spatial", pt.size = 0.1)
}        


Article content
Figure 4: Violin plot for a specific gene - ACTB


4.4. Visualize Spatial Distribution of Normalized Gene Expression

You can overlay the normalized expression of specific genes on the spatial coordinates to explore their spatial patterns:

# Spatial plot for a specific gene (e.g., ACTB)
SpatialFeaturePlot(seurat_obj, features = "ACTB") + theme(legend.position = "right")

# Spatial plot for the top highly variable genes (if identified)
if (length(VariableFeatures(seurat_obj)) > 0) {
  SpatialFeaturePlot(seurat_obj, features = VariableFeatures(seurat_obj)[1:3]) + theme(legend.position = "right")
}        


Article content
Figure 5: Spatial plot for a specific gene - ACTB


5.Quality Control Metrics Visualization

5.1. Scatter Plot of nFeature_Spatial vs. nCount_Spatial

You can create a scatter plot to visualize the relationship between the number of genes (nFeature_Spatial) and UMI counts (nCount_Spatial):

# Scatter plot of nFeature_Spatial vs. nCount_Spatial
FeatureScatter(seurat_obj, feature1 = "nCount_Spatial", feature2 = "nFeature_Spatial") +
  theme(legend.position = "right")        


Article content
Figure 6: Scatter plot of nFeature_Spatial vs. nCount_Spatial


6. Spatially Variable Gene (SVG) Detection

6.1. Identify Highly Variable Genes

Identifying highly variable genes (HVGs) can help focus the analysis on genes that are likely to be biologically relevant. Seurat provides a function to select HVGs based on their expression variance.

# Find highly variable genes
seurat_obj <- FindVariableFeatures(seurat_obj, assay = "Spatial", selection.method = "vst", nfeatures = 2000)

# Plot the variable features
VariableFeaturePlot(seurat_obj)        
Article content
Figure 7: Plot the variable features

6.2. Heatmap of Highly Variable Genes

If you have identified highly variable genes (HVGs), you can visualize their expression across cells using a heatmap:

seurat_obj <- ScaleData(seurat_obj, assay = "Spatial", features = rownames(seurat_obj))
# Heatmap of highly variable genes
DoHeatmap(seurat_obj, features = VariableFeatures(seurat_obj)[1:20], assay = "Spatial") + NoLegend()
# Save a spatial plot
spatial_plot <- SpatialFeaturePlot(seurat_obj, features = "ACTB") + theme(legend.position = "right")
ggsave("spatial_plot_ACTB.png", plot = spatial_plot, width = 8, height = 6)

# Save a violin plot
violin_plot <- VlnPlot(seurat_obj, features = "nFeature_Spatial", pt.size = 0.1)
ggsave("violin_plot_nFeature.png", plot = violin_plot, width = 8, height = 6)        
Article content
Figure 8: Heatmap of highly variable genes

7. Dimensionality Reduction & Clustering

7.1. Scale the Data

Scaling the data ensures that all genes have equal weight during dimensionality reduction. This step also regresses out unwanted sources of variation (e.g., technical noise or batch effects):

seurat_obj <- ScaleData(seurat_obj, assay = "Spatial", features = rownames(seurat_obj))        

7.2.Dimensionality Reduction (PCA)

Principal Component Analysis (PCA) is a common method for reducing the dimensionality of the data. It projects the data into a lower-dimensional space while preserving the most important variation:

seurat_obj <- RunPCA(seurat_obj, assay = "Spatial", features = VariableFeatures(seurat_obj))

# Visualize PCA results
DimPlot(seurat_obj, reduction = "pca")

VizDimLoadings(seurat_obj, dims = 1:2, reduction = "pca")        


Article content
Figure 9: PCA results



Article content
Figure 10: PCA results


7.3. Determine Significant PCs

To decide how many principal components (PCs) to use for downstream analysis, you can use an elbow plot or a more quantitative approach:

# Elbow plot
ElbowPlot(seurat_obj, ndims = 50)        


Article content
Figure 11: Elbow plot

Choose the number of PCs where the curve starts to flatten (the "elbow"). For example, if the elbow is around PC10, you might use 10 PCs for clustering.

7.4. Clustering (Louvain Algorithm)

Clustering groups cells with similar expression profiles. Seurat uses the Louvain algorithm for clustering:

# Find neighbors (k-nearest neighbors graph)
seurat_obj <- FindNeighbors(seurat_obj, reduction = "pca", dims = 1:10)  # Use the number of PCs determined earlier

# Perform clustering
seurat_obj <- FindClusters(seurat_obj, resolution = 0.5)  # Adjust resolution for more or fewer clusters

# View cluster assignments
head(seurat_obj@meta.data$seurat_clusters)        

7.5.Visualization (UMAP/t-SNE)

# Run UMAP
seurat_obj <- RunUMAP(seurat_obj, reduction = "pca", dims = 1:10)

# Plot UMAP
DimPlot(seurat_obj, reduction = "umap", label = TRUE)        


Article content
Figure 12: Plot UMAP

Alternatively, you can use t-SNE:

# Run t-SNE
seurat_obj <- RunTSNE(seurat_obj, reduction = "pca", dims = 1:10)

# Plot t-SNE
DimPlot(seurat_obj, reduction = "tsne", label = TRUE)        
Article content
Figure 13: Plot t-SNE

7.6. Spatial Visualization of Clusters

To visualize the clusters in the spatial context, use SpatialDimPlot:

SpatialDimPlot(seurat_obj, label = TRUE, label.size = 3)
saveRDS(seurat_obj, file = "seurat_obj_processed.rds")        
Article content
Figure 14: Clusters in the spatial context

8. Differential Expression Analysis

Differential expression analysis helps identify genes that are significantly upregulated or downregulated in each cluster compared to all other clusters. These genes are called marker genes and can provide insights into the biological identity of each cluster.

# Find marker genes for each cluster
markers <- FindAllMarkers(seurat_obj, assay = "Spatial", only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)

# View the top 5 marker genes for each cluster
top_markers <- markers %>%
  group_by(cluster) %>%
  slice_max(n = 5, order_by = avg_log2FC)

print(top_markers)        

8.1.Visualize Marker Genes

You can visualize the expression of top marker genes in the clusters using a heatmap:

# Heatmap of top marker genes
DoHeatmap(seurat_obj, features = top_markers$gene, assay = "Spatial") + NoLegend()        
Article content
Figure 15: Heatmap of top marker genes


Alternatively, you can visualize the expression of specific marker genes on the UMAP or spatial plot:

# UMAP plot for a specific marker gene
FeaturePlot(seurat_obj, features = top_markers$gene[1], reduction = "umap")

# Spatial plot for a specific marker gene
SpatialFeaturePlot(seurat_obj, features = top_markers$gene[1])        


Article content
Figure 16: UMAP plot for a specific marker gene




Article content
Figure 17: Spatial plot for a specific marker gene

9. Spatial Pattern Detection

Spatial pattern detection helps identify genes that exhibit spatial trends or gradients across the tissue. This can reveal genes associated with specific anatomical regions or functional zones.

# Extract counts and spatial coordinates
counts <- as.matrix(GetAssayData(seurat_obj, assay = "Spatial", layer = "counts"))
coordinates <- GetTissueCoordinates(seurat_obj)

# Create a SPARK object
spark_obj <- CreateSPARKObject(counts = counts, location = coordinates[, 1:2])

# Perform SPARK analysis to detect spatially variable genes
spark_obj <- spark.vc(spark_obj, 
                      covariates = NULL, 
                      lib_size = colSums(counts), 
                      num_core = 8)  # Adjust num_core based on your system

spark_obj <- spark.test(spark_obj, check_positive = TRUE, verbose = TRUE)        

10. Identify Spatially Variable Genes

Extract the top spatially variable genes (SVGs) from the SPARK results:

# Get the top spatially variable genes
svgs <- spark_obj@res_vc[order(spark_obj@res_vc$adjusted_pvalue), ]
top_svgs <- head(svgs, 10)  # Adjust the number of genes as needed

print(top_svgs)

# Spatial plot for the top spatially variable gene
SpatialFeaturePlot(seurat_obj, features = rownames(top_svgs)[1])

# Save marker genes
write.csv(markers, file = "marker_genes.csv")

# Save spatially variable genes
write.csv(svgs, file = "spatially_variable_genes.csv")        

11. Cell-cell intreaction

# Install and load CellChat
if (!require("CellChat")) {
  devtools::install_github("sqjin/CellChat")
}
library(CellChat)

# Prepare data for CellChat
# Ensure your Seurat object has cell type annotations (e.g., in seurat_obj@meta.data$seurat_clusters)
data.input <- GetAssayData(seurat_obj, assay = "Spatial", slot = "data")  # Normalized data
meta <- seurat_obj@meta.data  # Metadata
celltypes <- meta$seurat_clusters  # Cell type annotations (replace with your column name if different)

# Create a CellChat object
cellchat <- createCellChat(object = data.input, meta = meta, group.by = "seurat_clusters")

# Set ligand-receptor interaction database (use human or mouse database)
CellChatDB <- CellChatDB.human  # For human data (use CellChatDB.mouse for mouse data)
cellchat@DB <- CellChatDB

# Preprocess the data
cellchat <- subsetData(cellchat)  # Subset to genes involved in ligand-receptor interactions
cellchat <- identifyOverExpressedGenes(cellchat)
cellchat <- identifyOverExpressedInteractions(cellchat)

# Infer cell-cell communication networks
cellchat <- computeCommunProb(cellchat)

# Filter out weak interactions
cellchat <- filterCommunication(cellchat, min.cells = 10)

# Aggregate the communication network
cellchat <- aggregateNet(cellchat)

# Visualize the aggregated network
groupSize <- as.numeric(table(cellchat@idents))
netVisual_circle(cellchat@net$count, vertex.size = groupSize, vertex.label.cex = 0.8)

# Explore specific signaling pathways
cellchat <- computeCommunProbPathway(cellchat)

# Visualize a specific signaling pathway (e.g., "TGFb")
netVisual_aggregate(cellchat, signaling = "TGFb", layout = "circle")

# Save CellChat results (optional)
saveRDS(cellchat, file = "cellchat_results.rds")        

Conclusion

Spatial Transcriptomics (ST) has emerged as a ground-breaking technology that bridges the gap between molecular biology and tissue architecture, enabling researchers to explore gene expression within the spatial context of intact tissues. By preserving the spatial organization of cells, ST provides unprecedented insights into cellular heterogeneity, tissue microenvironments, and spatially regulated biological processes such as development, disease progression, and immune responses. This technology has become indispensable in fields like oncology, neuroscience, and developmental biology, where understanding the spatial organization of cells is critical to unraveling biological function.

The analysis of spatial transcriptomic data, however, presents unique computational and analytical challenges due to its high-dimensional nature and the integration of gene expression profiles with spatial coordinates. This article outlined a comprehensive workflow for analyzing ST data, focusing on 10x Visium datasets, and demonstrated how to leverage R and its rich ecosystem of bioinformatics tools to address these challenges. From data import and quality control to normalization, clustering, and spatial pattern detection, each step of the workflow is designed to extract meaningful biological insights from complex ST datasets. Advanced techniques such as differential expression analysis, spatially variable gene detection, and cell-cell interaction inference further enhance the utility of ST in uncovering spatially resolved molecular signatures and regional biological processes.

The integration of ST with single-cell RNA sequencing and other omics layers offers a powerful approach to understanding tissue organization and cellular heterogeneity at an unprecedented resolution. Tools like Seurat, SPARK, and CellChat enable researchers to map cell types, identify spatially variable genes, and infer cell-cell communication networks, providing a holistic view of tissue biology. Visualization techniques, including spatial feature plots, UMAPs, and heatmaps, facilitate the exploration of gene expression gradients and tissue architecture, making the data more interpretable and actionable.

Despite its transformative potential, spatial transcriptomics still faces challenges, such as balancing spatial resolution with transcriptomic depth, managing data complexity, and integrating multi-omics datasets. Future advancements in experimental technologies and computational tools will address these challenges, further enhancing the utility of ST in biomedical research. Emerging technologies, such as multi-omics spatial profiling and improved resolution methods, promise to unlock new insights into tissue organization and function, driving innovation in spatial biology.

In conclusion, spatial transcriptomics represents a powerful tool for exploring the spatial complexity of biological systems. By combining advanced computational workflows with cutting-edge experimental techniques, researchers can uncover spatially resolved molecular signatures, infer cell-cell interactions, and generate hypotheses about regional biological processes. As the field continues to evolve, spatial transcriptomics will play an increasingly important role in advancing our understanding of health and disease, guiding precision medicine, and driving innovation in spatial biology. This guide equips researchers with the foundational tools and methodologies to harness the full potential of spatial transcriptomics, paving the way for new discoveries in biology and medicine.

Incredibly insightful article! Spatial transcriptomics is revolutionizing our understanding of cellular interactions, and this write-up highlights its significance beautifully. Thanks for sharing! 👏

Like
Reply

To view or add a comment, sign in

More articles by Dr Suhirthakumar P.

Explore content categories