Conventional result and visualization Instructions with Real-World Examples
Last Updated: 24, April, 2024 at 09:43
Conventional result and visualization Instructions with Real-World Examples.Rmd
1.Utilizing Real-World Examples in scPagwas for Comprehensive Testing
To showcase the visualization effects accurately, we have opted to use real-world examples from the referenced article instead of the software’s provided demo dataset. In this scenario, we will be utilizing the BMMC scRNA-seq data in conjunction with the GWAS data for monocyte count. It is important to note that due to the large size of the data, the computation time for this analysis may take several minutes. Hence, we recommend directly downloading the result data for convenient examination and exploration. The processed monocytecount gwas data can be download from: - Onedrive: https://1drv.ms/t/s!As-aKqXDnDUHi6sx7Hqblj2Sgl7P8w?e=pwSq1Q
BMMC example scRNA-seq data can be obtained from: - Ondrive: https://1drv.ms/u/s!As-aKqXDnDUHjfcMjN4VGtAw1Utm0w?e=bOOBJl
We can download the result files from: - Ondrive: https://1drv.ms/u/s!As-aKqXDnDUHjLNvq64d2GAK_pIGVA?e=bZSawC
Code: 1234
If you want to run the deno data, please go to Conventional Usage Instructions for scPagwas with Demo Example Data
2. Computing the singlecell and celltype result for Monocytecount trait
If we running scPagwas in multi-core in server environment, there may
cause an error:
Error: Two levels of parallelism are used. See ?assert_cores\
`
add this code before call in R environment:
export OPENBLAS_NUM_THREADS=1
There is no need to run this code in window system.
In this example, we run the scPagwas for usual step, both running singlecell and celltype process.
library(scPagwas)
Pagwas<-scPagwas_main(Pagwas =NULL,
gwas_data="./monocytecount_prune_gwas_data.txt",
Single_data ="./Seu_Hema_data.rds",
output.prefix="mono",
output.dirs="monocytecount_bmmc",
Pathway_list=Genes_by_pathway_kegg,
n.cores=2,
assay="RNA",
singlecell=T,
iters_singlecell = 100,
celltype=T,
block_annotation = block_annotation,
chrom_ld = chrom_ld)
save(Pagwas,file="./BMMC_scPagwas.RData")
Sometimes, we need to remove the objects in cache folder:
3. Result interpretation
Data Results and Output in scPagwas:
- By default (seurat_return = TRUE), scPagwas returns the data results in Seurat format, which ensures consistency with the input single-cell data format. The scPagwas results in Seurat format include two additional assays, five new columns in the meta.data, and five new columns in the misc data.
- In the output folder, there will be five result files along with an empty folder named “scPagwas_cache”. Generally, the “scPagwas_cache” folder is unnecessary and can be deleted without affecting the results.
3.1 Pagwas
> print(Pagwas)
An object of class Seurat
16478 features across 35582 samples within 3 assays
Active assay: RNA (15860 features, 5000 variable features)
2 other assays present: scPagwasPaHeritability, scPagwasPaPca
3 dimensional reductions calculated: pca, tsne, umap
> names(Pagwas@misc)
[1] "Pathway_list" "pca_cell_df" "lm_results"
[4] "bootstrap_results" "scPathways_rankPvalue" "GeneticExpressionIndex"
[7] "Merged_celltype_pvalue"
-
In this Seruat result, two new assays were added:
scPagwasPaPca: An assay for S4 type of data; the svd score result for pathways in each cells;
scPagwasPaHeritability: An assay for S4 type of data; the gPas score matrix for pathways in each cells;
These two files are intermediate files generated by scPagwas.
-
In meta.data, four columns were added:
scPagwas.TRS.Score1: inheritance related score, enrichment socre for genetics top genes;
scPagwas.upTRS.Score: Trait relevant score based on genetic relevant genes
scPagwas.downTRS.Score: Trait relevant score based on anti-genetic relevant genes
scPagwas.gPAS.score: Inheritance pathway regression effects score for each cells;
Random_Correct_BG_p: CellpValue for each cells;
Random_Correct_BG_adjp: fdr for each cells, adjust p value.
Random_Correct_BG_z:z score for eahc cells.
-
A new element names misc is added in scPagwas’s result,
Pagwas@misc
is a list including:Pathway_list: filtered pathway gene list;
pca_cell_df: a pahtway and celltype data frame for pathway svd(1’st pca) result for each celltype;
lm_results: the regression result for each celltype;
PCC: pearson correlation coefficients,heritability correlation value for each gene;
PCC_up: pearson correlation coefficients, high heritability correlation value for high gPas score of cells
PCC_down: pearson correlation coefficients, Low heritability correlation value for Low gPas score of cells
bootstrap_results: The bootstrap data frame results for celltypes including bootstrap pvalue and confidence interval.
-
Merged_celltype_pvalue: The resulting Merged_celltype_pvalue represents p-values at the cell type level, which differ from the bootstrap_results. The bootstrap_results are based on merging cell-level expression data into cell type-level expression data and performing calculations. In contrast, the Merged_celltype_pvalue is derived from the Random_Correct_BG_p results, combining the p-values of all cells within a specific cell type into a single p-value result for that cell type.
Note: Although both the bootstrap and merged strategies yield cell type-level results, they may not be identical. This is because the bootstrap approach calculates p-values based on pseudo-bulk data, which can differ from the results obtained from single-cell calculations. In contrast, the merged strategy directly integrates the results from single-cell calculations, resulting in greater consistency with single-cell results. The primary focus in scPagwas paper is on the bootstrap_results; however, during the later stages of development, we discovered that discrepancies between cell type-level and single-cell-level calculation strategies posed some challenges in downstream analyses. Hence, in the new version development, we directly utilize the integration of single-cell results to obtain cell type-level p-values. Additionally, the major advantage of the bootstrap_results is its speed, as the calculation of Random_Correct_BG_p for single cells requires at least 100 iterations, making it computationally demanding. To avoid calculating Random_Correct_BG_p, setting the parameter iters_singlecell=0 is a good option.
3.2 Pagwas : output files result
In monocytecount_bmmc folder, There is several result files :
- *_cellytpes_bootstrap_results.csv : The result of cellytpes for bootstrap p results;
- *_gene_PCC.csv : The result of gene heritability correlation with gPAS score.
- *_Pathway_singlecell_lm_results.txt : The regression result for all pahtway and single cell matrix;
- *_singlecell_scPagwas_score_pvalue.Result.csv : The data frame result for each cell inlcuding scPagwas.TRS.Score, scPagwas.downTRS.Score, scPagwas.gPAS.score, pValueHighScale, qValueHighScale;
- *_Merged_celltype_pvalue.csv : The merged singlecell pvalue to celltype pvalue.
scPagwas_cache is a temporary folder to save the SOAR data, you can remove it when finish the scPagwas.
4. Result Visualization
4.1 Visualize the scPagwas Score of single cell data in UMAP or TSNE plot.
- DimPlot of singlecell data.
require("RColorBrewer")
require("Seurat")
require("SeuratObject")
require("ggsci")
require("dplyr")
require("ggplot2")
require("ggpubr")
#check the objects
color26 <- c("#D9DD6B","#ECEFA4","#D54C4C","#8D2828","#FDD2BF","#E98580","#DF5E5E","#492F10","#334257","#476072","#548CA8","#00A19D","#ECD662","#5D8233","#284E78","#3E215D","#835151","#F08FC0","#C6B4CE","#BB8760","#FFDADA","#3C5186","#558776","#E99497","#FFBD9B","#0A1D37")
Seurat::DimPlot(Pagwas,pt.size=1,reduction="tsne",label = T, repel=TRUE)+
scale_colour_manual(name = "celltypes", values = color26)+
umap_theme()+ggtitle("Monocyte BMMC")+labs(x="TSNE",y="")+theme(aspect.ratio=1)
scPagwas.TRS.Score1 and positive(p<0.05) cells showing in dimplot.
scPagwas_Visualization(Single_data=Pagwas,
p_thre = 0.05,
FigureType = "tsne",
width = 7,
height = 7,
lowColor = "white",
highColor = "red",
output.dirs="figure",
size = 0.5,
do_plot = T)
- scPagwas.gPAS.score dimplot.
- scPagwas.TRS.Score dimplot.
4)Random_Correct_BG_adjp dimplot.
Positive cells(q value<0.05): red dot; Negative cells: other dot.
4.2 Plot the barplot of the proportion of positive Cell.
Plot the barplot of the proportion of positive Cells in celltypes:
plot_bar_positie_nagtive(seurat_obj=Pagwas,
var_ident="celltypes",
var_group="positiveCells",
vec_group_colors=c("#E8D0B3","#7EB5A6"),
do_plot = T)
Plot the barplot of the proportion of celltypes in positive and negative Cells:
plot_bar_positie_nagtive(seurat_obj=Pagwas,
var_ident="positiveCells",
var_group="celltypes",
p_thre = 0.01,
vec_group_colors=NULL,
f_color=colorRampPalette(brewer.pal(n=10, name="RdYlBu")),
do_plot = T)
4.3 Plot the heritability correlated genes
heritability_cor_scatterplot(gene_heri_cor=Pagwas@misc$PCC,
topn_genes_label=10,
color_low="#035397",
color_high ="#F32424",
color_mid = "white",
text_size=2,
do_plot=T,
max.overlaps =20,
width = 7,
height = 7)
4.5 celltypes bootstrap_results reuslt
Barplot for celltypes
Bootstrap_P_Barplot(p_results=Pagwas@misc$bootstrap_results$bp_value[-1],
p_names=rownames(Pagwas@misc$bootstrap_results)[-1],
figurenames = "Bootstrap_P_Barplot.pdf",
width = 5,
height = 7,
do_plot=T,
title = "monocytecount_bmmc")
Forest plot for estimate value.
Bootstrap_estimate_Plot(bootstrap_results=Pagwas@misc$bootstrap_results,
figurenames = "estimateplot.pdf",
width = 9,
height = 7,
do_plot=T)
4.6 Visualize the heritability correlated Pathways
Plot the specific genetics pathway for each celltypes
library(tidyverse)
library("rhdf5")
library(ggplot2)
library(grDevices)
library(stats)
library(FactoMineR)
library(scales)
library(reshape2)
library(ggdendro)
library(grImport2)
library(gridExtra)
library(grid)
library(sisal)
source(system.file("extdata", "plot_scpathway_contri_dot.R", package = "scPagwas"))
plot_scpathway_dot(Pagwas=Pagwas,
celltypes=c("01_HSC","02_Early.Eryth","05_CMP.LMPP","11_CD14.Mono.1","12_CD14.Mono.2","13_CD16.Mono","15_CLP.2","19_CD8.N","20_CD4.N1","21_CD4.N2"),
topn_path_celltype=5,
filter_p=0.05,
max_logp=15,
display_max_sizes=F,
size_var ="logrankPvalue" ,
col_var="proportion",
shape.scale = 8,
cols.use=c("lightgrey", "#E45826"),
dend_x_var = "logrankPvalue",
dist_method="euclidean",
hclust_method="ward.D",
do_plot = T,
width = 7,
height = 7)
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: aarch64-apple-darwin20 (64-bit)
#> Running under: macOS Ventura 13.4
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Asia/Shanghai
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices datasets utils methods base
#>
#> other attached packages:
#> [1] knitr_1.46
#>
#> loaded via a namespace (and not attached):
#> [1] vctrs_0.6.5 cli_3.6.2 rlang_1.1.3
#> [4] xfun_0.43 purrr_1.0.2 renv_1.0.7
#> [7] textshaping_0.3.7 jsonlite_1.8.8 htmltools_0.5.8.1
#> [10] ragg_1.3.0 sass_0.4.9 rmarkdown_2.26
#> [13] evaluate_0.23 jquerylib_0.1.4 fastmap_1.1.1
#> [16] yaml_2.3.8 lifecycle_1.0.4 memoise_2.0.1
#> [19] BiocManager_1.30.22 compiler_4.3.1 fs_1.6.3
#> [22] htmlwidgets_1.6.4 rstudioapi_0.16.0 systemfonts_1.0.6
#> [25] digest_0.6.35 R6_2.5.1 magrittr_2.0.3
#> [28] bslib_0.7.0 tools_4.3.1 pkgdown_2.0.9
#> [31] cachem_1.0.8 desc_1.4.3