Main wrapper functions for scPagwas — scPagwas

Usage

scPagwas_main(
  Pagwas = NULL,
  gwas_data = NULL,
  output.prefix = "Test",
  output.dirs = "scPagwastest_output",
  block_annotation = block_annotation,
  Single_data = NULL,
  assay = "RNA",
  Pathway_list = Genes_by_pathway_kegg,
  chrom_ld = chrom_ld,
  run_split = FALSE,
  n.cores = 1,
  marg = 10000,
  maf_filter = 0.01,
  min_clustercells = 10,
  min.pathway.size = 5,
  max.pathway.size = 300,
  iters_celltype = 200,
  iters_singlecell = 100,
  n_topgenes = 1000,
  singlecell = TRUE,
  celltype = TRUE,
  seurat_return = TRUE,
  remove_outlier = TRUE
)

Arguments

Pagwas: = NULL: This parameter is typically not required and does not need any input data. When seurat_return = FALSE, all intermediate data is stored in the "Pagwas" list and returned as the result. This result can be inherited and used as input for subsequent calculations. In certain scenarios, such as when performing two computations with the same single-cell data input but different GWAS data inputs, the result list obtained from the first computation can be used as the "Pagwas" parameter input for the second computation. This allows skipping the single-cell calculations, significantly expediting the process. However, when seurat_return = TRUE, the returned result cannot be manipulated in this manner, as it is the final Seurat result with many intermediate data removed.
gwas_data: (data.frame)GWAS summary data; It must have some colmuns such as: chrom| pos | rsid | se | beta | maf 6 | 119968580 | rs1159767 | 0.032 | 0.019 |0.5275 10 | 130566523 | rs559109 | 0.033 | 0.045 |0.4047 5 | 133328825 | rs6893145 | 0.048 | 0.144 |0.1222 7 | 146652932 | rs13228798| 0.035 | 0.003 | 0.3211
output.prefix: = "Test": This parameter sets the prefix for the output result files.
output.dirs: = "scPagwastest_output": This parameter specifies the directory for the output result files.
block_annotation: (data.frame) Start and end points for block traits, usually genes.
Single_data: (character or seurat)Input the Single data in seurat format, or the seurat data address for rds format.Idents should be the celltypes annotation.
assay: (character)assay data of your single cell data to use, default is "RNA"
Pathway_list: (list,character) pathway gene sets list
chrom_ld: (list,numeric)LD data for 22 chromosome.
run_split: (logical) Whether the input single cell data is a split sub-data, if TRUE, one result(gPas score) is return, if FALSE, the whole function is running. default is FALSE.
n.cores: cores for regression
marg: (integr) the distance to TSS site,default is 10000, then gene-TSS-window size is 20000.
maf_filter: (numeric)Filter the maf, default is 0.01
min_clustercells: (integr)Only use is when FilterSingleCell is TRUE.Threshold for total cells fo each cluster.default is 10
min.pathway.size: (integr)Threshold for min pathway gene size. default is 5
max.pathway.size: (integr)Threshold for max pathway gene size. default is 300
iters_celltype: (integr)number of bootstrap iterations for celltype
iters_singlecell: (integr)number of bootstrap iterations for singlecell； The parameter "iters_singlecell" is used to calculate the significance p-value for individual cells. However, we have observed that this step requires a significant amount of computational memory. Therefore, we do not recommend selecting a large value for this parameter initially.If you do not want to waste time calculating the p-value, you can choose to set it as 0.
n_topgenes: (integr)Number of top associated gene selected to calculate the scPagwas score;
singlecell: (logical)Whether to produce the singlecell result;
celltype: (logical)Whether to produce the celltypes result;
seurat_return: (logical) Whether return the seurat format result, if not,will return a list result;
remove_outlier: (logical)Whether to remove the outlier for scPagwas score.

Value

Returns a seurat data with entries(seurat_return=T):

assay:: scPagwasPaPca:An assay for S4 type of data; the svd result for pathways in each cells; scPagwasPaHeritability:An assay for S4 type of data; the gPas matrix for pathways in each cells;
meta.data: scPagwas.TRS.Score1: the column for "meta.data";Enrichment socre for inheritance associated top genes. scPagwas.gPAS.score: the column for "meta.data";Inheritance regression effects for each cells Random_Correct_BG_p: CellpValue for each cells; Random_Correct_BG_adjp: fdr for each cells, adjust p value. Random_Correct_BG_z: z score for eahc cells. misc: element in result,Pagwas@misc Pathway_list:a list for pathway gene list intersecting with single cell data pca_cell_df: a data frame for pathway pca result for each celltype. lm_results: the regression result for each cell. PCC: heritability correlation value for each gene;In the previous version, we referred to it as Pearson correlation coefficients. bootstrap_results:The bootstrap data frame results for celltypes including bootstrap pvalue and confidence interval.

Main Pagwas wrapper functions.

scPagwas, A single-cell pathway-based principal component (PC)-scoring algorithm named for integrating polygenic signals from GWAS with single cell data to infer disease-relevant cell populations and score the associations of individual cells with complex diseases.The entry point for Pagwas analysis. Including the data input functions and the main progress functions; It can also output the running log and parameter log for scPagwas, and construct the folder for output.

1.When you run the package in linux server, you can run export OPENBLAS_NUM_THREADS=1 before enter into R environment, It can help you to avoid core errors. 2. When the code is kill for some wrong, you can run SOAR::Objects() to check the objects, or run SOAR::Remove(SOAR::Objects()) to remove it. 3. When your data is too big to run the function(bus error), you can split your data into several sub-data and run the merge_pagwas to merge them.

library(scPagwas) Pagwas_data <- scPagwas_main( Pagwas = NULL, gwas_data = system.file("extdata", "GWAS_summ_example.txt", package = "scPagwas" ), Single_data = system.file("extdata", "scRNAexample.rds", package = "scPagwas" ), output.prefix = "test", output.dirs = "scPagwastest_output", block_annotation = block_annotation, assay = "RNA", Pathway_list = Genes_by_pathway_kegg, chrom_ld = chrom_ld, singlecell = TRUE, celltype = TRUE)

Polygenic regression identifies trait-relevant cell subpopulations through pathway activity transformed single-cell RNA sequencing data.(2022)

Chunyu Deng

functions.

of

scPagwas

scPagwas_main,

wrapper