Package 'scMappR' reference manual

Title:	Single Cell Mapper
Description:	The single cell mapper (scMappR) R package contains a suite of bioinformatic tools that provide experimentally relevant cell-type specific information to a list of differentially expressed genes (DEG). The function "scMappR_and_pathway_analysis" reranks DEGs to generate cell-type specificity scores called cell-weighted fold-changes. Users input a list of DEGs, normalized counts, and a signature matrix into this function. scMappR then re-weights bulk DEGs by cell-type specific expression from the signature matrix, cell-type proportions from RNA-seq deconvolution and the ratio of cell-type proportions between the two conditions to account for changes in cell-type proportion. With cwFold-changes calculated, scMappR uses two approaches to utilize cwFold-changes to complete cell-type specific pathway analysis. The "process_dgTMatrix_lists" function in the scMappR package contains an automated scRNA-seq processing pipeline where users input scRNA-seq count data, which is made compatible for scMappR and other R packages that analyze scRNA-seq data. We further used this to store hundreds up regularly updating signature matrices. The functions "tissue_by_celltype_enrichment", "tissue_scMappR_internal", and "tissue_scMappR_custom" combine these consistently processed scRNAseq count data with gene-set enrichment tools to allow for cell-type marker enrichment of a generic gene list (e.g. GWAS hits). Reference: Sokolowski,D.J., Faykoo-Martinez,M., Erdman,L., Hou,H., Chan,C., Zhu,H., Holmes,M.M., Goldenberg,A. and Wilson,M.D. (2021) Single-cell mapper (scMappR): using scRNA-seq to infer cell-type specificities of differentially expressed genes. NAR Genomics and Bioinformatics. 3(1). Iqab011. <doi:10.1093/nargab/lqab011>.
Authors:	Dustin Sokolowski [aut, cre], Mariela Faykoo-Martinez [aut], Lauren Erdman [aut], Houyun Hou [aut], Cadia Chan [aut], Helen Zhu [aut], Melissa Holmes [aut], Anna Goldenberg [aut], Michael Wilson [aut]
Maintainer:	Dustin Sokolowski <[email protected]>
License:	GPL-3
Version:	1.0.11
Built:	2025-03-01 05:25:58 UTC
Source:	https://github.com/cran/scMappR

Fisher's Exact Cell-Type Identification.

Description

This function uses the CellMarker and Panglao datasets to identify cell-type differentially expressed genes.

Usage

cellmarker_enrich(
  gene_list,
  p_thresh,
  gmt = "cellmarker_list.Rdata",
  fixed_length = 13000,
  min_genes = 5,
  max_genes = 3000,
  isect_size = 3
)
cellmarker_enrich(
  gene_list,
  p_thresh,
  gmt = "cellmarker_list.Rdata",
  fixed_length = 13000,
  min_genes = 5,
  max_genes = 3000,
  isect_size = 3
)

Arguments

`gene_list`	A character vector of gene symbols with the same designation (e.g. mouse symbol - mouse, human symbol - human) as the gene set database.
`p_thresh`	The Fisher's test cutoff for a cell-marker to be enriched.
`gmt`	Either a path to an rda file containing an object called "gmt", which is a named list where each element of the list is a vector of gene symbols website for more detail on the file type (https://software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats). The gmt list may also be inputted.
`fixed_length`	Estimated number of genes in your background.
`min_genes`	Minimum number of genes in the cell-type markers.
`max_genes`	Maximum number of genes in the cell-type markers.
`isect_size`	Number of genes in your list and the cell-type.

Details

Complete a Fisher's exact test of an input list of genes against a gene set saved in an *.RData object. The RData object is storing a named list of genes called "gmt".

Value

cellmarker_enrich Gene set enrichment of cell-types on your inputted gene list.

Examples



data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]
data(gmt)
enriched <- cellmarker_enrich(gene_list = genes, 
                              p_thresh = 0.05, gmt = gmt)

 
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]
data(gmt)
enriched <- cellmarker_enrich(gene_list = genes, 
                              p_thresh = 0.05, gmt = gmt)

Identify co-expressed cell-types

Description

This function identifies genes with similar cell-type markers and if those markers are driving enrichment.

Usage

coEnrich(
  sig,
  gene_list_heatmap,
  background_heatmap,
  study_name,
  outDir,
  toSave = FALSE,
  path = NULL
)
coEnrich(
  sig,
  gene_list_heatmap,
  background_heatmap,
  study_name,
  outDir,
  toSave = FALSE,
  path = NULL
)

Arguments

`sig`	A The number of combinations of significant cell-types to enrich.
`gene_list_heatmap`	Signature matrix of inputted genes in heatmap and the cell-type preferences – output of heatmap generation.
`background_heatmap`	Signature matrix of background matrix in heatmap and cell-type preferences – output of heatmap generation.
`study_name`	Name of the outputted table.
`outDir`	Name of the directory this table will be printed in.
`toSave`	Allow scMappR to write files in the current directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.

Details

This function takes significantly enriched cell-types from the single CT_enrich before testing to see if the genes driving their enrichment are overlapping to a significant proportion using Fisher's exact test. To save computational time and to not complete this with an incredible number of permutations, scMappR stops at overlapping 5 cell-types.

Value

coEnrich Enrichment of cell-types that are expressed by the same genes, up to 4 sets of cell-types.

Examples


# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
sig <- get_gene_symbol(POA_Rank_signature)
Signature <- POA_Rank_signature
rownames(Signature) <- sig$rowname
genes <- rownames(Signature)[1:60]
heatmap_test <- tissue_scMappR_custom(gene_list = genes, signature_matrix = Signature,
output_directory =  "scMappR_test", toSave = FALSE)
group_preferences <- heatmap_test$group_celltype_preferences

# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
sig <- get_gene_symbol(POA_Rank_signature)
Signature <- POA_Rank_signature
rownames(Signature) <- sig$rowname
genes <- rownames(Signature)[1:60]
heatmap_test <- tissue_scMappR_custom(gene_list = genes, signature_matrix = Signature,
output_directory =  "scMappR_test", toSave = FALSE)
group_preferences <- heatmap_test$group_celltype_preferences

compare_deconvolution_methods

Description

This function calculates cell-type proportions of an inputted bulk sample using DeconRNA-seq, WGCNA, and DCQ methods. Outputted cell-type proportions are then compared.

Usage

compare_deconvolution_methods(
  count_file,
  signature_matrix,
  print_plot = FALSE,
  order_celltype = NULL,
  useWGCNA = TRUE
)
compare_deconvolution_methods(
  count_file,
  signature_matrix,
  print_plot = FALSE,
  order_celltype = NULL,
  useWGCNA = TRUE
)

Arguments

`count_file`	Normalized (CPM, TPM, RPKM) RNA-seq count matrix where rows are gene symbols and columns are individuals. Either the object itself of the path of a .tsv file.
`signature_matrix`	Signature matrix (odds ratios) of cell-type specificity of genes. Either the object itself or a pathway to an .RData file containing an object named "wilcoxon_rank_mat_or" - generally internal.
`print_plot`	print the barplot of estimated cell-type proportions from each method into the R console (logical: TRUE/FALSE)
`order_celltype`	Specify the order that cell-type are placed on the barplot. NULL = alphabetical, otherwise a character vector of cell-type labels (i.e. column names of the signature matrix).
`useWGCNA`	specify if WGCNA is installed = TRUE/FALSE.

Value

List with the following elements:

`cellWeighted_Foldchange`	data frame of cellweightedFold-changes for each gene.
`cellType_Proportions`	data frame of cell-type proportions from DeconRNA-seq.
`leave_one_out_proportions`	data frame of average cell-type proportions for case and control when gene is removed.
`processed_signature_matrix`	signature matrix used in final analysis.

Examples


 
data(PBMC_example)
norm_counts <- PBMC_example$bulk_normalized
signature <- PBMC_example$odds_ratio_in
tst <- compare_deconvolution_methods(count_file = norm_counts, 
 signature_matrix = signature, print_plot = FALSE,
 order_celltype = c("I_mono", "C_mono", "CD8_CM", "CD8_TE", 
 "B_SM", "B_NSM", "B_naive"), useWGCNA = FALSE)

 
data(PBMC_example)
norm_counts <- PBMC_example$bulk_normalized
signature <- PBMC_example$odds_ratio_in
tst <- compare_deconvolution_methods(count_file = norm_counts, 
 signature_matrix = signature, print_plot = FALSE,
 order_celltype = c("I_mono", "C_mono", "CD8_CM", "CD8_TE", 
 "B_SM", "B_NSM", "B_naive"), useWGCNA = FALSE)

Measure cell-type specificity of cell-weighted Fold-changes

Description

This function normalizes cwFold-changes by each gene to help visualize the cell-type specificity of DEGs. It then tests if a cell-type has a large change in correlation from bulk DEGs. Finally, it identifies genes that may be specific to each cell-type.

Usage

cwFoldChange_evaluate(
  cwFC,
  celltype_prop,
  DEG_list,
  gene_cutoff = NULL,
  sd_cutoff = 3
)
cwFoldChange_evaluate(
  cwFC,
  celltype_prop,
  DEG_list,
  gene_cutoff = NULL,
  sd_cutoff = 3
)

Arguments

`cwFC`	A matrix or data frame of cell-weighted fold-changes of DEGs. Rows are DEGs and columns are cell-types.
`celltype_prop`	A matrix or data frame of cell-type proportions. Rows are different cell-types and columns are different samples. These cell-type proportions can come from any source (not just scMappR).
`DEG_list`	An object with the first column as gene symbols within the bulk dataset (doesn't have to be in signature matrix), second column is the adjusted p-value, and the third the log2FC path to a .tsv file containing this info is also acceptable.
`gene_cutoff`	Additional cut-off of normalized cwFold-change to see if a gene is cut-off.
`sd_cutoff`	Number of standard deviations or median absolute deviations to calculate outliers.

Details

cwFold-changes and re-normalized and re-processed to interrogate cell-type specificity at the level of the cell-type and at the level of the gene. At the level of the cell-type, cwFold-changes are correlated to bulk DEGs. The difference in rank between bulk DEGs and cwFold-changes are also compared. At the level of the gene, cwFold-changes are re-normalized so that each gene sums to 1. Normalization of their distributions are tested with a Shapiro test. Then, outlier cell-types for each gene are measured by testing for ‘sd_cutoff'’s mad or sd's greater than the median or mean depending on if the cwFold-change is non-normally or normally distributed respectively. Cell-types considered outliers are then further filtered so their normalized cwFold-changes are greater than the cell-type proportions of that gene and 'gene_cutoff' if the user sets it.

Value

List with the following elements:

`gene_level_investigation`	data frame of genes showing the Euclidian distances between cwFold-change and null vector as well as if cwFold-changes are distributed.
`celltype_level_investigation`	data frame of Spearman's and Pearson's correlation between bulk DEGs and cwFold-changes.
`cwFoldchange_vs_bulk_rank_change`	data frame of the change in rank of DEG between the bulk fold-change and cwFold-change.
`cwFoldChange_normalized`	cwFold-change normalized such that each gene sums to 1.
`cwFoldchange_gene_assigned`	List of cell-types where genes are designated to cell-type specific differential expression.
`cwFoldchange_gene_flagged_FP`	Mapped cwFoldchanges that are flagged as false-positives. These are genes that are driven by the reciprical ratio of cell-type proportions between case and control. These genes may be DE in a non-cell-type specific manner but are falsely assigned to cell-types with very large differences in proportion between condition.

Examples


data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized, 
                                      signature_matrix = odds_ratio_in, 
                                      DEG_list = bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                       plot_names = "tst1", theSpecies = "human", 
                                       output_directory = "tester",
                                       sig_matrix_size = 3000, up_and_downregulated = FALSE, 
                                       internet = FALSE)
                                       
cwFC1 <- toOut$cellWeighted_Foldchange
prop1 <- toOut$cellType_Proportions
DE <- bulk_DE_cors
eval_test <- cwFoldChange_evaluate(cwFC = cwFC1, celltype_prop = prop1,
                                    DEG_list = DE)

data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized, 
                                      signature_matrix = odds_ratio_in, 
                                      DEG_list = bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                       plot_names = "tst1", theSpecies = "human", 
                                       output_directory = "tester",
                                       sig_matrix_size = 3000, up_and_downregulated = FALSE, 
                                       internet = FALSE)
                                       
cwFC1 <- toOut$cellWeighted_Foldchange
prop1 <- toOut$cellType_Proportions
DE <- bulk_DE_cors
eval_test <- cwFoldChange_evaluate(cwFC = cwFC1, celltype_prop = prop1,
                                    DEG_list = DE)

DeconRNASeq CRAN compatible

Description

This function runs DeconRNAseq with default parameters such that it is compatible with CRAN and scMappR

Usage

DeconRNAseq_CRAN(
  datasets,
  signatures,
  proportions = NULL,
  checksig = FALSE,
  known.prop = FALSE,
  use.scale = TRUE,
  fig = FALSE
)
DeconRNAseq_CRAN(
  datasets,
  signatures,
  proportions = NULL,
  checksig = FALSE,
  known.prop = FALSE,
  use.scale = TRUE,
  fig = FALSE
)

Arguments

`datasets`	Normalized RNA-seq dataset
`signatures`	Signature matrix of odds ratios
`proportions`	If cell-type proportion is already inputted - always NULL for scMappR
`checksig`	Check to see if plotting is significant - always false for scMappR
`known.prop`	If proportions were known - always false for scMappR
`use.scale`	Scale and center value - always TRUE for scMappR
`fig`	Make figures - always FALSE for scMappR

Details

This is the exact same function as the primary function in the Bioconductor package, DeconRNAseq (PMID: 23428642) except it is now compatible with CRAN packages.

Value

DeconRNAseq_CRAN Estimated cell-type proportions with DeconRNAseq.

Examples


data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
out <- DeconRNAseq_CRAN(datasets = as.data.frame(bulk_normalized),
                       signatures = as.data.frame(odds_ratio_in))

                                      
data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
out <- DeconRNAseq_CRAN(datasets = as.data.frame(bulk_normalized),
                       signatures = as.data.frame(odds_ratio_in))

Generate cell weighted Fold-Changes (cwFold-changes)

Description

This function takes a count matrix, signature matrix, and differentially expressed genes (DEGs) before generating cwFold-changes for each cell-type.

Usage

deconvolute_and_contextualize(
  count_file,
  signature_matrix,
  DEG_list,
  case_grep,
  control_grep,
  max_proportion_change = -9,
  print_plots = TRUE,
  plot_names = "scMappR",
  theSpecies = "human",
  FC_coef = TRUE,
  sig_matrix_size = 3000,
  drop_unknown_celltype = TRUE,
  toSave = FALSE,
  path = NULL,
  deconMethod = "DeconRNASeq",
  rareCT_filter = TRUE
)
deconvolute_and_contextualize(
  count_file,
  signature_matrix,
  DEG_list,
  case_grep,
  control_grep,
  max_proportion_change = -9,
  print_plots = TRUE,
  plot_names = "scMappR",
  theSpecies = "human",
  FC_coef = TRUE,
  sig_matrix_size = 3000,
  drop_unknown_celltype = TRUE,
  toSave = FALSE,
  path = NULL,
  deconMethod = "DeconRNASeq",
  rareCT_filter = TRUE
)

Arguments

`count_file`	Normalized (e.g. CPM, TPM, RPKM) RNA-seq count matrix where rows are gene symbols and columns are individuals. Either the matrix itself of class "matrix" or data.frame" or a path to a tsv file containing these DEGs. The gene symbols in the count file, signature matrix, and DEG list must match.
`signature_matrix`	Signature matrix (fold-change ratios) of cell-type specificity of genes. Either the object itself or a pathway to an .RData file containing an object named "wilcoxon_rank_mat_or". We strongly recommend inputting the signature matrix directly.
`DEG_list`	An object with the first column as gene symbols within the bulk dataset (doesn't have to be in signature matrix), second column is the adjusted P-value, and the third the log2FC. Path to a tsv file containing this info is also acceptable.
`case_grep`	Tag in the column name for cases (i.e. samples representing upregulated) OR an index of cases.
`control_grep`	Tag in the column name for control (i.e. samples representing downregulated) OR an index of cases.
`max_proportion_change`	Maximum cell-type proportion change. May be useful if a cell-type does not exist in one condition, thus preventing infinite values.
`print_plots`	Whether boxplots of the estimated CT proportion for the leave-one-out method of CT deconvolution should be printed (T/F).
`plot_names`	If plots are being printed, the pre-fix of their .pdf files.
`theSpecies`	internal species designation to be passed from 'scMappR_and_pathway_analysis'. It only impacts this function if data are taken directly from the PanglaoDB database (i.e. not reprocessed by scMappR or the user).
`FC_coef`	Making cwFold-changes based on fold-change (TRUE) or rank := (-log10(Pval)) (FALSE) rank. After testing, we strongly recommend to keep true (T/F).
`sig_matrix_size`	Number of genes in signature matrix for cell-type deconvolution.
`drop_unknown_celltype`	Whether or not to remove "unknown" cell-types from the signature matrix (T/F).
`toSave`	Allow scMappR to write files in the current directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.
`deconMethod`	Which RNA-seq deconvolution method to use to estimate cell-type proporitons. Options are "WGCNA", "DCQ", or "DeconRNAseq"
`rareCT_filter`	option to keep cell-types rarer than 0.1 percent of the population (T/F). Setting to FALSE may lead to false-positives.

Details

This function completes the pre-processing, normalization, and scaling steps in the scMappR algorithm before calculating cwFold-changes. cwFold-changes scales bulk fold-changes by the cell-type specificity of the gene, cell-type gene-normalized cell-type proportions, and the reciprocal ratio of cell-type proportions between the two conditions. cwFold-changes are generated for genes that are in both the count matrix and in the list of DEGs. It does not have to also be in the signature matrix. First, this function will estimate cell-type proportions with all genes included before estimating changes in cell-type proportion between case/control using a t-test. Then, it takes a leave-one-out approach to cell-type deconvolution such that estimated cell-type proportions are computed for every inputted DEG. Optionally, the differences between cell-type proportions before and after a gene is removed is plotted in boxplots. Then, for every gene, cwFold-changes are computed with the following formula (the example for upreguatled genes) val <- cell-preferences * cell-type_proportion * cell-type_proportion_fold-change * sign*2^abs(gene_DE$log2fc). A matrix of cwFold-changes for all DEGs are returned.

Value

List with the following elements:

`cellWeighted_Foldchange`	data frame of cellweightedFold changes for each gene.
`cellType_Proportions`	data frame of cell-type proportions from DeconRNA-seq.
`leave_one_out_proportions`	data frame of average cell-type proportions for case and control when gene is removed.
`processed_signature_matrix`	signature matrix used in final analysis.

Examples


data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
cwFC <- deconvolute_and_contextualize(count_file = bulk_normalized,
                                    signature_matrix = odds_ratio_in, DEG_list = bulk_DE_cors,
                                    case_grep = case_grep, control_grep = control_grep,
                                     max_proportion_change = max_proportion_change,
                                      print_plots = print_plots, 
                                     theSpecies = theSpecies, toSave = FALSE)

                                      
data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
cwFC <- deconvolute_and_contextualize(count_file = bulk_normalized,
                                    signature_matrix = odds_ratio_in, DEG_list = bulk_DE_cors,
                                    case_grep = case_grep, control_grep = control_grep,
                                     max_proportion_change = max_proportion_change,
                                      print_plots = print_plots, 
                                     theSpecies = theSpecies, toSave = FALSE)

Extract Markers

Description

Extracting cell-type markers from a signature matrix.

Usage

extract_genes_cell(
  geneHeat,
  cellTypes = "ALL",
  val = 1,
  isMax = FALSE,
  isPvalue = FALSE
)
extract_genes_cell(
  geneHeat,
  cellTypes = "ALL",
  val = 1,
  isMax = FALSE,
  isPvalue = FALSE
)

Arguments

`geneHeat`	The heatmap of ranks from your scRNA-seq dataset with your genes subsetted.
`cellTypes`	The cell-types that you're interested in extracting. They need to be colnames (not case sensitive).
`val`	How associated a gene is with a particular cell type to include in your list - default is slightly associated.
`isMax`	If you are taking the single best CT marker (T/F) – TRUE not recommended.
`isPvalue`	If the signature matrix is raw p-value (T/F) – TRUE not recommended.

Details

This function takes a signature matrix and extracts cell-type markers above a p-value or fold-change threshold.

Value

extract_genes_cell A vector of genes above the threshold for each sample.

Examples



data(POA_example)
Signature <- POA_example$POA_Rank_signature
RowName <- get_gene_symbol(Signature)
rownames(Signature) <-RowName$rowname
# extract genes with a -log10(Padj > 1)
Signat <- extract_genes_cell(Signature)



data(POA_example)
Signature <- POA_example$POA_Rank_signature
RowName <- get_gene_symbol(Signature)
rownames(Signature) <-RowName$rowname
# extract genes with a -log10(Padj > 1)
Signat <- extract_genes_cell(Signature)

Generate signature matrix

Description

Convert a list of cell-type markers from FindMarkers in Seurat to a signature matrix defined by odds ratio and rank.

Usage

generes_to_heatmap(
  generes = generes,
  species = "human",
  naming_preference = -9,
  rda_path = "",
  make_names = TRUE,
  internal = FALSE
)
generes_to_heatmap(
  generes = generes,
  species = "human",
  naming_preference = -9,
  rda_path = "",
  make_names = TRUE,
  internal = FALSE
)

Arguments

`generes`	A list of cell-type markers with fold-changes and p-values (FindMarkers output in Seurat).
`species`	The species of gene symbols, if not internal, "human" or "mouse".
`naming_preference`	Likely cell-types given tissues (to be passed into human_mouse_ct_marker_enrich).
`rda_path`	Path to output direcotry, if toSave is true.
`make_names`	Identify names of cell-type markers using the Fisher's exact test method (T/F).
`internal`	If this function is pre-processing from Panglao (T/F).

Details

Take a list of compiled differentially expressed genes from different cell-types, identify what the cell-types are using the Fisher's exact test, and then convert into a signature matrix for both the adjusted p-value and odds ratio.

Value

List with the following elements:

`pVal`	A dataframe containing the signature matrix of ranks (-log10(Padj) * sign(fold-change)).
`OR`	A dataframe containing the signature matrix of odds ratios.
`cellname`	A vector of the cell-labels returned from the GSVA method.
`topGenes`	the top 30 mos expressed genes in each cell-type.

Examples


data(POA_example)
 POA_generes <- POA_example$POA_generes
signature <- generes_to_heatmap(POA_generes,species = -9, make_names = FALSE)



data(POA_example)
 POA_generes <- POA_example$POA_generes
signature <- generes_to_heatmap(POA_generes,species = -9, make_names = FALSE)

Internal – get gene symbol from Panglao.db assigned gene-names (symbol-ensembl).

Description

Internal – removes Ensembl signature appended to signature matrix from Panglao and figure out species by pre-fix Ensembl of the Ensembl ID that is appended to gene names.

Usage

get_gene_symbol(wilcoxon_rank_mat_t)
get_gene_symbol(wilcoxon_rank_mat_t)

Arguments

wilcoxon_rank_mat_t

Matrix where row names are "GeneSymbol-Ensembl" (human or mouse).

Details

Internal: This function removes the ENGMUS/ENGS tag from Panglao created gene names (symbol-ENGS). From the ENSG/ENSMUS, this function determines if the species is mouse/human and returns the gene symbols.

Value

List with the following elements:

`rowname`	Genes in the signature matrix excluding the ensemble name.
`species`	"mouse" or "human" depending on appended ensembl symbols.

Examples


 # load signature
data(POA_example) 
POA_OR_signature <- POA_example$POA_OR_signature
symbols <- get_gene_symbol(POA_OR_signature)


# load signature
data(POA_example) 
POA_OR_signature <- POA_example$POA_OR_signature
symbols <- get_gene_symbol(POA_OR_signature)

Get signature matrices.

Description

This function downloads and returns signature matrices and associated cell-type labels from the scMappR_data repo.

Usage

get_signature_matrices(type = "all")
get_signature_matrices(type = "all")

Arguments

type

a character vector that can be 'all', 'pVal', or 'OR'

Value

get_signature_matrices Returns the signature matrices currently stored in scMappR_Data. Associated cell-type labels from different methods for each signature matrix is also provided.

Examples


 
signatures <- get_signature_matrices(type = "all")
 

 
signatures <- get_signature_matrices(type = "all")

gmt_example

Description

Markers of 5 glial cell-types

Usage

data(gmt)
data(gmt)

Format

A list with 5 character vectors, each containing genes.

Astrocytes_panglao: astrocyte markers identified by panglao
Schwann_panglao: Schwann markers identified by panglao
Bergmann glia_panglao: Bergmann glia markers identified by panglao
Kupffer_panglao: Kupffer markers identified by panglao
Oligodendrocyte progenitor_panglao: Oligodendrocyte progenitor markers identified by panglao

Details

A named list containing the cell-type markers of 5 glial cell types. Used for testing cell-type naming functions.

Examples

data(gmt)
data(gmt)

Pathway enrichment for cwFold-changes

Description

This function runs through each list of cell weighted Fold changes (cwFold-changes) and completes both pathway and transcription factor (TF) enrichment.

Usage

gProfiler_cellWeighted_Foldchange(
  cellWeighted_Foldchange_matrix,
  species,
  background,
  gene_cut,
  newGprofiler
)
gProfiler_cellWeighted_Foldchange(
  cellWeighted_Foldchange_matrix,
  species,
  background,
  gene_cut,
  newGprofiler
)

Arguments

`cellWeighted_Foldchange_matrix`	Matrix of cell weighted Fold changes from the deconvolute_and_contextualize functions.
`species`	Human, mouse, or a name that is compatible with gProfileR (e.g. "mmusculus").
`background`	A list of background genes to test against.
`gene_cut`	The top number of genes in pathway analysis.
`newGprofiler`	Using gProfileR or gprofiler2, (T/F).

Details

This function takes a matrix of cellWeighted_Foldchange and a species (human, mouse, or a character directly compatible with g:ProfileR). Before completing pathway analysis with g:ProfileR. Enriched pathways are stored in a list and returned.

Value

List with the following elements:

`BP`	gprofiler enrichment of biological pathways for each cell-type
`TF`	gprofiler enrichment of transcription factors for eachc cell-type.

Examples



data(PBMC_example)

bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in

case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
norm <- deconvolute_and_contextualize(count_file = bulk_normalized,
                                     signature_matrix = odds_ratio_in,
                                      DEG_list = bulk_DE_cors, case_grep = case_grep, 
                                      control_grep = control_grep,
                                       max_proportion_change = max_proportion_change,
                                       print_plots = print_plots,
                                       theSpecies = theSpecies)
background = rownames(bulk_normalized)
STVs <- gProfiler_cellWeighted_Foldchange(
 cellWeighted_Foldchange_matrix = norm$cellWeighted_Foldchange,
 species = theSpecies, background = background, gene_cut = -9,
 newGprofiler = TRUE)

 
 
data(PBMC_example)

bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in

case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
norm <- deconvolute_and_contextualize(count_file = bulk_normalized,
                                     signature_matrix = odds_ratio_in,
                                      DEG_list = bulk_DE_cors, case_grep = case_grep, 
                                      control_grep = control_grep,
                                       max_proportion_change = max_proportion_change,
                                       print_plots = print_plots,
                                       theSpecies = theSpecies)
background = rownames(bulk_normalized)
STVs <- gProfiler_cellWeighted_Foldchange(
 cellWeighted_Foldchange_matrix = norm$cellWeighted_Foldchange,
 species = theSpecies, background = background, gene_cut = -9,
 newGprofiler = TRUE)

Cell-type naming with GSVA

Description

This function computes the mean expression of every cell-type before predicting the most likely cell-type using the GSVA method.

Usage

gsva_cellIdentify(
  pbmc,
  theSpecies,
  naming_preference = -9,
  rda_path = "",
  toSave = FALSE
)
gsva_cellIdentify(
  pbmc,
  theSpecies,
  naming_preference = -9,
  rda_path = "",
  toSave = FALSE
)

Arguments

`pbmc`	Processed Seurat object without named cells.
`theSpecies`	"human" or "mouse" – it will determine which species cell-type markers will originate from.
`naming_preference`	Once top cell-type markers are identified, naming_preferences will then extract CT markers within a more appropriate tissue type.
`rda_path`	Path to pre-computed cell-type .gmt files (rda objects).
`toSave`	If scMappR is allowed to write files and directories.

Details

This function inputs a Seurat object and uses the average normalized expression of each gene in each cluster to identify cell-types using the GSVA method.

Value

List with the following elements:

`cellMarker`	Most likely cell-types predicted from CellMarker database.
`panglao`	Most likely cell-types predicted from Panglao database.
`avg_expression`	Average expression of each gene in each cell-type.

Examples



data(sm)
toProcess <- list(example = sm)
tst1 <- process_from_count(countmat_list = toProcess,name = "testProcess",
                            theSpecies = "mouse")
cellnames <- gsva_cellIdentify(pbmc = tst1, theSpecies = "mouse",
 naming_preference = "brain", rda_path = "")


 
data(sm)
toProcess <- list(example = sm)
tst1 <- process_from_count(countmat_list = toProcess,name = "testProcess",
                            theSpecies = "mouse")
cellnames <- gsva_cellIdentify(pbmc = tst1, theSpecies = "mouse",
 naming_preference = "brain", rda_path = "")

Generate Heatmap

Description

This function takes an inputted signature matrix as well as a list of genes and overlaps them. Then, if there is overlap, it prints a heatmap or barplot (depending on the number of overlapping genes). Then, for every cell-type, genes considered over-represented are saved in a list.

Usage

heatmap_generation(
  genesIn,
  comp,
  reference,
  cex = 0.8,
  rd_path = "",
  cellTypes = "ALL",
  pVal = 0.01,
  isPval = TRUE,
  isMax = FALSE,
  isBackground = FALSE,
  which_species = "human",
  toSave = FALSE,
  path = NULL
)
heatmap_generation(
  genesIn,
  comp,
  reference,
  cex = 0.8,
  rd_path = "",
  cellTypes = "ALL",
  pVal = 0.01,
  isPval = TRUE,
  isMax = FALSE,
  isBackground = FALSE,
  which_species = "human",
  toSave = FALSE,
  path = NULL
)

Arguments

`genesIn`	A list of gene symbols (all caps) to have their cell type enrichment.
`comp`	The name of the comparison.
`reference`	Path to signature matrix or the signature matrix itself.
`cex`	The size of the genes in the column label for the heatmap.
`rd_path`	The directory to RData files – if they are not in this directory, then the files will be downloaded.
`cellTypes`	Colnames of the cell-types you will extract (passed to extract_genes_cell).
`pVal`	The level of association a gene has within a cell type (passed to extract_genes_cell).
`isPval`	If the signature matrix is raw p-value (T/F) – TRUE not recommended
`isMax`	If you are taking the single best CT marker (T/F) – TRUE not recommended
`isBackground`	If the heatmap is from the entire signature matrix or just the inputted gene list (T/F). isBackground == TRUE is used for internal.
`which_species`	Species of gene symbols – "human" or "mouse" .
`toSave`	Allow scMappR to write files in the path directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.

Value

List with the following elements:

`genesIn`	Vector of genes intersecting gene list and signature matrix.
`genesNoIn`	Vector of inputted genes not in signature matrix.
`geneHeat`	Signature matrix subsetted by inputted gene list
`preferences`	Cell-markers mapping to cell-types.

Examples



# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]
heatmap_test <- heatmap_generation(genesIn = genes, "scMappR_test",
                                   reference = Signature, which_species = "mouse")

 

# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]
heatmap_test <- heatmap_generation(genesIn = genes, "scMappR_test",
                                   reference = Signature, which_species = "mouse")

Consensus cell-type naming (Fisher's Exact)

Description

This function completes the Fisher's exact test cell-type naming for all cell-types.

Usage

human_mouse_ct_marker_enrich(
  gene_lists,
  theSpecies = "human",
  cell_marker_path = "",
  naming_preference = -9
)
human_mouse_ct_marker_enrich(
  gene_lists,
  theSpecies = "human",
  cell_marker_path = "",
  naming_preference = -9
)

Arguments

`gene_lists`	A named list of vectors containing cell-type markers (mouse or human gene-symbols) which will be named as a cell-type via the Fisher's exact test method.
`theSpecies`	The species of the gene symbols: "human" or "mouse".
`cell_marker_path`	If local, path to cell-type marker rda files, otherwise, we will try to download data files.
`naming_preference`	Either -9 if there is no expected cell-type or one of the categories from get_naming_preference_options(). This is useful if you previously have an idea of which cell-type you were going to enrich.

Details

Fisher's exact test method of cell-type identification using the Panglao and CellMarker databases. It extracts significant pathways (pFDR < 0.05). Then, if naming_preference != -9, it will extract the enriched cell-types within the cell-types identified with the naming preferences option. Generally, this method seems to be biased to cell-types with a greater number of markers.

Value

List with the following elements:

`cellTypes`	most likely marker for each cell-type from each database.
`marker_sets`	all enriched cell-types for each cluster from each dataset.

Examples



data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]
enriched <- human_mouse_ct_marker_enrich(gene_lists = genes, theSpecies = "mouse", 
                                         cell_marker_path = "", naming_preference = "brain")
 
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]
enriched <- human_mouse_ct_marker_enrich(gene_lists = genes, theSpecies = "mouse", 
                                         cell_marker_path = "", naming_preference = "brain")

Plot g:profileR Barplot (TF)

Description

Make a barplot of the top transcription factors enriched by gprofileR.

Usage

make_TF_barplot(ordered_back_all_tf, top_tf = 5)
make_TF_barplot(ordered_back_all_tf, top_tf = 5)

Arguments

`ordered_back_all_tf`	Output of the g:profileR function.
`top_tf`	The number of transcription factors to be plotted.

Details

This function takes a gprofileR output and prints the top "top_tfs" most significantly enriched fdr adjusted p-values before plotting the rank of their p-values.

Value

make_TF_barplot A barplot of the number of "top_tf" tf names (not motifs), ranked by -log10(Pfdr).

Examples


data(POA_example)
 POA_generes <- POA_example$POA_generes
 POA_OR_signature <- POA_example$POA_OR_signature
 POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- as.data.frame(POA_Rank_signature)
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
ordered_back_all <- gprofiler2::gost(query = rowname$rowname[1:100], organism = "mmusculus", 
ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE, 
measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05, 
correction_method = "fdr",   numeric_ns = "", sources = c("GO:BP", "KEGG", "REAC"))  
ordered_back_all <- ordered_back_all$result
ordered_back_all <- ordered_back_all[ordered_back_all$term_size > 15 &
 ordered_back_all$term_size < 2000 & ordered_back_all$intersection_size > 2,]
ordered_back_all_tf <- gprofiler2::gost(query = rowname$rowname[1:150], organism = "mmusculus",
 ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE,
 measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05,
  correction_method = "fdr",  numeric_ns = "", sources = c("TF"))  
ordered_back_all_tf <- ordered_back_all_tf$result
ordered_back_all_tf <- ordered_back_all_tf[ordered_back_all_tf$term_size > 15
 & ordered_back_all_tf$term_size < 5000 & ordered_back_all_tf$intersection_size > 2,]
TF = ordered_back_all_tf
BP <- ordered_back_all
bp <- plotBP(BP)
tf <- make_TF_barplot(TF)
 
data(POA_example)
 POA_generes <- POA_example$POA_generes
 POA_OR_signature <- POA_example$POA_OR_signature
 POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- as.data.frame(POA_Rank_signature)
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
ordered_back_all <- gprofiler2::gost(query = rowname$rowname[1:100], organism = "mmusculus", 
ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE, 
measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05, 
correction_method = "fdr",   numeric_ns = "", sources = c("GO:BP", "KEGG", "REAC"))  
ordered_back_all <- ordered_back_all$result
ordered_back_all <- ordered_back_all[ordered_back_all$term_size > 15 &
 ordered_back_all$term_size < 2000 & ordered_back_all$intersection_size > 2,]
ordered_back_all_tf <- gprofiler2::gost(query = rowname$rowname[1:150], organism = "mmusculus",
 ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE,
 measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05,
  correction_method = "fdr",  numeric_ns = "", sources = c("TF"))  
ordered_back_all_tf <- ordered_back_all_tf$result
ordered_back_all_tf <- ordered_back_all_tf[ordered_back_all_tf$term_size > 15
 & ordered_back_all_tf$term_size < 5000 & ordered_back_all_tf$intersection_size > 2,]
TF = ordered_back_all_tf
BP <- ordered_back_all
bp <- plotBP(BP)
tf <- make_TF_barplot(TF)

Internal - Pathway enrichment for cellWeighted_Foldchanges and bulk gene list

Description

This function completes pathway enrichment of cellWeighted_Foldchanges and bulk gene list.

Usage

pathway_enrich_internal(
  DEGs,
  theSpecies,
  scMappR_vals,
  background_genes,
  output_directory,
  plot_names,
  number_genes = -9,
  newGprofiler = FALSE,
  toSave = FALSE,
  path = NULL
)
pathway_enrich_internal(
  DEGs,
  theSpecies,
  scMappR_vals,
  background_genes,
  output_directory,
  plot_names,
  number_genes = -9,
  newGprofiler = FALSE,
  toSave = FALSE,
  path = NULL
)

Arguments

`DEGs`	Differentially expressed genes (gene_name, padj, log2fc).
`theSpecies`	Human, mouse, or a character that is compatible with g:ProfileR.
`scMappR_vals`	cell weighted Fold-changes of differentially expressed genes.
`background_genes`	A list of background genes to test against.
`output_directory`	Path to the directory where files will be saved.
`plot_names`	Names of output.
`number_genes`	Number of genes to if there are many, many DEGs.
`newGprofiler`	Whether to use g:ProfileR or gprofiler2 (T/F).
`toSave`	Allow scMappR to write files in the current directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.

Details

Internal: Pathway analysis of differentially expressed genes (DEGs) and cell weighted Fold-changes (cellWeighted_Foldchanges) for each cell-type. Returns .RData objects of differential analysis as well as plots of the top bulk pathways. It is a wrapper for making barplots, bulk pathway analysis, and gProfiler_cellWeighted_Foldchange.

Value

List with the following elements:

`BPs`	Enriched biological pathways for each cell-type.
`TFs`	Enriched transcription factors for each cell-type.

Examples



data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(bulk_normalized, odds_ratio_in, 
                                      bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                       plot_names = "tst1", theSpecies = "human", 
                                       output_directory = "tester",
                                       sig_matrix_size = 3000, up_and_downregulated = FALSE, 
                                       internet = FALSE)


data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(bulk_normalized, odds_ratio_in, 
                                      bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                       plot_names = "tst1", theSpecies = "human", 
                                       output_directory = "tester",
                                       sig_matrix_size = 3000, up_and_downregulated = FALSE, 
                                       internet = FALSE)

PBMC_scMappR

Description

Toy example of data where cell-weighted fold-changes and related downsteam analyses can be completed.

Usage

data(PBMC_example)
data(PBMC_example)

Format

A list containing three data frames, normalized count data, a signature matrix, and a list of differentially expressed genes.

bulk_normalized: A 3231 x 9 matrix where rows are genes, columns are samples, and the matrix is filled with CPM normalized counts.
odds_ratio_in: A 2336 x 7 matrix where rows are genes, columns are cell-types and matrix is filled with the odds-ratio that a gene is in each cell-type.
bulk_DE_cors: A 59 x 3 matrix of sex-specific genes found between male and female PBMC samples (female biased = upregulated). row and rownames are genes, columns are gene name, FDR adjusted p-value, and log2 fold-change. DEGs were computed with DESeq2 and genes with a log2FC > 1 were kept.

Details

A named list called "PBMC_example" containing the count data, signature matrix, and DEGs. The count data and signature matrix are shortened to fit the size of the package and do not reflect biologically relevant data.

Examples

data(PBMC_example)
data(PBMC_example)

Plot gProfileR Barplot

Description

Make a barplot of the top biological factors enriched by g:ProfileR.

Usage

plotBP(ordered_back_all, top_bp = 10)
plotBP(ordered_back_all, top_bp = 10)

Arguments

`ordered_back_all`	Output of the g:ProfileR function.
`top_bp`	The number of pathways you want to plot.

Details

This function takes a gProfileR output and prints the top "top_bp" most significantly enriched FDR adjusted p-values before plotting the rank of their p-values.

Value

plotBP A barplot of the number of "top_bp" pathways, ranked by -log10(Pfdr).

Examples


data(POA_example)
 POA_generes <- POA_example$POA_generes
 POA_OR_signature <- POA_example$POA_OR_signature
 POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- as.data.frame(POA_Rank_signature)
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
ordered_back_all <- gprofiler2::gost(query = rowname$rowname[1:100], organism = "mmusculus",
 ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE,
  measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05,
   correction_method = "fdr",   numeric_ns = "", sources = c("GO:BP", "KEGG", "REAC"))  
ordered_back_all <- ordered_back_all$result
ordered_back_all <- ordered_back_all[ordered_back_all$term_size > 15
 & ordered_back_all$term_size < 2000 & ordered_back_all$intersection_size > 2,]
ordered_back_all_tf <- gprofiler2::gost(query = rowname$rowname[1:150], organism = "mmusculus",
 ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE,
  measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05,
   correction_method = "fdr",  numeric_ns = "", sources = c("TF"))  
ordered_back_all_tf <- ordered_back_all_tf$result
ordered_back_all_tf <- ordered_back_all_tf[ordered_back_all_tf$term_size > 15 
& ordered_back_all_tf$term_size < 5000 & ordered_back_all_tf$intersection_size > 2,]
TF = ordered_back_all_tf
BP <- ordered_back_all
bp <- plotBP(ordered_back_all = BP)
tf <- make_TF_barplot(ordered_back_all_tf = TF)

 
data(POA_example)
 POA_generes <- POA_example$POA_generes
 POA_OR_signature <- POA_example$POA_OR_signature
 POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- as.data.frame(POA_Rank_signature)
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
ordered_back_all <- gprofiler2::gost(query = rowname$rowname[1:100], organism = "mmusculus",
 ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE,
  measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05,
   correction_method = "fdr",   numeric_ns = "", sources = c("GO:BP", "KEGG", "REAC"))  
ordered_back_all <- ordered_back_all$result
ordered_back_all <- ordered_back_all[ordered_back_all$term_size > 15
 & ordered_back_all$term_size < 2000 & ordered_back_all$intersection_size > 2,]
ordered_back_all_tf <- gprofiler2::gost(query = rowname$rowname[1:150], organism = "mmusculus",
 ordered_query = TRUE, significant = TRUE, exclude_iea = FALSE, multi_query = FALSE,
  measure_underrepresentation = FALSE, evcodes = FALSE, user_threshold = 0.05,
   correction_method = "fdr",  numeric_ns = "", sources = c("TF"))  
ordered_back_all_tf <- ordered_back_all_tf$result
ordered_back_all_tf <- ordered_back_all_tf[ordered_back_all_tf$term_size > 15 
& ordered_back_all_tf$term_size < 5000 & ordered_back_all_tf$intersection_size > 2,]
TF = ordered_back_all_tf
BP <- ordered_back_all
bp <- plotBP(ordered_back_all = BP)
tf <- make_TF_barplot(ordered_back_all_tf = TF)

Preoptic_Area

Description

Toy data for tissue_scMappR_custom, tissue_scMappR_internal, generes_to_heatmap.

Usage

data(POA_example)
data(POA_example)

Format

A list containing three objects: summary statistics of cell-type markers, a signature matrix of odds ratios, and a signature matrix of ranks.

POA_generes: A list of 27 data frames containing (up to 30) cell-type markers. Each element of the list is a dataframe where rows are genes, and columns are p-value, log2FC, percentage of cells expressing gene in cell-type, percentage of cells expressing gene in other cell-types, and FDR adjusted p-value.
POA_OR_signature: A 266 x 27 matrix where rows are genes, columns are cell-types and matrix is filled with the odds-ratio that a gene is in each cell-type.
POA_Rank_signature: A 266 x 27 matrix of matrix where rows are genes, columns are cell-types and matrix is filled with the rank := -log10(P_fdr) that a gene is in each cell-type.

Details

A named list called POA_example (pre-optic area example) containing three objects, POA_generes: a list of truncated dataframes containing summary statistics for each cell-type marker, POA_OR_signature a truncated signature matrix of odds ratio's for cell-types in the POA, and POA_Rank_signature a truncated signature matrix of -log10(Padj) for cell-type markers in the POA.

Examples

data(POA_example)
data(POA_example)

Count Matrix To Signature Matrix

Description

This function takes a list of count matrices, processes them, calls cell-types, and generates signature matrices.

Usage

process_dgTMatrix_lists(
  dgTMatrix_list,
  name,
  species_name,
  naming_preference = -9,
  rda_path = "",
  panglao_set = FALSE,
  haveUMAP = FALSE,
  saveSCObject = FALSE,
  internal = FALSE,
  toSave = FALSE,
  path = NULL,
  use_sctransform = FALSE,
  test_ctname = "wilcox",
  genes_integrate = 2000,
  genes_include = FALSE
)
process_dgTMatrix_lists(
  dgTMatrix_list,
  name,
  species_name,
  naming_preference = -9,
  rda_path = "",
  panglao_set = FALSE,
  haveUMAP = FALSE,
  saveSCObject = FALSE,
  internal = FALSE,
  toSave = FALSE,
  path = NULL,
  use_sctransform = FALSE,
  test_ctname = "wilcox",
  genes_integrate = 2000,
  genes_include = FALSE
)

Arguments

`dgTMatrix_list`	A list of matrices in the class of dgTMatrix object – sparce object – compatible with Seurat rownames should be of the same species for each.
`name`	The name of the outputted signature matrices, cell-type preferences, and Seurat objects if you choose to save them.
`species_name`	Mouse or human symbols, -9 if internal as Panglao objects have gene symbol and ensembl combined.
`naming_preference`	For cell-type naming, see if cell-types given the inputted tissues are more likely to be named within one of the categories. These categories are: "brain", "epithelial", "endothelial", "blood", "connective","eye", "epidermis", "Digestive", "Immune", "pancreas", "liver", "reproductive", "kidney", "respiratory".
`rda_path`	If saved, directory to where data from scMappR_data is downloaded.
`panglao_set`	If the inputted matrices are from Panglao (i.e. if they're internal).
`haveUMAP`	Save the UMAPs - requires additional packages (see Seurat for details).
`saveSCObject`	Save the Seurat object as an RData object (T/F).
`internal`	Was this used as part of the internal processing of Panglao datasets (T/F).
`toSave`	Allow scMappR to write files in the current directory (T/F)
`path`	If toSave == TRUE, path to the directory where files will be saved.
`use_sctransform`	If you should use sctransform or the Normalize/VariableFeatures/ScaleData pipeline (T/F).
`test_ctname`	statistical test for calling CT markers – must be in Seurat
`genes_integrate`	The number of genes to include in the integration anchors feature when combining datasets.
`genes_include`	TRUE or FALSE – include 2000 genes in signature matrix or all matrix.

Details

This function is a one line wrapper to process count matrices into a signature matrix. It combines process_from_count, two methods of identifying cell-type identities (GSVA and Fisher's test). Then, it takes the output of cell-type markers and converts it into a signature matrix of p-value ranks and odds ratios. It saves the Seurat object (if chosen with saveSCObject), cell-type identities from GSVA (its own object), and the signature matrices. Cell-type marker outputs are also saved in the generes .RData list. This is a list of cell-types containing all of the cell-type markers found with the FindMarkers function. Names of the generes lists and the signature matrices are kept.

Value

List with the following elements:

`wilcoxon_rank_mat_t`	A dataframe containing the signature matrix of ranks (-log10(Padj) * sign(fold-change)).
`wilcoxon_rank_mat_or`	A dataframe containing the signature matrix of odds-ratios.
`generes`	All cell-type markers for each cell-type with p-value and fold changes.
`cellLabel`	matrix where each row is a cluster and each column provides information on the cell-type. Columns provide info on the cluster from seurat, the cell-type label from CellMarker and Panglao using the fisher's exact test and GSVA, and the top 30 markers per cluser.

Examples


data(sm)
toProcess <- list(example = sm)
tst1 <- process_dgTMatrix_lists(dgTMatrix_list = toProcess, name = "testProcess",
                               species_name = "mouse", naming_preference = "eye", rda_path = "")


data(sm)
toProcess <- list(example = sm)
tst1 <- process_dgTMatrix_lists(dgTMatrix_list = toProcess, name = "testProcess",
                               species_name = "mouse", naming_preference = "eye", rda_path = "")

Count Matrix To Seurat Object

Description

This function processes a list of count matrices (same species/gene symbols in each list) and converts them to a Seurat object.

Usage

process_from_count(
  countmat_list,
  name,
  theSpecies = -9,
  haveUmap = FALSE,
  saveALL = FALSE,
  panglao_set = FALSE,
  toSave = FALSE,
  path = NULL,
  use_sctransform = FALSE,
  genes_integrate = 2000,
  genes_include = FALSE
)
process_from_count(
  countmat_list,
  name,
  theSpecies = -9,
  haveUmap = FALSE,
  saveALL = FALSE,
  panglao_set = FALSE,
  toSave = FALSE,
  path = NULL,
  use_sctransform = FALSE,
  genes_integrate = 2000,
  genes_include = FALSE
)

Arguments

`countmat_list`	A list of count matrices that will be integrated using the IntegrationAnchors features they should have the same rownames. A dgCMatrix or matrix object is also acceptable, and no samples will be integrated.
`name`	The output of the normalized and fused Seurat object if you choose to keep it.
`theSpecies`	Gene symbols for human, mouse, or -9 if internal. If your species is not human or mouse gene symbols, make sure that you have "MT-" before your mitochondrial gene names then pick "human".
`haveUmap`	Write a UMAP (T/F).
`saveALL`	Save the Seurat object generated (T/F).
`panglao_set`	If the function is being used from internal (T/F).
`toSave`	Allows scMappR to print files and make directories locally (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.
`use_sctransform`	If you should use sctransform or the Normalize/VariableFeatures/ScaleData pipeline (T/F).
`genes_integrate`	The number of genes to include in the integration anchors feature when combining datasets
`genes_include`	TRUE or FALSE – include 2000 genes in signature matrix or all matrix.

Details

This function takes a list of count matrices and returns a Seurat object of the count matrices integrated using Seurat v4 (and IntegrationAnchors feature). Different normalization features such as the SCTransform pipeline are also available in this function. Different options are used when the function is being ran internally (i.e. reprocessing count matrices from PanglaoDB) or if it is running from custom scRNA-seq data. Larger scRNA-seq datasets can take considerable amounts of memory and run-time. See Seurat for details.

Value

process_from_count A processed and integrated Seurat object that has been scaled and clustered. It can be returned as an internal object or also stored as an RData object if necessary.

Examples



data(sm)
toProcess <- list(example = sm)
tst1 <- process_from_count(countmat_list = toProcess, name = "testProcess", 
                           theSpecies = "mouse")


data(sm)
toProcess <- list(example = sm)
tst1 <- process_from_count(countmat_list = toProcess, name = "testProcess", 
                           theSpecies = "mouse")

Generate cellWeighted_Foldchanges, visualize, and enrich.

Description

This function generates cell weighted Fold-changes (cellWeighted_Foldchange), visualizes them in a heatmap, and completes pathway enrichment of cellWeighted_Foldchanges and the bulk gene list using g:ProfileR.

Usage

scMappR_and_pathway_analysis(
  count_file,
  signature_matrix,
  DEG_list,
  case_grep,
  control_grep,
  rda_path = "",
  max_proportion_change = -9,
  print_plots = T,
  plot_names = "scMappR",
  theSpecies = "human",
  output_directory = "scMappR_analysis",
  sig_matrix_size = 3000,
  drop_unknown_celltype = TRUE,
  internet = TRUE,
  up_and_downregulated = FALSE,
  gene_label_size = 0.4,
  number_genes = -9,
  toSave = FALSE,
  newGprofiler = TRUE,
  path = NULL,
  deconMethod = "DeconRNASeq",
  rareCT_filter = TRUE
)
scMappR_and_pathway_analysis(
  count_file,
  signature_matrix,
  DEG_list,
  case_grep,
  control_grep,
  rda_path = "",
  max_proportion_change = -9,
  print_plots = T,
  plot_names = "scMappR",
  theSpecies = "human",
  output_directory = "scMappR_analysis",
  sig_matrix_size = 3000,
  drop_unknown_celltype = TRUE,
  internet = TRUE,
  up_and_downregulated = FALSE,
  gene_label_size = 0.4,
  number_genes = -9,
  toSave = FALSE,
  newGprofiler = TRUE,
  path = NULL,
  deconMethod = "DeconRNASeq",
  rareCT_filter = TRUE
)

Arguments

`count_file`	Normalized (i.e. TPM, RPKM, CPM) RNA-seq count matrix where rows are gene symbols and columns are individuals. Inputted data should be a data.frame or matrix. A character vector to a tsv file where this data can be loaded is also acceptable. Gene symbols from the count file, signature matrix, and DEG list should all match (case sensitive, gene symbol or ensembl, etc.)
`signature_matrix`	Signature matrix: a gene by cell-type matrix populated with the fold-change of gene expression in cell-type marker "i" vs all other cell-types. Object should be a data.frame or matrix.
`DEG_list`	An object with the first column as gene symbols within the bulk dataset (doesn't have to be in signature matrix), second column is the adjusted p-value, and the third the log2FC path to a .tsv file containing this info is also acceptable.
`case_grep`	A character representing what designates the "cases" (i.e. upregulated is 'case' biased) in the columns of the count file. A numeric vector of the index of "cases" is also acceptable. Tag in the column name for cases (i.e. samples representing upregulated) OR an index of cases.
`control_grep`	A character representing what designates the "control" (i.e. downregulated is 'control biased) in the columns of the count file. A numeric vector of the index of "control" is also acceptable. Tag in the column name for cases (i.e. samples representing upregulated) OR an index of cases.
`rda_path`	If downloaded, path to where data from scMappR_data is stored.
`max_proportion_change`	Maximum cell-type proportion change – may be useful if there are many rare cell-type. Alternatively, if a cell-type is only present in one condition but not the other, it will prevent possible infinite or 0 cwFold-changes.
`print_plots`	Whether boxplots of the estimated CT proportion for the leave-one-out method of CT deconvolution should be printed. The same name of the plots will be completed for top pathways.
`plot_names`	The prefix of plot pdf files.
`theSpecies`	human, mouse, or a species directly compatible with gProfileR (i.e. g:ProfileR).
`output_directory`	The name of the directory that will contain output of the analysis.
`sig_matrix_size`	Maximum number of genes in signature matrix for cell-type deconvolution.
`drop_unknown_celltype`	Whether or not to remove "unknown" cell-types from the signature matrix.
`internet`	Whether you have stable Wifi (T/F).
`up_and_downregulated`	Whether you are additionally splitting up/downregulated genes (T/F).
`gene_label_size`	The size of the gene label on the plot.
`number_genes`	The number of genes to cut-off for pathway analysis (good with many DEGs).
`toSave`	Allow scMappR to write files in the current directory (T/F).
`newGprofiler`	Whether to use gProfileR or gprofiler2 (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.
`deconMethod`	Which RNA-seq deconvolution method to use to estimate cell-type proporitons. Options are "WGCNA", "DCQ", or "DeconRNAseq"
`rareCT_filter`	option to keep cell-types rarer than 0.1 percent of the population (T/F). Setting to FALSE may lead to false-positives.

Details

This function generates cellWeighted_Foldchanges for every cell-type (see deconvolute_and_contextualize), as well as accompanying data such as cell-type proportions with the DeconRNA-seq, WGCNA, or DCQ methods. Then, it generates heatmaps of all cellWeighted_Foldchanges, cellWeighted_Foldchanges overlapping with the signature matrix, the entire signature matrix, the cell-type preference values from the signature matrix that overlap with inputted differentially expressed genes. Then, assuming there is available internet, it will complete gProfileR of the reordered cellWeighted_Foldchanges as well as a the ordered list of genes. This function is a wrapper for deconvolute_and_contextualize and pathway_enrich_internal and the primary function within the package.

Value

List with the following elements:

`cellWeighted_Foldchanges`	Cellweighted Fold-changes for all differentially expressed genes.
`paths`	Enriched biological pathways for each cell-type.
`TFs`	Enriched TFs for each cell-type.

Examples


data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized, 
                                      signature_matrix = odds_ratio_in, 
                                      DEG_list = bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                      plot_names = "tst1", theSpecies = "human", 
                                      output_directory = "tester",
                                      sig_matrix_size = 3000,
                                      up_and_downregulated = FALSE, 
                                      internet = FALSE)



data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized, 
                                      signature_matrix = odds_ratio_in, 
                                      DEG_list = bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                      plot_names = "tst1", theSpecies = "human", 
                                      output_directory = "tester",
                                      sig_matrix_size = 3000,
                                      up_and_downregulated = FALSE, 
                                      internet = FALSE)

scMappR_tissues

Description

Tissues available in scMappR.

Usage

data(scMappR_tissues)
data(scMappR_tissues)

Format

A vector of tissue names available for tissue_scMappR_internal or to download and use in scMappR_and_pathway_analysis.

scMappR_tissues: A list of 174 tissue names from PanglaoDB.

Details

A vector of tissues available in scMappR.

Examples

data(scMappR_tissues)
data(scMappR_tissues)

Identify all cell-type markers

Description

Takes processed Seurat matrix and identifies cell-type markers with FindMarkers in Seurat.

Usage

seurat_to_generes(pbmc, test = "wilcox")
seurat_to_generes(pbmc, test = "wilcox")

Arguments

`pbmc`	Processed Seurat object.
`test`	statistical test for calling CT markers – must be in Seurat.

Details

Internal: This function runs the FindMarkers function from Seurat in a loop, will use the Seurat v2 or Seurat v3 object after identifying which Seurat object is inputted. It then takes the output of the FindMarkers and puts it in a list, returning it.

Value

seurat_to_generes A list of genes where their over-representation in the i'th cell-type is computed. Each element contains the gene name, adjusted p-value, and the log2Fold-Change of each gene being present in that cell-type.

Examples

  
 
data(sm)
toProcess <- list(example = sm)
tst1 <- process_from_count(countmat_list = toProcess,name = "testProcess",
                          theSpecies = "mouse")
generes <- seurat_to_generes(pbmc = tst1)


data(sm)
toProcess <- list(example = sm)
tst1 <- process_from_count(countmat_list = toProcess,name = "testProcess",
                          theSpecies = "mouse")
generes <- seurat_to_generes(pbmc = tst1)

Single cell-type gene preferences

Description

Measure enrichment of individual cell-types in a signature matrix.

Internal function as part of tissue_scMappR_internal(). This function takes genes preferentially expressed within a gene list, each cell-type and the background (i.e. all genes within the signature matrix) before completing the cell-type specific enrichment of the inputted gene list on each cell type. This function then returns a table describing the cell-type enrichments (p-value and odds ratio) of each cell-type.

Usage

single_gene_preferences(
  hg_short,
  hg_full,
  study_name,
  outDir,
  toSave = FALSE,
  path = NULL
)
single_gene_preferences(
  hg_short,
  hg_full,
  study_name,
  outDir,
  toSave = FALSE,
  path = NULL
)

Arguments

`hg_short`	A list with two objects: a "preferences" and a "genesIn". Preferences is a list of gene symbols over-represented in each cell-type and genesIn were all the inputted genes.
`hg_full`	The same as hg_short but for every gene in the signature matrix.
`study_name`	Name of output table.
`outDir`	Directory where table is outputted.
`toSave`	Allow scMappR to write files in the current directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.

Value

single_gene_preferences A gene-set enrichment table of individual cell-type enrichment.

Examples



# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
sig <- get_gene_symbol(POA_Rank_signature)
Signature <- POA_Rank_signature
rownames(Signature) <- sig$rowname
genes <- rownames(Signature)[1:60]
heatmap_test <- tissue_scMappR_custom(gene_list = genes, signature_matrix = Signature,
                                      output_directory =  "scMappR_test", toSave = FALSE)
single_preferences <- heatmap_test$single_celltype_preferences
 
# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
sig <- get_gene_symbol(POA_Rank_signature)
Signature <- POA_Rank_signature
rownames(Signature) <- sig$rowname
genes <- rownames(Signature)[1:60]
heatmap_test <- tissue_scMappR_custom(gene_list = genes, signature_matrix = Signature,
                                      output_directory =  "scMappR_test", toSave = FALSE)
single_preferences <- heatmap_test$single_celltype_preferences

single_cell_process

Description

Example data for processing scRNA-seq count data with Seurat.

Usage

data(sm)
data(sm)

Format

A 752 x 236 matrix of class dgCMatrix where rows are genes and columns are cells. Data matrix is filled with counts detected from scRNAseq.

TCTCTAACACAGGCCT: Barcode of one of the sequenced cells present. Each column is the count from a scRNA-seq dataset reprocessed by PanglaoDB.

Details

A dgCMatrix object containing count data for scRNA-seq processing.

Examples

data(sm)
data(sm)

tissue_by_celltype_enrichment

Description

This function uses a Fisher's-exact-test to rank gene-set enrichment.

Usage

tissue_by_celltype_enrichment(
  gene_list,
  species,
  name = "CT_Tissue_example",
  p_thresh = 0.05,
  rda_path = "",
  isect_size = 3,
  return_gmt = FALSE
)
tissue_by_celltype_enrichment(
  gene_list,
  species,
  name = "CT_Tissue_example",
  p_thresh = 0.05,
  rda_path = "",
  isect_size = 3,
  return_gmt = FALSE
)

Arguments

`gene_list`	A character vector of gene symbols with the same designation (e.g. mouse symbol - mouse, human symbol - human) as the gene set database.
`species`	Species of cell-type marker to use ('human' or 'mouse').
`name`	Name of the pdf to be printed.
`p_thresh`	The Fisher's test cut-off for a cell-marker to be enriched.
`rda_path`	Path to a .rda file containing an object called "gmt". Either human or mouse cell-type markers split by experiment. If the correct file isn't present they will be downloaded from https://github.com/wilsonlabgroup/scMappR_Data.
`isect_size`	Number of genes in your list and the cell-type.
`return_gmt`	Return .gmt file – recommended if downloading from online as it may have updated (T/F).

Details

Complete a Fisher's-exact test of an input list of genes against one of the two curated tissue by cell-type marker datasets from scMappR.

Value

List with the following elements:

`enriched`	Data frame of enriched cell-types from tissues.
`gmt`	Cell-markers in enriched cell-types from tissues.

Examples



data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]

enriched <- tissue_by_celltype_enrichment(gene_list = genes, 
species = "mouse",p_thresh = 0.05, isect_size = 3)


 
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
Signature <- POA_Rank_signature
rowname <- get_gene_symbol(Signature)
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:100]

enriched <- tissue_by_celltype_enrichment(gene_list = genes, 
species = "mouse",p_thresh = 0.05, isect_size = 3)

Gene List Visualization and Enrichment with Custom Signature Matrix

Description

This function visualizes signature matrix, clusters subsetted genes, completes enrichment of individual cell-types and co-enrichment.

Usage

tissue_scMappR_custom(
  gene_list,
  signature_matrix,
  output_directory = "custom_test",
  toSave = FALSE,
  path = NULL,
  gene_cutoff = 1,
  is_pvalue = TRUE
)
tissue_scMappR_custom(
  gene_list,
  signature_matrix,
  output_directory = "custom_test",
  toSave = FALSE,
  path = NULL,
  gene_cutoff = 1,
  is_pvalue = TRUE
)

Arguments

`gene_list`	A list of gene symbols matching that of the signature_matrix. Any gene symbol is acceptable.
`signature_matrix`	Pre-computed signature matrix with matching gene names.
`output_directory`	Directory made containing output of functions.
`toSave`	Allow scMappR to write files in the current directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.
`gene_cutoff`	Value cut-off (generally rank := log10(Padj)) for a gene to be considered a marker.
`is_pvalue`	If signature matrix is p-value before rank is applied (not recommended) (T/F).

Details

This function is roughly the same as tissue_scMappR_internal, however now there is a custom signature matrix. It generates a heatmap of the signature matrix and your inputted gene list, as well as single cell-type and co-celltype enrichment.

Value

List with the following elements:

`background_heatmap`	Data frame of the entire gene by cell-type signature matrix inputted.
`gene_list_heatmap`	Data frame of inputted signature matrix subsetted by input genes.
`single_celltype_preferences`	Data frame of enriched cell-types.
`group_celtype_preference`	Data frame of groups of cell-types enriched by the same genes.

Examples




# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
sig <- get_gene_symbol(POA_Rank_signature)
Signature <- POA_Rank_signature
rownames(Signature) <- sig$rowname
genes <- rownames(Signature)[1:60]
heatmap_test <- tissue_scMappR_custom(gene_list =  genes, signature_matrix = Signature,
                                      output_directory =  "scMappR_test", toSave = FALSE)



# load in signature matrices
data(POA_example)
POA_generes <- POA_example$POA_generes
POA_OR_signature <- POA_example$POA_OR_signature
POA_Rank_signature <- POA_example$POA_Rank_signature
sig <- get_gene_symbol(POA_Rank_signature)
Signature <- POA_Rank_signature
rownames(Signature) <- sig$rowname
genes <- rownames(Signature)[1:60]
heatmap_test <- tissue_scMappR_custom(gene_list =  genes, signature_matrix = Signature,
                                      output_directory =  "scMappR_test", toSave = FALSE)

Gene List Visualization and Enrichment (Internal)

Description

This function loops through every signature matrix in a particular tissue and generates heatmaps, cell-type preferences, and co-enrichment.

Usage

tissue_scMappR_internal(
  gene_list,
  species,
  output_directory,
  tissue,
  rda_path = "",
  cluster = "Pval",
  genecex = 0.01,
  raw_pval = FALSE,
  path = NULL,
  toSave = FALSE,
  drop_unkown_celltype = FALSE
)
tissue_scMappR_internal(
  gene_list,
  species,
  output_directory,
  tissue,
  rda_path = "",
  cluster = "Pval",
  genecex = 0.01,
  raw_pval = FALSE,
  path = NULL,
  toSave = FALSE,
  drop_unkown_celltype = FALSE
)

Arguments

`gene_list`	A list of gene symbols, mouse or human.
`species`	"mouse", "human" or "-9" if using a precomputed signature matrix.
`output_directory`	If toSave = TRUE, the name of the output directory that would be built.
`tissue`	Name of the tissue in "get_tissues".
`rda_path`	Path to the .rda file containing all of the signature matrices.
`cluster`	'Pval' or 'OR' depending on if you want to cluster odds ratios or p-values of cell-type preferences.
`genecex`	The size of the gene names of the rows in the heatmap.
`raw_pval`	If the inputted signature matrix are raw (untransformed) p-values – recommended to generate rank first (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.
`toSave`	Allow scMappR to write files in the current directory (T/F).
`drop_unkown_celltype`	Whether or not to remove "unknown" cell-types from the signature matrix (T/F).

Details

This function takes a list of genes and a tissue that is contained in current signature matrices before and generating heatmaps of cell-type preferences. It then completes cell-type enrichment of each individual cell-type, then, if more than two cell-types are significantly enriched, co-enrichment. of those enriched cell-types is then computed.

Value

List with the following elements:

`background_heatmap`	Data frame of the entire gene by cell-type signature matrix inputted.
`gene_list_heatmap`	Data frame of inputted signature matrix subsetted by input genes.
`single_celltype_preferences`	Data frame of enriched cell-types.
`group_celtype_preference`	Data frame of groups of cell-types enriched by the same genes.

Examples



data(POA_example) # region to preoptic area
Signature <- POA_example$POA_Rank_signature # signature matrix
rowname <- get_gene_symbol(Signature) # get signature
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:60]
rda_path1 = "" # data directory (if it exists)

# set toSave = TRUE and path = output directory of your choice
internal <- tissue_scMappR_internal(gene_list = genes, species = "mouse",
                                   output_directory = "scMappR_TesInternal",
                                   tissue = "hypothalamus", toSave = FALSE) 

 
data(POA_example) # region to preoptic area
Signature <- POA_example$POA_Rank_signature # signature matrix
rowname <- get_gene_symbol(Signature) # get signature
rownames(Signature) <- rowname$rowname
genes <- rownames(Signature)[1:60]
rda_path1 = "" # data directory (if it exists)

# set toSave = TRUE and path = output directory of your choice
internal <- tissue_scMappR_internal(gene_list = genes, species = "mouse",
                                   output_directory = "scMappR_TesInternal",
                                   tissue = "hypothalamus", toSave = FALSE)

To Character.

Description

This function checks if your vector is not a character and if not, will convert it to a character.

Usage

tochr(x)
tochr(x)

Arguments

`x`	A character, factor or numeric vector.

Value

tochr Returns a character vector.

Examples


 
 # vector of factors
 fact <- factor(c("a", "b", "c", "d"))
 # convert to character
 char <- tochr(x = fact)
 
# vector of factors
 fact <- factor(c("a", "b", "c", "d"))
 # convert to character
 char <- tochr(x = fact)

To Numeric.

Description

This function checks if your vector is not a character and if it is, then converts it to a numeric.

Usage

toNum(x)
toNum(x)

Arguments

`x`	A character, factor, or numeric vector.

Value

toNum Returns a numeric vector.

Examples


 
 # vector of factors
fact <- factor(c("1", "2", "3", "4"))
# convert to numeric
num <- toNum(x = fact)
 
# vector of factors
fact <- factor(c("1", "2", "3", "4"))
# convert to numeric
num <- toNum(x = fact)

Extract Top Markers

Description

Internal – Extracts strongest cell-type markers from a Seurat object.

Usage

topgenes_extract(generes, padj = 0.05, FC = 1.5, topNum = 30)
topgenes_extract(generes, padj = 0.05, FC = 1.5, topNum = 30)

Arguments

`generes`	A list of cell-type markers with fold-changes and p-values (FindMarkers output in Seurat).
`padj`	The p-value (FDR) cutoff.
`FC`	The fold-change cutoff.
`topNum`	The number of genes to extract.

Details

Internal, this function runs through a list of outputs from FindMarkers objects in Seurat and will extract genes past a padj and fold-change threshold. Then it extracts the topNum number of genes. if you have not used the FindMarkers function, then a list of summary statistics with fold-change designated by avg_logFC and p-val by p_val_adj.

Value

topgenes_extract Returns a list of character vectors with the top (topNum) of gene markers for each cell-type.

Examples

 
 
 # load generes object
 data(POA_example)
 topGenes <- topgenes_extract(generes = POA_example$POA_generes)


# load generes object
 data(POA_example)
 topGenes <- topgenes_extract(generes = POA_example$POA_generes)

two_method_pathway_enrichment

Description

Pathway analysis of each cell-type based on cell-type specificity and rank improvement by scMappR.

Usage

two_method_pathway_enrichment(
  DEG_list,
  theSpecies,
  scMappR_vals,
  background_genes = NULL,
  output_directory = "output",
  plot_names = "reweighted",
  number_genes = -9,
  newGprofiler = TRUE,
  toSave = FALSE,
  path = NULL
)
two_method_pathway_enrichment(
  DEG_list,
  theSpecies,
  scMappR_vals,
  background_genes = NULL,
  output_directory = "output",
  plot_names = "reweighted",
  number_genes = -9,
  newGprofiler = TRUE,
  toSave = FALSE,
  path = NULL
)

Arguments

`DEG_list`	Differentially expressed genes (gene_name, padj, log2fc).
`theSpecies`	Human, mouse, or a character that is compatible with g:ProfileR.
`scMappR_vals`	cell weighted Fold-changes of differentially expressed genes.
`background_genes`	A list of background genes to test against. NULL assumes all genes in g:profileR gene set databases.
`output_directory`	Path to the directory where files will be saved.
`plot_names`	Names of output.
`number_genes`	Number of genes to if there are many, many DEGs.
`newGprofiler`	Whether to use g:ProfileR or gprofiler2 (T/F).
`toSave`	Allow scMappR to write files in the current directory (T/F).
`path`	If toSave == TRUE, path to the directory where files will be saved.

Details

This function re-ranks cwFoldChanges based on their absolute cell-type specificity scores (per-celltype) as well as their rank increase in cell-type specificity before completing an ordered pathway analysis. In the second method, only genes with a rank increase in cell-type specificity were included.

Value

List with the following elements:

`rank_increase`	A list containing the degree of rank change between bulk DE genes and cwFold-changes. Pathway enrichment and TF enrichment of these reranked genes.
`non_rank_increase`	list of DFs containing the pathway and TF enrichment of cwFold-changes.

Examples



# load data for scMappR
data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"

# calculate cwFold-changes
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized, 
                                      signature_matrix = odds_ratio_in, 
                                      DEG_list = bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                      plot_names = "tst1", theSpecies = "human", 
                                      output_directory = "tester",
                                      sig_matrix_size = 3000,
                                      up_and_downregulated = FALSE, 
                                      internet = FALSE)

# complete pathway enrichment using both methods
twoOutFiles <- two_method_pathway_enrichment(DEG_list = bulk_DE_cors,theSpecies = "human",
scMappR_vals = toOut$cellWeighted_Foldchange, background_genes = rownames(bulk_normalized), 
output_directory = "newfun_test",plot_names = "nonreranked_", toSave = FALSE)


 
# load data for scMappR
data(PBMC_example)
bulk_DE_cors <- PBMC_example$bulk_DE_cors
bulk_normalized <- PBMC_example$bulk_normalized
odds_ratio_in <- PBMC_example$odds_ratio_in
case_grep <- "_female"
control_grep <- "_male"
max_proportion_change <- 10
print_plots <- FALSE
theSpecies <- "human"

# calculate cwFold-changes
toOut <- scMappR_and_pathway_analysis(count_file = bulk_normalized, 
                                      signature_matrix = odds_ratio_in, 
                                      DEG_list = bulk_DE_cors, case_grep = case_grep,
                                      control_grep = control_grep, rda_path = "", 
                                      max_proportion_change = 10, print_plots = TRUE, 
                                      plot_names = "tst1", theSpecies = "human", 
                                      output_directory = "tester",
                                      sig_matrix_size = 3000,
                                      up_and_downregulated = FALSE, 
                                      internet = FALSE)

# complete pathway enrichment using both methods
twoOutFiles <- two_method_pathway_enrichment(DEG_list = bulk_DE_cors,theSpecies = "human",
scMappR_vals = toOut$cellWeighted_Foldchange, background_genes = rownames(bulk_normalized), 
output_directory = "newfun_test",plot_names = "nonreranked_", toSave = FALSE)

Package 'scMappR'

Help Index

Fisher's Exact Cell-Type Identification.

Description

Usage

Arguments

Details

Value

Examples

Identify co-expressed cell-types

Description

Usage

Arguments

Details

Value

Examples

compare_deconvolution_methods

Description

Usage

Arguments

Value

Examples

Measure cell-type specificity of cell-weighted Fold-changes

Description

Usage

Arguments

Details

Value

Examples

DeconRNASeq CRAN compatible

Description

Usage

Arguments

Details

Value

Examples

Generate cell weighted Fold-Changes (cwFold-changes)

Description

Usage

Arguments

Details

Value

Examples

Extract Markers

Description

Usage

Arguments

Details

Value

Examples

Generate signature matrix

Description

Usage

Arguments

Details

Value

Examples

Internal – get gene symbol from Panglao.db assigned gene-names (symbol-ensembl).

Description

Usage

Arguments

Details

Value

Examples

Get signature matrices.

Description

Usage

Arguments

Value

Examples

gmt_example

Description

Usage

Format

Details

Examples

Pathway enrichment for cwFold-changes

Description

Usage

Arguments