We designed a method to perform statistical testing on TF-target gene pairs from the Random Forest regulatory weights inference. The idea is to build a first biologically relevant network with the strongest importances given by a prior GENIE3 run, that would then be refined by statistical testing. Those tests are performed by the rfPermute package, providing empirical pvalues on observed importance values from response variable permutations.

test_edges(
  mat,
  normalized_counts,
  nGenes,
  nRegulators,
  density = 0.02,
  nTrees = 1000,
  nShuffle = 1000,
  nCores = ifelse(is.na(parallel::detectCores()), 1, max(parallel::detectCores() - 1,
    1)),
  verbose = TRUE
)

Arguments

mat

matrix containing the importance values for each target and regulator (preferably computed with GENIE3 and the OOB importance metric)

normalized_counts

normalized expression data containing the genes present in mat argument, and such as used for the first network inference step.

nGenes

number of total genes in the network, union of thetarget genes, and regulators

nRegulators

number of regulators used for the network inference step

density

approximate desired density, that will be used to build a first network, which edges are the one to be statistically tested. Default is 0.02. Biological networks are known to have densities (ratio of edges over total possible edges in the graph) between 0.1 and 0.001. The number of genes and regulators are needed to compute the density.

nTrees

number of trees used for random forest importance computations

nShuffle

number of times the response variable (target gene expression) is randomized in order to estimate the null distribution of the predictive variables (regulators) importances.

nCores

Number of CPU cores to use during the procedure. Default is the detected number of cores minus one.

verbose

If set to TRUE, a feedback on the progress of the calculations is given. Default: TRUE

Value

named list containing the edges pvalues, as well as graphics intended to guide the choice of a pvalue threshold for the final network:

  • links: a dataframe containing the links of the network before testing, as built from the user defined prior density. All edges are associated to their pvalue and fdr-adjusted pvalue.

  • fdr_nEdges_curve : relation between the fdr threshold, and the final number of edges in the final network

Examples

if (FALSE) { data("abiotic_stresses") data("gene_annotations") data("regulators_per_organism") genes <- get_locus(abiotic_stresses$heat_DEGs) regressors <- intersect(genes, regulators_per_organism$`Arabidopsis thaliana`) data <- aggregate_splice_variants(abiotic_stresses$normalized_counts) r <- DIANE::group_regressors(data, genes, regressors) mat <- DIANE::network_inference(r$counts, conds = abiotic_stresses$conditions, targets = r$grouped_genes, regressors = r$grouped_regressors, importance_metric = "MSEincrease_oob", verbose = TRUE) res <- DIANE::test_edges(mat, normalized_counts = r$counts, density = 0.02, nGenes = length(r$grouped_genes), nRegulators = length(r$grouped_regressors), nTrees = 1000, verbose = TRUE) }