Performs expression based clustering on genes. Uses the coseq package to fit either Poisson or Gaussian mixtures to genes clusters, estimating their multivariate distribution parameters via an EM algorithm. Several numbers of clusters can be tested, and evaluated in terms of of likelihood to help the user in the decision

run_coseq(
  conds,
  genes,
  data,
  K = 6:12,
  transfo = "none",
  model = "Poisson",
  seed = NULL
)

Arguments

conds

Condition names to be used for clustering. Must be a unique vector containing the conditions you want to consider for gene clustering, without the replicate information (string before the underscore in sample names)

genes

Genes used as an input for the clustering. They must be present in the row names of data.

data

normalized counts with genes as rownames and samples as columns

K

range of number of clusters to test.

transfo

Transformation to apply to normalized counts before modeling with "Normal" Mixture Models. It must be : “arcsin”, “logit”, “logMedianRef”, “profile”, “logclr”, “clr”, “alr”, “ilr”, or “none”. For "Poisson", no transformation will be used, this argument will be ignored.

model

Model to use for mixture models : to choose between Poisson or Normal.

seed

seed for random state to ensure reproducible runs

Value

Named list containing the coseq run result as "model", and the cluster membership for each gene as "membership".

Examples

data("abiotic_stresses") genes <- abiotic_stresses$heat_DEGs clustering <- run_coseq(conds = unique(abiotic_stresses$conditions), data = abiotic_stresses$normalized_counts, genes = genes, K = 6:9)
#> **************************************** #> coseq analysis: Poisson approach & none transformation #> K = 6 to 9 #> Use seed argument in coseq for reproducible results. #> **************************************** #> Running g = 6 ... #> Running g = 7 ... #> Running g = 8 ... #> Running g = 9 ...