Performs expression based clustering on genes. Uses the coseq package to fit either Poisson or Gaussian mixtures to genes clusters, estimating their multivariate distribution parameters via an EM algorithm. Several numbers of clusters can be tested, and evaluated in terms of of likelihood to help the user in the decision
run_coseq( conds, genes, data, K = 6:12, transfo = "none", model = "Poisson", seed = NULL )
conds | Condition names to be used for clustering. Must be a unique vector containing the conditions you want to consider for gene clustering, without the replicate information (string before the underscore in sample names) |
---|---|
genes | Genes used as an input for the clustering. They must be present in the row names of data. |
data | normalized counts with genes as rownames and samples as columns |
K | range of number of clusters to test. |
transfo | Transformation to apply to normalized counts before modeling with "Normal" Mixture Models. It must be : “arcsin”, “logit”, “logMedianRef”, “profile”, “logclr”, “clr”, “alr”, “ilr”, or “none”. For "Poisson", no transformation will be used, this argument will be ignored. |
model | Model to use for mixture models : to choose between Poisson or Normal. |
seed | seed for random state to ensure reproducible runs |
Named list containing the coseq run result as "model", and the cluster membership for each gene as "membership".
data("abiotic_stresses") genes <- abiotic_stresses$heat_DEGs clustering <- run_coseq(conds = unique(abiotic_stresses$conditions), data = abiotic_stresses$normalized_counts, genes = genes, K = 6:9)#> **************************************** #> coseq analysis: Poisson approach & none transformation #> K = 6 to 9 #> Use seed argument in coseq for reproducible results. #> **************************************** #> Running g = 6 ... #> Running g = 7 ... #> Running g = 8 ... #> Running g = 9 ...