The function corrects for different sequencing depths bewteen samples. It relies on the TCC package, to build a TCC-class object containing the raw counts and the conditions for each sample. The function calcNormFactors is then applied, and uses the method chosen by the user. It can, weather or not, proceed to a first step or removing potentially differentially expressed genes to have less biased normalisation factors in the second normalization step. It returns a TCC object, with an element norm_factors containing the computed normalization factors.
normalize( data, conditions = stringr::str_split_fixed(colnames(data), "_", 2)[, 1], norm_method = "tmm", deg_method = "deseq2", fdr = 0.01, iteration = TRUE )
data | raw counts to be normalized (data frame or matrix), with genes as rownames and conditions as columns. |
---|---|
conditions | condition of each column of the data argument. Default is all the conditions in the experiment. (as defined by the underscore prefixes). |
norm_method | method used for normalization, between tmm or deseq2 |
deg_method | method used for DEGs detection if chosen, between edgeR ou deseq2 |
fdr | pvalue threshold for adjusted pvalues for DEGs detection if chosen |
iteration | weather or not to perform a prior removal of DEGs (TRUE or FALSE) |
a TCC-Class object
Filtering low counts is highly recommended after normalization,
consider using the DIANE::filter_low_counts
function just after this function.
You can get the normalized expression matrix with TCC::getNormalizedData(tcc)
,
tcc being the result of DIANE::normalize()
or DIANE::filter_low_counts()
data("abiotic_stresses") tcc_object <- DIANE::normalize(abiotic_stresses$raw_counts, abiotic_stresses$conditions, iteration = FALSE)