SETGEN.generate.sets {SETGEN} | R Documentation |
For each variable (gene) the function creates a set of variables which correlate with this variable. The usual centered Pearson correlation coefficient is used.
SETGEN.generate.sets(X1, X2 = NULL, r2min1 = 0.5, r2min2 = NULL, rposonly1 = TRUE, rposonly2 = FALSE, pivdom = TRUE)
X1 |
An n1 by p matrix with variables in columns and samples in rows. The columns (variables) should be named. |
X2 |
NULL or an n2 by p matrix with variables in columns and samples in rows. The names
of the variables must be identical with those in X1 . |
r2min1 |
Lower bound for a squared correlation between a source variable (center) of a set
and any of its other members. The parameter is used for creating sets of variables.
The correlation is computed in the matrix X1 . |
r2min2 |
Similar as r2min1 but the correlation is computed in X2 . |
rposonly1 |
A logical variable. If TRUE then a variable is included in a set
only if the correlation between the variable and the source variable of the set is positive.
if FALSE then this correlation can be positive or negative.
The variable refers to the correlations computed in X1 . |
rposonly2 |
Similar as rposonly1 for correlations computed in X2 . |
pivdom |
A logical variable. If TRUE then sets of variables are created
such that their source variables dominate other members of the sets with respect
to their total sum of squares (variance). This setting reduces the number of the created sets.
The dominance condition applies only to X1 . |
For each variable the function creates a set of variables which correlate
with the variable. The condition on the correlations can be checked in one data set X1
or simultaneously in two data sets if X2
is also given. The lower thresholds
for the squared correlations are specified in r2min1
and r2min2
. Whether
the correlations are required to be positive is specified in rposonly1
and rposonly2
.
The returned list of sets can be tested for correlation with a response with the function
SETGEN
.
A list of the names of the variables belonging to each set.
The names of the components of the list are the names of the respective
source variables. The list has the same format as returned by SETGEN
in the component grpdata
.
Rosolowski, M., Laeuter, J., Beck, M.; <maciej.rosolowski@imise.uni-leipzig.de>
Laeuter, J., Glimm, E., Eszlinger, M. 2005, Search for relevant sets of variables in a high-dimensional setup keeping the familywise error rate. Statistica Neerlandica Vol. 59, No. 3, pp. 298-312.
Laeuter, J. 2007, Hochdimensionale Statistik, Anwendung in der Genexpressionsanalyse, (German) (english title: High Dimensional Statistics, Application to Gene Expression Analysis ). Leipzig Bioinformatics Working Paper, No. 15. http://www.izbi.uni-leipzig.de/izbi/Working%20Paper/2007/WP_15_Statistik.pdf.
Laeuter, J., Horn, F., Rosolowski, M., Glimm, E. 2009, High-dimensional data analysis: Selection of variables and representation of results - Application to gene expression. Biometrical Journal
# generate data with two subsets coming # from one-factor models set.seed(100) y <- c(rep(0, 10), rep(1,10)) mu2 <- 0.5 # difference between the means of the two groups of samples f1 <- matrix(rep(c(rnorm(10), rnorm(10, mean = mu2)), 10), ncol = 10) f2 <- matrix( rep(c(rnorm(5), rnorm(5, mean = mu2), rnorm(5), rnorm(5, mean = mu2)), 10), ncol = 10 ) x <- matrix(rnorm(20*100), nrow=20) x[,1:10] <- x[,1:10] + f1 x[,11:20] <- x[,11:20] + f2 colnames(x) <- paste("v",1:100, sep="") # looking for significant single variables (e.g., differentially expressed genes) res1 <- SETGEN(x = x, y = y, resp.type = "Two class unpaired", pivdom = FALSE, r2min = 1, nres = 1000) res1$overview[1:10,] # looking for significant sets of correlated variables (genes) res2 <- SETGEN(x = x, y = y, resp.type = "Two class unpaired", pivdom = FALSE, r2min = 0.5, nres = 1000) res2$overview[1:10,] # the same using SETGEN.generate.sets first genesets <- SETGEN.generate.sets(X1 = x, r2min1 = 0.5, pivdom = FALSE) res3 <- SETGEN(x = x, y = y, grpdata = genesets, resp.type = "Two class unpaired", nres = 1000) res3$overview[1:10,]