SETGEN.generate.sets {SETGEN}R Documentation

Function to create sets of correlated variables (genes)

Description

For each variable (gene) the function creates a set of variables which correlate with this variable. The usual centered Pearson correlation coefficient is used.

Usage

SETGEN.generate.sets(X1, X2 = NULL, r2min1 = 0.5, r2min2 = NULL, rposonly1 = TRUE, rposonly2 = FALSE, pivdom = TRUE)

Arguments

X1 An n1 by p matrix with variables in columns and samples in rows. The columns (variables) should be named.
X2 NULL or an n2 by p matrix with variables in columns and samples in rows. The names of the variables must be identical with those in X1.
r2min1 Lower bound for a squared correlation between a source variable (center) of a set and any of its other members. The parameter is used for creating sets of variables. The correlation is computed in the matrix X1.
r2min2 Similar as r2min1 but the correlation is computed in X2.
rposonly1 A logical variable. If TRUE then a variable is included in a set only if the correlation between the variable and the source variable of the set is positive. if FALSE then this correlation can be positive or negative. The variable refers to the correlations computed in X1.
rposonly2 Similar as rposonly1 for correlations computed in X2.
pivdom A logical variable. If TRUE then sets of variables are created such that their source variables dominate other members of the sets with respect to their total sum of squares (variance). This setting reduces the number of the created sets. The dominance condition applies only to X1.

Details

For each variable the function creates a set of variables which correlate with the variable. The condition on the correlations can be checked in one data set X1 or simultaneously in two data sets if X2 is also given. The lower thresholds for the squared correlations are specified in r2min1 and r2min2. Whether the correlations are required to be positive is specified in rposonly1 and rposonly2. The returned list of sets can be tested for correlation with a response with the function SETGEN.

Value

A list of the names of the variables belonging to each set. The names of the components of the list are the names of the respective source variables. The list has the same format as returned by SETGEN in the component grpdata.

Author(s)

Rosolowski, M., Laeuter, J., Beck, M.; <maciej.rosolowski@imise.uni-leipzig.de>

References

Laeuter, J., Glimm, E., Eszlinger, M. 2005, Search for relevant sets of variables in a high-dimensional setup keeping the familywise error rate. Statistica Neerlandica Vol. 59, No. 3, pp. 298-312.

Laeuter, J. 2007, Hochdimensionale Statistik, Anwendung in der Genexpressionsanalyse, (German) (english title: High Dimensional Statistics, Application to Gene Expression Analysis ). Leipzig Bioinformatics Working Paper, No. 15. http://www.izbi.uni-leipzig.de/izbi/Working%20Paper/2007/WP_15_Statistik.pdf.

Laeuter, J., Horn, F., Rosolowski, M., Glimm, E. 2009, High-dimensional data analysis: Selection of variables and representation of results - Application to gene expression. Biometrical Journal

Examples

# generate data with two subsets coming
# from one-factor models
set.seed(100)
y <- c(rep(0, 10), rep(1,10))
mu2 <- 0.5  # difference between the means of the two groups of samples
f1 <- matrix(rep(c(rnorm(10), rnorm(10, mean = mu2)), 10), ncol = 10)
f2 <- matrix( rep(c(rnorm(5), rnorm(5, mean = mu2), rnorm(5), rnorm(5, mean = mu2)), 10), ncol = 10 )
x <- matrix(rnorm(20*100), nrow=20)
x[,1:10] <- x[,1:10] + f1
x[,11:20] <- x[,11:20] + f2  
colnames(x) <- paste("v",1:100, sep="")

# looking for significant single variables (e.g., differentially expressed genes)
res1 <- SETGEN(x = x, y = y, resp.type = "Two class unpaired", pivdom = FALSE, r2min = 1, nres = 1000)
res1$overview[1:10,]

# looking for significant sets of correlated variables (genes)
res2 <- SETGEN(x = x, y = y, resp.type = "Two class unpaired", pivdom = FALSE, r2min = 0.5, nres = 1000)
res2$overview[1:10,]

# the same using SETGEN.generate.sets first
genesets <- SETGEN.generate.sets(X1 = x, r2min1 = 0.5, pivdom = FALSE)
res3 <- SETGEN(x = x, y = y, grpdata = genesets, resp.type = "Two class unpaired", nres = 1000)
res3$overview[1:10,]

[Package SETGEN version 0.1 Index]