Related
Consider the following matrix:
testMat <- matrix(c(1,2,1,3,
3,2,3,3,
1,3,1,3,
2,3,3,3,
3,3,3,3,
3,2,3,1), byrow = T, ncol = 4)
and following condition:
cond <- c(0,0) # binary
Problem: If cond[1] is 0 and there is a 1 in either the first or third column, the corresponding rows will be removed. Similarly if cond[2] is 0 and there is a 1 in either the second or fourth column, the corresponding rows will be removed. For example the new matrix will be:
newMat <- testMat[-c(1,3,6),] # for cond <- c(0,0)
newMat <- testMat[-c(1,3),] # for cond <- c(0,1)
newMat <- testMat[-c(6),] # for cond <- c(1,0)
newMat <- testMat # for cond <- c(1,1)
I tried in following way which is both wrong and clumsy.
newMat <- testMat[-(cond[1] == 0 & testMat[,c(1,3)] == 1),]
newMat <- newMat[-(cond[2] == 0 & newMat[,c(2,4)] == 1),]
Can you help to find a base R solution?
This is ugly, but seems to work:
(generalized for length of cond, assuming that the matrix has length(cond)*2 columns)
keepRows <- apply(testMat, 1,
function(rw, cond) {
logicrow <- vector(length=2)
for (b in 1:length(cond)) {
logicrow[b] <- ifelse(cond[b]==0, all(rw[b]!=1) & all(rw[length(cond)+b]!=1), TRUE)
}
all(logicrow)
}, cond = cond)
newMat <- testMat[keepRows, ]
(edited according to comment)
Assuming 1) cond can be of arbitrary length, 2) testMat has an even number of columns, and 3) the rule is to look at the i-th and (i+2)-th column of testMat
cond=c(0,0)
unlist(
sapply(
1:length(cond),
function(i){
j=rowSums(testMat[,c(i,i+2)]==1)
if (cond[i]==0 & sum(j)>0) which(j>0) else nrow(testMat)+1
}
)
)
[1] 1 3 6
which returns the rows which satisfy your conditions, you can then remove these
testMat[-.Last.value,]
I am new to R and am trying to vectorize my codes below.
What is a better way to do this? Thanks so much!
*
l_mat <- data.frame(matrix(ncol = 4, nrow = 4))
datax <- data.frame("var1"= c(1,1,1,1), "Var2" = c(2,2,2,2), "Var3"=c(3,3,3,3), "Var4"=c(4,4,4,4))
for (i in 1:4) {
for (j in 1:4) {
if (datax[i, 2] == datax[j, 2]) {
l_mat[i, j] <- 100
} else {
l_mat[i, j] <- 1
}
}
}
*
It can be better done with outer. As we are checking if all the values in the second column against itself, create the logical expression with outer, convert it to a numeric index and then replace the values with 1 or 100
out <- 1 + (outer(datax[,2], datax[,2], `==`))
out[] <- c(1, 100)[out]
Or in a single line
ifelse(outer(datax[,2], datax[,2], `==`), 100, 1)
Or use a variation with pmax and outer
do.call(pmax, list(outer(datax[,2], datax[,2], `==`) * 100, 1))
I want to create two lists of data frames in a for loop, but I cannot use assign:
dat <- data.frame(name = c(rep("a", 10), rep("b", 13)),
x = c(1,3,4,4,5,3,7,6,5,7,8,6,4,3,9,1,2,3,5,4,6,3,1),
y = c(1.1,3.2,4.3,4.1,5.5,3.7,7.2,6.2,5.9,7.3,8.6,6.3,4.2,3.6,9.7,1.1,2.3,3.2,5.7,4.8,6.5,3.3,1.2))
a <- dat[dat$name == "a",]
b <- dat[dat$name == "b",]
samp <- vector(mode = "list", length = 100)
h <- list(a,b)
hname <- c("a", "b")
for (j in 1:length(h)) {
for (i in 1:100) {
samp[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
assign(paste("samp", hname[j], sep="_"), samp[[i]])
}
}
Instead of lists named samp_a and samp_b I get vectors which contain the result of the 100th sample. I want to get a list samp_a and samp_b, which have all the different samples for dat[dat$name == a,] and dat[dat$name == a,].
How could I do this?
How about creating two different lists and avoiding using assign:
Option 1:
# create empty list
samp_a <-list()
samp_b <- list()
for (j in seq(h)) {
# fill samp_a list
if(j == 1){
for (i in 1:100) {
samp_a[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
}
# fill samp_b list
} else if(j == 2){
for (i in 1:100) {
samp_b[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
}
}
}
You could use assign too, shorter answer:
Option 2:
for (j in seq(hname)) {
l = list()
for (i in 1:100) {
l[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
}
assign(paste0('samp_', hname[j]), l)
rm(l)
}
You could easily use an lapply for this using the rep function. Unless you want a random x, paired with a random y. This will maintain the existing paired order.
dat <- data.frame(name = c(rep("a", 10), rep("b", 13)),
x = c(1,3,4,4,5,3,7,6,5,7,8,6,4,3,9,1,2,3,5,4,6,3,1),
y = c(1.1,3.2,4.3,4.1,5.5,3.7,7.2,6.2,5.9,7.3,8.6,6.3,4.2,3.6,9.7,1.1,2.3,3.2,5.7,4.8,6.5,3.3,1.2))
a <- dat[dat$name == "a",]
b <- dat[dat$name == "b",]
h <- list(a,b)
hname <- c("a", "b")
testfunc <- function(df) {
#df[sample(nrow(df), nrow(df)*0.5), ] #gives you the values in your data frame
sample(nrow(df), nrow(df)*0.5) # just gives you the indices
}
lapply(h, testfunc) # This gives you the standard lapply format, and only gives one a, and one b
samp <- lapply(rep(h, 100), testfunc) # This shows you how to replicate the function n times, giving you 100 a and 100 b data.frames in a list
samp_a <- samp[c(TRUE, FALSE)] # Applies a repeating T/F vector, selecting the odd data.frames, which in this case are the `a` frames.
samp_b <- samp[c(FALSE, TRUE)] # And here, the even data.frames, which are the `b` frames.
I have the following triple for loop in R :
res <- data.frame()
for (i in c(1,3,5,7,9,11,13)) {
for (k in c(3,5,7,9,11,13)) {
for (j in 1:41) {
tmp <- chisq.test(matrix(c(counts[j,i],counts[j,(i+1)],counts[j,k],counts[j,(k+1)]),nrow=2,ncol=2,byrow=TRUE))
res<-c(res,c(tmp$p.value))
}}}
Where counts is a 14x41 data frame with genotype counts values (numbers like 600, 400, 240.. you can generate your own with random values in order to reproduce this in your computer).
And i want to save the resulting p.value of the chisq.test in a vector (res, from "results"). For now, the output is the following:
Which is OK, but now I also want it to have three more columns that indicate j,i and k values so then i can track where each p value came from, later. So, the desired output would be:
res
(p value 1) j1 i1 k1
(p value 2) j2 i2 k2
...
So, I modified the loop adding j,i,k to the final line, like this:
res <- data.frame()
for (i in c(1,3,5,7,9,11,13)) {
for (k in c(3,5,7,9,11,13)) {
for (j in 1:41) {
tmp <- chisq.test(matrix(c(counts[j,i],counts[j,(i+1)],counts[j,k],counts[j,(k+1)]),nrow=2,ncol=2,byrow=TRUE))
res<-c(res,c(tmp$p.value,j,i,k))
}}}
And it produces the following result (which I don't like):
Any thoughts? thank you !
You could add a new row to a dataframe instead of concatenating the results together.
Maybe something like this:
res <- NULL
for (i in c(1,3,5,7,9,11,13)) {
for (k in c(3,5,7,9,11,13)) {
for (j in 1:41) {
tmp <- chisq.test(matrix(c(counts[j,i],counts[j,(i+1)],counts[j,k],counts[j,(k+1)]),nrow=2,ncol=2,byrow=TRUE))
res<-rbind(res, data.frame(p.value = tmp$p.value,
i = i,
j = j,
k = k,
stringsAsFactors = FALSE))
}}}
Data:
counts <- matrix(data = sample(1:1000, 574, replace = FALSE),
nrow = 41, ncol = 14)
I have written the code below to generate a matrix containing what is, to me, a fairly complex pattern. In this case I determined that there are 136 rows in the finished matrix by trial and error.
I could write a function to calculate the number of matrix rows in advance, but the function would be a little complex. In this example the number of rows in the matrix = ((4 * 3 + 1) + (3 * 3 + 1) + (2 * 3 + 1) + (1 * 3 + 1)) * 4.
Is there an easy and efficient way to create matrices in R without hard-wiring the number of rows in the matrix statement? In other words, is there an easy way to let R simply add a row to a matrix as needed when using for-loops?
I have presented one solution that employs rbind at each pass through the loops, but that seems a little convoluted and I was wondering if there might be a much easier solution.
Sorry if this question is redundant with an earlier question. I could not locate a similar question using the search feature on this site or using an internet search engine today, although I think I have found a similar question somewhere in the past.
Below are 2 sets of example code, one using rbind and the other where I used trial and error to set nrow=136 in advance.
Thanks for any suggestions.
v1 <- 5
v2 <- 2
v3 <- 2
v4 <- (v1-1)
my.matrix <- matrix(0, nrow=136, ncol=(v1+4) )
i = 1
for(a in 1:v2) {
for(b in 1:v3) {
for(c in 1:v4) {
for(d in (c+1):v1) {
if(d == (c+1)) l.s = 4
else l.s = 3
for(e in 1:l.s) {
my.matrix[i,c] = 1
if(d == (c+1)) my.matrix[i,d] = (e-1)
else my.matrix[i,d] = e
my.matrix[i,(v1+1)] = a
my.matrix[i,(v1+2)] = b
my.matrix[i,(v1+3)] = c
my.matrix[i,(v1+4)] = d
i <- i + 1
}
}
}
}
}
my.matrix2 <- matrix(0, nrow=1, ncol=(v1+4) )
my.matrix3 <- matrix(0, nrow=1, ncol=(v1+4) )
i = 1
for(a in 1:v2) {
for(b in 1:v3) {
for(c in 1:v4) {
for(d in (c+1):v1) {
if(d == (c+1)) l.s = 4
else l.s = 3
for(e in 1:l.s) {
my.matrix2[1,c] = 1
if(d == (c+1)) my.matrix2[1,d] = (e-1)
else my.matrix2[1,d] = e
my.matrix2[1,(v1+1)] = a
my.matrix2[1,(v1+2)] = b
my.matrix2[1,(v1+3)] = c
my.matrix2[1,(v1+4)] = d
i <- i+1
if(i == 2) my.matrix3 <- my.matrix2
else my.matrix3 <- rbind(my.matrix3, my.matrix2)
my.matrix2 <- matrix(0, nrow=1, ncol=(v1+4) )
}
}
}
}
}
all.equal(my.matrix, my.matrix3)
If you have some upper bound on the size of the matrix,
you can create a matrix
large enough to hold all the data
my.matrix <- matrix(0, nrow=v1*v2*v3*v4*4, ncol=(v1+4) )
and truncate it at the end.
my.matrix <- my.matrix[1:(i-1),]
This is the generic form to do it. You can adapt it to your problem
matrix <- NULL
for(...){
...
matrix <- rbind(matriz,vector)
}
where vector contains the row elements
I stumbled upon this solution today: convert the matrix to a data.frame. As new rows are needed by the for-loop those rows are automatically added to the data.frame. Then you can convert the data.frame back to a matrix at the end if you want. I am not sure whether this constitutes something similar to iterative use of rbind. Perhaps it becomes very slow with large data.frames. I do not know.
my.data <- matrix(0, ncol = 3, nrow = 2)
my.data <- as.data.frame(my.data)
j <- 1
for(i1 in 0:2) {
for(i2 in 0:2) {
for(i3 in 0:2) {
my.data[j,1] <- i1
my.data[j,2] <- i2
my.data[j,3] <- i3
j <- j + 1
}
}
}
my.data
my.data <- as.matrix(my.data)
dim(my.data)
class(my.data)
EDIT: July 27, 2015
You can also delete the first matrix statement, create an empty data.frame then convert the data.frame to a matrix at the end:
my.data <- data.frame(NULL,NULL,NULL)
j <- 1
for(i1 in 0:2) {
for(i2 in 0:2) {
for(i3 in 0:2) {
my.data[j,1] <- i1
my.data[j,2] <- i2
my.data[j,3] <- i3
j <- j + 1
}
}
}
my.data
my.data <- as.matrix(my.data)
dim(my.data)
class(my.data)