I have a large number of matrix, calls train$X which is a binary data with 1 and 0
I want to extract and make another two lists which contains 1 as list1 and 0 as list2 by using for loop
my R code is not working
X <- c(0,1,0,1,0,1)
Y <- c(1,1,1,1,1,0)
train<- as.matrix (cbind(X,Y))
list1 <- list()
list2 <- list()
for(i in 1:length(train)) {
if(train[i]== 1)
list1 = train[i]
else
list2 = train[i]
}
Therefore
list1 contains (1,1,1,1,1,1,1)
list2 contains (0,0,0,0)
I don't think the best way is a loop, it isn't a list that you want but a vector object, I propose to use == on a matrix as following :
X <- c(0,1,0,1,0,1)
Y <- c(1,1,1,1,1,0)
train <- cbind(X,Y)
v1 <- train[train == 1]
v2 <- train[train == 0]
If you really want a loop :
v1 <- c() # It is not necessary
v2 <- c() # It is not necessary
for(i in 1:nrow(train)){
for(j in 1:ncol(train)){
if(train[i, j] == 1){
v1 <- c(v1, train[i, j])
}else{
v2 <- c(v2, train[i, j])
}
}
}
A last solution but not least :
v1 <- rep(1, sum(train == 1))
v2 <- rep(0, sum(train == 0))
So their is a lot of differents solutions to do it.
Related
Consider the following matrix:
testMat <- matrix(c(1,2,1,3,
3,2,3,3,
1,3,1,3,
2,3,3,3,
3,3,3,3,
3,2,3,1), byrow = T, ncol = 4)
and following condition:
cond <- c(0,0) # binary
Problem: If cond[1] is 0 and there is a 1 in either the first or third column, the corresponding rows will be removed. Similarly if cond[2] is 0 and there is a 1 in either the second or fourth column, the corresponding rows will be removed. For example the new matrix will be:
newMat <- testMat[-c(1,3,6),] # for cond <- c(0,0)
newMat <- testMat[-c(1,3),] # for cond <- c(0,1)
newMat <- testMat[-c(6),] # for cond <- c(1,0)
newMat <- testMat # for cond <- c(1,1)
I tried in following way which is both wrong and clumsy.
newMat <- testMat[-(cond[1] == 0 & testMat[,c(1,3)] == 1),]
newMat <- newMat[-(cond[2] == 0 & newMat[,c(2,4)] == 1),]
Can you help to find a base R solution?
This is ugly, but seems to work:
(generalized for length of cond, assuming that the matrix has length(cond)*2 columns)
keepRows <- apply(testMat, 1,
function(rw, cond) {
logicrow <- vector(length=2)
for (b in 1:length(cond)) {
logicrow[b] <- ifelse(cond[b]==0, all(rw[b]!=1) & all(rw[length(cond)+b]!=1), TRUE)
}
all(logicrow)
}, cond = cond)
newMat <- testMat[keepRows, ]
(edited according to comment)
Assuming 1) cond can be of arbitrary length, 2) testMat has an even number of columns, and 3) the rule is to look at the i-th and (i+2)-th column of testMat
cond=c(0,0)
unlist(
sapply(
1:length(cond),
function(i){
j=rowSums(testMat[,c(i,i+2)]==1)
if (cond[i]==0 & sum(j)>0) which(j>0) else nrow(testMat)+1
}
)
)
[1] 1 3 6
which returns the rows which satisfy your conditions, you can then remove these
testMat[-.Last.value,]
My R-Code:
l <- list()
for(i in 1:5){
n <- 1
mat <- matrix(0L,500,10)
repeat{
a <- rnorm(10)
b <- rnorm(10)
c <- a+b
mat[n,] <- c
mat <- mat[mat[,10] >= 0 + (i/10) & mat[,1] >= 0 +(i/10),]
n <- n +1
if(mat[500,] != 0){
break
}
}
l[[i]] <- mat
}
l
I would like to get 5 Matrices, which are stored in a list. Each matrix should have exactly 500 rows and should not have negative values in its rows at position [,1] or [,10].
I tried to build a repeat loop:
Calculate Vector
Store vector in matrix
delete if condition is met
repeat if there arent 500 rows
Unfortunately, there's something wrong and it doesn't work. What can I do? Thanks!
If you add an if-clause that tests your condition before adding the line to your matrix, it should work:
l <- list()
for(i in 1:5){
n <- 1
mat <- matrix(0L,500,10)
repeat{
a <- rnorm(10)
b <- rnorm(10)
c <- a+b
if(!any(c[c(1,10)] < 0 + i/10)){
mat[n,] <- c
n <- n +1
}
if(n==501){
break
}
}
l[[i]] <- mat
}
I'm having trouble with indexing a function correctly so that it would return the correct values. The function is only applied when an if condition is met, and the ifelse statement is in a for loop.
I've simplified my function and dataset to make the example reproducible here - the way I wrote it currently is not very efficient, but I just want to make sure the indexing works so that I can try applying this back to my more complicated dataset, then work on optimizing later on.
First I start with some data:
var1 <- seq(100, 500, 100)
matrix1 <- matrix(1:12, ncol=2)
matrix2 <- matrix(c(1,2,5,6,8,9,11,12,14,15,17,20), ncol=2)
matrix3 <- matrix(seq(1,34,3), ncol=2)
list1 <- list(matrix1, matrix2, matrix3)
I have a function that takes in a data frame like below:
func1 <- function(df1) {
data1 <- df1[,1]
data2 <- df1[,2]
data3 <- vector()
data3[i] <- sum(data1, data2)
return(data3)
}
And my for loop looks like below:
for (i in (1:3)) {
vec1 <- var1
vec2 <- diff(unlist(list1[[i]])[,1])
df1 <- matrix()
df1 <- cbind(vec1, vec2)
len <- length(which(vec2<3))
if (len>1) {
func1(df1)
}
else data3 <- NA
print(data3)
}
I'm trying to only apply func1 if my df1 passes the length test, and if it doesn't meet the length requirement, I want to give it an NA value.
When i=1, len=5 and data3=1505; when i=2, len=4 and data3=1508; when i=3, len=0 so data3=NA. In short, the result I want to return from the for loop is data3 = c(1505, 1508, NA).
However, I can't get that result now, and since individually running for i=1 i=2 and i=3 works, I'm suspecting that I'm having an index issue (probably inside func1 I suppose) but can't figure it out. Can anyone give me some suggestions?
I think you had a typo in your function func1, which should be
func1 <- function(df1) {
data1 <- df1[,1]
data2 <- df1[,2]
data3 <- vector()
data3 <- sum(data1, data2)
return(data3)
}
data3 <- c()
for (i in (1:3)) {
vec1 <- var1
vec2 <- diff(unlist(list1[[i]])[,1])
df1 <- matrix()
df1 <- cbind(vec1, vec2)
len <- length(which(vec2<3))
if (len>1) {
r <- func1(df1)
}
else r <- NA
data3 <- c(data3,r)
}
After fixing that, you will get the output
> data3
[1] 1505 1508 NA
I adjusted the loop syntax a bit:
data3 <- vector(length = 3, mode = "numeric")
for (i in 1:3) {
vec1 <- var1
vec2 <- diff(unlist(list1[[i]])[,1])
df1 <- matrix()
df1 <- cbind(vec1, vec2)
len <- length(which(vec2<3))
if (len>1) {
v <- func1(df1)
v <- v[!is.na(v)] #when i = 2, func1 returns a vector of length 2
data3[[i]] <- v
}
else {
data3[[i]] <- NA
}
}
data3
[1] 1505 1508 NA
I want to create two lists of data frames in a for loop, but I cannot use assign:
dat <- data.frame(name = c(rep("a", 10), rep("b", 13)),
x = c(1,3,4,4,5,3,7,6,5,7,8,6,4,3,9,1,2,3,5,4,6,3,1),
y = c(1.1,3.2,4.3,4.1,5.5,3.7,7.2,6.2,5.9,7.3,8.6,6.3,4.2,3.6,9.7,1.1,2.3,3.2,5.7,4.8,6.5,3.3,1.2))
a <- dat[dat$name == "a",]
b <- dat[dat$name == "b",]
samp <- vector(mode = "list", length = 100)
h <- list(a,b)
hname <- c("a", "b")
for (j in 1:length(h)) {
for (i in 1:100) {
samp[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
assign(paste("samp", hname[j], sep="_"), samp[[i]])
}
}
Instead of lists named samp_a and samp_b I get vectors which contain the result of the 100th sample. I want to get a list samp_a and samp_b, which have all the different samples for dat[dat$name == a,] and dat[dat$name == a,].
How could I do this?
How about creating two different lists and avoiding using assign:
Option 1:
# create empty list
samp_a <-list()
samp_b <- list()
for (j in seq(h)) {
# fill samp_a list
if(j == 1){
for (i in 1:100) {
samp_a[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
}
# fill samp_b list
} else if(j == 2){
for (i in 1:100) {
samp_b[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
}
}
}
You could use assign too, shorter answer:
Option 2:
for (j in seq(hname)) {
l = list()
for (i in 1:100) {
l[[i]] <- sample(1:nrow(h[[j]]), nrow(h[[j]])*0.5)
}
assign(paste0('samp_', hname[j]), l)
rm(l)
}
You could easily use an lapply for this using the rep function. Unless you want a random x, paired with a random y. This will maintain the existing paired order.
dat <- data.frame(name = c(rep("a", 10), rep("b", 13)),
x = c(1,3,4,4,5,3,7,6,5,7,8,6,4,3,9,1,2,3,5,4,6,3,1),
y = c(1.1,3.2,4.3,4.1,5.5,3.7,7.2,6.2,5.9,7.3,8.6,6.3,4.2,3.6,9.7,1.1,2.3,3.2,5.7,4.8,6.5,3.3,1.2))
a <- dat[dat$name == "a",]
b <- dat[dat$name == "b",]
h <- list(a,b)
hname <- c("a", "b")
testfunc <- function(df) {
#df[sample(nrow(df), nrow(df)*0.5), ] #gives you the values in your data frame
sample(nrow(df), nrow(df)*0.5) # just gives you the indices
}
lapply(h, testfunc) # This gives you the standard lapply format, and only gives one a, and one b
samp <- lapply(rep(h, 100), testfunc) # This shows you how to replicate the function n times, giving you 100 a and 100 b data.frames in a list
samp_a <- samp[c(TRUE, FALSE)] # Applies a repeating T/F vector, selecting the odd data.frames, which in this case are the `a` frames.
samp_b <- samp[c(FALSE, TRUE)] # And here, the even data.frames, which are the `b` frames.
I have a large list that stored measurements (a product of other lapply() runs). I now want to gather these measurements and calculate median/mean/sd etc but I don't know how to access them. The structure of this list is like this:
foo[[i]][[j]][[k]][[1]]
foo[[i]][[j]][[k]][[2]]$bar
I can't figure out a function that would return e.g. mean of $bar (but not of $x) and keep relation the values of the indices i,j,k.
A sample list can be generated with the following R code:
library(purrr)
metrics <- function(y){
tt10r <- median(y)
list(y, flatten(list(bar = tt10r)))
}
example_list <- list()
for (i in 1:10)
{
v <- list()
for (j in 1:10)
{
w <- 1:10
v[j] <- list(w)
}
example_list[[i]] <- v
}
foo <- list()
for (i in 1:length(example_list))
{
u <- list()
values <- list()
for (j in 1:length(example_list[[i]]))
{
u[[j]] <- lapply(example_list[[i]][[j]], function(x) mean(x))
values[[j]] <- lapply(u[[j]], function(x) metrics(x))
}
foo[[i]] <- values
}
The following code works nicely, but I am not sure if it is efficient (loops!). Gives the anticipated result:
final <- matrix(nrow = tail(cumsum(unlist(lapply(foo, function(x) lengths(x) -2))), n=1), ncol = 3)
final <- data.frame(final)
j=1
i=1
all_js <- c(0, cumsum(lengths(foo)))
starts <- c(0, cumsum(unlist(lapply(foo, function(x) lengths(x) -2)))) + 1
ends <- c(0, cumsum(unlist(lapply(foo, function(x) lengths(x) -2))))
for (i in 1:length(foo))
{
a <- foo[[i]]
for (j in 1:length(a))
{
b <- a[[j]]
data <- unlist(lapply(lapply(b[1], '[', 2), '[[', 1))
for (k in 2:c(length(b)-2))
{
data <- rbind(data,unlist(lapply(lapply(b[k], '[', 2), '[[', 1)))
}
row.names(data) <- NULL
colnames(final) <- c("i", "j", colnames(data))
first <- starts[all_js[i] + j]
last <- ends[all_js[i] + j+1]
final[first:last,] <- data.frame(cbind(i = i, j = j, data))
}
}