Split data frames into K fold after randomizing in R - r

I have a data frame with 6 rows. I want to split it into 5 folds, so ultimately there would be 4 data frames with 1 element each and the last data frame should have 2 elements. I have tried the following code. But it doesnot help. I am new to R. Any help is appreciated.
a = matrix(1:12,6,2)
d <- split(a,rep(1:6,each=4))
Warning message:
In split.default(a, rep(1:6, each = 4)) :
data length is not a multiple of split variable

split expects as vector with groups as it's second argument. In your case
ngroups <- 5
floor(seq(1, ngroups, length.out = nrow(a)))
ans also split doesn't work that well with matrices, so first convert to data.frame:
split(as.data.frame(a), floor(seq(1, ngroups, length.out = nrow(a))))
Edit: Following a suggestion from #IShouldByABoat, the following also works for matrix objects:
split.as.data.frame(a, floor(seq(1, ngroups, length.out = nrow(a))))

Not sure about the "1 element each" aspect which seems to be problematic with R's version of matrix objects, but here is a way to split into the elements of a 12 element matrix that satify the requirements:
split( matrix(1:12,ncol=2), findInterval(1:6, c(sort(sample(1:6,5)),Inf)))
$`1`
[1] 1 7
$`2`
[1] 2 3 8 9
$`3`
[1] 4 10
$`4`
[1] 5 11
$`5`
[1] 6 12
If you wanted to fom them back int o two-column matrices:
lapply( split( matrix(1:12,ncol=2), findInterval(1:6, c(sort(sample(1:6,5)),Inf))) ,
matrix, ncol=2)
$`1`
[,1] [,2]
[1,] 1 7
$`2`
[,1] [,2]
[1,] 2 8
$`3`
[,1] [,2]
[1,] 3 9
$`4`
[,1] [,2]
[1,] 4 10
[2,] 5 11
$`5`
[,1] [,2]
[1,] 6 12

I solved a similar problem using the modulo operator on the 1:6 sequence. For your example, try this:
a = matrix(1:12, 6, 2)
d = split(as.data.frame(a), 1:6%%5)
Simple, and it gets the job done.
For splitting into K folds, you might find using the following useful:
nfolds = 5
a = matrix(1:12, 6, 2)
folds = 1:nrow(a)%%nfolds # or sample(1:nrow(a)%%nfolds) if you want to randomize
fold = 1 # which ever fold you want to test with
train = a[folds != fold,]
test = a[folds == fold,]

Related

Access an element of a list in the same manner how you access an element of a matrix

I have a matrix:
mat <- matrix(c(3,9,5,1,-2,8), nrow = 2)
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8
I have a list:
lst <- as.list(data.frame(matrix(c(3,9,5,1,-2,8), nrow = 2)))
$X1
[1] 3 9
$X2
[1] 5 1
$X3
[1] -2 8
I can access my matrix by mat[i,j]
I can access my list lst[[c(i,j)]]
But if in a matrix if I do mat[1,2] I get a 5. If I use same numbers in a list lst[[c(1,2)]] I get 9.
Is there a way I can get the same numbers when I access a list? Maybe manipulate the list in certain manner? When I use lst[[c(1,2)]] I want to get 5 instead of 9.I want to get the same numbers I get when using mat[i,j].
You can try
> list2DF(lst)[1, 2]
[1] 5
You can use transpose() from purrr to transpose a list.
lst2 <- purrr::transpose(lst)
lst2[[c(1,2)]]
# [1] 5

How to vectorize this operation?

I have a n x 3 x m array, call it I. It contains 3 columns, n rows (say n=10), and m slices. I have a computation that must be done to replace the third column in each slice based on the other 2 columns in the slice.
I've written a function insertNewRows(I[,,simIndex]) that takes a given slice and replaces the third column. The following for-loop does what I want, but it's slow. Is there a way to speed this up by using one of the apply functions? I cannot figure out how to get them to work in the way I'd like.
for(simIndex in 1:m){
I[,, simIndex] = insertNewRows(I[,,simIndex])
}
I can provide more details on insertNewRows if needed, but the short version is that it takes a probability based on the columns I[,1:2, simIndex] of a given slice of the array, and generates a binomial RV based on the probability.
It seems like one of the apply functions should work just by using
I = apply(FUN = insertNewRows, MARGIN = c(1,2,3)) but that just produces gibberish..?
Thank you in advance!
IK
The question has not defined the input nor the transformation nor the result so we can't really answer it but here is an example of adding a row of ones to to a[,,i] for each i so maybe that will suggest how you could solve the problem yourself.
This is how you could use sapply, apply, plyr::aaply, reshaping using matrix/aperm and abind::abind.
# input array and function
a <- array(1:24, 2:4)
f <- function(x) rbind(x, 1) # append a row of 1's
aa <- array(sapply(1:dim(a)[3], function(i) f(a[,,i])), dim(a) + c(1,0,0))
aa2 <- array(apply(a, 3, f), dim(a) + c(1,0,0))
aa3 <- aperm(plyr::aaply(a, 3, f), c(2, 3, 1))
aa4 <- array(rbind(matrix(a, dim(a)[1]), 1), dim(a) + c(1,0,0))
aa5 <- abind::abind(a, array(1, dim(a)[2:3]), along = 1)
dimnames(aa3) <- dimnames(aa5) <- NULL
sapply(list(aa2, aa3, aa4, aa5), identical, aa)
## [1] TRUE TRUE TRUE TRUE
aa[,,1]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
## [3,] 1 1 1
aa[,,2]
## [,1] [,2] [,3]
## [1,] 7 9 11
## [2,] 8 10 12
## [3,] 1 1 1
aa[,,3]
## [,1] [,2] [,3]
## [1,] 13 15 17
## [2,] 14 16 18
## [3,] 1 1 1
aa[,,4]
## [,1] [,2] [,3]
## [1,] 19 21 23
## [2,] 20 22 24
## [3,] 1 1 1

How to use data from every row to create a matrix by loop

I have a data frame like
df<-data.frame(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9),d=c(10,11,12))
a b c d
1 1 4 7 10
2 2 5 8 11
3 3 6 9 12
I want to use every row to create 3 (nrow(df)) 2*2 matrixes. 1st use 1,4,7,10, 2nd use 2,5,8,11, 3rd use 3,6,9,12. So that I can get 3 matrixes. Thank you.
We can use split to split up the dataset into list and use matrix
lapply(split.default(as.matrix(df), row(df)), matrix, 2)
If we need the matrix columns to be 1, 7 followed by 4, 10, use the byrow=TRUE
lapply(split.default(as.matrix(df), row(df)), matrix, 2, byrow=TRUE)
Or use apply with MARGIN = 1 and wrap it with list to get a list output
do.call("c", apply(df, 1, function(x) list(matrix(x, ncol=2))))
If we need a for loop, preassign a as a list with length equal to the number of rows of 'df'
a <- vector("list", nrow(df))
for(i in 1:nrow(df)){ a[[i]] <- matrix(unlist(df[i,]), ncol=2)}
a
Or if it can be stored as array
array(t(df), c(2, 2, 3))
Or using map:
m <- matrix(c(t(df)), ncol = 2, byrow = T)
p <- 2 # number of rows
Map(function(i,j) m[i:j,], seq(1,nrow(m),p), seq(p,nrow(m),p))
# [[1]]
# [,1] [,2]
# [1,] 1 4
# [2,] 7 10
# [[2]]
# [,1] [,2]
# [1,] 2 5
# [2,] 8 11
# [[3]]
# [,1] [,2]
# [1,] 3 6
# [2,] 9 12

Reshape each row of a data.frame to be a matrix in R

I am working with the hand-written zip codes dataset. I have loaded the dataset like this:
digits <- read.table("./zip.train",
quote = "",
comment.char = "",
stringsAsFactors = F)
Then I get only the ones:
ones <- digits[digits$V1 == 1, -1]
Right now, in ones I have 442 rows, with 256 column. I need to transform each row in ones to a 16x16 matrix. I think what I am looking for is a list of 16x16 matrix like the ones in this question:
How to create a list of matrix in R
But I tried with my data and did not work.
At first I tried ones <- apply(ones, 1, matrix, nrow = 16, ncol = 16) but is not working as I thought it was. I also tried lapply with no luck.
An alternative is to just change the dims of your matrix.
Consider the following matrix "M":
M <- matrix(1:12, ncol = 4)
M
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
We are looking to create a three dimensional array from this, so you can specify the dimensions as "row", "column", "third-dimension". However, since the matrix is constructed by column, you first need to transpose it before changing the dimensions.
`dim<-`(t(M), c(2, 2, nrow(M)))
# , , 1
#
# [,1] [,2]
# [1,] 1 7
# [2,] 4 10
#
# , , 2
#
# [,1] [,2]
# [1,] 2 8
# [2,] 5 11
#
# , , 3
#
# [,1] [,2]
# [1,] 3 9
# [2,] 6 12
though there are probably simple ways, you can try with lapply:
ones_matrix <- lapply(1:nrow(ones), function(i){matrix(ones[i, ], nrow=16)})

Create a special matrix with data from two lists in R

i have here a minimal sample data to understand my final matrix:
test <- list( c(1, 2, 3, 4) )
test2 <- list( c(2, 3) )
and my matrix should be:
2 4 6 8
3 6 9 12
it's like a nestes for loop. I go over each row and in each i use the value from it and sum it with column value.
after a few houres I have this:
sapply(2, function(j) lapply(seq_along(test), function(i) test[[i]] * test2[[i]][j]))
it gives the final simulated row two: (param for row is '2' after sapply)
[[1]]
[1] 3 6 9 12
The going over rows could be done with seq_along(test2) but i don't know how to save data after each row ... i was last testing this: .. and fail..
a=matrix(data=0, nrow=2, ncol=4)
lapply(seq_along(test2), function(k) a[k,]<-unlist(sapply(2, function(j) lapply(seq_along(test), function(i) test[[i]] * test2[[i]][j])) ) )
output:
[1] 3 6 9 12
Later on, i would like to have more vectors in input lists and repeat the hole action descriped on top.
We can use outer after unlisting the list
t(outer(unlist(test), unlist(test2)))
# [,1] [,2] [,3] [,4]
#[1,] 2 4 6 8
#[2,] 3 6 9 12
You mean matrix multiplication? Quick example:
> t(matrix(unlist(test)) %*% matrix(unlist(test2), nrow = 1))
[,1] [,2] [,3] [,4]
[1,] 2 4 6 8
[2,] 3 6 9 12

Resources