Shuffle Chunks of a Data Frame in R

Shuffle Chunks of a Data Frame in R - r

I have a data frame of 90 rows and 90 columns.
My goal is to divide this data frame into 9 pieces, each with 30 rows and 30 columns. Then, I would like to randomly shuffle these 9 pieces for a new data frame.
I'm not sure how to approach this as I'm fairly new in R.
Any help is appreciated!

Use sample on a list of vectors, then unlist and subset:
If I understand rightly, you're trying to do something a bit like one of those 3*3 sliding picture puzzles, right? Where you split the df into chunks of 30 and then 'shuffle' those chunks. This will do that:
### Create a 90 by 90 data frame
set.seed(2)
df <- as.data.frame(array(runif(90*90), dim = c(90,90)))
### Function to randomise subsetting
rand_dims <- function(){
myDims <- list(1:30, 31:60, 61:90)
unlist(sample(myDims))
}
### Shuffle 'chunks' of df
df[rand_dims(), rand_dims()] -> shuffled
### Checking the leading row/col names
### Shows we've succeeded
colnames(shuffled)[1:5]
#> [1] "V61" "V62" "V63" "V64" "V65"
rownames(shuffled)[1:5]
#> [1] "31" "32" "33" "34" "35"
Created on 2021-01-21 by the reprex package (v0.3.0)

Perhaps you can try the following code using split + sample, which helps shuffle the chunks
set.seed(1)
mat <- matrix(data = rnorm(90 * 90), ncol = 90, nrow = 90)
rrnd <- split(sample(seq(nrow(mat))), gl(3, nrow(mat) / 3))
crnd <- split(sample(seq(ncol(mat))), gl(3, ncol(mat) / 3))
mat.shuffle <- c()
for (i in rrnd) {
for (j in crnd) {
mat.shuffle <- c(mat.shuffle, list(mat[i, j]))
}
}
Note that mat.shuffle is a list of 9 chunks. If you want to transform back to the matrix of size 90x90, you can run the code below afterwards
out <- do.call(
rbind,
lapply(
split(
mat.shuffle,
gl(3, length(mat.shuffle) / 3)
),
function(x) do.call(cbind, x)
)
)
and you will see
> str(out)
num [1:90, 1:90] -0.639 -0.984 -0.108 -0.939 1.743 ...
> str(mat.shuffle)
List of 9
$ : num [1:30, 1:30] -0.639 -0.984 -0.108 -0.939 1.743 ...
$ : num [1:30, 1:30] -0.718 -0.154 -0.29 -0.796 1.7 ...
$ : num [1:30, 1:30] -0.886 0.887 -0.381 0.371 0.327 ...
$ : num [1:30, 1:30] -0.4555 -0.901 0.0312 -3.0223 0.8688 ...
$ : num [1:30, 1:30] 0.705 1.559 -1.278 -0.676 1.933 ...
$ : num [1:30, 1:30] -0.1916 -0.98649 -1.65769 0.00104 -0.19544 ...
$ : num [1:30, 1:30] 1.668 -0.869 1.469 -0.222 -1.869 ...
$ : num [1:30, 1:30] 0.9931 0.4116 1.995 -0.4456 -0.0692 ...
$ : num [1:30, 1:30] -1.462 -0.206 1.197 -1.611 -1.708 ...

Related

how to partition the data with specific row size

Say I have the following matrix:
X = matrix(rnorm(4000), nrow=400, ncol=10)
size = c(80,80,79,80,81)
I want to partition the matrix row-wise according to the above size in a concise way. To illustrate,
x1 = X[1:80,]
x2 = X[81:160,]
.
.
x5 = X[320:400,]
I usually have different sizes and different matrices, so I won't be able to do this manually every time. Your help is very much appreciated.

Try the code below
e <- cumsum(size)
b <- c(1, head(e + 1, -1))
out <- Map(function(rb, re) X[rb:re, ], b, e)
and you will see
> str(out)
List of 5
$ : num [1:80, 1:10] 0.253 -0.368 0.804 -0.603 -0.119 ...
$ : num [1:80, 1:10] 0.467 -0.743 -0.401 1.48 0.853 ...
$ : num [1:79, 1:10] -1.123 -0.873 -1.039 -0.247 -0.774 ...
$ : num [1:80, 1:10] -1.409 -0.683 -0.514 0.485 -0.347 ...
$ : num [1:81, 1:10] 0.58 0.529 -0.803 0.49 -0.847 ...
Furthermore, if you want to create variables, try
list2env(setNames(out,paste0("x",seq_along(out))),envir = .GlobalEnv)

You can get start and end points using cumsum and use Map to subset them and create list of matrices.
X = matrix(rnorm(4000), nrow=400, ncol=10)
size = c(80,80,79,80,81)
val <- cumsum(size)
result <- Map(function(x, y) X[x:y, ], c(1, val[-length(val)] + 1), val)
where c(1, val[-length(val)] + 1) creates the starting row numbers and val are the ending ones.
c(1, val[-length(val)] + 1)
#[1] 1 81 161 240 320
val
#[1] 80 160 239 319 400

Using map2
library(purrr)
size1 <- cumsum(size)
map2(size1, c(1, size1[-length(size1)] + 1), ~ X[.x:.y,])

Using rollapply to output to lists of lists

I would like to use rollapply or rollapplyr to apply the modwt function to my time series data.
I'm familiar with how rollapply/r works but I need some help setting up the output so that I can correctly store my results when using rollapply.
The modwt function in the waveslim package takes a time series and decomposes it into J levels, for my particular problem J = 4 which means I will have 4 sets of coefficients from my single time series stored in a list of 5. Of this list I am only concerned with d1,d2,d3 & d4.
The output of the modwt function looks as follows
> str(ar1.modwt)
List of 5
$ d1: num [1:200] -0.223 -0.12 0.438 -0.275 0.21 ...
$ d2: num [1:200] 0.1848 -0.4699 -1.183 -0.9698 -0.0937 ...
$ d3: num [1:200] 0.5912 0.6997 0.5416 0.0742 -0.4989 ...
$ d4: num [1:200] 1.78 1.86 1.85 1.78 1.65 ...
$ s4: num [1:200] 4.64 4.42 4.19 3.94 3.71 ...
- attr(*, "class")= chr "modwt"
- attr(*, "wavelet")= chr "la8"
- attr(*, "boundary")= chr "periodic"
In the example above I have applied the modwt function to the full length time series of length 200 but I wish to apply it to a small rolling window of 30 using rollapply.
I have already tried the following but the output is a large matrix and I cannot easily identify which values belong to d1,d2,d3 or d4
roller <- rollapplyr(ar1, 30,FUN=modwt,wf="la8",n.levels=4,boundary="periodic")
The output of this is a large matrix with the following structure:
> str(roller)
List of 855
$ : num [1:30] 0.117 -0.138 0.199 -1.267 1.872 ...
$ : num [1:30] -0.171 0.453 -0.504 -0.189 0.849 ...
$ : num [1:30] 0.438 -0.3868 0.1618 -0.0973 -0.0247 ...
$ : num [1:30] -0.418 0.407 0.639 -2.013 1.349 ...
...lots of rows omitted...
$ : num [1:30] 0.307 -0.658 -0.105 1.128 -0.978 ...
[list output truncated]
- attr(*, "dim")= int [1:2] 171 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "d1" "d2" "d3" "d4" ...
How can I set up a variable such that it will store the (200-30)+1 lists with lists within this for each of the scales d1,d2,d3 and d4?
For a reproducible example please use the following:
library(waveslim)
data(ar1)
ar1.modwt <- modwt(ar1, "la8", 4)

Define modwt2 which invokes modwt, takes the first 4 components and strings them out into a numeric vector. Then use rollapplyr with that giving rollr where each row of rollr is the result of one call to modwt2. Finally, reshape each row of rollr into a separate matrix and create a list, L, of those matrices:
modwt2 <- function(...) unlist(head(modwt(...), 4))
rollr <- rollapplyr(ar1, 30, FUN = modwt2, wf = "la8", n.levels = 4, boundary = "periodic")
L <- lapply(1:nrow(rollr), function(i) matrix(rollr[i,], , 4))
If a 30 x 4 x 171 array is desired then the following will simplify it into a 3d array:
simplify2array(L)
or as a list of lists:
lapply(L, function(x) as.list(as.data.frame(x)))
2) This is an alternate solution that just uses lapply directly and returns a list each of whose components is the list consisting of d1, d2, d3 and d4.
lapply(1:(200-30+1), function(i, ...) head(modwt(ar1[seq(i, length = 30)], ...), 4),
wf = "la8", n.levels = 4, boundary = "periodic")
Updates: Code improvements, expand (1) and add (2).

How to split list at every 10th item in R?

I have a list of 100 items.
I want to split it after each 10th item in Code 1.
Code 2 is about a list of two former lists and splitting it to 20 lists of 10 items each.
Code 1
Expected output: ten lists of 10 items.
A <- 100
a <- rnorm(A) # [1:100]
n <- 10
str(a)
# Not resulting in equal size of chunks with vectors so reject
# http://stackoverflow.com/a/3321659/54964
#d <- split(d, ceiling(seq_along(d)/(length(d)/n)))
# Works for vectors but not with lists
# http://stackoverflow.com/a/16275428/54964
#d <- function(d,n) split(d, cut(seq_along(d), n, labels = FALSE))
str(d)
Test code 2
Input: a list of two lists
aa <- list(a, rnorm(a))
Expected output: 20 lists of 10 item size
Testing Loki's answer
segmentLists <- function(A, segmentSize) {
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
}
segmentLists(aa, 10)
Output: loop going on, never stopping
OS: Debian 8.5
R: 3.3.1

you can use lapply.
aa <- list(a, rnorm(a))
aa
n <- 10
x <- lapply(aa, function(x) split(unlist(x), cut(seq_along(unlist(x)), n, labels = F)))
y <- unlist(x, recursive = F)
str(y)
# List of 20
# $ 1 : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ 2 : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ 3 : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ 4 : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ 5 : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
# $ 6 : num [1:10] -0.115 0.164 -0.365 -1.827 -2.036 ...
...
length(y)
# [1] 20
to remove the names of the list elements in y ($ 1, $ 2 etc.) you can use unname()
str(unname(y))
# List of 20
# $ : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
...
Using a function, you have to return res at the end of the function.
segmentLists <- function(A, segmentSize)
{
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
res <- unname(res)
res
}

apply create columns function to a list r

I am new in using apply and functions together and I am stuck and frustrated. I have 2 different list of data frames that I need to add certain number of columns to the first one when a condition is fulfill related to the second one. Below this is the structure of the first list that has one data frame for any station and every df has 2 or more columns with each pressure:
> str(KDzlambdaEG)
List of 3
$ 176:'data.frame': 301 obs. of 3 variables:
..$ 0 : num [1:301] 0.186 0.182 0.18 0.181 0.177 ...
..$ 5 : num [1:301] 0.127 0.127 0.127 0.127 0.127 ...
..$ 20: num [1:301] 0.245 0.241 0.239 0.236 0.236 ...
$ 177:'data.frame': 301 obs. of 2 variables:
..$ 0 : num [1:301] 0.132 0.132 0.132 0.13 0.13 ...
..$ 25: num [1:301] 0.09 0.092 0.0902 0.0896 0.0896 ...
$ 199:'data.frame': 301 obs. of 2 variables:
..$ 0 : num [1:301] 0.181 0.182 0.181 0.182 0.179 ...
..$ 10: num [1:301] 0.186 0.186 0.185 0.183 0.184 ...
On the other hand I have the second list that have the number of columns that I need to add after every column on each data frame of the first list :
> str(dif)
List of 3
[[176]]
[1] 4 15 28
[[177]]
[1] 24 67
[[199]]
[1] 9 53
I´ve tried tonnes of things even this, using the append_col function that appear in:
How to add a new column between other dataframe columns?
for (i in 1:length(dif)){
A<-lapply(KDzlambdaEG,append_col,rep(list(NA),dif[[i]][1]),after=1)
}
but nothing seems to work so far... I have searched for answers here but its difficult to find specific ones being a newcomer.

Try:
indxlst <- lapply(dif, function(x) c(1, x[-length(x)]+1, x[length(x)]))
newdflist <- lapply(indxlst, function(x) data.frame(matrix(0, 2, sum(x))))
for(i in 1:length(newdflist)) {
newdflist[[i]][indxlst[[i]]] <- KDzlambdaEG[[i]]
}
Reproducible Data Test
df1 <- data.frame(x=1:2, y=c("Jan", "Feb"), z=c("A", "B"))
df3 <- df2 <- df1[,-3]
KDzlambdaEG <- list(df1,df2,df3)
x1 <- c(4,15,28)
x2 <- c(24,67)
x3 <- c(9, 53)
dif <- list(x1,x2,x3)
indxlst <- lapply(dif, function(x) c(1, x[-length(x)]+1, x[length(x)]))
newdflist <- lapply(indxlst, function(x) data.frame(matrix(0, 2, sum(x))))
for(i in 1:length(newdflist)) {
newdflist[[i]][indxlst[[i]]] <- KDzlambdaEG[[i]]
}
newdflist

Building a list in a loop in R - getting item names correct

I have a function which contains a loop over two lists and builds up some calculated data. I would like to return these data as a lists of lists, indexed by some value, but I'm getting the assignment wrong.
A minimal example of what I'm trying to do, and where i'm going wrong would be:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- append(mybiglist, tmp)
}
If you run this and look at the output mybiglist, you will see that something is going very wrong in the way each item is being named.
Any ideas on how I might achieve what I actually want?
Thanks
ps. I know that in R there is a sense in which one has failed if one has to resort to loops, but in this case I do feel justified ;-)

It works if you don't use the append command:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- tmp
}
# List of 5
# $ item:1:List of 3
# ..$ uniform : num [1:10] 0.737 0.987 0.577 0.814 0.452 ...
# ..$ normal : num [1:16] -0.403 -0.104 2.147 0.32 1.713 ...
# ..$ binomial: num [1:8] 0 0 0 0 1 0 0 1
# $ item:2:List of 3
# ..$ uniform : num [1:10] 0.61 0.62 0.49 0.217 0.862 ...
# ..$ normal : num [1:16] 0.945 -0.154 -0.5 -0.729 -0.547 ...
# ..$ binomial: num [1:8] 1 2 2 0 2 1 0 2
# $ item:3:List of 3
# ..$ uniform : num [1:10] 0.66 0.094 0.432 0.634 0.949 ...
# ..$ normal : num [1:16] -0.607 0.274 -1.455 0.828 -0.73 ...
# ..$ binomial: num [1:8] 2 2 3 1 1 1 2 0
# $ item:4:List of 3
# ..$ uniform : num [1:10] 0.455 0.442 0.149 0.745 0.24 ...
# ..$ normal : num [1:16] 0.0994 -0.5332 -0.8131 -1.1847 -0.8032 ...
# ..$ binomial: num [1:8] 2 3 1 1 2 2 2 1
# $ item:5:List of 3
# ..$ uniform : num [1:10] 0.816 0.279 0.583 0.179 0.321 ...
# ..$ normal : num [1:16] -0.036 1.137 0.178 0.29 1.266 ...
# ..$ binomial: num [1:8] 3 4 3 4 4 2 2 3

Change
mybiglist[[name]] <- append(mybiglist, tmp)
to
mybiglist[[name]] <- tmp

To show that an explicit for loop is not required
unif_norm <- replicate(5, list(uniform = runif(10),
normal = rnorm(16)), simplify=F)
binomials <- lapply(seq_len(5)/10, function(prob) {
list(binomial = rbinom(n = 5 ,size = 8, prob = prob))})
biglist <- setNames(mapply(c, unif_norm, binomials, SIMPLIFY = F),
paste0('item:',seq_along(unif_norm)))
In general if you go down the for loop path it is better to preassign the list beforehand. This is more memory efficient.
mybiglist <- vector('list', 5)
names(mybiglist) <- paste0('item:', seq_along(mybiglist))
for(i in seq_along(mybiglist)){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[i]] <- tmp
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Shuffle Chunks of a Data Frame in R - r

I have a data frame of 90 rows and 90 columns. My goal is to divide this data frame into 9 pieces, each with 30 rows and 30 columns. Then, I would like to randomly shuffle these 9 pieces for a new data frame. I'm not sure how to approach this as I'm fairly new in R. Any help is appreciated!

Related

how to partition the data with specific row size

Using rollapply to output to lists of lists

How to split list at every 10th item in R?

apply create columns function to a list r

Building a list in a loop in R - getting item names correct

Categories

Resources