how to partition the data with specific row size

how to partition the data with specific row size - r

Say I have the following matrix:
X = matrix(rnorm(4000), nrow=400, ncol=10)
size = c(80,80,79,80,81)
I want to partition the matrix row-wise according to the above size in a concise way. To illustrate,
x1 = X[1:80,]
x2 = X[81:160,]
.
.
x5 = X[320:400,]
I usually have different sizes and different matrices, so I won't be able to do this manually every time. Your help is very much appreciated.

Try the code below
e <- cumsum(size)
b <- c(1, head(e + 1, -1))
out <- Map(function(rb, re) X[rb:re, ], b, e)
and you will see
> str(out)
List of 5
$ : num [1:80, 1:10] 0.253 -0.368 0.804 -0.603 -0.119 ...
$ : num [1:80, 1:10] 0.467 -0.743 -0.401 1.48 0.853 ...
$ : num [1:79, 1:10] -1.123 -0.873 -1.039 -0.247 -0.774 ...
$ : num [1:80, 1:10] -1.409 -0.683 -0.514 0.485 -0.347 ...
$ : num [1:81, 1:10] 0.58 0.529 -0.803 0.49 -0.847 ...
Furthermore, if you want to create variables, try
list2env(setNames(out,paste0("x",seq_along(out))),envir = .GlobalEnv)

You can get start and end points using cumsum and use Map to subset them and create list of matrices.
X = matrix(rnorm(4000), nrow=400, ncol=10)
size = c(80,80,79,80,81)
val <- cumsum(size)
result <- Map(function(x, y) X[x:y, ], c(1, val[-length(val)] + 1), val)
where c(1, val[-length(val)] + 1) creates the starting row numbers and val are the ending ones.
c(1, val[-length(val)] + 1)
#[1] 1 81 161 240 320
val
#[1] 80 160 239 319 400

Using map2
library(purrr)
size1 <- cumsum(size)
map2(size1, c(1, size1[-length(size1)] + 1), ~ X[.x:.y,])

Related

Shuffle Chunks of a Data Frame in R

I have a data frame of 90 rows and 90 columns.
My goal is to divide this data frame into 9 pieces, each with 30 rows and 30 columns. Then, I would like to randomly shuffle these 9 pieces for a new data frame.
I'm not sure how to approach this as I'm fairly new in R.
Any help is appreciated!

Use sample on a list of vectors, then unlist and subset:
If I understand rightly, you're trying to do something a bit like one of those 3*3 sliding picture puzzles, right? Where you split the df into chunks of 30 and then 'shuffle' those chunks. This will do that:
### Create a 90 by 90 data frame
set.seed(2)
df <- as.data.frame(array(runif(90*90), dim = c(90,90)))
### Function to randomise subsetting
rand_dims <- function(){
myDims <- list(1:30, 31:60, 61:90)
unlist(sample(myDims))
}
### Shuffle 'chunks' of df
df[rand_dims(), rand_dims()] -> shuffled
### Checking the leading row/col names
### Shows we've succeeded
colnames(shuffled)[1:5]
#> [1] "V61" "V62" "V63" "V64" "V65"
rownames(shuffled)[1:5]
#> [1] "31" "32" "33" "34" "35"
Created on 2021-01-21 by the reprex package (v0.3.0)

Perhaps you can try the following code using split + sample, which helps shuffle the chunks
set.seed(1)
mat <- matrix(data = rnorm(90 * 90), ncol = 90, nrow = 90)
rrnd <- split(sample(seq(nrow(mat))), gl(3, nrow(mat) / 3))
crnd <- split(sample(seq(ncol(mat))), gl(3, ncol(mat) / 3))
mat.shuffle <- c()
for (i in rrnd) {
for (j in crnd) {
mat.shuffle <- c(mat.shuffle, list(mat[i, j]))
}
}
Note that mat.shuffle is a list of 9 chunks. If you want to transform back to the matrix of size 90x90, you can run the code below afterwards
out <- do.call(
rbind,
lapply(
split(
mat.shuffle,
gl(3, length(mat.shuffle) / 3)
),
function(x) do.call(cbind, x)
)
)
and you will see
> str(out)
num [1:90, 1:90] -0.639 -0.984 -0.108 -0.939 1.743 ...
> str(mat.shuffle)
List of 9
$ : num [1:30, 1:30] -0.639 -0.984 -0.108 -0.939 1.743 ...
$ : num [1:30, 1:30] -0.718 -0.154 -0.29 -0.796 1.7 ...
$ : num [1:30, 1:30] -0.886 0.887 -0.381 0.371 0.327 ...
$ : num [1:30, 1:30] -0.4555 -0.901 0.0312 -3.0223 0.8688 ...
$ : num [1:30, 1:30] 0.705 1.559 -1.278 -0.676 1.933 ...
$ : num [1:30, 1:30] -0.1916 -0.98649 -1.65769 0.00104 -0.19544 ...
$ : num [1:30, 1:30] 1.668 -0.869 1.469 -0.222 -1.869 ...
$ : num [1:30, 1:30] 0.9931 0.4116 1.995 -0.4456 -0.0692 ...
$ : num [1:30, 1:30] -1.462 -0.206 1.197 -1.611 -1.708 ...

Translating a formula into R with summation operator

Edit:
What I want to know in its simplest form:
How do I get the following cumsum data-set with a for-loop:
1 = 1
1+2 = 3
1+2+3 = 6
1+2+3+4= 10
1+2+3+4+5 = 15
When I try this code; I do not get the restult I want
test <- c(0,0,0,0,0)
for (i in 1:5) {test[i] <- sum(i)}
test
[1] 1 2 3 4 5
Actual Problem:
I want to translate the following formula into R:
What I tried is the following, but it gives me wrong values unfortunately:
P <- rnorm(20,100,6)
exp <- function( theta){
for (t in (3:20)){
for (k in (1:(t-1))){
X[t] = (1-theta) * sum( theta^(k-1)*(P[t-k] - P[t-k-1])) + theta^(t-1)}
}
I am sure I am using the sum operator wrong, but I my brain is totally blocked right and I do not know how to properly use it here.
I would appreciate your help.

The convoluted way:
for(i in 1:5){
if(!exists("x")){
print(x <- 1)
} else {
print(x <- x + i)
}
}; rm(i, x)
The simple way:
cumsum(1:5)

You can create your function exp so that it will depend on theta and t and then Vectorize it. Try out:
set.seed(1)
P <- rnorm(20,100,6)
exp <- function(theta, t){
for (k in 1:(t-1)) {
X = (1-theta) * sum(theta^(k-1)*(P[t-k] - P[t-k-1])) + theta^(t-1)
}
return(X)
}
exp <- Vectorize(exp)
exp(0.6, 3:20) # setting theta to 0.6 for instance
# output
[1] 3.600000e-01 2.160000e-01 1.296000e-01
[4] 7.776000e-02 4.665600e-02 2.799360e-02
[7] 1.679616e-02 1.007770e-02 6.046618e-03
[10] 3.627971e-03 2.176782e-03 1.306069e-03
[13] 7.836416e-04 4.701850e-04 2.821110e-04
[16] 1.692666e-04 1.015600e-04 6.093597e-05
You can also use a for loop for different theta values:
my.list <- list()
for (i in seq(0.1, 0.9, 0.1)) {
val = exp(i, 3:20)
my.list[[paste0("theta", i)]] <- val
}
str(my.list)
# my.list
List of 9
$ theta0.1: num [1:18] 1e-02 1e-03 1e-04 1e-05 1e-06 ...
$ theta0.2: num [1:18] 4.0e-02 8.0e-03 1.6e-03 3.2e-04 6.4e-05 ...
$ theta0.3: num [1:18] 0.09 0.027 0.0081 0.00243 0.000729 ...
$ theta0.4: num [1:18] 0.16 0.064 0.0256 0.0102 0.0041 ...
$ theta0.5: num [1:18] 0.25 0.125 0.0625 0.0312 0.0156 ...
$ theta0.6: num [1:18] 0.36 0.216 0.1296 0.0778 0.0467 ...
$ theta0.7: num [1:18] 0.49 0.343 0.24 0.168 0.118 ...
$ theta0.8: num [1:18] 0.64 0.512 0.41 0.328 0.262 ...
$ theta0.9: num [1:18] 0.81 0.729 0.656 0.59 0.531 ...

generate multiple data for each row with tidyverse

I have a list of variable named theta and want to draw n number of random variable from using each theta.
S = 5
n = 5
test = tibble(
s = 1:S,
theta = rgamma(S, shape = 10*s, rate = 50)
) %>%
mutate(data = rexp(n, theta))
ideally, I want my result be something like this:
S theta data
1 some value [a list with n number]
...
then expand it into a tibble:
S theta d1 d2 d3 .. dn
...
Hope this is clear.
Thanks.

If we need a list then, we can use map to loop through the each element of 'theta' and get the rexp in a list
library(tidyverse)
test1 <- test %>%
pull(theta) %>%
map(~rexp(n, .)) %>%
mutate(test, data = .)
str(test1$data)
#List of 5
# $ : num [1:5] 5.88 7.94 1.64 3.3 11.25
# $ : num [1:5] 4.5942 0.5424 1.7479 0.0469 0.9573
# $ : num [1:5] 1.192 2.447 0.239 1.497 2.359
# $ : num [1:5] 1.2323 0.0996 1.5778 0.1278 0.6982
# $ : num [1:5] 0.15 0.733 0.19 3.548 2.08
The list column can be unnested
test1 %>%
unnest(data)
data
S <- 5
n <- 5
test <- tibble(
s = 1:S,
theta = rgamma(S, shape = 10*s, rate = 50)
)

How to split list at every 10th item in R?

I have a list of 100 items.
I want to split it after each 10th item in Code 1.
Code 2 is about a list of two former lists and splitting it to 20 lists of 10 items each.
Code 1
Expected output: ten lists of 10 items.
A <- 100
a <- rnorm(A) # [1:100]
n <- 10
str(a)
# Not resulting in equal size of chunks with vectors so reject
# http://stackoverflow.com/a/3321659/54964
#d <- split(d, ceiling(seq_along(d)/(length(d)/n)))
# Works for vectors but not with lists
# http://stackoverflow.com/a/16275428/54964
#d <- function(d,n) split(d, cut(seq_along(d), n, labels = FALSE))
str(d)
Test code 2
Input: a list of two lists
aa <- list(a, rnorm(a))
Expected output: 20 lists of 10 item size
Testing Loki's answer
segmentLists <- function(A, segmentSize) {
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
}
segmentLists(aa, 10)
Output: loop going on, never stopping
OS: Debian 8.5
R: 3.3.1

you can use lapply.
aa <- list(a, rnorm(a))
aa
n <- 10
x <- lapply(aa, function(x) split(unlist(x), cut(seq_along(unlist(x)), n, labels = F)))
y <- unlist(x, recursive = F)
str(y)
# List of 20
# $ 1 : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ 2 : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ 3 : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ 4 : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ 5 : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
# $ 6 : num [1:10] -0.115 0.164 -0.365 -1.827 -2.036 ...
...
length(y)
# [1] 20
to remove the names of the list elements in y ($ 1, $ 2 etc.) you can use unname()
str(unname(y))
# List of 20
# $ : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
...
Using a function, you have to return res at the end of the function.
segmentLists <- function(A, segmentSize)
{
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
res <- unname(res)
res
}

apply create columns function to a list r

I am new in using apply and functions together and I am stuck and frustrated. I have 2 different list of data frames that I need to add certain number of columns to the first one when a condition is fulfill related to the second one. Below this is the structure of the first list that has one data frame for any station and every df has 2 or more columns with each pressure:
> str(KDzlambdaEG)
List of 3
$ 176:'data.frame': 301 obs. of 3 variables:
..$ 0 : num [1:301] 0.186 0.182 0.18 0.181 0.177 ...
..$ 5 : num [1:301] 0.127 0.127 0.127 0.127 0.127 ...
..$ 20: num [1:301] 0.245 0.241 0.239 0.236 0.236 ...
$ 177:'data.frame': 301 obs. of 2 variables:
..$ 0 : num [1:301] 0.132 0.132 0.132 0.13 0.13 ...
..$ 25: num [1:301] 0.09 0.092 0.0902 0.0896 0.0896 ...
$ 199:'data.frame': 301 obs. of 2 variables:
..$ 0 : num [1:301] 0.181 0.182 0.181 0.182 0.179 ...
..$ 10: num [1:301] 0.186 0.186 0.185 0.183 0.184 ...
On the other hand I have the second list that have the number of columns that I need to add after every column on each data frame of the first list :
> str(dif)
List of 3
[[176]]
[1] 4 15 28
[[177]]
[1] 24 67
[[199]]
[1] 9 53
I´ve tried tonnes of things even this, using the append_col function that appear in:
How to add a new column between other dataframe columns?
for (i in 1:length(dif)){
A<-lapply(KDzlambdaEG,append_col,rep(list(NA),dif[[i]][1]),after=1)
}
but nothing seems to work so far... I have searched for answers here but its difficult to find specific ones being a newcomer.

Try:
indxlst <- lapply(dif, function(x) c(1, x[-length(x)]+1, x[length(x)]))
newdflist <- lapply(indxlst, function(x) data.frame(matrix(0, 2, sum(x))))
for(i in 1:length(newdflist)) {
newdflist[[i]][indxlst[[i]]] <- KDzlambdaEG[[i]]
}
Reproducible Data Test
df1 <- data.frame(x=1:2, y=c("Jan", "Feb"), z=c("A", "B"))
df3 <- df2 <- df1[,-3]
KDzlambdaEG <- list(df1,df2,df3)
x1 <- c(4,15,28)
x2 <- c(24,67)
x3 <- c(9, 53)
dif <- list(x1,x2,x3)
indxlst <- lapply(dif, function(x) c(1, x[-length(x)]+1, x[length(x)]))
newdflist <- lapply(indxlst, function(x) data.frame(matrix(0, 2, sum(x))))
for(i in 1:length(newdflist)) {
newdflist[[i]][indxlst[[i]]] <- KDzlambdaEG[[i]]
}
newdflist

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

how to partition the data with specific row size - r

Using map2 library(purrr) size1 <- cumsum(size) map2(size1, c(1, size1[-length(size1)] + 1), ~ X[.x:.y,])

Related

Shuffle Chunks of a Data Frame in R

Translating a formula into R with summation operator

generate multiple data for each row with tidyverse

How to split list at every 10th item in R?

apply create columns function to a list r

Categories

Resources