I would like to create several datasets via for loop.
basically I want create 29 datasets in which I can find in the 1st one the 44th and 45th column of the DF, in the 2nd one the 46th and 47th column of the DF and so on.
I tried like this with no results.
data. <- data.frame(matrix( nrow=1442, ncol=2))
for (i in 1:29){
assign(paste("data",i, sep="_"), data.)
data_[i][,1] <- DF[,c(43+i)]
data_[i][,2] <- DF[,c(44+i)]
}
Can you help me please?
Like this?
data <- list()
DF <- data.frame(matrix(runif(10000),ncol=100))
for (i in 1:29){
data[[i]] <- data.frame(DF[,c(43:44+i)])
}
str(data, list.len = 3)
One solution using purrr
DF <- data.frame(matrix(runif(10000),ncol=100))
library(purrr)
res <- 0:28 %>%
# create the indices to subset
map( ~ c(44, 45) + .x) %>%
# subset the df for each indice group
map( ~ DF[, .x])
length(res)
#> [1] 29
str(head(res))
#> List of 6
#> $ :'data.frame': 100 obs. of 2 variables:
#> ..$ X44: num [1:100] 0.477 0.0593 0.2616 0.7349 0.1202 ...
#> ..$ X45: num [1:100] 0.43 0.105 0.557 0.341 0.111 ...
#> $ :'data.frame': 100 obs. of 2 variables:
#> ..$ X45: num [1:100] 0.43 0.105 0.557 0.341 0.111 ...
#> ..$ X46: num [1:100] 0.78 0.877 0.518 0.162 0.565 ...
#> $ :'data.frame': 100 obs. of 2 variables:
#> ..$ X46: num [1:100] 0.78 0.877 0.518 0.162 0.565 ...
#> ..$ X47: num [1:100] 0.931 0.985 0.59 0.656 0.713 ...
#> $ :'data.frame': 100 obs. of 2 variables:
#> ..$ X47: num [1:100] 0.931 0.985 0.59 0.656 0.713 ...
#> ..$ X48: num [1:100] 0.82 0.899 0.359 0.809 0.329 ...
#> $ :'data.frame': 100 obs. of 2 variables:
#> ..$ X48: num [1:100] 0.82 0.899 0.359 0.809 0.329 ...
#> ..$ X49: num [1:100] 0.7982 0.0966 0.2716 0.3364 0.7295 ...
#> $ :'data.frame': 100 obs. of 2 variables:
#> ..$ X49: num [1:100] 0.7982 0.0966 0.2716 0.3364 0.7295 ...
#> ..$ X50: num [1:100] 0.83057 0.64207 0.94392 0.00904 0.26966 ...
Created on 2018-11-04 by the reprex package (v0.2.1)
Give this a try.
n = 1000
k = 120
DF = matrix(runif(n*k), n, k)
for (i in 1:29){
tmp = DF[,c(43, 43) + c(2*i-1, 2*i)]
assign(paste0("data_", i), tmp)
}
ls()
all(data_1 == DF[,c(44, 45)])
all(data_2 == DF[,c(46, 47)])
Doing data_[i] will make R look for the object called data_, so you can't just subscript the object name like that.
Related
I have a dataset and would like to take a lot of subsets based on various columns, values, and conditional operators. I think the most desirable output is a list containing all of these subsetted data frames as separate elements in the list. I attempted to do this by building a data frame that contains the subset conditions I would like to use, building a function, then using apply to feed that data frame to the function, but that didn't work. I'm sure there's probably a better method that uses an anonymous function or something like that, but I'm not sure how I would implement that. Below is an example code that should produce 8 subsets of data.
Original dataset, where x1 and x2 are scored on items that won't be used for subsetting and RT and LS are the variables that will be a subset on:
df <- data.frame(x1 = rnorm(100),
x2 = rnorm(100),
RT = abs(rnorm(100)),
LS = sample(1:10, 100, replace = T))
Dataframe containing the conditions for subsetting. E.g., the first subset of data should be any observations with values greater than or equal to 0.5 in the RT column, the second subset should be any observations greater than or equal to 1 in the subset column, etc. There should be 8 subsets, 4 done on the RT variable and 4 done on the LS variable.
subsetConditions <- data.frame(column = rep(c("RT", "LS"), each = 4),
operator = rep(c(">=", "<="), each = 4),
value = c(0.5, 1, 1.5, 2,
9, 8, 7, 6))
And this is the ugly function I wrote to attempt to do this:
subsetFun <- function(x){
subset(df, eval(parse(text = paste(x))))
}
subsets <- apply(subsetConditions, 1, subsetFun)
Thanks for any help!
Consider Map (wrapper to mapply) without any eval + parse. Since ==, <=, >=, and other operators can be used as functions with two arguments where 4 <= 5 can be written as `<=`(4,5) or "<="(4, 5), simply pass arguments elementwise and use get to reference the function by string:
sub_data <- function(col, op, val) {
df[get(op)(df[[col]], val),]
}
sub_dfs <- with(subsetConditions, Map(sub_data, column, operator, value))
Output
str(sub_dfs)
List of 8
$ RT:'data.frame': 62 obs. of 4 variables:
..$ x1: num [1:62] -1.12 -0.745 -1.377 0.848 1.63 ...
..$ x2: num [1:62] -0.257 -2.385 0.805 -0.313 0.662 ...
..$ RT: num [1:62] 0.693 1.662 0.731 2.145 0.543 ...
..$ LS: int [1:62] 5 5 1 2 9 1 5 9 3 10 ...
$ RT:'data.frame': 36 obs. of 4 variables:
..$ x1: num [1:36] -0.745 0.848 0.908 -0.761 0.74 ...
..$ x2: num [1:36] -2.3849 -0.3131 -2.4645 -0.0784 0.8512 ...
..$ RT: num [1:36] 1.66 2.15 1.74 1.65 1.13 ...
..$ LS: int [1:36] 5 2 1 5 9 10 2 7 1 3 ...
$ RT:'data.frame': 14 obs. of 4 variables:
..$ x1: num [1:14] -0.745 0.848 0.908 -0.761 -1.063 ...
..$ x2: num [1:14] -2.3849 -0.3131 -2.4645 -0.0784 -2.9886 ...
..$ RT: num [1:14] 1.66 2.15 1.74 1.65 2.63 ...
..$ LS: int [1:14] 5 2 1 5 5 6 9 4 8 4 ...
$ RT:'data.frame': 3 obs. of 4 variables:
..$ x1: num [1:3] 0.848 -1.063 0.197
..$ x2: num [1:3] -0.313 -2.989 0.709
..$ RT: num [1:3] 2.15 2.63 2.05
..$ LS: int [1:3] 2 5 6
$ LS:'data.frame': 92 obs. of 4 variables:
..$ x1: num [1:92] -1.12 -0.745 -1.377 0.848 0.612 ...
..$ x2: num [1:92] -0.257 -2.385 0.805 -0.313 0.958 ...
..$ RT: num [1:92] 0.693 1.662 0.731 2.145 0.489 ...
..$ LS: int [1:92] 5 5 1 2 1 9 1 5 9 3 ...
$ LS:'data.frame': 78 obs. of 4 variables:
..$ x1: num [1:78] -1.12 -0.745 -1.377 0.848 0.612 ...
..$ x2: num [1:78] -0.257 -2.385 0.805 -0.313 0.958 ...
..$ RT: num [1:78] 0.693 1.662 0.731 2.145 0.489 ...
..$ LS: int [1:78] 5 5 1 2 1 1 5 3 5 2 ...
$ LS:'data.frame': 75 obs. of 4 variables:
..$ x1: num [1:75] -1.12 -0.745 -1.377 0.848 0.612 ...
..$ x2: num [1:75] -0.257 -2.385 0.805 -0.313 0.958 ...
..$ RT: num [1:75] 0.693 1.662 0.731 2.145 0.489 ...
..$ LS: int [1:75] 5 5 1 2 1 1 5 3 5 2 ...
$ LS:'data.frame': 62 obs. of 4 variables:
..$ x1: num [1:62] -1.12 -0.745 -1.377 0.848 0.612 ...
..$ x2: num [1:62] -0.257 -2.385 0.805 -0.313 0.958 ...
..$ RT: num [1:62] 0.693 1.662 0.731 2.145 0.489 ...
..$ LS: int [1:62] 5 5 1 2 1 1 5 3 5 2 ...
You were actually pretty close with your function, but just needed to make an adjustment. So, with paste for each row, you need to collapse all 3 columns so that it is only 1 string rather than 3, then it can properly evaluate the expression.
subsetFun <- function(x){
subset(df, eval(parse(text = paste(x, collapse = ""))))
}
subsets <- apply(subsetConditions, 1, subsetFun)
Output
Then, it will return the 8 subsets.
str(subsets)
List of 8
$ :'data.frame': 67 obs. of 4 variables:
..$ x1: num [1:67] -1.208 0.606 -0.17 0.728 -0.424 ...
..$ x2: num [1:67] 0.4058 -0.3041 -0.3357 0.7904 -0.0264 ...
..$ RT: num [1:67] 1.972 0.883 0.598 0.633 1.517 ...
..$ LS: int [1:67] 8 9 2 10 8 5 3 4 7 2 ...
$ :'data.frame': 35 obs. of 4 variables:
..$ x1: num [1:35] -1.2083 -0.4241 -0.0906 0.9851 -0.8236 ...
..$ x2: num [1:35] 0.4058 -0.0264 1.0054 0.0653 1.4647 ...
..$ RT: num [1:35] 1.97 1.52 1.05 1.63 1.47 ...
..$ LS: int [1:35] 8 8 5 4 7 3 1 6 8 6 ...
$ :'data.frame': 16 obs. of 4 variables:
..$ x1: num [1:16] -1.208 -0.424 0.985 0.99 0.939 ...
..$ x2: num [1:16] 0.4058 -0.0264 0.0653 0.3486 -0.7562 ...
..$ RT: num [1:16] 1.97 1.52 1.63 1.85 1.8 ...
..$ LS: int [1:16] 8 8 4 6 10 2 6 6 3 9 ...
$ :'data.frame': 7 obs. of 4 variables:
..$ x1: num [1:7] 0.963 0.423 -0.444 0.279 0.417 ...
..$ x2: num [1:7] 0.6612 0.0354 0.0555 0.1253 -0.3056 ...
..$ RT: num [1:7] 2.71 2.15 2.05 2.01 2.07 ...
..$ LS: int [1:7] 2 6 9 9 7 7 4
$ :'data.frame': 91 obs. of 4 variables:
..$ x1: num [1:91] -0.952 -1.208 0.606 -0.17 -0.048 ...
..$ x2: num [1:91] -0.645 0.406 -0.304 -0.336 -0.897 ...
..$ RT: num [1:91] 0.471 1.972 0.883 0.598 0.224 ...
..$ LS: int [1:91] 6 8 9 2 1 8 4 5 3 4 ...
$ :'data.frame': 75 obs. of 4 variables:
..$ x1: num [1:75] -0.952 -1.208 -0.17 -0.048 -0.424 ...
..$ x2: num [1:75] -0.6448 0.4058 -0.3357 -0.8968 -0.0264 ...
..$ RT: num [1:75] 0.471 1.972 0.598 0.224 1.517 ...
..$ LS: int [1:75] 6 8 2 1 8 4 5 3 4 1 ...
$ :'data.frame': 65 obs. of 4 variables:
..$ x1: num [1:65] -0.9517 -0.1698 -0.048 0.2834 -0.0906 ...
..$ x2: num [1:65] -0.645 -0.336 -0.897 -2.072 1.005 ...
..$ RT: num [1:65] 0.471 0.598 0.224 0.486 1.053 ...
..$ LS: int [1:65] 6 2 1 4 5 3 4 1 7 4 ...
$ :'data.frame': 58 obs. of 4 variables:
..$ x1: num [1:58] -0.9517 -0.1698 -0.048 0.2834 -0.0906 ...
..$ x2: num [1:58] -0.645 -0.336 -0.897 -2.072 1.005 ...
..$ RT: num [1:58] 0.471 0.598 0.224 0.486 1.053 ...
..$ LS: int [1:58] 6 2 1 4 5 3 4 1 4 2 ...
I'm trying to run psych::alpha on a grouped dataset group_map works but as expected the list doesn't state the groups, it indexes the countries ([[1]] etc) which is not useful to me, so it is not a viable alternative.
The reference website examples imply there is no argument additions between group_map and group_modify but passing through group_modify gives me the error:
Number of categories should be increased in order to count frequencies.
Error: The result of .f should be a data frame.
Backtrace:
1. `%>%`(...)
3. dplyr:::group_modify.grouped_df(., ~psych::alpha(.x, check.keys = TRUE))
5. dplyr:::group_map.data.frame(.data, fun, .keep = .keep)
6. dplyr:::map2(chunks, group_keys, .f, ...)
7. base::mapply(.f, .x, .y, MoreArgs = list(...), SIMPLIFY = FALSE)
>
This happens in both my dataset where I:
df%>% select(groupVar, vars1:var4)%>% group_by(groupVar)%>%
group_modify(~ psych::alpha(.x, check.keys = TRUE))
as well as with the example code adapted from the tidyverse website which substitutes head() for psych::alpha:
iris %>% group_by(Species) %>%
group_modify(~ psych::alpha(.x, check.keys = TRUE))
The issue is that group_modify expects a data.frame as output. According to ?group_modify
group_modify() is good for "data frame in, data frame out".
We could use group_map as the output of alpha is a list
group_map() returns a list of results from calling .f on each group.
library(dplyr)
out <- iris %>%
group_by(Species) %>%
group_map(~ psych::alpha(.x, check.keys = TRUE))
check the output of elements of the list output from group_map
str(out[[1]])
List of 14
$ total :'data.frame': 1 obs. of 9 variables:
..$ raw_alpha: num 0.663
..$ std.alpha: num 0.672
..$ G6(smc) : num 0.68
..$ average_r: num 0.338
..$ S/N : num 2.05
..$ ase : num 0.0535
..$ mean : num 2.54
..$ sd : num 0.196
..$ median_r : num 0.273
$ alpha.drop :'data.frame': 4 obs. of 8 variables:
..$ raw_alpha: num [1:4] 0.34 0.425 0.69 0.691
..$ std.alpha: num [1:4] 0.496 0.553 0.683 0.663
..$ G6(smc) : num [1:4] 0.404 0.454 0.672 0.664
..$ average_r: num [1:4] 0.247 0.292 0.418 0.396
..$ S/N : num [1:4] 0.986 1.239 2.153 1.965
..$ alpha se : num [1:4] 0.1243 0.1086 0.0426 0.0562
..$ var.r : num [1:4] 0.00608 0.00119 0.07961 0.09217
..$ med.r : num [1:4] 0.233 0.278 0.278 0.267
$ item.stats :'data.frame': 4 obs. of 7 variables:
..$ n : num [1:4] 50 50 50 50
..$ raw.r : num [1:4] 0.905 0.888 0.472 0.445
..$ std.r : num [1:4] 0.806 0.758 0.626 0.649
..$ r.cor : num [1:4] 0.796 0.729 0.393 0.424
..$ r.drop: num [1:4] 0.73 0.66 0.273 0.328
..$ mean : num [1:4] 5.006 3.428 1.462 0.246
..$ sd : num [1:4] 0.352 0.379 0.174 0.105
$ response.freq: NULL
$ keys : Named num [1:4] 1 1 1 1
..- attr(*, "names")= chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
$ scores : num [1:50] 2.55 2.38 2.35 2.35 2.55 ...
$ nvar : int 4
$ boot.ci : NULL
$ boot : NULL
$ Unidim :List of 1
..$ Unidim: num 0.625
$ var.r : num 0.0418
$ Fit :List of 1
..$ Fit.off: num 0.926
$ call : language psych::alpha(x = .x, check.keys = TRUE)
$ title : NULL
- attr(*, "class")= chr [1:2] "psych" "alpha"
Update
There is a summary method for alpha which can return a data.frame
out <- iris %>%
group_by(Species) %>%
group_modify(~ psych::alpha(.x, check.keys = TRUE) %>%
summary)
-output
out
# A tibble: 3 x 10
# Groups: Species [3]
Species raw_alpha std.alpha `G6(smc)` average_r `S/N` ase mean sd median_r
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 0.663 0.672 0.680 0.338 2.05 0.0535 2.54 0.196 0.273
2 versicolor 0.833 0.877 0.877 0.640 7.10 0.0308 3.57 0.323 0.612
3 virginica 0.774 0.785 0.818 0.477 3.65 0.0418 4.28 0.364 0.429
This question already has answers here:
Can lists be created that name themselves based on input object names?
(4 answers)
Closed 2 years ago.
I have several lists (ListA, ListB, ListC...) with the same internal structure as the example below. I would like to combine all of them, keeping their structure, and have one list with all lists (ListAll). How can I do this?
Example:
I have:
ListA
$ data :'data.frame': 1 obs. of 2 variables:
..$ mean: num -0.128
..$ sd : num 1.11
$ simulations :'data.frame': 1000 obs. of 2 variables:
..$ mean: num [1:1000] -0.0116 -0.0156 0.0336 -0.0502 -0.0427 ...
..$ sd : num [1:1000] 1.003 1.014 0.963 1.036 1.051 ...
$ values:'data.frame': 35 obs. of 2 variables:
..$ C: num [1:35] 3.45 2.91 2.62 2.06 1.87 ...
..$ D: num [1:35] 5.42 2.89 3.34 1.68 1.43 ...
and several lists with the same structure.
I would like to get:
ListAll
$ ListA
$ data :'data.frame': 1 obs. of 2 variables:
..$ mean: num -0.128
..$ sd : num 1.11
$ simulations :'data.frame': 1000 obs. of 2 variables:
..$ mean: num [1:1000] -0.0116 -0.0156 0.0336 -0.0502 -0.0427 ...
..$ sd : num [1:1000] 1.003 1.014 0.963 1.036 1.051 ...
$ values:'data.frame': 35 obs. of 2 variables:
..$ C: num [1:35] 3.45 2.91 2.62 2.06 1.87 ...
..$ D: num [1:35] 5.42 2.89 3.34 1.68 1.43 ...
$ ListB
$ data :'data.frame': 1 obs. of 2 variables:
..$ mean: num -0.132
..$ sd : num 1.01
$ simulations :'data.frame': 1000 obs. of 2 variables:
..$ mean: num [1:1000] -0.0114 -0.0123 0.0378 -0.0102 -0.0340 ...
..$ sd : num [1:1000] 1.013 1.011 0.876 1.012 1.023 ...
$ values:'data.frame': 35 obs. of 2 variables:
..$ C: num [1:35] 4.41 1.61 1.42 1.96 2.07 ...
..$ D: num [1:35] 2.41 2.19 2.54 2.08 2.53 ...
** and names(listAll) would be:**
ListaA, ListB, ListC...
You can create a list of lists in base R.
ListAll <- list(ListA, ListB, ListC)
I have a matrix "Mat.return" with 390 rows and 2749 columns and I want to create 2499 sub-matrices from it, each with 250 columns and 80 rows.
The first sub-matrix would be:
B1=(Mat.return)[sample(nrow((Mat.return)),size=80,replace=TRUE),][,c(1:250)]
The second one, would start from the second column of "Mat.return" and would select 250 following columns. It would thus be:
B2=(Mat.return)[sample(nrow((Mat.return)),size=80,replace=TRUE),][,c(2:251)]
The third one would start from the third column and would select the 250 following column, and so on [until matrix n°2499]
Is there a function or a code that could do this, instead of computing it manually?
Thank you!
Just make a loop from 1 to 2499 around your function. this code will give you a list of 2499 matrix on 80 rows and 251 columns
Mat.return <- matrix(rnorm(390*2749), nrow = 390, ncol = 2749)
lmat <- lapply(1:2499, function(i){
(Mat.return)[sample(nrow((Mat.return)),size=80,replace=TRUE),][,c(i:(250 + i))]
})
str(lmat, list.len = 10)
#> List of 2499
#> $ : num [1:80, 1:251] 0.493 -0.295 2.299 -1.427 -0.174 ...
#> $ : num [1:80, 1:251] -0.4 -1.632 1.21 0.529 -1.045 ...
#> $ : num [1:80, 1:251] -1.71 -1.458 0.186 0.808 -1.179 ...
#> $ : num [1:80, 1:251] 0.237 -0.952 -0.632 -0.204 -1.702 ...
#> $ : num [1:80, 1:251] -1.828 -0.895 -1.31 1.009 -0.451 ...
#> $ : num [1:80, 1:251] 0.128 0.461 -0.393 0.358 1.549 ...
#> $ : num [1:80, 1:251] -0.44814 0.52248 0.28651 0.39365 -0.00774 ...
#> $ : num [1:80, 1:251] 0.136 0.615 -0.435 -0.846 0.788 ...
#> $ : num [1:80, 1:251] 0.761 0.11 -1.486 -0.488 0.118 ...
#> $ : num [1:80, 1:251] -0.9064 -1.3382 -0.9678 0.0654 -0.5952 ...
#> [list output truncated]
I want to add a column to each of my data frames in my list table after I do this code :
#list of my dataframes
df <- list(df1,df2,df3,df4)
#compute stats
stats <- function(d) do.call(rbind, lapply(split(d, d[,2]), function(x) data.frame(Nb= length(x$Year), Mean=mean(x$A), SD=sd(x$A) )))
#Apply to list of dataframes
table <- lapply(df, stats)
This column which I call Source for example, include the names of my dataframes along with Nb, Mean and SD variables. So the variable Source should contain df1,df1,df1... for my table[1], and so on.
Is there anyway I can add it in my code above?
Here's a different way of doing things:
First, let's start with some reproducible data:
set.seed(1)
n = 10
dat <- list(data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)),
data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)),
data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)),
data.frame(a=rnorm(n), b=sample(1:3,n,TRUE)))
Then, you want a function that adds columns to a data.frame. The obvious candidate is within. The particular things you want to calculate are constant values for each observation within a particular category. To do that, use ave for each of the columns you want to add. Here's your new function:
stat <- function(d){
within(d, {
Nb = ave(a, b, FUN=length)
Mean = ave(a, b, FUN=mean)
SD = ave(a, b, FUN=sd)
})
}
Then just lapply it to your list of data.frames:
lapply(dat, stat)
As you can see, columns are added as appropriate:
> str(lapply(dat, stat))
List of 4
$ :'data.frame': 10 obs. of 5 variables:
..$ a : num [1:10] -0.626 0.184 -0.836 1.595 0.33 ...
..$ b : int [1:10] 3 1 2 1 1 2 1 2 3 2
..$ SD : num [1:10] 0.85 0.643 0.738 0.643 0.643 ...
..$ Mean: num [1:10] -0.0253 0.649 -0.3058 0.649 0.649 ...
..$ Nb : num [1:10] 2 4 4 4 4 4 4 4 2 4
$ :'data.frame': 10 obs. of 5 variables:
..$ a : num [1:10] -0.0449 -0.0162 0.9438 0.8212 0.5939 ...
..$ b : int [1:10] 2 3 2 1 1 1 1 2 2 2
..$ SD : num [1:10] 1.141 NA 1.141 0.136 0.136 ...
..$ Mean: num [1:10] -0.0792 -0.0162 -0.0792 0.7791 0.7791 ...
..$ Nb : num [1:10] 5 1 5 4 4 4 4 5 5 5
$ :'data.frame': 10 obs. of 5 variables:
..$ a : num [1:10] 1.3587 -0.1028 0.3877 -0.0538 -1.3771 ...
..$ b : int [1:10] 2 3 2 1 3 1 3 1 1 1
..$ SD : num [1:10] 0.687 0.668 0.687 0.635 0.668 ...
..$ Mean: num [1:10] 0.873 -0.625 0.873 0.267 -0.625 ...
..$ Nb : num [1:10] 2 3 2 5 3 5 3 5 5 5
$ :'data.frame': 10 obs. of 5 variables:
..$ a : num [1:10] -0.707 0.365 0.769 -0.112 0.881 ...
..$ b : int [1:10] 3 3 2 2 1 1 3 1 2 2
..$ SD : num [1:10] 0.593 0.593 1.111 1.111 0.297 ...
..$ Mean: num [1:10] -0.318 -0.318 0.24 0.24 0.54 ...
..$ Nb : num [1:10] 3 3 4 4 3 3 3 3 4 4