I am looking the right R idiom to run a function over a set of parameters and create a long data frame from the results. Imagine that you have the following toy function:
fun <- function(sd, mean, foobar = "foobar") {
list(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
Now you want to run fun over different values of sd and mean:
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
I want to run fun for the parameters in each row of pars, and collect the results in a data frame with columns sd, mean, pos, value. Here is a rather clumsy solution:
set.seed(42)
## Run fun
res <- lapply(seq_len(nrow(pars)), function(x) {
do.call(fun, as.list(pars[x, ]))
})
## Select the result we need
res <- lapply(res, "[[", "random")
## Make it a single data frame
res <- do.call(rbind, res)
## Together with the parameters
res <- as.data.frame(cbind(sd = par_sd, mean = par_mean, res))
colnames(res) <- c("sd", "mean", 1:10)
## Make it a long data frame
res <- reshape2::melt(res, id.vars=c("sd", "mean"),
variable.name = "pos", value.name="value")
## Done
res[1:5,]
#> sd mean pos value
#> 1 1 0 1 2.37095845
#> 2 2 0 1 3.60973931
#> 3 3 0 1 0.08008422
#> 4 4 0 1 2.82180049
#> 5 5 0 1 2.02999300
Is there a simpler way to do this? Anyone knows a package that does things like this? My quick search did not give any good results...
If you're willing to amend fun() to return a data.frame, I find the most elegant solution is plyr's mdply.
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10, foobar = foobar)
}
par_sd <- rep(1:5, 3)
par_mean <- rep(0:2, each = 5)
pars <- data.frame(sd = par_sd, mean = par_mean)
results = mdply(pars, fun, foobar = "stuff")
str(results)
mapply would seem a good fit:
> str(with(pars, mapply(fun, sd=sd, mean=mean) ) )
List of 30
$ : num [1:10] 3.16 2.28 2.84 1.49 3.43 ...
$ : chr "foobar"
$ : num [1:10] 3.429 0.157 0.583 1.542 6.485 ...
$ : chr "foobar"
$ : num [1:10] -4.56 -1.51 -1.33 7.16 3.21 ...
$ : chr "foobar"
$ : num [1:10] -2.275 2.225 4.196 0.962 15.739 ...
$ : chr "foobar"
$ : num [1:10] 6.23 10.08 2.85 6.81 4.51 ...
$ : chr "foobar"
$ : num [1:10] 1.65 3.15 5.62 5.91 6.14 ...
$ : chr "foobar"
$ : num [1:10] 4.26 1.95 7.33 2.72 6.29 ...
$ : chr "foobar"
$ : num [1:10] 7.53 6.74 3.6 6.43 3.08 ...
$ : chr "foobar"
$ : num [1:10] -0.4181 -0.0584 5.5812 1.038 8.2482 ...
$ : chr "foobar"
$ : num [1:10] 0.2377 4.8557 5.2177 -0.0706 2.0434 ...
$ : chr "foobar"
$ : num [1:10] 2.95 4.3 5.26 8.58 5.81 ...
$ : chr "foobar"
$ : num [1:10] -0.85 4.83 8.19 5.17 6.58 ...
$ : chr "foobar"
$ : num [1:10] 3.59 11.46 6.29 6.57 2.97 ...
$ : chr "foobar"
$ : num [1:10] 0.117 3.142 10.473 10.196 5.56 ...
$ : chr "foobar"
$ : num [1:10] 13.03 2.64 -1.07 5.29 1.97 ...
$ : chr "foobar"
- attr(*, "dim")= int [1:2] 2 15
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "random" "foobar"
..$ : NULL
By default mapply will attempt to simplify and if you wanted to keep them as separate objects you could negate that default:
> str(with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) )
List of 15
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 1.08 0.68 3.16 3.38 5.96 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] 0.0927 5.1506 -1.0109 2.7136 2.1263 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
$ :'data.frame': 10 obs. of 2 variables:
..$ random: num [1:10] -0.331 2.9 -1.705 5.471 4.712 ...
..$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1
snipped
And if you need them in one stacked dataframe, it's just:
> str(do.call( rbind, with(pars, mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
'data.frame': 150 obs. of 2 variables:
$ random: num 1 3.34 2.5 4.72 4.25 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...
If you want these "labeled" with the sd and mean values, just this modification of the constructor function:
fun <- function(sd, mean, foobar = "foobar") {
data.frame(random = rnorm(10) * sd + mean + 1:10,
sd=sd, mean=mean, foobar = foobar)
}
str(do.call( rbind, with(pars,
mapply(fun, sd=sd, mean=mean, SIMPLIFY=FALSE) ) ))
#---------------
'data.frame': 150 obs. of 4 variables:
$ random: num 1.42 1.13 3.73 4.5 5.63 ...
$ sd : int 1 1 1 1 1 1 1 1 1 1 ...
$ mean : int 0 0 0 0 0 0 0 0 0 0 ...
$ foobar: Factor w/ 1 level "foobar": 1 1 1 1 1 1 1 1 1 1 ...
Related
I have multiple lists wanted to combine, but got wrong results
The code I used
hiv.Scatter <- list(predictions = predictdata, labels = L)
for (k in 1:2){
hiv.Scatter <-
list(predictions = append(
list(hiv.Scatter$predictions),
list(predictdata)
),
labels = append(list(hiv.Scatter$labels), list(L)))
}
But use the code above, I got very strange results
the results I expected is:
> str(hiv.Scatter)
List of 2
$ predictions:List of 3
..$ : num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
..$ : num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
..$ : num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
$ labels :List of 3
..$ : num [1:6] 1 1 1 1 1 1
..$ : num [1:6] 1 1 1 1 1 1
..$ : num [1:6] 1 1 1 1 1 1
The data I used
> dput(L)
c(1, 1, 1, 1, 1, 1)
> dput(predictdata)
c(0.0287037037037037, 0.00648148148148148, 0.00925925925925926,
0.0435185185185185, 0.012962962962963, 0.00833333333333333)
Thanks for your help
See this,
hiv.Scatter <- list(predictions = list(predictions = predictdata),
labels = list(labels = L))
for (k in 1:2){
hiv.Scatter[[1]] <- append(hiv.Scatter[[1]],
list(predictions = predictdata))
hiv.Scatter[[2]] <- append(hiv.Scatter[[2]], list(labels = L))
}
OR, this
hiv.Scatter <- list(predictions = list(predictions = predictdata),
labels = list(labels = L))
for (k in 1:2){
hiv.Scatter$predictions <- append(hiv.Scatter$predictions,
list(predictions = predictdata))
hiv.Scatter$labels <- append(hiv.Scatter$labels, list(labels = L))
}
Which seems to give the desired output
str(hiv.Scatter)
# List of 2
# $ predictions:List of 3
# ..$ predictions: num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
# ..$ predictions: num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
# ..$ predictions: num [1:6] 0.0287 0.00648 0.00926 0.04352 0.01296 ...
# $ labels :List of 3
# ..$ labels: num [1:6] 1 1 1 1 1 1
# ..$ labels: num [1:6] 1 1 1 1 1 1
# ..$ labels: num [1:6] 1 1 1 1 1 1
I would like to plot a 3D surface graph like on the figure:
My attempt with the plotly package is below:
library(plotly)
packageVersion("plotly")
# [1] ‘4.5.2’
# random data
a <- 0; s <- c(1:16)
x <- seq(a-3*max(s), a+3*max(s), len=10)
f <- sapply(s, function(ss) dnorm(x, a, ss))
df0=data.frame(x=rep(x,length(s)),
y=rep(s,each=length(x)),
z=f,
col=rep(seq(1,31,2),each=length(x)))
df0 %>% group_by(y) %>%
plot_ly(x = ~x, y = ~y, z = ~f, type = 'scatter3d', mode = 'lines',
line = list(width = 6,color = ~col,colorscale = 'Viridis'))
I have the error message:
Error in function_list[[i]](value) : could not find function "group_by"
The group argument is deprecated and I have not had success with group_by.
Question. How to rewrite the group_by argument?
There is a problem in the construction of the dataset 'df0'. If we look at the
str(df0)
#'data.frame': 160 obs. of 19 variables:
# $ x : num -48 -37.33 -26.67 -16 -5.33 ...
# $ y : int 1 1 1 1 1 1 1 1 1 1 ...
# $ z.1 : num 0.00 8.83e-304 1.53e-155 1.03e-56 2.66e-07 ...
# $ z.2 : num 1.67e-126 4.33e-77 4.97e-40 2.53e-15 5.70e-03 ...
# $ z.3 : num 3.42e-57 3.13e-35 9.26e-19 8.85e-08 2.74e-02 ...
# $ z.4 : num 5.37e-33 1.21e-20 2.23e-11 3.35e-05 4.10e-02 ...
# $ z.5 : num 7.76e-22 6.25e-14 5.31e-08 4.77e-04 4.52e-02 ...
# $ z.6 : num 8.42e-16 2.60e-10 3.42e-06 1.90e-03 4.48e-02 ...
# $ z.7 : num 3.51e-12 3.79e-08 4.02e-05 4.18e-03 4.26e-02 ...
# $ z.8 : num 7.59e-10 9.31e-07 1.93e-04 6.75e-03 3.99e-02 ...
# $ z.9 : num 2.95e-08 8.13e-06 5.50e-04 9.13e-03 3.72e-02 ...
# $ z.10: num 3.96e-07 3.75e-05 1.14e-03 1.11e-02 3.46e-02 ...
# $ z.11: num 2.66e-06 1.14e-04 1.92e-03 1.26e-02 3.22e-02 ...
# $ z.12: num 1.12e-05 2.63e-04 2.81e-03 1.37e-02 3.01e-02 ...
# $ z.13: num 3.36e-05 4.97e-04 3.74e-03 1.44e-02 2.82e-02 ...
# $ z.14: num 7.98e-05 8.14e-04 4.64e-03 1.48e-02 2.65e-02 ...
# $ z.15: num 0.000159 0.001201 0.005477 0.015058 0.024967 ...
# $ z.16: num 0.000277 0.001639 0.006217 0.015123 0.023586 ...
# $ col : num 1 1 1 1 1 1 1 1 1 1 ...
it will be evident. the f returns a matrix and it should be converted to vector to create the 'z'
df0 <- data.frame(x=rep(x,length(s)),
y=rep(s,each=length(x)),
z=c(f), ######
col=rep(seq(1,31,2),each=length(x)))
str(df0)
#'data.frame': 160 obs. of 4 variables:
#$ x : num -48 -37.33 -26.67 -16 -5.33 ...
#$ y : int 1 1 1 1 1 1 1 1 1 1 ...
#$ z : num 0.00 8.83e-304 1.53e-155 1.03e-56 2.66e-07 ...
#$ col: num 1 1 1 1 1 1 1 1 1 1 ...
Another error mentioned is the group_by. If we have loaded
library(dplyr)
that error message would be gone as well.
Here is my dataframe example. It includes a column variable, named "dta" which is a single list of n values I want to keep for each of my scenario:
set.seed(777)
df <- data.frame(theo = numeric(),
size = numeric(),
dta = I(list()))
df[ 1: 5,"theo"] <- qlnorm(0.1, meanlog=0, sdlog=1, lower.tail = TRUE, log.p = FALSE)
df[ 6:10,"theo"] <- qlnorm(0.2, meanlog=0, sdlog=1, lower.tail = TRUE, log.p = FALSE)
df[ 1: 5,"size"] <- 10
df[ 6:10,"size"] <- 20
for(i in 1:10){
df$dta[i] <- list(rlnorm(df$size[i], meanlog = 0, sdlog = 1))
}
df
str(df)
This should give a df like:
theo size dta
1 0.2776062 10 1.631967....
2 0.2776062 10 0.737667....
3 0.2776062 10 0.131252....
4 0.2776062 10 1.937334....
5 0.2776062 10 0.739868....
6 0.4310112 20 4.631176....
7 0.4310112 20 2.610180....
8 0.4310112 20 0.175918....
9 0.4310112 20 3.501670....
10 0.4310112 20 0.588178....
or:
'data.frame': 10 obs. of 4 variables:
$ theo: num 0.278 0.278 0.278 0.278 0.278 ...
$ size: num 10 10 10 10 10 20 20 20 20 20
$ dta :List of 10
..$ : num 1.632 0.671 1.667 0.671 5.148 ...
..$ : num 0.738 1.056 0.152 0.967 10.089 ...
..$ : num 0.131 1.256 0.457 3.574 4.211 ...
..$ : num 1.937 2.359 3.496 0.297 4.587 ...
..$ : num 0.74 0.66 0.481 0.434 1.874 ...
..$ : num 4.631 0.298 10.28 0.933 1.286 ...
..$ : num 2.61 0.472 0.251 1.61 0.303 ...
..$ : num 0.176 0.566 2.156 0.407 3.52 ...
..$ : num 3.502 1.748 1.283 0.648 1.359 ...
..$ : num 0.588 0.392 2.447 1.926 0.86 ...
..- attr(*, "class")= chr "AsIs"
Now, I want to subset that list in such a way that:
for each list, each value is compared with the fixed value "theo" stored in the dataframe
when that value is below or equal to "theo", then recode that value NA
Here is a working code and gives me exactly what I want:
df$dta2 <- df$dta
for(i in 1:10){
df$dta2[[i]] [ df$dta2[[i]] <= df$theo[i] ] <- NA
}
However I was wondering is there is a way to get the same result with a single line of code and no "for loop" to proceed with a conditional replacement of values contained in a list which is nested in a dataframe?
We can use Map
df$dta3 <- Map(function(x,y) replace(x, x<=y, NA), df$dta, df$theo)
all.equal(df$dta2, df$dta3, check.attributes=FALSE)
#[1] TRUE
Trying to use dplyr to group_by the stud_ID variable in the following data frame, as in this SO question:
> str(df)
'data.frame': 4136 obs. of 4 variables:
$ stud_ID : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...
$ behavioral_scale: num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ...
$ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ...
$ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...
I tried the following to obtain scale scores by student (rather than scale scores for observations across all students):
scaled_data <-
df %>%
group_by(stud_ID) %>%
mutate(behavioral_scale_ind = scale(behavioral_scale),
cognitive_scale_ind = scale(cognitive_scale),
affective_scale_ind = scale(affective_scale))
Here is the result:
> str(scaled_data)
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 4136 obs. of 7 variables:
$ stud_ID : chr "ABB112292" "ABB112292" "ABB112292" "ABB112292" ...
$ behavioral_scale : num 3.5 4 3.5 3 3.5 2 NA NA 1 2 ...
$ cognitive_scale : num 3.5 3 3 3 3.5 2 NA NA 1 1 ...
$ affective_scale : num 2.5 3.5 3 3 2.5 2 NA NA 1 1.5 ...
$ behavioral_scale_ind: num [1:12, 1] 0.64 1.174 0.64 0.107 0.64 ...
..- attr(*, "scaled:center")= num 2.9
..- attr(*, "scaled:scale")= num 0.937
$ cognitive_scale_ind : num [1:12, 1] 1.17 0.64 0.64 0.64 1.17 ...
..- attr(*, "scaled:center")= num 2.4
..- attr(*, "scaled:scale")= num 0.937
$ affective_scale_ind : num [1:12, 1] 0 1.28 0.64 0.64 0 ...
..- attr(*, "scaled:center")= num 2.5
..- attr(*, "scaled:scale")= num 0.782
The three scaled variables (behavioral_scale, cognitive_scale, and affective_scale) have only 12 observations - the same number of observations for the first student, ABB112292.
What's going on here? How can I obtain scaled scores by individual?
The problem seems to be in the base scale() function, which expects a matrix. Try writing your own.
scale_this <- function(x){
(x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
}
Then this works:
library("dplyr")
# reproducible sample data
set.seed(123)
n = 1000
df <- data.frame(stud_ID = sample(LETTERS, size=n, replace=TRUE),
behavioral_scale = runif(n, 0, 10),
cognitive_scale = runif(n, 1, 20),
affective_scale = runif(n, 0, 1) )
scaled_data <-
df %>%
group_by(stud_ID) %>%
mutate(behavioral_scale_ind = scale_this(behavioral_scale),
cognitive_scale_ind = scale_this(cognitive_scale),
affective_scale_ind = scale_this(affective_scale))
Or, if you're open to a data.table solution:
library("data.table")
setDT(df)
cols_to_scale <- c("behavioral_scale","cognitive_scale","affective_scale")
df[, lapply(.SD, scale_this), .SDcols = cols_to_scale, keyby = factor(stud_ID)]
This was a known problem in dplyr, a fix has been merged to the development version, which you can install via
# install.packages("devtools")
devtools::install_github("hadley/dplyr")
In the stable version, the following should work, too:
scale_this <- function(x) as.vector(scale(x))
df <- df %>% mutate(across(is.numeric, ~ as.numeric(scale(.))))
I have a function which contains a loop over two lists and builds up some calculated data. I would like to return these data as a lists of lists, indexed by some value, but I'm getting the assignment wrong.
A minimal example of what I'm trying to do, and where i'm going wrong would be:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- append(mybiglist, tmp)
}
If you run this and look at the output mybiglist, you will see that something is going very wrong in the way each item is being named.
Any ideas on how I might achieve what I actually want?
Thanks
ps. I know that in R there is a sense in which one has failed if one has to resort to loops, but in this case I do feel justified ;-)
It works if you don't use the append command:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- tmp
}
# List of 5
# $ item:1:List of 3
# ..$ uniform : num [1:10] 0.737 0.987 0.577 0.814 0.452 ...
# ..$ normal : num [1:16] -0.403 -0.104 2.147 0.32 1.713 ...
# ..$ binomial: num [1:8] 0 0 0 0 1 0 0 1
# $ item:2:List of 3
# ..$ uniform : num [1:10] 0.61 0.62 0.49 0.217 0.862 ...
# ..$ normal : num [1:16] 0.945 -0.154 -0.5 -0.729 -0.547 ...
# ..$ binomial: num [1:8] 1 2 2 0 2 1 0 2
# $ item:3:List of 3
# ..$ uniform : num [1:10] 0.66 0.094 0.432 0.634 0.949 ...
# ..$ normal : num [1:16] -0.607 0.274 -1.455 0.828 -0.73 ...
# ..$ binomial: num [1:8] 2 2 3 1 1 1 2 0
# $ item:4:List of 3
# ..$ uniform : num [1:10] 0.455 0.442 0.149 0.745 0.24 ...
# ..$ normal : num [1:16] 0.0994 -0.5332 -0.8131 -1.1847 -0.8032 ...
# ..$ binomial: num [1:8] 2 3 1 1 2 2 2 1
# $ item:5:List of 3
# ..$ uniform : num [1:10] 0.816 0.279 0.583 0.179 0.321 ...
# ..$ normal : num [1:16] -0.036 1.137 0.178 0.29 1.266 ...
# ..$ binomial: num [1:8] 3 4 3 4 4 2 2 3
Change
mybiglist[[name]] <- append(mybiglist, tmp)
to
mybiglist[[name]] <- tmp
To show that an explicit for loop is not required
unif_norm <- replicate(5, list(uniform = runif(10),
normal = rnorm(16)), simplify=F)
binomials <- lapply(seq_len(5)/10, function(prob) {
list(binomial = rbinom(n = 5 ,size = 8, prob = prob))})
biglist <- setNames(mapply(c, unif_norm, binomials, SIMPLIFY = F),
paste0('item:',seq_along(unif_norm)))
In general if you go down the for loop path it is better to preassign the list beforehand. This is more memory efficient.
mybiglist <- vector('list', 5)
names(mybiglist) <- paste0('item:', seq_along(mybiglist))
for(i in seq_along(mybiglist)){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[i]] <- tmp
}