Using Reduce() to calculate percentiles or variance in R - r

In the same way that I calculate the average of each position in the parallel vectors combined in the list, I would like to look for percentiles (0.05 and 0.95), variance or standard error.
LOC_GI_1950a <- rnorm(100,5,2)
LOC_GI_1951a <- rnorm(100,7,3)
LOC_GI_1952a <- rnorm(100,1,2)
LOC_GI_1953a <- rnorm(100,2,3)
LOC_GI_1954a <- rnorm(100,5,2)
LOC_GI_1955a <- rnorm(100,7,3)
LOC_GI_1956a <- rnorm(100,8,2)
LOC_GI_1957a <- rnorm(100,2,5)
LOC_GI_1958a <- rnorm(100,5,1)
LOC_GI_1959a <- rnorm(100,7,1)
LOC_GI_1960a <- rnorm(100,1,2)
LOC_GI_1961a <- rnorm(100,6,3)
LOC_GI_Annuala <- list(LOC_GI_1950a,LOC_GI_1951a,LOC_GI_1952a,LOC_GI_1953a,LOC_GI_1954a,
LOC_GI_1955a,LOC_GI_1956a,LOC_GI_1957a,LOC_GI_1958a,LOC_GI_1959a,
LOC_GI_1960a,LOC_GI_1961a)
LOC_GI_AnnualAvga <- Reduce("+",LOC_GI_Annuala)/length(LOC_GI_Annuala)

We can convert the list to an array and then use apply procedures to get the mean, var, etc. of each corresponding element
apply(array(unlist(v1), c(10, 10, 12)), c(1,2), mean)
apply(array(unlist(v1), c(10, 10, 12)), c(1,2), var)
As #RuiBarradas mentioned, the quantile can be used with apply
c(apply(array(unlist(v1), c(10, 10, 12)), c(1,2), quantile, probs = 0.95))

Related

Separating Parameters in Repeated Function Calls

With a vector of values, I want each value to be called on a function
values = 1:10
rnorm(100, mean=values, sd=1)
mean = values repeats the sequence (1,2,3,4,5,6,7,8,9,10). How can I get a matrix, each with 100 observations and using a single element from my vector?
ie:
rnorm(100, mean=1, sd=1)
rnorm(100, mean=2, sd=1)
rnorm(100, mean=3, sd=1)
rnorm(100, mean=4, sd=1)
# ...
It's not clear from your question, but I took it that you wanted a single matrix with 10 rows and 100 columns. That being the case you can do:
matrix(rnorm(1000, rep(1:10, each = 100)), nrow = 10, byrow = TRUE)
Or modify akrun's answer by using sapply instead of lapply
An option is lapply from base R
lapply(1:10, function(i) rnorm(100, mean = i, sd = 1))
Or Map from base R:
Map(function(i) rnorm(100, mean = i, sd = 1), 1:10)
Using map I can apply a function for each value from the vector values
library(purrr)
values = 1:10
map_dfc(
.x = values,
.f = ~rnorm(100,mean = .x,sd = 1)
)
In this case I will have a data.frame 100x10

Stacking lapply results

I am using the following code to generate data, and i am estimating regression models across a list of variables (covar1 and covar2). I have also created confidence intervals for the coefficients and merged them together.
I have been examining all sorts of examples here and on other sites, but i can't seem to accomplish what i want. I want to stack the results for each covar into a single data frame, labeling each cluster of results by the covar it is attributable to (i.e., "covar1" and "covar2"). Here is the code for generating data and results using lapply:
##creating a fake dataset (N=1000, 500 at treated, 500 at control group)
#outcome variable
outcome <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 70, sd = 10))
#running variable
running.var <- seq(0, 1, by = .0001)
running.var <- sample(running.var, size = 1000, replace = T)
##Put negative values for the running variable in the control group
running.var[1:500] <- -running.var[1:500]
#treatment indicator (just a binary variable indicating treated and control groups)
treat.ind <- c(rep(0,500), rep(1,500))
#create covariates
set.seed(123)
covar1 <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 50, sd = 20))
covar2 <- c(rnorm(500, mean = 10, sd = 20), rnorm(500, mean = 10, sd = 30))
data <- data.frame(cbind(outcome, running.var, treat.ind, covar1, covar2))
data$treat.ind <- as.factor(data$treat.ind)
#Bundle the covariates names together
covars <- c("covar1", "covar2")
#loop over them using a convenient feature of the "as.formula" function
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = d)
ci <-confint(regres, level=0.95)
regres_ci <- cbind(summary(regres)$coefficient, ci)
})
names(models) <- covars
print(models)
Any nudge in the right direction, or link to a post i just haven't come across, is greatly appreciated.
You can use do.call were de second argument is a list (like in here):
do.call(rbind, models)
I made a (possible) improve to your lapply function. This way you can save the estimated parameters and the variables in a data.frame:
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = data)
ci <-confint(regres, level=0.95)
regres_ci <- data.frame(covar=x,param=rownames(summary(regres)$coefficient),
summary(regres)$coefficient, ci)
})
do.call(rbind,models)

Monte Carlo Simulations in list-columns in R and purrr

I have the following single case Monte Carlo simulation:
runs <- 100000
sim <- rnorm(n=runs,mean = 0,sd=1)
summary(sim)
But I would like to do the above with purrr in a list column. For example if I had the following data.
a <- tribble(
~group, ~mean, ~sd,~n,
1, 10, 5,1e5,
2, 20, 6,1e5,
3, 30, 7,1e5)
How can I produce another column of say 1e5 rnorms per group. In the end I may want to make histograms or summary statistics off of that list column.
I have tried the following.
a %>%
pmap_df(rnorm)
But got the following error, which I did not understand.
Error in .f(group = .l[[c(1L, i)]], mean = .l[[c(2L, i)]], sd = .l[[c(3L, : unused argument (group = .l[[c(1, i)]])
EDIT:
For some additional clarity I have mean successful at implementing what I want in lists, but not in data frames as follows:
mu <- list(10, 20, 30)
sigma <- list(5, 6, 7)
n <- list(1e5, 1e5, 1e5)
args2 <- list(mean = mu, sd = sigma, n = n)
args2 %>%
pmap(rnorm) %>% map(quantile)
where quantile is just an example of a function, maybe I would choose to do mean or sd at a latter date.

Using for loop to perform actions separately for different categories in R

What is the best way to calculate values such as mean and standard deviation for each column in a data frame?
For example, if I have a data frame:
s <- data.frame(
sample = c("s_1", "s_2", "s_3", "s_4", "s_5", "s_6", "s_7", "s_8"),
flavor = c("original", "chicken", "original", "original", "cheese", "chicken", "cheese", "original"),
age = c(23, 25, 11, 5, 6, 44, 50, 2),
scale = c( 4, 3, 2, 5, 4, 3, 1, 5))
How do I use the for loop to find the mean and sd values of only one of the columns (for example, age) based on another column (for example, flavor)
I've got the code for finding the mean and standard deviations individually but was wondering if there was a way to use loops instead.
print(paste("mean =",
mean(s[s$flavor == "original", "age"]),
"sd =",
sd(s[s$flavor == "original", "age"])))
If we need a for loop, then loop through the unique elements of 'flavor', subset the 'age' based on the the values of 'flavor' and get the mean and sd` for each category to be included in a vector 'v1'
v1 <- c()
for(un1 in unique(s$flavor)){
tmp <- s$age[s$flavor == un1]
v1 <- c(v1, paste("mean =", mean(tmp), "sd =", sd(tmp)))
}
v1
#[1] "mean = 10.25 sd = 9.28708781050335" "mean = 34.5 sd = 13.4350288425444"
#[3] "mean = 28 sd = 31.1126983722081"
Instead of creating a NULL vector initially (v1 <- c()), we can also pre-allocate a vector of length equal to the length of unique elements in 'flavor' (should be more efficient than the above)
v1 <- numeric(length(unique(s$flavor)))
Inside the loop change the 'v1 <-' to
nm1 <- unique(s$flavor)
for(i in seq_along(unique(s$flavor))){
tmp <- s$age[s$flavor == nm1[i] ]
v1[i] <-paste("mean =", mean(tmp), "sd =", sd(tmp))
}
But this can be done as a group by operation with base R
do.call(data.frame, aggregate(age~flavor, s, FUN = function(x) c(Mean = mean(x), SD= sd(x))))
Or a more efficient approach with data.table
library(data.table)
setDT(s)[, paste("mean =", mean(age), "sd =", sd(age)), flavor]$V1
Since loops are not efficient, you may use dplyr like Patronus suggested OR use plyr as follows:
require(plyr)
s.summary <- ddply(s, c("flavor"), summarise,
N= length(age),
mean= round(mean(age),2),
sd= round(sd(age),2),
se = round(sd/sqrt(N),2)
)
s.summary

Quantiles by factor levels in R

I have a data frame and I'm trying to create a new variable in the data frame that has the quantiles of a continuous variable var1, for each level of a factor strata.
# some data
set.seed(472)
dat <- data.frame(var1 = rnorm(50, 10, 3)^2,
strata = factor(sample(LETTERS[1:5], size = 50, replace = TRUE))
)
# function to get quantiles
qfun <- function(x, q = 5) {
quantile <- cut(x, breaks = quantile(x, probs = 0:q/q),
include.lowest = TRUE, labels = 1:q)
quantile
}
I tried using two methods, neither of which produce a usable result. Firstly, I tried using aggregate to apply qfun to each level of strata:
qdat <- with(dat, aggregate(var1, list(strata), FUN = qfun))
This returns the quantiles by factor level, but the output is hard to coerce back into a data frame (e.g., using unlist does not line the new variable values up with the correct rows in the data frame).
A second approach was to do this in steps:
tmp1 <- with(dat, split(var1, strata))
tmp2 <- lapply(tmp1, qfun)
tmp3 <- unlist(tmp2)
dat$quintiles <- tmp3
Again, this calculates the quantiles correctly for each factor level, but obviously, as with aggregate they aren't in the correct order in the data frame. We can check this by putting the quantile "bins" into the data frame.
# get quantile bins
qfun2 <- function(x, q = 5) {
quantile <- cut(x, breaks = quantile(x, probs = 0:q/q),
include.lowest = TRUE)
quantile
}
tmp11 <- with(dat, split(var1, strata))
tmp22 <- lapply(tmp11, qfun2)
tmp33 <- unlist(tmp22)
dat$quintiles2 <- tmp33
Many of the values of var1 are outside of the bins of quantile2. I feel like i'm missing something simple. Any suggestions would be greatly appreciated.
I think your issue is that you don't really want to aggregate, but use ave, (or data.table or plyr)
qdat <- transform(dat, qq = ave(var1, strata, FUN = qfun))
#using plyr
library(plyr)
qdat <- ddply(dat, .(strata), mutate, qq = qfun(var1))
#using data.table (my preference)
dat[, qq := qfun(var1), by = strata]
Aggregate usually implies returning an object that is smaller that the original. (inthis case you were getting a data.frame where x was a list of 1 element for each strata.
Use ave on your dat data frame. Full example with your simulated data and qfun function:
# some data
set.seed(472)
dat <- data.frame(var1 = rnorm(50, 10, 3)^2,
strata = factor(sample(LETTERS[1:5], size = 50, replace = TRUE))
)
# function to get quantiles
qfun <- function(x, q = 5) {
quantile <- cut(x, breaks = quantile(x, probs = 0:q/q),
include.lowest = TRUE, labels = 1:q)
quantile
}
And my addition...
dat$q <- ave(dat$var1,dat$strata,FUN=qfun)

Resources