Separating Parameters in Repeated Function Calls - r

With a vector of values, I want each value to be called on a function
values = 1:10
rnorm(100, mean=values, sd=1)
mean = values repeats the sequence (1,2,3,4,5,6,7,8,9,10). How can I get a matrix, each with 100 observations and using a single element from my vector?
ie:
rnorm(100, mean=1, sd=1)
rnorm(100, mean=2, sd=1)
rnorm(100, mean=3, sd=1)
rnorm(100, mean=4, sd=1)
# ...

It's not clear from your question, but I took it that you wanted a single matrix with 10 rows and 100 columns. That being the case you can do:
matrix(rnorm(1000, rep(1:10, each = 100)), nrow = 10, byrow = TRUE)
Or modify akrun's answer by using sapply instead of lapply

An option is lapply from base R
lapply(1:10, function(i) rnorm(100, mean = i, sd = 1))

Or Map from base R:
Map(function(i) rnorm(100, mean = i, sd = 1), 1:10)

Using map I can apply a function for each value from the vector values
library(purrr)
values = 1:10
map_dfc(
.x = values,
.f = ~rnorm(100,mean = .x,sd = 1)
)
In this case I will have a data.frame 100x10

Related

How do I generate 5000 synthetic data sets in R with 1000 observations in each that are gaussian;

for each I need to For each data set, set σ 2 = 10 and µj = j, where j = 1, . . . , 5, 000 is the index of a data set.
We can use lapply to loop through 1 to 5000 and design a simple function to apply the data to the rnorm function.
lapply(1:5000, function(x) rnorm(n = 1000, mean = x, sd = sqrt(10)))
You can use purrr::map().
map(1:5000, ~ rnorm(n = 10000, mean = .x, sd = 10))
If you want to iterate over two different arguments to rnorm:
n_arg <- c(rep(10000, 2500), rep(20000, 2500))
map2(1:5000, n_arg, ~ rnorm(n = .y, mean = .x, sd = 10))

Function that will generate iter samples of size n from a gamma distribution with shape parameter alpha and rate parameter beta

The function needs to return the mean and standard deviation of each sample.
This is what I have:
sample_gamma <- function(alpha, beta, n, iter) {
mean = alpha/beta
var = alpha/(beta)^2
sd = sqrt(var)
gamma = rgamma(n,shape = alpha, scale = 1/beta)
sample_gamma = data.frame(mean = replicate(n = iter, expr = mean))
}
I'm very lost for this. I also need to create a data frame for this function.
Thank you for your time.
Edit:
sample_gamma <- function(alpha, beta, n, iter) {
output <- rgamma(iter, alpha, 1/beta)
output_1 <- matrix(output, ncol = iter)
means <- apply(output_1, 2, mean)
sds <- apply(output_1, 2, sd)
mystats <- data.frame(means, sds)
return(mystats)
}
This works except for the sds. It's returning NAs.
It's not really clear to me what you want. But say you want to create 10 samples of size 1000, alpha = 1, beta = 2. Then you can create a single stream of rgamma realizations, dimension them into a matrix, then get your stats with apply, and finally create a data frame with those vectors:
output <- rgamma(10*1000, 1, 1/2)
output <- matrix(output, ncol = 10)
means <- apply(output, 2, mean)
sds <- apply(output, 2, sd)
mystats <- data.frame(means, sds)
You could wrap your function around that code, replacing the hard values with parameters.

Stacking lapply results

I am using the following code to generate data, and i am estimating regression models across a list of variables (covar1 and covar2). I have also created confidence intervals for the coefficients and merged them together.
I have been examining all sorts of examples here and on other sites, but i can't seem to accomplish what i want. I want to stack the results for each covar into a single data frame, labeling each cluster of results by the covar it is attributable to (i.e., "covar1" and "covar2"). Here is the code for generating data and results using lapply:
##creating a fake dataset (N=1000, 500 at treated, 500 at control group)
#outcome variable
outcome <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 70, sd = 10))
#running variable
running.var <- seq(0, 1, by = .0001)
running.var <- sample(running.var, size = 1000, replace = T)
##Put negative values for the running variable in the control group
running.var[1:500] <- -running.var[1:500]
#treatment indicator (just a binary variable indicating treated and control groups)
treat.ind <- c(rep(0,500), rep(1,500))
#create covariates
set.seed(123)
covar1 <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 50, sd = 20))
covar2 <- c(rnorm(500, mean = 10, sd = 20), rnorm(500, mean = 10, sd = 30))
data <- data.frame(cbind(outcome, running.var, treat.ind, covar1, covar2))
data$treat.ind <- as.factor(data$treat.ind)
#Bundle the covariates names together
covars <- c("covar1", "covar2")
#loop over them using a convenient feature of the "as.formula" function
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = d)
ci <-confint(regres, level=0.95)
regres_ci <- cbind(summary(regres)$coefficient, ci)
})
names(models) <- covars
print(models)
Any nudge in the right direction, or link to a post i just haven't come across, is greatly appreciated.
You can use do.call were de second argument is a list (like in here):
do.call(rbind, models)
I made a (possible) improve to your lapply function. This way you can save the estimated parameters and the variables in a data.frame:
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = data)
ci <-confint(regres, level=0.95)
regres_ci <- data.frame(covar=x,param=rownames(summary(regres)$coefficient),
summary(regres)$coefficient, ci)
})
do.call(rbind,models)

Using Reduce() to calculate percentiles or variance in R

In the same way that I calculate the average of each position in the parallel vectors combined in the list, I would like to look for percentiles (0.05 and 0.95), variance or standard error.
LOC_GI_1950a <- rnorm(100,5,2)
LOC_GI_1951a <- rnorm(100,7,3)
LOC_GI_1952a <- rnorm(100,1,2)
LOC_GI_1953a <- rnorm(100,2,3)
LOC_GI_1954a <- rnorm(100,5,2)
LOC_GI_1955a <- rnorm(100,7,3)
LOC_GI_1956a <- rnorm(100,8,2)
LOC_GI_1957a <- rnorm(100,2,5)
LOC_GI_1958a <- rnorm(100,5,1)
LOC_GI_1959a <- rnorm(100,7,1)
LOC_GI_1960a <- rnorm(100,1,2)
LOC_GI_1961a <- rnorm(100,6,3)
LOC_GI_Annuala <- list(LOC_GI_1950a,LOC_GI_1951a,LOC_GI_1952a,LOC_GI_1953a,LOC_GI_1954a,
LOC_GI_1955a,LOC_GI_1956a,LOC_GI_1957a,LOC_GI_1958a,LOC_GI_1959a,
LOC_GI_1960a,LOC_GI_1961a)
LOC_GI_AnnualAvga <- Reduce("+",LOC_GI_Annuala)/length(LOC_GI_Annuala)
We can convert the list to an array and then use apply procedures to get the mean, var, etc. of each corresponding element
apply(array(unlist(v1), c(10, 10, 12)), c(1,2), mean)
apply(array(unlist(v1), c(10, 10, 12)), c(1,2), var)
As #RuiBarradas mentioned, the quantile can be used with apply
c(apply(array(unlist(v1), c(10, 10, 12)), c(1,2), quantile, probs = 0.95))

Histogram of sum instead of frequency - R

I want to plot an histogram where the y-axis represent the sum of a column.
I found this example for categorical data:
R histogram that sums rather than frequency.
However, this is not what I am looking for, as it does not apply for continuous data, where I would have to define the bins.
Let's say I have x and y:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1),
x = rpois(100, 15) * 10)
A traditional histogram will be like:
hist (mydata$x)
Now how can I get the cumulative sum of y in the y-axis?
This is one way to solve this problem that leverages the hist() function for most of the heavy lifting, and has the advantage that the barplot of the cumulative sum of y matches the bins and dimensions of the histogram of x:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
mx <- mydata$x
my <- mydata$y
h <- hist(mydata$x)
breaks <- data.frame(
"beg"=h$breaks[-length(h$breaks)],
"end"=h$breaks[-1]
)
sums <- apply(breaks, MARGIN=1, FUN=function(x) { sum(my[ mx >= x[1] & mx < x[2] ]) })
h$counts <- sums
plot(h, ylab="Sum", main="Sum of y Within x Bins")
Summarizing all comments, this is what I wanted to have. Thanks #Alex A.
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
a <- aggregate(mydata$y, by=list(bin=cut(mydata$x, nclass.Sturges(mydata$x))), FUN=sum)
a$bin<- gsub (']','',as.character (a$bin))
a$bin<- gsub (',',' ',as.character (a$bin))
ab2=sapply(strsplit(as.character(a$bin), " "), "[", 2)
barplot(a$x, names.arg=ab2)

Resources