With a vector of values, I want each value to be called on a function
values = 1:10
rnorm(100, mean=values, sd=1)
mean = values repeats the sequence (1,2,3,4,5,6,7,8,9,10). How can I get a matrix, each with 100 observations and using a single element from my vector?
ie:
rnorm(100, mean=1, sd=1)
rnorm(100, mean=2, sd=1)
rnorm(100, mean=3, sd=1)
rnorm(100, mean=4, sd=1)
# ...
It's not clear from your question, but I took it that you wanted a single matrix with 10 rows and 100 columns. That being the case you can do:
matrix(rnorm(1000, rep(1:10, each = 100)), nrow = 10, byrow = TRUE)
Or modify akrun's answer by using sapply instead of lapply
An option is lapply from base R
lapply(1:10, function(i) rnorm(100, mean = i, sd = 1))
Or Map from base R:
Map(function(i) rnorm(100, mean = i, sd = 1), 1:10)
Using map I can apply a function for each value from the vector values
library(purrr)
values = 1:10
map_dfc(
.x = values,
.f = ~rnorm(100,mean = .x,sd = 1)
)
In this case I will have a data.frame 100x10
Related
for each I need to For each data set, set σ 2 = 10 and µj = j, where j = 1, . . . , 5, 000 is the index of a data set.
We can use lapply to loop through 1 to 5000 and design a simple function to apply the data to the rnorm function.
lapply(1:5000, function(x) rnorm(n = 1000, mean = x, sd = sqrt(10)))
You can use purrr::map().
map(1:5000, ~ rnorm(n = 10000, mean = .x, sd = 10))
If you want to iterate over two different arguments to rnorm:
n_arg <- c(rep(10000, 2500), rep(20000, 2500))
map2(1:5000, n_arg, ~ rnorm(n = .y, mean = .x, sd = 10))
The function needs to return the mean and standard deviation of each sample.
This is what I have:
sample_gamma <- function(alpha, beta, n, iter) {
mean = alpha/beta
var = alpha/(beta)^2
sd = sqrt(var)
gamma = rgamma(n,shape = alpha, scale = 1/beta)
sample_gamma = data.frame(mean = replicate(n = iter, expr = mean))
}
I'm very lost for this. I also need to create a data frame for this function.
Thank you for your time.
Edit:
sample_gamma <- function(alpha, beta, n, iter) {
output <- rgamma(iter, alpha, 1/beta)
output_1 <- matrix(output, ncol = iter)
means <- apply(output_1, 2, mean)
sds <- apply(output_1, 2, sd)
mystats <- data.frame(means, sds)
return(mystats)
}
This works except for the sds. It's returning NAs.
It's not really clear to me what you want. But say you want to create 10 samples of size 1000, alpha = 1, beta = 2. Then you can create a single stream of rgamma realizations, dimension them into a matrix, then get your stats with apply, and finally create a data frame with those vectors:
output <- rgamma(10*1000, 1, 1/2)
output <- matrix(output, ncol = 10)
means <- apply(output, 2, mean)
sds <- apply(output, 2, sd)
mystats <- data.frame(means, sds)
You could wrap your function around that code, replacing the hard values with parameters.
I am using the following code to generate data, and i am estimating regression models across a list of variables (covar1 and covar2). I have also created confidence intervals for the coefficients and merged them together.
I have been examining all sorts of examples here and on other sites, but i can't seem to accomplish what i want. I want to stack the results for each covar into a single data frame, labeling each cluster of results by the covar it is attributable to (i.e., "covar1" and "covar2"). Here is the code for generating data and results using lapply:
##creating a fake dataset (N=1000, 500 at treated, 500 at control group)
#outcome variable
outcome <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 70, sd = 10))
#running variable
running.var <- seq(0, 1, by = .0001)
running.var <- sample(running.var, size = 1000, replace = T)
##Put negative values for the running variable in the control group
running.var[1:500] <- -running.var[1:500]
#treatment indicator (just a binary variable indicating treated and control groups)
treat.ind <- c(rep(0,500), rep(1,500))
#create covariates
set.seed(123)
covar1 <- c(rnorm(500, mean = 50, sd = 10), rnorm(500, mean = 50, sd = 20))
covar2 <- c(rnorm(500, mean = 10, sd = 20), rnorm(500, mean = 10, sd = 30))
data <- data.frame(cbind(outcome, running.var, treat.ind, covar1, covar2))
data$treat.ind <- as.factor(data$treat.ind)
#Bundle the covariates names together
covars <- c("covar1", "covar2")
#loop over them using a convenient feature of the "as.formula" function
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = d)
ci <-confint(regres, level=0.95)
regres_ci <- cbind(summary(regres)$coefficient, ci)
})
names(models) <- covars
print(models)
Any nudge in the right direction, or link to a post i just haven't come across, is greatly appreciated.
You can use do.call were de second argument is a list (like in here):
do.call(rbind, models)
I made a (possible) improve to your lapply function. This way you can save the estimated parameters and the variables in a data.frame:
models <- lapply(covars, function(x){
regres <- lm(as.formula(paste(x," ~ running.var + treat.ind",sep = "")), data = data)
ci <-confint(regres, level=0.95)
regres_ci <- data.frame(covar=x,param=rownames(summary(regres)$coefficient),
summary(regres)$coefficient, ci)
})
do.call(rbind,models)
In the same way that I calculate the average of each position in the parallel vectors combined in the list, I would like to look for percentiles (0.05 and 0.95), variance or standard error.
LOC_GI_1950a <- rnorm(100,5,2)
LOC_GI_1951a <- rnorm(100,7,3)
LOC_GI_1952a <- rnorm(100,1,2)
LOC_GI_1953a <- rnorm(100,2,3)
LOC_GI_1954a <- rnorm(100,5,2)
LOC_GI_1955a <- rnorm(100,7,3)
LOC_GI_1956a <- rnorm(100,8,2)
LOC_GI_1957a <- rnorm(100,2,5)
LOC_GI_1958a <- rnorm(100,5,1)
LOC_GI_1959a <- rnorm(100,7,1)
LOC_GI_1960a <- rnorm(100,1,2)
LOC_GI_1961a <- rnorm(100,6,3)
LOC_GI_Annuala <- list(LOC_GI_1950a,LOC_GI_1951a,LOC_GI_1952a,LOC_GI_1953a,LOC_GI_1954a,
LOC_GI_1955a,LOC_GI_1956a,LOC_GI_1957a,LOC_GI_1958a,LOC_GI_1959a,
LOC_GI_1960a,LOC_GI_1961a)
LOC_GI_AnnualAvga <- Reduce("+",LOC_GI_Annuala)/length(LOC_GI_Annuala)
We can convert the list to an array and then use apply procedures to get the mean, var, etc. of each corresponding element
apply(array(unlist(v1), c(10, 10, 12)), c(1,2), mean)
apply(array(unlist(v1), c(10, 10, 12)), c(1,2), var)
As #RuiBarradas mentioned, the quantile can be used with apply
c(apply(array(unlist(v1), c(10, 10, 12)), c(1,2), quantile, probs = 0.95))
I want to plot an histogram where the y-axis represent the sum of a column.
I found this example for categorical data:
R histogram that sums rather than frequency.
However, this is not what I am looking for, as it does not apply for continuous data, where I would have to define the bins.
Let's say I have x and y:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1),
x = rpois(100, 15) * 10)
A traditional histogram will be like:
hist (mydata$x)
Now how can I get the cumulative sum of y in the y-axis?
This is one way to solve this problem that leverages the hist() function for most of the heavy lifting, and has the advantage that the barplot of the cumulative sum of y matches the bins and dimensions of the histogram of x:
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
mx <- mydata$x
my <- mydata$y
h <- hist(mydata$x)
breaks <- data.frame(
"beg"=h$breaks[-length(h$breaks)],
"end"=h$breaks[-1]
)
sums <- apply(breaks, MARGIN=1, FUN=function(x) { sum(my[ mx >= x[1] & mx < x[2] ]) })
h$counts <- sums
plot(h, ylab="Sum", main="Sum of y Within x Bins")
Summarizing all comments, this is what I wanted to have. Thanks #Alex A.
set.seed(1)
mydata <- data.frame(y = runif (100, min= 0, max = 1), x = rpois(100, 15) * 10)
a <- aggregate(mydata$y, by=list(bin=cut(mydata$x, nclass.Sturges(mydata$x))), FUN=sum)
a$bin<- gsub (']','',as.character (a$bin))
a$bin<- gsub (',',' ',as.character (a$bin))
ab2=sapply(strsplit(as.character(a$bin), " "), "[", 2)
barplot(a$x, names.arg=ab2)