Loop over to create new variables from uniform dataframe - r

My problem is the following
I want to create variables e_1, e_2, e_3, ... , e_50 which are all composed of 100 draws from the uniform[-1,1]
This means e_1 is a vector of 100 draws from U[-1.1], e_2, .., e_50 as well.
Here is what I thought I could do :
periods <- c(1:50)
people <- c(1:100)
for (t in periods){
sprint('e_', t) <- runif(100, -1,1)
}
This did not work, and i am really not sure how to change it to obtain what I want.
Thank you so much for your help!!

It is better not to create objects in the global environment. Regarding the issue in the code, the assignment should be based on assign
for(t in periods) {
assign(sprintf('e_%d', t), runif(100, -1, 1))
}
An approach that wouldn't create multiple objects in the global env, would be to create a list with replicate
lst1 <- replicate(length(periods), runif(100, -1, 1), simplify = FALSE)
names(lst1) <- sprintf('e_%d', periods)

Related

Change title: mcmc_trace function with ggplot

I used mcmc_trace function from the bayesplot package to plot traceplot with mcmc list, which is a ggplot item so it can be further edited by ggplot function.
Follows is the plot that produced by the function. I needed to change the title k1...k[20] to subject 1... subject 20. Are there any approaches I can achieve this with ggplot function?
Follows is a simple reproducible model.
library (r2jags)
library (bayesplot)
library (ggplot2)
# data
dlist <- list(
NSubjects = 20,
k = rep (5,20),
n = rep (10,20)
)
# monitor
parameter <- 'theta'
# model
minimodel <- function(){
for (i in 1:NSubjects){
theta [i] ~ dbeta (1,1)
k[i] ~ dbin(theta[i],n[i])
}
}
samples <- jags(dlist, inits=NULL, parameter,
model.file = minimodel,
n.chains=1, n.iter=10, n.burnin=1, n.thin=1, DIC=T)
# mcmc list
codaSamples = as.mcmc.list(samples$BUGSoutput)
# select subjects
colstheta <- sprintf("theta[%d]",1:20)
# plot (here is where I need to change title, in this example: theta[1]...theta[20] to subject [1].. subject [20]
mcmc_trace(codaSamples[,colstheta]) +
labs (x='Iteration',y='theta value',
title='Traceplot - theta')
Use colnames<- to modify the column names. Since the object is a 1-element list containing a matrix-like object, you need to use [[1]]; if you have multiple chains you'll need to lapply() (or use a for loop) to apply the solution to every chain (i.e., every element in the list).
cc <- codaSamples[,colstheta]
colnames(cc[[1]]) <- gsub("theta\\[([0-9]+)\\]","subject \\1",colnames(cc[[1]]))
mcmc_trace(cc, ...)
The code above finds the numerical element in each name and inserts it into the new name; since you happen to know in this case that these are elements 1:20, you could simplify considerably, e.g.
colnames(cc[[1]]) <- paste("subject",seq(ncol(cc[[1]])))

Unpacking lists programmatically in R

I am prototyping an application in R. I'm using the parallel library and parApply to run a function on columns of a data frame. I understand this will also be applicable to non-parallel/Apply application as well. I have a line similar to:
myBigList <- parApply(myCluster, myInputData, 2, myFunction)
where myFunction is a one that I have written, takes a vector as an input. The function itself performs quite a few operations that I can't go in to. It returns a list of variables of various classes. For the purposes of a MWE, say:
myFunction <- function(vectorIn){
# CODE GOES HERE
return(list(
mean = mean(vectorIn),
sd = mean(vectorIn),
vectorOut = sumUserFunction(vectorIn),
plot1 = aPlotGeneratingFunction(vectorIn),
))
What is returned to me is a list containing the results from the function. I can address elements from the list, eg:
myBigList$Column1$mean
But that isnt really helpful for my purposes. I'd like to know how to unpack the list so that I can look at all the mean values. eg:
listOfMeans <- myBigList$*ALL_ITEMS*$mean
so that listOfMeans is a vector with row.names, or data.frame with col.names.
Is this possible? I can think of a solution using a for loop but that doesnt seem very elegant.
I'd also like to do something similiar with the plots that I return so that I can automatically build a pdf containing all of them. I'm guessing learning the above will help.
tl;dr: What is the best methods of extracting common data names from a list?
EDIT: An actual MWE
library('ggplot2')
exampleData <- data.frame(Col1 = rnorm(100), Col2 = rnorm(100), Col3 = rnorm(100))
myFunction <- function(xIn){
meanX <- mean(xIn)
sdX <- sd(xIn)
vecX <- xIn^2 + xIn
plotX <-
ggplot(data.frame(xIn, vecX), aes(x = xIn, y = vecX)) +
geom_point()
return(list(
mean = meanX,
sd = sdX,
vect = vecX,
plot = plotX
))
}
myBigList <- apply(exampleData,
2,
myFunction)
from #docendo discusimus comment
mymeans <- sapply(myBigList, '[[', 'mean')
returns a vector of all the values stores in mean. To return a list, which is useful for storing the plot class the command should be:
myplots <- lapply(myBigList, '[[', 'plot')

Add elements to a previous subplot within an active base R graphics device?

Let's say I generate 9 groups of data in a list data and plot them each with a for loop. I could use *apply here too, whichever you prefer.
data = list()
layout(mat = matrix(1:9, nrow = 3))
for(i in 1:9){
data[[i]] = rnorm(n = 100, mean = i, sd = 1)
plot(data[[i]])
}
After creating all the data, I want to decide which one is best:
best_data = which.min(sapply(data, sd))
Now I want to highlight that best data on the plot to distinguish it. Is there a plotting function that lets me go back to a specified sub-plot in the active device and add an element (maybe a title)?
I know I could make a second for loop: for loop 1 generates the data, then I assess which is best, then for loop 2 creates the plots, but this seems less efficient and more verbose.
Does such a plotting function exist for base R graphics?
#rawr's answer is simple and easy. But I thought I'd point out another option that allows you to select the "best" data set before you plot, in case you want more flexibility to plot the "best" data set differently from the rest.
For example:
# Create the data
data = lapply(1:9, function(i) rnorm(n = 100, mean = i, sd = 1))
par(mar=c(4,4,1,1))
layout(mat = matrix(1:9, nrow = 3))
rng = range(data)
# Plot each data frame
lapply(1:9, function(i) {
# Select data frame with lowest SD
best = which.min(sapply(data, sd))
# Highlight data frame with lowest SD by coloring points red
plot(data[[i]], col=ifelse(best==i,"red","black"), pch=ifelse(best==i, 3, 1), ylim=rng)
})

Two lists in apply family

I am trying to extract two regions ex1 and ex2 (in exlist) from a list of rasters (rasterlist) using the apply family and extract from the raster package. I could use a nested for loop but was wondering if there is a way to achieve this in with one of the apply family members, since nested for loops are considered more or less bad practice in R. Here the dummy code:
library(raster)
ras1 <- raster(matrix(runif(20), nrow = 5, ncol = 5))
ras2 <- ras1 * 2
ras3 <- ras1 * 0.5
rasterlist <- list(ras1, ras2, ras3)
ex1 <- extent(0, 0.4, 0, 0.4)
ex2 <- extent(0.6, 1, 0.4, 1)
exlist <- list(ex1, ex2)
At the moment I've got this as a (rather unsatisfying) solution:
out1 <- lapply(rasterlist, function(i) extract(i, ex1))
out2 <- lapply(rasterlist, function(i) extract(i, ex2))
N.B. The solution it does not need to be a member of the apply family (although that was the task I set myself) if there is a better, faster, more elegant way please share.
You could start with combining the regions into a single SpatialPolygons object (maybe they are to begin with?). With your example data that can be done like this:
ex <- do.call(bind, sapply(exlist, function(x) as(x, 'SpatialPolygons')))
In this example (with RasterLayer objects that can be stacked) you can then do
s <- stack(rasterlist)
extract(s, ex)

How to include variable values in histogram titles in R - using by()

I want to produce histograms using by(), how can I access the values of the factors, to include in histogram headings, for example...
a <- runif(500, 0, 10)
b <- LETTERS[1:5]
c <- c("Condition1", "Condition2")
x <- data.frame("Variable1" = b, "Variable2"= c, "Value"=a)
head(x)
by(x$Value, x$Variable2, hist)
or using two variables
by(x$Value, list(x$Variable2, x$Variable1), hist)
Is there a way of passing the variable value (eg Condition1) to the title of the histogram using the options within hist(), eg putting function(x) hist(x, main=...) into by()?
Pass the split up dataframe rather than just the Values. Then you will have more to work with:
by(x, x$Variable2, function(x) hist(x$Value, main=unique(x$Variable2) ) )
Produced two plots labled Condition1, Condition2
This doesn't really answer your question, since you're specifying the use of by(), but I usually use split() and lapply() for these types of problems. My approach is usually along the lines of:
temp <- split(x$Value, list(x$Variable2, x$Variable1))
lapply(names(temp), function(x) hist(temp[[x]], main = x, xlab = "Value"))

Resources