I'm trying to plot from a rather complex array in R. I want to produce an image with 3 by 3 graphs, each with red and blue points on it.
I've got a structure of apply loops which works, but I'd like to change the y maximum value by each row.
I would normally do this using a counter, like i, in other languages. But the apply thing in R is completely baffling me!
par(mfrow=c(3,3),pty="s") # a 3 by 3 graphic
set.seed(1001)
x <- 1:54 # with 1 to 54 along the x axis
y <- array(rexp(20), dim=c(54,6,3,2)) # and the y axis coming
# from an array with dimensions as shown.
ymax <- c(1,0.1,0.3) # three different y maximum values I want
# on the graphic, one for each row of graphs
counter <- 1 # a counter, starting at 1,
# as I would use in a traditional loop
apply(y[,3:5,,], 2, function(i) # my first apply, which only considers
# the 3rd, 4th and 5th columns
{
yy <- ymax[counter] # using the counter to select my ylimit maximum
apply(i, 2, function (ii) # my second apply, considering the 3rd
# dimension of y
{
plot(x,ii[,1], col="blue", ylim=c(0,yy))
# plotting the 4th dimension
points(x,ii[,2], col="red")
# adding points in a different
# colour from the 4th dim.
})
})
Thank you in advance for your thoughts, they are very much appreciated!
Cheers
Kate
I think it might be easier to use loops in this case.
Also, your code does not have a line to update the counter, like counter <- counter + 1. From inside apply you will need to assign to the global environment using <<-, note the doubled smaller < sign. An example using lapply, e.g.
Single lapply usage
counter <- 0
lapply(1:3, function(x) {
counter <<- counter + 1
cat("outer", counter, "\n")
plot(1:10, main=counter)
})
Or nested usage of lapply
counter <- 0
lapply(1:3, function(x) {
counter <<- counter + 1
cat("outer", counter, "\n")
lapply(1:3, function(x) {
counter <<- counter + 1
cat("inner", counter, "\n")
plot(1:10, main=counter)
})
})
The key thing here is to use lapply on the index rather than on the array itself, so then you can use the index to subset both your y limits and the array ahead of the inner loop. This also avoids having to use the <<- construct.
Simplified your data a bit:
par(mfrow=c(3,3),pty="s") # a 3 by 3 graphic
set.seed(1001)
x <- 1:10 # with 1 to 54 along the x axis
dims <- c(10,6,3,2)
y <- array(rexp(prod(dims)), dim=c(10,6,3,2)) # and the y axis coming
ymax <- c(1,0.1,0.3)
lapply(1:3, function(counter, arr) {
apply(
arr[ ,counter + 2, , ], 2,
function(ii) {
plot(x, ii[,1], col="blue", ylim=c(0,ymax[counter]))
points(x, ii[,2], col="red")
} )
},
arr=y
)
I am not going to rewrite your code as I must say it is difficult to comprehend, but this will help: you can update a variable outside of the scope of apply by using <<- assignment, e.g. to update some external "counter"
Related
I had a vector like this :
x= c(0.542949849, 0.242292905, 0.163459552, 0.069668097, 0.042969073, 0.035829825)
and I want to plot (x[i], x[i+1]). Using Excel I got this :
How can I get this graphic in R ? I tried this :
for(i in 1:5){
plot(x[i], x[i+1])
par(new = TRUE)
}
but it doesn't give the excepted result
Here are two solutions.
The first uses base R only.
x <- c(0.542949849, 0.242292905, 0.163459552, 0.069668097, 0.042969073, 0.035829825)
plot(range(x), range(x), type = "n")
for(i in seq_along(x)[-length(x)]){
points(x[i], x[i+1])
}
The second uses package tsDyn.
tsDyn::autopairs(x, type = "points")
Try this:
plot(embed(rev(x), 2))
or
plot(embed(x, 2)[, 2:1])
You can get what you want but you have to add a few intermediate steps.
You need to put in a qualifier to force the array to be numeric. This is the equivalent of forcing the array to be an array of float values. Otherwise all you get is integer values in your array.
You need to redefine the sub-components of x to 2 new vectors. Vector 'a' has an index of elements from 1 to 5 of the x array. It appears on the x-axis. Vector 'b' has an index of elements from 2 to 6 of the x array. It appears on the y-axis. The first elements in vectors a and b index position 1 are equivalent to x[i],x[i+1] where i is 1.
You need to bind the 2 vectors together and then plot the result.
x <- as.numeric(c(0.542949849, 0.242292905, 0.163459552, 0.069668097, 0.042969073, 0.035829825))
a <- x[1:5]
b <- x[2:6]
c <- cbind(a,b)
plot(c)
and the result graph is as follows
I'm using a for loop to assign ggplots to a list, which is then passed to plot_grid() (package cowplot). plot_grid places multiple ggplots side by side in a single figure. This works fine manually, but when I use a for loop, the last plot generated is repeated in each subframe of the figure (shown below). In other words, all the subframes show the same ggplot.
Here is a toy example:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
Expected Figure:
Figure from for loop:
I tried converting the list elements to grobs, as described in this question, like this:
mygrobs <- lapply(myplots, ggplotGrob)
plot_grid(plotlist=mygrobs)
But I got the same result.
I think the problem lies in the loop assignment, not plot_grid(), but I can't see what I'm doing wrong.
The answers so far are very close, but unsatisfactory in my opinion. The problem is the following - after your for loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[1]]$plot_env
#<environment: R_GlobalEnv>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[2]]$plot_env
#<environment: R_GlobalEnv>
i
#[1] "B"
As the other answers mention, ggplot doesn't actually evaluate those expressions until plotting, and since these are all in the global environment, and the value of i is "B", you get the undesirable results.
There are several ways of avoiding this issue, the simplest of which in fact simplifies your expressions:
myplots = lapply(v, function(col)
ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,col])) + geom_point() + labs(y=col))
The reason this works, is because the environment is different for each of the values in the lapply loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[1]]$plot_env
#<environment: 0x000000000bc27b58>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[2]]$plot_env
#<environment: 0x000000000af2ef40>
eval(quote(dfrm[, col]), env = myplots[[1]]$plot_env)
#[1] 1 2 3 4 5 6 7 8 9 10
eval(quote(dfrm[, col]), env = myplots[[2]]$plot_env)
#[1] 10 9 8 7 6 5 4 3 2 1
So even though the expressions are the same, the results are different.
And in case you're wondering what exactly is stored/copied to the environment of lapply - unsurprisingly it's just the column name:
ls(myplots[[1]]$plot_env)
#[1] "col"
I believe the problem here is that the non-standard evaluation of the aes method delays evaluating i until the plot is actually plotted. By the time of plotting, i is the last value (in the toy example "B") and thus the y aesthetic mapping for all plots refers to that last value. Meanwhile, the labs call uses standard evaluation and so the labels correctly refer to each iteration of i in the loop.
This can be fixed by simply using the standard evaluation version of the mapping function, aes_q:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes_q(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
There is a nice explanation of what happens with ggplot2's lazy evaluation and for loops in [this answer](https://stackoverflow.com/a/26246791/2461552.
I usually switch to aes_string or aes_ for situations like this so I can use variables as strings in ggplot2.
I find lapply loops easier than a for loop in your case as initializing the list and using the counter can be avoided.
First, I add the x variable to the dataset.
dfrm$index = 1:nrow(dfrm)
Now, the lapply loop, looping through the columns in v.
myplots = lapply(v, function(x) {
ggplot(dfrm, aes_string(x = "index", y = x)) +
geom_point() +
labs(y = x)
})
plot_grid(plotlist = myplots)
I think ggplot is getting confused by looking for your x and y variables inside of dfrm even though you are actually defining them on the fly. If you change the for loop slightly to build a new sub data.frame as the first line it works just fine.
myplots <- list()
count = 1
for(i in v){
df <- data.frame(x = 1:dfmsize, y = dfrm[,i])
myplots[[count]] <- ggplot(df, aes(x=x, y=y)) + geom_point() + labs(y=i)
count = count + 1
}
plot_grid(plotlist=myplots)
I'm using a for loop to assign ggplots to a list, which is then passed to plot_grid() (package cowplot). plot_grid places multiple ggplots side by side in a single figure. This works fine manually, but when I use a for loop, the last plot generated is repeated in each subframe of the figure (shown below). In other words, all the subframes show the same ggplot.
Here is a toy example:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
Expected Figure:
Figure from for loop:
I tried converting the list elements to grobs, as described in this question, like this:
mygrobs <- lapply(myplots, ggplotGrob)
plot_grid(plotlist=mygrobs)
But I got the same result.
I think the problem lies in the loop assignment, not plot_grid(), but I can't see what I'm doing wrong.
The answers so far are very close, but unsatisfactory in my opinion. The problem is the following - after your for loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[1]]$plot_env
#<environment: R_GlobalEnv>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, i]
myplots[[2]]$plot_env
#<environment: R_GlobalEnv>
i
#[1] "B"
As the other answers mention, ggplot doesn't actually evaluate those expressions until plotting, and since these are all in the global environment, and the value of i is "B", you get the undesirable results.
There are several ways of avoiding this issue, the simplest of which in fact simplifies your expressions:
myplots = lapply(v, function(col)
ggplot(dfrm, aes(x=1:dfmsize, y=dfrm[,col])) + geom_point() + labs(y=col))
The reason this works, is because the environment is different for each of the values in the lapply loop:
myplots[[1]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[1]]$plot_env
#<environment: 0x000000000bc27b58>
myplots[[2]]$mapping
#* x -> 1:dfmsize
#* y -> dfrm[, col]
myplots[[2]]$plot_env
#<environment: 0x000000000af2ef40>
eval(quote(dfrm[, col]), env = myplots[[1]]$plot_env)
#[1] 1 2 3 4 5 6 7 8 9 10
eval(quote(dfrm[, col]), env = myplots[[2]]$plot_env)
#[1] 10 9 8 7 6 5 4 3 2 1
So even though the expressions are the same, the results are different.
And in case you're wondering what exactly is stored/copied to the environment of lapply - unsurprisingly it's just the column name:
ls(myplots[[1]]$plot_env)
#[1] "col"
I believe the problem here is that the non-standard evaluation of the aes method delays evaluating i until the plot is actually plotted. By the time of plotting, i is the last value (in the toy example "B") and thus the y aesthetic mapping for all plots refers to that last value. Meanwhile, the labs call uses standard evaluation and so the labels correctly refer to each iteration of i in the loop.
This can be fixed by simply using the standard evaluation version of the mapping function, aes_q:
require(cowplot)
dfrm <- data.frame(A=1:10, B=10:1)
v <- c("A","B")
dfmsize <- nrow(dfrm)
myplots <- vector("list",2)
count = 1
for(i in v){
myplots[[count]] <- ggplot(dfrm, aes_q(x=1:dfmsize, y=dfrm[,i])) + geom_point() + labs(y=i)
count = count +1
}
plot_grid(plotlist=myplots)
There is a nice explanation of what happens with ggplot2's lazy evaluation and for loops in [this answer](https://stackoverflow.com/a/26246791/2461552.
I usually switch to aes_string or aes_ for situations like this so I can use variables as strings in ggplot2.
I find lapply loops easier than a for loop in your case as initializing the list and using the counter can be avoided.
First, I add the x variable to the dataset.
dfrm$index = 1:nrow(dfrm)
Now, the lapply loop, looping through the columns in v.
myplots = lapply(v, function(x) {
ggplot(dfrm, aes_string(x = "index", y = x)) +
geom_point() +
labs(y = x)
})
plot_grid(plotlist = myplots)
I think ggplot is getting confused by looking for your x and y variables inside of dfrm even though you are actually defining them on the fly. If you change the for loop slightly to build a new sub data.frame as the first line it works just fine.
myplots <- list()
count = 1
for(i in v){
df <- data.frame(x = 1:dfmsize, y = dfrm[,i])
myplots[[count]] <- ggplot(df, aes(x=x, y=y)) + geom_point() + labs(y=i)
count = count + 1
}
plot_grid(plotlist=myplots)
I have a for loop in R in which I want to store the result of each calculation (for all the values looped through). In the for loop a function is called and the output is stored in a variable r in the moment. However, this is overwritten in each successive loop. How could I store the result of each loop through the function and access it afterwards?
Thanks,
example
for (par1 in 1:n) {
var<-function(par1,par2)
c(var,par1)->var2
print(var2)
So print returns every instance of var2 but in var2 only the value for the last n is saved..is there any way to get an array of the data or something?
initialise an empty object and then assign the value by indexing
a <- 0
for (i in 1:10) {
a[i] <- mean(rnorm(50))
}
print(a)
EDIT:
To include an example with two output variables, in the most basic case, create an empty matrix with the number of columns corresponding to your output parameters and the number of rows matching the number of iterations. Then save the output in the matrix, by indexing the row position in your for loop:
n <- 10
mat <- matrix(ncol=2, nrow=n)
for (i in 1:n) {
var1 <- function_one(i,par1)
var2 <- function_two(i,par2)
mat[i,] <- c(var1,var2)
}
print(mat)
The iteration number i corresponds to the row number in the mat object. So there is no need to explicitly keep track of it.
However, this is just to illustrate the basics. Once you understand the above, it is more efficient to use the elegant solution given by #eddi, especially if you are handling many output variables.
To get a list of results:
n = 3
lapply(1:n, function(par1) {
# your function and whatnot, e.g.
par1*par1
})
Or sapply if you want a vector instead.
A bit more complicated example:
n = 3
some_fn = function(x, y) { x + y }
par2 = 4
lapply(1:n, function(par1) {
var = some_fn(par1, par2)
return(c(var, par1)) # don't have to type return, but I chose to make it explicit here
})
#[[1]]
#[1] 5 1
#
#[[2]]
#[1] 6 2
#
#[[3]]
#[1] 7 3
I followed the discussion over HERE and am curious why is using<<- frowned upon in R. What kind of confusion will it cause?
I also would like some tips on how I can avoid <<-. I use the following quite often. For example:
### Create dummy data frame of 10 x 10 integer matrix.
### Each cell contains a number that is between 1 to 6.
df <- do.call("rbind", lapply(1:10, function(i) sample(1:6, 10, replace = TRUE)))
What I want to achieve is to shift every number down by 1, i.e all the 2s will become 1s, all the 3s will be come 2 etc. Therefore, all n would be come n-1. I achieve this by the following:
df.rescaled <- df
sapply(2:6, function(i) df.rescaled[df.rescaled == i] <<- i-1))
In this instance, how can I avoid <<-? Ideally I would want to be able to pipe the sapply results into another variable along the lines of:
df.rescaled <- sapply(...)
First point
<<- is NOT the operator to assign to global variable. It tries to assign the variable in the nearest parent environment. So, say, this will make confusion:
f <- function() {
a <- 2
g <- function() {
a <<- 3
}
}
then,
> a <- 1
> f()
> a # the global `a` is not affected
[1] 1
Second point
You can do that by using Reduce:
Reduce(function(a, b) {a[a==b] <- a[a==b]-1; a}, 2:6, df)
or apply
apply(df, c(1, 2), function(i) if(i >= 2) {i-1} else {i})
But
simply, this is sufficient:
ifelse(df >= 2, df-1, df)
You can think of <<- as global assignment (approximately, because as kohske points out it assigns to the top environment unless the variable name exists in a more proximal environment). Examples of why this is bad are here:
Examples of the perils of globals in R and Stata