ggplot2 plots / results are different within and outside of loop [Bug?] - r

The my simple case:
Plotting graphs within the loop brings different results than plotting it directly after the loop
# Initialize
Input <- list(c(3,3,3,3),c(1,1,1,1))
y <- c()
x <- c()
plotlist <- c()
Answer <- c()
# create helper grid
x.grid = c(1:4)
y.grid = c(1:4)
helpergrid <- expand.grid(xgrid=x.grid, ygrid=y.grid )
#- Loop Lists -
for (m in c(1,2))
{
# # Loop within each list
# for(j in 1:4)
# {
# y[j] <- Input[[m]][j]
# x[j] <- j
# }
y[1] <- Input[[m]][1]
x[1] <- 1
y[2] <- Input[[m]][2]
x[2] <- 2
y[3] <- Input[[m]][3]
x[3] <- 3
y[4] <- Input[[m]][4]
x[4] <- 4
Points <- data.frame(x, y)
# Example Plot
plot = ggplot() + labs(title = paste("Loop m = ",m)) + labs(subtitle = paste("y-values = ",Points$y)) + geom_tile(data = helpergrid, aes(x=xgrid, y=ygrid, fill=1), colour="grey20") + geom_point(data = Points, aes(x=Points$x, y=Points$y), stroke=3, size=5, shape=1, color="white") + theme_minimal()
# Plot to plotlist
plotlist[[m]] <- plot
# --- Plot plotlist within loop ---
plot(plotlist[[m]])
}
# --- Plot plotlist outside of loop ---
plot(plotlist[[1]])
plot(plotlist[[2]])
Here is an image of the results:
Plot Results
as aaumai is pointing out that there is a nested loop that might cause the issue for ggplot using static values, however the resulting plot 'is' showing the correct y-value (y=3) explicitely, but the geom_points are using the wrong values (y=1)...
It makes absolutely (!) no sense to me, I am relatively new to R and trying to debug this for hours now - so I hope someone can help me with this !!
EDIT: I manually removed the nested loop and updated the example code, but the problem still persists :(

The problem arises due to your use of Points$x within aes. The "tl;dr" is that basically you should never use $ or [ or [[ within aes. See the answer here from baptiste.
library(ggplot2)
# Initialize
Input <- list(c(3,3,3,3),c(1,1,1,1))
y <- c()
x <- c()
plotlist <- c()
Answer <- c()
# create helper grid
x.grid = c(1:4)
y.grid = c(1:4)
helpergrid <- expand.grid(xgrid=x.grid, ygrid=y.grid )
#- Loop Lists -
for (m in c(1,2)) {
y[1] <- Input[[m]][1]
x[1] <- 1
y[2] <- Input[[m]][2]
x[2] <- 2
y[3] <- Input[[m]][3]
x[3] <- 3
y[4] <- Input[[m]][4]
x[4] <- 4
Points <- data.frame(x, y)
# Example Plot
plot = ggplot() + labs(title = paste("Loop m = ",m)) + labs(subtitle = paste("y-values = ",force(Points$y))) +
geom_tile(data = helpergrid, aes(x=xgrid, y=ygrid, fill=1), colour="grey20") +
geom_point(data = Points, aes(x=x, y=y), stroke=3, size=5, shape=1, color="white") + theme_minimal()
# Plot to plotlist
plotlist[[m]] <- plot
# --- Plot plotlist within loop ---
print(plotlist[[m]])
}
# --- Plot plotlist outside of loop ---
print(plotlist[[1]])
print(plotlist[[2]])
I believe the reason this happens is due to lazy evaluation. The data passed into geom_tile/point gets stored, but when the plot is printed, it grabs Points$x from the current environment. During the loop, this points to the current state of the Points data frame, the desired state. After the loop is finished, only the second version of Points exists, so when the referenced value from aes is evaluated, it grabs the x values from Points$x as it exists after the second evaluation of the loop. Hope this is clear, feel free to ask further if not.
To clarify, if you remove Points$ and just refer to x within aes, it takes these values from the data.frame as it was passed into the data argument of the geom calls.

If I'm not mistaken, this is because you have a loop within the loop.
The plot within the loop returns plots for changing y values in the Points data (from 1 to 4), whereas the plot outside is only plotting the static values.

Related

Loop coordinates of intesected line and an outline - R

Based on this answer How to get the coordinates of an intesected line with an outline - R ,I tried to run a loop using the script below. Any idea why I can not plot all the intersection points and lines? The shape is different than the answer given
Code:
library(ggplot2)
library(sf)
t <- seq(0, 2*pi, by=0.1)
df <- data.frame(x = 13*sin(t)^3,
y = 4*cos(t)-2*cos(3*t)-5*cos(4*t)-cos(2*t))
df <- rbind(df, df[1,]) # close the polygon
meanX <- mean(df$x)
meanY <- mean(df$y)
# Transform your data.frame in a sf polygon (the first and last points
# must have the same coordinates)
#> Linking to GEOS 3.5.1, GDAL 2.1.3, proj.4 4.9.2
poly <- st_sf(st_sfc(st_polygon(list(as.matrix(df)))))
# Choose the angle (in degrees)
rotAngles <- 5
for(angle in seq(0,359,rotAngles)) {
# Find the minimum length for the line segment to be always
# outside the cloud whatever the choosen angle
maxX <- max(abs(abs(df[,"x"]) - abs(meanX)))
maxY <- max(abs(abs(df[,"y"]) - abs(meanY)))
line_length = sqrt(maxX^2 + maxY^2) + 1
# Find the coordinates of the 2 points to draw a line with
# the intended angle.
# This is the gray line on the graph below
line <- rbind(c(meanX,meanY),
c(meanX + line_length * cos((pi/180)*angle),
meanY + line_length * sin((pi/180)*angle)))
# Transform into a sf line object
line <- st_sf(st_sfc(st_linestring(line)))
# Intersect the polygon and line. The result is a two points line
# shown in black on the plot below
intersect_line <- st_intersection(poly, line)
# Extract only the second point of this line.
# This is the intersecting point
intersect_point <- st_coordinates(intersect_line)[2,c("X","Y")]
# Visualise this with ggplot and without geom_sf
# you need first transform back the lines into data.frame
line <- as.data.frame(st_coordinates(line))[,1:2]
intersect_line <- as.data.frame(st_coordinates(intersect_line))[,1:2]
ggplot() + geom_path(data=df, aes(x = x, y = y)) +
geom_line(data=line, aes(x = X, y = Y), color = "gray80", lwd = 3) +
geom_line(data=intersect_line, aes(x = X, y = Y), color = "gray20", lwd = 1) +
geom_point(aes(meanX, meanY), colour="orangered", size=2) +
geom_point(aes(intersect_point["X"], intersect_point["Y"]),
colour="orangered", size=2) +
theme_bw()
}
First we'll go back to #Gilles polygon shape as it is more consistent with his reasoning and presentation:
# Generate a heart shape
t <- seq(0, 2*pi, by=0.1)
df <- data.frame(x = 16*sin(t)^3,
y = 13*cos(t)-5*cos(2*t)-2*cos(3*t)-cos(4*t))
df <- rbind(df, df[1,]) # close the polygon
meanX <- mean(df$x)
meanY <- mean(df$y)
library(sf)
poly <- st_sf(st_sfc(st_polygon(list(as.matrix(df)))))
These elements don't change and don't need to calculated multiple times inside a loop:
maxX <- max(abs(abs(df[,"x"]) - abs(meanX)))
maxY <- max(abs(abs(df[,"y"]) - abs(meanY)))
line_length = sqrt(maxX^2 + maxY^2) + 1
Then your rotAngle and angle:
rotAngle <- 5
angle <- seq(0, 359, rotAngle)
Focusing attention on the for loop, the first line call has elements that do and don't change. Let's make an empty list to hold our results, made outside the for loop, that will hold 2x2 matrices:
line_lst <- list()
for (j in 1:length(angle)) {
line_lst[[j]] <- matrix(nrow = 2, ncol=2)
line_lst[[j]][1,1] <- meanX
line_lst[[j]][1,2] <- meanY
line_lst[[j]][2,1] <- meanX + line_length * cos((pi/180)*angle[j])
line_lst[[j]][2,2] <- meanY + line_length * sin((pi/180)*angle[j])
}
line_lst[[1]]
[,1] [,2]
[1,] 1.225402e-06 0.09131118
[2,] 2.425684e+01 0.09131118
line_lst[[72]]
[,1] [,2]
[1,] 1.225402e-06 0.09131118
[2,] 2.416454e+01 -2.02281169
Those seem reasonable, and this was mainly what I wanted to show, explicating on the LHS[j] <- RHS[j], with the which iteration we're on in 1:length(angle). And on to linestring, intersection, and points,
same make an empty receiver, loop thru:
# here we have mismatch of establishing an `i` counter then
# counting `j`, which look close enough to tired eyes
# this will result in NULL(s)
linestring_lst <- list()
for (i in 1:length(line_lst)) { # this causes future error
linestring_lst[[j]] <- st_sf(st_sfc(st_linestring(line_lst[[j]])))
}
# simply keeping our accounting right, using all `i` or all `j`,
# or staying away from things that look alike and using `k` here
for (k in 1:length(line_lst)) {
linestring_lst[[k]] <- st_sf(st_sfc(st_linestring(line_lst[[k]])))
}
intersection_lst <- list()
for (j in 1:length(linestring_lst)) {
intersection_lst[[j]] <- st_intersection(poly, linestring_lst[[j]])
}
intersect_points <- list()
for (j in 1:length(intersection_lst)) {
intersect_points[[j]] <- st_coordinates(intersection_lst[[j]])[2,c('X','Y')]
}
The things to remember here related to for loops, create your receiver objects outside the loop, index both the LHS[j] and RHS[j] ([ for vector-like receivers, [[ for lists). And having done each of these independently, you can put it all in one for loop.
And final step, take the lists to data.frame(s) for use in ggplot.
intersect_pts_df <- as.data.frame(do.call('rbind', intersect_points))
head(intersect_pts_df, n = 3)
X Y
1 14.96993 0.09131118
2 15.56797 1.45333163
3 15.87039 2.88968964

ggarrange and ggplot inside a for loop

this is a simple example I have, where I generate 5 standard normals, each one with his own p value (just for the sake of the demonstration). I save that in a 4x50x5 array called X.
After that, I want to save 5 plots with 4 histograms each in a .pdf, and the following code does the job
pvec <- 2^(2:5)
n <- pvec/2
j <- 5
size <- 50
X <- array(rep(NA, length(pvec)*reps*j), dim=c(length(pvec), reps, j))
for (k in 1:length(pvec)){
for (i in 1:j){
X[k,,i] <- rnorm(size)
}
}
pdf("grafic.pdf")
par(mfrow=c(2,2))
for (w in 1:j){
for (k in 1:length(pvec)){
hist(X[k,,w], freq = F, col = 'lightgreen',main = paste("p = ", pvec[k], ",n =", n[k]))
curve(dnorm(x,mean=0,sd=1), add=TRUE,col="blue")
}
}
dev.off()
Obtaining, for example
Let's say that I want to do this now, but with ggplot. I have to use ggarrange in replace of par(mfrow). But ggarrange uses a plot.list as an argument, so inside the for I should have something like
graphlist <- NULL
for (w in 1:j){
for (k in 1:length(p.vec)){
graphlist[k,,w] <- ggplot(data=data.frame(X), aes(x=X[k,,w])) +
geom_histogram()
}
}
ggarrange(plotlist = graphlist, ncol = 2, nrow = 2)
But of course this doesn't work. How can I do stuffs like that, where I need to save the plots made by ggplot2 and then combine them with ggarrange? Thanks

Plot Sphere with custom gridlines in R

I would like to plot a sphere in R with the gridlines on the surface corresponding to the equal area gridding of the sphere using the arcos transformation.
I have been experimenting with the R packakge rgl and got some help from :
Plot points on a sphere in R
Which plots the gridlines with equal lat long spacing.
I have the below function which returns a data frame of points that are the cross over points of the grid lines I want, but not sure how to proceed.
plot_sphere <- function(theta_num,phi_num){
theta <- seq(0,2*pi,(2*pi)/(theta_num))
phi <- seq(0,pi,pi/(phi_num))
tmp <- seq(0,2*phi_num,2)/phi_num
phi <- acos(1-tmp)
tmp <- cbind(rep(seq(1,theta_num),each = phi_num),rep(seq(1,phi_num),times = theta_num))
results <- as.data.frame(cbind(theta[tmp[,1]],phi[tmp[,2]]))
names(results) <- c("theta","phi")
results$x <- cos(results$theta)*sin(results$phi)
results$y <- sin(results$theta)*sin(results$phi)
results$z <- cos(results$phi)
return(results)
}
sphere <- plot_sphere(10,10)
Can anyone help, in general I am finding the rgl functions tricky to work with.
If you use lines3d or plot3d(..., type="l"), you'll get a plot joining the points in your dataframe. To get breaks (you don't want one long line), add rows containing NA values.
The code in your plot_sphere function seems really messed up (you compute phi twice, you don't generate vectors of the requested length, etc.), but this function based on it works:
function(theta_num,phi_num){
theta0 <- seq(0,2*pi, len = theta_num)
tmp <- seq(0, 2, len = phi_num)
phi0 <- acos(1-tmp)
i <- seq(1, (phi_num + 1)*theta_num) - 1
theta <- theta0[i %/% (phi_num + 1) + 1]
phi <- phi0[i %% (phi_num + 1) + 1]
i <- seq(1, phi_num*(theta_num + 1)) - 1
theta <- c(theta, theta0[i %% (theta_num + 1) + 1])
phi <- c(phi, phi0[i %/% (theta_num + 1) + 1])
results <- data.frame( x = cos(theta)*sin(phi),
y = sin(theta)*sin(phi),
z = cos(phi))
lines3d(results)
}

flexible log10 grid with ggplot2

i am trying to make flexible log10-grid in ggplot2. The idea is that between 0.1-1 the gridline breaks are 0.1 apart, between 1-10, they are 1 apart, between 10-100 they are 10 apart, etc
This way the grid lines repeat the same pattern as many times as required based on a variable vector (CAfails) with data i supply. This is what i came up with after a lot of tweaking:
CAfails<-data.frame(c(2.5,5.8,10.7,16.2,23,36.2,45.3,49.5,70.1,80.3,83.6,90))
LOG.as<-c(t((10^((floor(log10(min(CAfails)))-1):ceiling(log10(max(CAfails)))))%o%c(1:10)))
LOG.as<-LOG.as[-10*((floor(log10(min(CAfails))):ceiling(log10(max(CAfails))))+1)]
After which i pass it to ggplot2:
scale_x_log10(limits=c(1,10^(ceiling(log10(max(CAfails))))),breaks=LOG.as)
scale_y_log10(limits=c(0.1,10^(ceiling(log10(max(CAfails))))),breaks=LOG.as)
It works Ok but i was wondering if there wasn't a simpler and more easy way to do this
Here is a complete example:
CAfails<-data.frame(x=c(2.5,5.8,10.7,16.2,23,36.2,45.3,49.5,70.1,80.3,83.6,90))
LOG.as<-c(t((10^((floor(log10(min(CAfails)))-1):ceiling(log10(max(CAfails)))))%o%c(1:10)))
LOG.as<-LOG.as[-10*((floor(log10(min(CAfails))):ceiling(log10(max(CAfails))))+1)]
pdf$x <- 1:nrow(CAfails)
pdf$y <- CAfails$x
ggplot(data=pdf,aes(x,y)) + geom_point() +
scale_x_log10(limits=c(1,10^(ceiling(log10(max(CAfails))))),breaks=LOG.as) +
scale_y_log10(limits=c(0.1,10^(ceiling(log10(max(CAfails))))),breaks=LOG.as)
Which yields this:
I think this works a bit better:
CAfails<-data.frame(x=c(2.5,5.8,10.7,16.2,23,36.2,45.3,49.5,70.1,80.3,83.6,90))
pdf$x <- 1:nrow(CAfails)
pdf$y <- CAfails$x
genbreaks <- function(x){
# only works on positive vals
minx <- min(x)
maxx <- max(x)
flminx <- floor(log10(minx))
clmaxx <- ceil(log10(maxx))
rv <- c()
xlo <- 10^flminx
for (i in flminx:clmaxx) {
rv <- c(rv,seq(xlo,xlo*10,xlo))
xlo <- xlo*10
}
return(rv)
}
ggplot(data=pdf,aes(x,y)) + geom_point(color="blue") +
scale_x_log10(breaks=genbreaks(pdf$x)) +
scale_y_log10(breaks=genbreaks(pdf$y))
Yielding:
But there could be a standard way of doing it.

Plot a Region in R

I generated a Matrix with 100 random x-y-Coordinates in the [-1,1]^2 Interval:
n <- 100
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n)
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
and classified them into 2 classes -1 and 1 by a given target function f (a vector).
I computed a hypothesis function g and now want to visualize how good it matches the
target function f.
f <- c(1.0, 0.5320523, 0.6918301) # the given target function
ylist <- sign(datam %*% f) # classify into -1 and 1
# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
w <- c(1,0,0) # starting vector
made.mistake = TRUE
while (made.mistake) {
made.mistake=FALSE
for (i in 1:n) {
if (ylist[i] != sign(t(w) %*% datam[i,])) {
w <- w + ylist[i]*datam[i,]
made.mistake=TRUE
}
}
}
return(w=w)
}
g <- perceptron(datam, ylist)
I now want to compare f to g in plot.
I can do this quite easily in mathematica. Shown here is the data set with the target function f that separates the data in the +1 and -1 parts:
This mathematica plot shows both f and g in comparison (different data set and f)
This is the corresponding mathematica code
ContourPlot[g.{1, x1, x2} == 0, {x1, -1, 1}, {x2, -1, 1}]
How can I do something similar in R (ggplot would be nice)?
Same thing using ggplot. This example follows your code exactly, then adds at the end:
# OP's code...
# ...
glist <- sign(datam %*% g)
library(reshape2) # for melt(...)
library(plyr) # for .(...)
library(ggplot2)
df <- data.frame(datam,f=ylist,g=glist) # df has columns: X1, X2, X3, f, g
gg <- melt(df,id.vars=c("X1","X2","X3"),variable.name="model")
ggp <- ggplot(gg, aes(x=X2, y=X3, color=factor(value)))
ggp <- ggp + geom_point()
ggp <- ggp + geom_abline(subset=.(model=="f"),intercept=-f[1]/f[3],slope=-f[2]/f[3])
ggp <- ggp + geom_abline(subset=.(model=="g"),intercept=-g[1]/g[3],slope=-g[2]/g[3])
ggp <- ggp + facet_wrap(~model)
ggp <- ggp + scale_color_discrete(name="Mistake")
ggp <- ggp + labs(title=paste0("Comparison of Target (f) and Hypothesis (g) [n=",n,"]"))
ggp <- ggp + theme(plot.title=element_text(face="bold"))
ggp
Below are results for n=200, 500, and 1000. When n=100, g=c(1,0,0). You can see that f and g converge for n~500.
In case you are new to ggplot: first we create a data frame (df) which has the coordinates (X2 and X3) and two columns for the classifications based on f and g. Then we use melt(...) to convert this to a new dataframe, gg, in "long" format. gg has columns X1, X2, X3, model, and value. The column, gg$model identifies the model (f or g). The corresponding classifications are in gg$value. Then the ggplot calls do the following:
Establish the default dataset, gg, the x and y coords, and the coloring [ggplot(...)]
Add the points layer [geom_point(...)]
Add lines separating the classifications [geom_abline(...)]
Tell ggplot to plot the two models in different "facets" [facet_wrap(...)]
Set the legend name.
Set the plot title.
Make the plot title bold.
Your example is still not reproducible. Look at my code and you will see that f and g are identical. Also, it seems as you are extrapolating the lines (second part of your questions) for data points you don't have. Have you any evidence that the discrimination should be linear?
#Data generation
n <- 10000
datam <- matrix(c(rep(1,n), 2*runif(n)-1, 2*runif(n)-1), n)
# leading 1 column needed for computation
# second column has x coordinates, third column has y coordinates
datam.df<-data.frame(datam)
datam.df$X1<-NULL
f <- c(1.0, 0.5320523, 0.6918301) # the given target function
f.col <- ifelse(sign(datam %*% f)==1,"darkred", "darkblue")
f.fun<-sign(datam %*% f)
# perceptron algorithm to find g:
perceptron = function(datam, ylist) {
w <- c(1,0,0) # starting vector
made.mistake = TRUE
while (made.mistake) {
made.mistake=FALSE
for (i in 1:n) {
if (ylist[i] != sign(t(w) %*% datam[i,])) {
w <- w + ylist[i]*datam[i,]
made.mistake=TRUE
}
}
}
return(w=w)
}
g <- perceptron(datam, f.fun)
g.fun<-sign(datam %*% g)
Plotting the overall data
plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", cex=2)
I will produce separate plots for the g and f function since something is not working in your example and f and g are identical. Once you sort this out you can put all in one plot.You can also see and choose if you want shadowing or not. If you have no evidence that the classification are linear it's probably more wise to use chull() to mark the data you have.
For the f function
plot(datam.df$X2, datam.df$X3, col=f.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="f function")
datam.df.f<-datam.df[f.fun==1,]
ch.f<-chull(datam.df.f$X2, datam.df.f$X3 )
ch.f <- rbind(x = datam.df.f[ch.f, ], datam.df.f[ch.f[1], ])
polygon(ch.f, lwd=3, col=rgb(0,0,180,alpha=50, maxColorValue=255))
For the g function
g.col <- ifelse(sign(datam %*% g)==1,"darkred", "darkblue")
plot(datam.df$X2, datam.df$X3, col=g.col, pch=".", xlim=c(-1,-0.5), ylim=c(-1,-.5), cex=3, main="g function")
datam.df.g<-datam.df[g.fun==1,]
ch.g<-chull(datam.df.g$X2, datam.df.g$X3 )
ch.g <- rbind(x = datam.df.g[ch.g, ], datam.df.g[ch.g[1], ])
polygon(ch.g, col=rgb(0,0,180,alpha=50, maxColorValue=255), lty=3, lwd=3)
the ch.f and ch.g objects are the coordinates for the "bag" around your points. You can extract the points to describe your line.
ch.f
lm.f<-lm(c(ch.f$X3[ ch.f$X2> -0.99 & ch.f$X2< -0.65 & ch.f$X3<0 ])~c(ch.f$X2[ ch.f$X2>-0.99 & ch.f$X2< -0.65 & ch.f$X3<0]))
curve(lm.f$coefficients[1]+x*lm.f$coefficients[2], from=-1., to=-0.59, lwd=5, add=T)
lm.g<-lm(c(ch.g$X3[ ch.g$X2> -0.99 & ch.g$X2< -0.65 & ch.g$X3<0 ])~c(ch.g$X2[ ch.g$X2>-0.99 & ch.g$X2< -0.65 & ch.g$X3<0]))
curve(lm.g$coefficients[1]+x*lm.g$coefficients[2], from=-1., to=-0.59, lwd=5, add=T, lty=3)
And you get
Unfortunately because f and g functions are same in your example you cannot see the different lines in the above picture
You can use the col argument in plot() to indicate classification of the f() function. And you can use polygon() to shade the classification area of your g() function. If you give us a reproducible example we could answer with specific code. It would result in a figure similar to that of Mathematica you present.

Resources