I have x,y,z data with categorical variables that facilitate a facet. I want to include contour lines from all but the first facet and discard the rest of the data. One way to visualize the process is to facet the data and mentally move the contours from the other facets to the first.
MWE:
library(ggplot2)
library(dplyr)
data(volcano)
nx <- 87; ny <- 61
vdat <- data_frame(w=0L, x=rep(seq_len(nx), ny), y=rep(seq_len(ny), each=nx), z=c(volcano))
vdat <- bind_rows(vdat,
mutate(vdat, w=1L, x=x+4, y=y+4, z=z-20),
mutate(vdat, w=2L, x=x+8, y=y+8, z=z-40))
ggplot(vdat, aes(x, y, fill=z)) +
geom_tile() +
facet_wrap(~ w, nrow=1) +
geom_contour(aes(z=z), color='white', breaks=c(-Inf,110,Inf))
In each facet, I have:
facet 0: X,Y,Z for w==0L, contour for w==0L
facet 1: X,Y,Z for w==1L, contour for w==1L
facet 2: X,Y,Z for w==2L, contour for w==2L
What I'd like to have is a single pane, effectively:
X,Y,Z for w==0L, contour for all values of the w categorical
(Forgive my hasty GIMP skills. In the real data, the contours will likely not overlap, but I don't think that that would be a problem.)
The real data has different values (and gradients) of z for the same X,Y system, so the contour is otherwise compatible with the first facet. However, it's still "different", so I cannot mock-up the contours with the single w==0L data.
I imagine there might be a few ways to do this:
form the data "right" the first time, informing ggplot how to pull the contours but lay them on the single plot (e.g., using different data= for certain layers);
form the faceted plot, extract the contours from the other facets, apply them to the first, and discard the other facets (perhaps using grid and/or gtable); or perhaps
(mathematically calculate the contours myself and add them as independent lines; I was hoping to re-use ggplot2's efforts to avoid this ...).
It doesn't fit so neatly with the grammar of graphics, but you can just add a geom_contour call for each subset of data. A quick way is to add a list of such calls to the graph, which you can generate quickly by lapplying across the split data:
ggplot(vdat[vdat$w == 0, ], aes(x, y, z = z, fill = z)) +
geom_tile() +
lapply(split(vdat, vdat$w), function(dat){
geom_contour(data = dat, color = 'white', breaks = c(-Inf, 110, Inf))
})
You can even make a legend, if you need:
ggplot(vdat[vdat$w == 0, ], aes(x, y, z = z, fill = z, color = factor(w))) +
geom_raster() +
lapply(split(vdat, vdat$w), function(dat){
geom_contour(data = dat, breaks = c(-Inf, 110, Inf))
})
Related
I'm wondering whether I can manipulate stat_density2d to show the density for the x values without considering the y values.
To illustrate:
df <- data.frame(x = c(1:40, rep(1:20, 3), 15:40))
ggplot(df, aes(x=x, y = x)) +
stat_density2d(aes(fill='red',alpha=..level..),geom='polygon', show.legend = F) +
geom_point(alpha = 0.3)
Obviously I does't really make sense to plot the sames values against each other, however I'm interested in the density of the plots at a certain value. Therefore I would like to keep y constant (e.g y = 1) but still show the same density like so:
(In my publication I actually have multiple groups, making this a nice way to plot the group separation even though it is 1D)
I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))
I'm trying to create a density curve in R using a set of random numbers between 1000, and shade the part that is less than or equal to a certain value. There are a lot of solutions out there involving geom_area or geom_ribbon, but they all require a yval, which I don't have (it's just a vector of 1000 numbers). Any ideas on how I could do this?
Two other related questions:
Is it possible to do the same thing for a cumulative density function (I'm currently using stat_ecdf to generate one), or shade it at all?
Is there any way to edit geom_vline so it will only go up to the height of the density curve, rather than the whole y axis?
Code: (the geom_area is a failed attempt to edit some code I found. If I set ymax manually, I just get a column taking up the whole plot, instead of just the area under the curve)
set.seed(100)
amount_spent <- rnorm(1000,500,150)
amount_spent1<- data.frame(amount_spent)
rand1 <- runif(1,0,1000)
amount_spent1$pdf <- dnorm(amount_spent1$amount_spent)
mean1 <- mean(amount_spent1$amount_spent)
#density/bell curve
ggplot(amount_spent1,aes(amount_spent)) +
geom_density( size=1.05, color="gray64", alpha=.5, fill="gray77") +
geom_vline(xintercept=mean1, alpha=.7, linetype="dashed", size=1.1, color="cadetblue4")+
geom_vline(xintercept=rand1, alpha=.7, linetype="dashed",size=1.1, color="red3")+
geom_area(mapping=aes(ifelse(amount_spent1$amount_spent > rand1,amount_spent1$amount_spent,0)), ymin=0, ymax=.03,fill="red",alpha=.3)+
ylab("")+
xlab("Amount spent on lobbying (in Millions USD)")+
scale_x_continuous(breaks=seq(0,1000,100))
There are a couple of questions that show this ... here and here, but they calculate the density prior to plotting.
This is another way, more complicated than required im sure, that allows ggplot to do some of the calculations for you.
# Your data
set.seed(100)
amount_spent1 <- data.frame(amount_spent=rnorm(1000, 500, 150))
mean1 <- mean(amount_spent1$amount_spent)
rand1 <- runif(1,0,1000)
Basic density plot
p <- ggplot(amount_spent1, aes(amount_spent)) +
geom_density(fill="grey") +
geom_vline(xintercept=mean1)
You can extract the x and y positions for the area to shade from the plot object using ggplot_build. Linear interpolation was used to get the y value at x=rand1
# subset region and plot
d <- ggplot_build(p)$data[[1]]
p <- p + geom_area(data = subset(d, x > rand1), aes(x=x, y=y), fill="red") +
geom_segment(x=rand1, xend=rand1,
y=0, yend=approx(x = d$x, y = d$y, xout = rand1)$y,
colour="blue", size=3)
I have some data I want to graph on a semi-log scale, however I get some artifacts when there is a large jump between points. On linear scale, a straight line is drawn between subsequent points, which is a fine approximation for visualization. However, the exact same thing is done when using the log scale (either by using scale_x_log10 or scale_x_continuous with a log transformation). A line between two points on the semi-log scale should show up curved. In other words, this:
df <- data.frame(x = c(0, 1), y = c(0, 1))
ggplot(data = df, aes(x, y)) + geom_line() + scale_x_log10(limits = c(10^-3, 10^0))
produces this:
when I would expect something more like this:
generated by this code:
df <- data.frame(x = seq(0, 1, 0.01), y = seq(0, 1, 0.01))
ggplot(data = df, aes(x, y)) + geom_line() + scale_x_log10(limits = c(10^-3, 10^0))
It's clear what's happening, but I'm not sure what the best way to fix the interpolation is. In the actual data I'm plotting there are a few jumps at various points, which makes the plots very misleading when trying to compare two lines. (They're ROC curves in this instance.)
One thought is I can search the data for jumps and fill in some interpolated points myself, but I'm hoping for a cleaner way that doesn't involve me adding in a bunch of fake data points.
What you describe is a transformation of the coordinate system, not a transformation of the scales. The distinction is that scale transformations take place before any statistical transformations, and coordinate transformations take place afterward. In this case, the "statistical transformation" is "draw a straight line between the points". With a transformed scale, the line is straight in the transformed (log) space; with a transformed coordinate, it is straight in the original (linear) space and therefore curved in log space.
# don't include 0 in the data because log 0 is -Inf
DF <- data.frame(x = c(0.1, 1), y = c(0.1, 1))
ggplot(data = DF, aes(x = x, y = y)) +
geom_line() +
coord_trans(x="log10")
I am trying to draw a curved line in ggplot2 which should look like this:
However, in ggplot2 I can only draw in the line in the following way:
Here is the code that I have used to create both pictures:
df1 <- data.frame(dollar = c(0,5,10,20,30), value = c(0,200,300, -100, -300))
# draw line graph with base plot
plot(y = df1$dollar, x = df1$emiss_red, type = "l")
# draw line graph with ggplot
ggplot() + geom_line(data = df1, aes(y = dollar, x = value), size =1)
Ggplot2 seems to order the data frame according to x value and then connect the points according to the x-value. However, I do not want my graph to be ordered.
Additionally, I do not want to flip the axis around, since dollar value must appear on the y-axis. Since I prefer to create these graphs in ggplot2, does anyone know how to accomplish this?
You just need to swap geom_line to geom_path. As noted in the documentation, geom_path connects "observations in original order", while geom_line connects "observations, ordered by x value".
So the last line would be
ggplot() + geom_path(data = df1, aes(y = dollar, x = value), size =1)