I'm quite new to ggplot but I like the systematic way how you build your plots. Still, I'm struggeling to achieve desired results. I can replicate plots where you have categorical data. However, for my use I often need to fit a model to certain observations and then highlight them in a combined plot. With the usual plot function I would do:
library(splines)
set.seed(10)
x <- seq(-1,1,0.01)
y <- x^2
s <- interpSpline(x,y)
y <- y+rnorm(length(y),mean=0,sd=0.1)
plot(x,predict(s,x)$y,type="l",col="black",xlab="x",ylab="y")
points(x,y,col="red",pch=4)
points(0,0,col="blue",pch=1)
legend("top",legend=c("True Values","Model values","Special Value"),text.col=c("red","black","blue"),lty=c(NA,1,NA),pch=c(4,NA,1),col=c("red","black","blue"),cex = 0.7)
My biggest problem is how to build the data frame for ggplot which automatically then draws the legend? In this example, how would I translate this into ggplot to get a similar plot? Or is ggplot not made for this kind of plots?
Note this is just a toy example. Usually the model values are derived from a more complex model, just in case you wante to use a stat in ggplot.
The key part here is that you can map colors in aes by giving a string, which will produce a legend. In this case, there is no need to include the special value in the data.frame.
df <- data.frame(x = x, y = y, fit = predict(s, x)$y)
ggplot(df, aes(x, y)) +
geom_line(aes(y = fit, col = 'Model values')) +
geom_point(aes(col = 'True values')) +
geom_point(aes(col = 'Special value'), x = 0, y = 0) +
scale_color_manual(values = c('True values' = "red",
'Special value' = "blue",
'Model values' = "black"))
Related
I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?
So:
my_data$year <- as.factor(my_data$year)
p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label")
p +my_theme()
works fine, but if I skip
my_data$year <- as.factor(my_data$year)
it doesn't work, I get one big fat violin for all years. Why?
TIA
You miss a ) at the end of this line p <- ggplot(data = my_data, aes(x = year, y = continuous_var)
I have construced a reproducible example with the ToothGrowth dataset:
This should work now:
library(ggplot2)
my_data <- ToothGrowth
my_data$dose <- as.factor(my_data$dose)
p <- ggplot(data = my_data, aes(x = dose, y = len))+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label") +
theme_bw()
p
PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.
I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.
I'm struggling with facet_wrap in R. It should be simple however the facet variable is not being picked up? Here is what I'm running:
plot = ggplot(data = item.household.descr.count, mapping = aes(x=item.household.descr.count$freq, y = item.household.descr.count$descr, color = item.household.descr.count$age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
I colored the faceting variable to try to help illustrate what is going on. The plot should have only one color in each facet instead of what you see here. Does anyone know what is going on?
This error is caused by fact that you are using $and data frame name to refer to your variables inside the aes(). Using ggplot() you should only use variables names in aes() as data frame is named already in data=.
plot = ggplot(data = item.household.descr.count,
mapping = aes(x=freq, y = descr, color = age.cat)) + geom_point()
plot = plot + facet_wrap(~ age.cat, ncol = 2)
plot
Here is an example using diamonds dataset.
diamonds2<-diamonds[sample(nrow(diamonds),1000),]
ggplot(diamonds2,aes(diamonds2$carat,diamonds2$price,color=diamonds2$color))+geom_point()+
facet_wrap(~color)
ggplot(diamonds2,aes(carat,price,color=color))+geom_point()+
facet_wrap(~color)
I want to use geom_dotplot to distinguish two different variables by shape of the dots (rather than colours as the documentation suggests). For example:
library(ggplot2)
set.seed(1)
x = rnorm(20)
y = rnorm(20)
df = data.frame(x,y)
ggplot(data = df) +
geom_dotplot(aes(x = x), fill = "red") +
geom_dotplot(aes(x=y), fill = "blue")
i.e. to distinguish the x and y in the below example
I want to set all the x to be dots, and y to be triangles.
Is this possible?
Thanks!
You could probably hack together something similar to what you want using the information from geom_dotplot plus base R's stripchart function.
#Save the dot plot in an object.
dotplot <- ggplot(data = df) +
geom_dotplot(aes(x = x), fill = "red") +
geom_dotplot(aes(x=y), fill = "blue")
#Use ggplot_build to save information including the x values.
dotplot_ggbuild <- ggplot_build(dotplot)
main_info_from_ggbuild_x <- dotplot_ggbuild$data[[1]]
main_info_from_ggbuild_y <- dotplot_ggbuild$data[[2]]
#Include only the first occurrence of each x value.
main_info_from_ggbuild_x <-
main_info_from_ggbuild_x[which(duplicated(main_info_from_ggbuild_x$x) == FALSE),]
main_info_from_ggbuild_y <-
main_info_from_ggbuild_y[which(duplicated(main_info_from_ggbuild_y$x) == FALSE),]
#To demonstrate, let's first roughly reproduce the original plot.
stripchart(rep(main_info_from_ggbuild_x$x,
times=main_info_from_ggbuild_x$count),
pch=19,cex=2,method="stack",at=0,col="red")
stripchart(rep(main_info_from_ggbuild_y$x,
times=main_info_from_ggbuild_y$count),
pch=19,cex=2,method="stack",at=0,col="blue",add=TRUE)
#Now, redo using what we actually want.
#You didn't specify if you want the circles and triangles filled or not.
#If you want them filled in, just change the pch values.
stripchart(rep(main_info_from_ggbuild_x$x,
times=main_info_from_ggbuild_x$count),
pch=21,cex=2,method="stack",at=0)
stripchart(rep(main_info_from_ggbuild_y$x,
times=main_info_from_ggbuild_y$count),
pch=24,cex=2,method="stack",at=0,add=TRUE)
I'm having a filled ggplot contour plot, depicting continuous R-squared values on a 100x100 grid. By default the legend depicts the values in a gradient like continuous manner, resulting from the data's continuous nature.
I, however, would like to classify and visualize the data in classes, each covering a range of 0.05 R-squared values (i.e. class 1 = 0.00-0.05, class 2 = 0.05-0.10, etc.). I have tried several commands such as scale_fill_brewer and scale_fill_gradient2. The latter does in fact generate some sort of discrete classes, but the class labels depict the break values rather than the class range. Scale_fill_brewer returns the error that the continuous data is forced on to a discrete scale, which makes sense, although I can't see how to work around it.
To make matters more complex, I prefer to make use of a diversified color palette to allow identification of specific classes more easily. Besides, I have a multitude of different contour plots with different maximum R-squared values. So ideally the code is generic and can be easily used for the other plots as well.
Thus far, this is the code I have:
library(scales)
library(ggplot2)
p1 <- ggplot(res, aes(x=Var1, y=Var2, fill=R2)) +
geom_tile
p1 +
theme(axis.text.x=element_text(angle=+90)) +
geom_vline(xintercept=c(seq(from = 1, to = 101, by = 5)),color="#8C8C8C") +
geom_hline(yintercept=c(seq(from = 1, to = 101, by = 5)),color="#8C8C8C") +
labs(list(title = "Contour plot of R^2 values for all possible correlations between Simple Ratio indices & Nitrogen Content", x = "Wavelength 1 (nm)", y = "Wavelength 2 (nm)")) +
scale_x_discrete(breaks = c("b450","b475","b500","b525","b550","b575","b600","b625","b650","b675","b700","b725","b750","b775","b800","b825","b850","b875","b900","b925","b950")) +
scale_y_discrete(breaks = c("b450","b475","b500","b525","b550","b575","b600","b625","b650","b675","b700","b725","b750","b775","b800","b825","b850","b875","b900","b925","b950")) +
scale_fill_continuous(breaks = c(seq(from = 0, to = 0.7, by = 0.05)), low = "black", high = "green")
The output at this point looks like this:
You might what to use scale_fill_gradientn() Untested since you failed to provide a reproducible example
library(scales)
library(ggplot2)
ggplot(res, aes(x=Var1, y=Var2, fill=R2)) +
geom_tile() +
scale_fill_gradientn(
colours = terrain.colors(15),
breaks = seq(from = 0, to = 0.7, by = 0.05)
)
I have question probably similar to Fitting a density curve to a histogram in R. Using qplot I have created 7 histograms with this command:
(qplot(V1, data=data, binwidth=10, facets=V2~.)
For each slice, I would like to add a fitting gaussian curve. When I try to use lines() method, I get error:
Error in plot.xy(xy.coords(x, y), type = type, ...) :
plot.new has not been called yet
What is the command to do it correctly?
Have you tried stat_function?
+ stat_function(fun = dnorm)
You'll probably want to plot the histograms using aes(y = ..density..) in order to plot the density values rather than the counts.
A lot of useful information can be found in this question, including some advice on plotting different normal curves on different facets.
Here are some examples:
dat <- data.frame(x = c(rnorm(100),rnorm(100,2,0.5)),
a = rep(letters[1:2],each = 100))
Overlay a single normal density on each facet:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
stat_function(fun = dnorm, colour = "red")
From the question I linked to, create a separate data frame with the different normal curves:
grid <- with(dat, seq(min(x), max(x), length = 100))
normaldens <- ddply(dat, "a", function(df) {
data.frame(
predicted = grid,
density = dnorm(grid, mean(df$x), sd(df$x))
)
})
And plot them separately using geom_line:
ggplot(data = dat,aes(x = x)) +
facet_wrap(~a) +
geom_histogram(aes(y = ..density..)) +
geom_line(data = normaldens, aes(x = predicted, y = density), colour = "red")
ggplot2 uses a different graphics paradigm than base graphics. (Although you can use grid graphics with it, the best way is to add a new stat_function layer to the plot. The ggplot2 code is the following.
Note that I couldn't get this to work using qplot, but the transition to ggplot is reasonably straighforward, the most important difference is that your data must be in data.frame format.
Also note the explicit mapping of the y aesthetic aes=aes(y=..density..)) - this is slighly unusual but takes the stat_function results and maps it to the data:
library(ggplot2)
data <- data.frame(V1 <- rnorm(700), V2=sample(LETTERS[1:7], 700, replace=TRUE))
ggplot(data, aes(x=V1)) +
stat_bin(aes(y=..density..)) +
stat_function(fun=dnorm) +
facet_grid(V2~.)