I struggle with plotting a quadratic fit in ggplot, as the line between the overlapping x-values jumps back and forth between the upper and lower side of the curve.
However, doing the same in base plot, it works, which makes me think I am overlooking something (possibly really stupid) in ggplot. Could anybody guide me towards how to receive a propper line in ggplot?
I unfortunately don`t know how to reproduce the exact problem, so just add code for a similar shaped "curve":
library(ggplot2)
x1 <- log(c(1:100, 99:1))
y1 <- log(seq(0.22, 0.2, length.out = 199))
dat <- data.frame(x = x1, y = y1)
ggplot(data = dat, aes(x = x, y = y)) + geom_line()
plot(y1 ~ x1, type = "l")
Thanks a lot in advance!
Try geom_path() instead.
library(ggplot2)
x1 <- log(c(1:100, 99:1))
y1 <- log(seq(0.22, 0.2, length.out = 199))
dat <- data.frame(x = x1, y = y1)
ggplot(data = dat, aes(x = x, y = y)) + geom_path()
plot(y1 ~ x1, type = "l")
geom_path() connects the observations in the order in which they appear in the data. geom_line() connects them in order of the variable on the x axis.
Documentation.
Related
Recently, I encountered a question in ggplot2 field. It's confused for me that everytime I plot first plot with ggplot names "pic1"(the result of running is okay), and then I plotted second one with ggplot2 called "pic2". Of course, the "pic2" is good. But at this moment, I check "pic1", I found the regression line became a vertical line.For example:
"pic1"
p <- ggplot()
p <- p + geom_line(data = MyData, aes(x = otherCrop, y = eta ))
p <- p+ geom_point(data = dat,aes(x =otherCrop,
y = dat$sumEnemies, colour = YEAR ),position = position_jitter(width = .01),size = 1)
p <- p+labs(colour = "年份\nYear") + theme_classic(base_size=18) +
theme(axis.title.x=element_text( vjust=0))
p=p + theme(text=element_text(family="Times", size=18))
pic1=p
"pic2"
p <- ggplot()
p <- p + geom_line(data = MyData, aes(x = SHDI, y = eta ))
p <- p+ geom_point(data = dat,aes(x = dat$SHDI,
y = eta,colour = YEAR ),position = position_jitter(width = .01),size = 1)
p <- p+labs(colour = "年份\nYear") + theme_classic(base_size=18) +
theme(axis.title.x=element_text( vjust=0))
p=p + theme(text=element_text(family="Times", size=18))
pic2=p
But at this moment, I started to review "pic1", I found it as below:
It became a strange short vertical line. This would be difficult because I cannot plot them in a same paper. Does anybody know what's the problem?
I think this is a great example of why using the dataframe$column syntax inside an aes call is discouraged: it makes your plot vulnerable to subsequent changes in your data. Here's a simple example. Start with a data frame with columns x and y:
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10)
Now make a ggplot, but instead of using aes(x = x, y = y), we make the mistake of doing aes(x = df$x, y = df$y):
vulnerable_plot <- ggplot()
vulnerable_plot <- vulnerable_plot + geom_line(data = df, aes(x = df$x, y = df$y))
pic1 <- vulnerable_plot
Now we review our plot. Sure, ggplot nags us to say we shouldn't use this syntax, but the plot looks fine, so who cares, right?
pic1
#> Warning: Use of `df$x` is discouraged. Use `x` instead.
#> Warning: Use of `df$y` is discouraged. Use `y` instead.
Now, let's make pic2 identical to pic1 except we use the correct syntax:
invulnerable_plot <- ggplot()
invulnerable_plot <- invulnerable_plot + geom_line(data = df, aes(x = x, y = y))
pic2 <- invulnerable_plot
Now we don't get any warning, but the plot looks the same.
pic2
So there's no difference between pic1 and pic2. Or is there? What happens when we change our data frame?
df$y <- 10:1
vulnerable_plot
Oh dear. Our first plot has changed because the plot object has a reference to an external variable that it relies on to build the plot. That's not what we wanted.
However, with the version where we used the correct syntax, a copy of the data was taken and is kept with the plot data, so it remains unaffected by subsequent changes to df:
invulnerable_plot
Created on 2020-08-23 by the reprex package (v0.3.0)
I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)
I'm analyzing a series that varies around zero. And to see where there are parts of the series with a tendency to be mostly positive or mostly negative I'm plotting a geom_smooth. I was wondering if it is possible to have the color of the smooth line be dependent on whether or not it is above or below 0. Below is some code that produces a graph much like what I am trying to create.
set.seed(5)
r <- runif(22, max = 5, min = -5)
t <- rep(-5:5,2)
df <- data.frame(r+t,1:22)
colnames(df) <- c("x1","x2")
ggplot(df, aes(x = x2, y = x1)) + geom_hline() + geom_line() + geom_smooth()
I considered calculating the smoothed values, adding them to the df and then using a scale_color_gradient, but I was wondering if there is a way to achieve this in ggplot directly.
You may use the n argument in geom_smooth to increase "number of points to evaluate smoother at" in order to create some more y values close to zero. Then use ggplot_build to grab the smoothed values from the ggplot object. These values are used in a geom_line, which is added on top of the original plot. Last we overplot the y = 0 values with the geom_hline.
# basic plot with a larger number of smoothed values
p <- ggplot(df, aes(x = x2, y = x1)) +
geom_line() +
geom_smooth(linetype = "blank", n = 10000)
# grab smoothed values
df2 <- ggplot_build(p)[[1]][[2]][ , c("x", "y")]
# add smoothed values with conditional color
p +
geom_line(data = df2, aes(x = x, y = y, color = y > 0)) +
geom_hline(yintercept = 0)
Something like this:
# loess data
res <- loess.smooth(df$x2, df$x1)
res <- data.frame(do.call(cbind, res))
res$posY <- ifelse(res$y >= 0, res$y, NA)
res$negY <- ifelse(res$y < 0, res$y, NA)
# plot
ggplot(df, aes(x = x2, y = x1)) +
geom_hline() +
geom_line() +
geom_line(data=res, aes(x = x, y = posY, col = "green")) +
geom_line(data=res, aes(x = x, y = negY, col = "red")) +
scale_color_identity()
I created a ggplot with linear geom_smooth now i would like to have the points, from the geom_point to have a different colour below and above the linear smooth line.
I know I can add the color to the point by doing geom_point(aes(x, y, colour = z)). My problem is how to determine if a point in the plot is below or above the linear line.
Can ggplot2 do this or do have to create a new column in the data frame first?
Below is the sample code with geom_smooth but without the different colours above and below the line.
Any help is appreciated.
library(ggplot2)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
ggplot(df, aes(x,y)) +
geom_point() +
geom_smooth(method = "lm")
I believe ggplot2 can't do this for you. As you say, you could create a new variable in df to make the colouring. You can do so, based on the residuals of the linear model.
For example:
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm")
Note that the colour argument has to be passed to geom_point(), otherwise geom_smooth() will produce a fit to each group separately.
Result:
I am very new to R and I am trying to plot a third variable to a plot using ggplot2. I have searched for an answer and I could not find anything similar (or I didn't know the right words to search).
I have three columns of data which will be my x, y and z variable.
I want a graph that can show the values for x and y axis (as in the first and second column variables). However, I want the "points" (as a scatter plot) in the graph to be the values shown in variable z. Is there a way of doing that?
Everything that I have tried plot x against y.
Thanks for any help!
I believe this is what you are asking: Map two variables: (x,y) in their axis and display the "text" of a third variable.
Let's use this data frame - We'll try to "write" X1 and X3
df <- data.frame(X1 = 1:5, X2 = 2*1:5, X3 = rnorm(1:5))
With base graphics you can just plot one character
plot(df$X1, df$X2, pch = paste(df$X1)) plot(df$X1, df$X2, pch = paste(df$X3))
doesn't seem to work well.
Using ggplot2:
ggplot(df, aes(x = X1, y = X2)) + geom_text(label = df$X1)
ggplot(df, aes(x = X1, y = X2)) + geom_text(label = df$X3)
a fancier alternative is adding colour in the aes()
ggplot(df, aes(x = X1, y = X2, color=X3)) + geom_text(label = df$X3)
I want the "points" (as a scatter plot) in the graph to be the values shown in variable z. Is there a way of doing that?
Definitely. The bit that you need to think about is how to present the data in your z variable. By that I mean do you want the information in z to be shown by the points' colour, size or area? There are some great examples of how to do this at the R cookbook.
If you have a data frame called my.data, which has columns x, y, and z, you need to set up your plot like this:
my.plot <- ggplot(data = my.data,
aes(x = x,
y = y))
The example above says "plot the data in my.data using my.data$x to set the x location and y.data$y to set the y location". If your x variable was grid.x and y was grid.y you would have
my.plot <- ggplot(data = my.data,
aes(x = grid.x,
y = grid.y))
then you need to add your points. This time we'll assume that the information in z is going to used to set the colour of the points, which in this case is the colour aesthetic:
my.plot <- my.plot + geom_point(aes(colour = z))
print(my.plot)
And that should be that. You don't need to tell geom_point() what x and y are, because you already did that when you set up the plot.