I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)
Related
I've been trying to use the function trans_new with the scales package however I can't get it to display labels correctly
# percent to fold change
fun1 <- function(x) (x/100) + 1
# fold change to percent
inv_fun1 <- function(x) (x - 1) * 100
percent_to_fold_change_trans <- trans_new(name = "transform", transform = fun1, inverse = inv_fun1)
plot_data <- data.frame(x = 1:10,
y = inv_fun1(1:10))
# Plot raw data
p1 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point()
# This doesn't really change the plot
p2 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = percent_to_fold_change_trans)
p1 and p2 are identical whereas I'm expecting p2 to be a diagonal line since we are reversing the inverting function. If I replace the inverse parameter in trans_new with another function (like fun(x) x) I can see the correct transformation but the labels are completely off. Any ideas of how to define the inverse parameters to get the right label positions?
You wouldn't expect a linear function like fun1 to change the appearance of the y axis. Remember, you are not transforming the data, you are transforming the y axis. This means that you are effectively changing the positions of the horizontal gridlines, but not the values they represent.
Any function that produces a linear transformation will result in fixed spacing between the horizontal grid lines, which is what you have already. The plot therefore won't change.
Let's take a simple example:
plot_data <- data.frame(x = 1:10, y = 1:10)
p <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(breaks = 1:10)
p
Now let's create a straightforward non-linear transformation:
little_trans <- trans_new(name = "transform",
transform = function(x) x^2,
inverse = function(x) sqrt(x))
p + coord_trans(y = little_trans)
Note the values on the y axis are the same, but because we applied a non-linear transformation, the distances between the gridlines now varies.
In fact, if we plot a transformed version of our data, we would get the same shape:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2)
In a sense, this is all that the transform does, except it applies the inverse transform to the axis labels. We could do that manually here:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2, labels = sqrt((1:10)^2))
Now, suppose I instead do a more complicated but linear function of x:
little_trans <- trans_new(name = "transform",
transform = function(x) (0.1 * x + 20) / 3,
inverse = function(x) (x * 3 - 20) / 0.1)
ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = little_trans)
It's unchanged from before. We can see why if we again apply our transform directly:
ggplot(plot_data, aes(x = x, y = (0.1 * y + 20) / 3)) +
geom_point() +
scale_y_continuous(breaks = (0.1 * (1:10) + 20) / 3)
Obviously, if we do the inverse transform on the axis labels we will have 1:10, which means we will just have the original plot back.
The same holds true for any linear transform, and therefore the results you are getting are exactly what are to be expected.
I just joined the community and looking forward to get some help for the data analysis for my master thesis.
At the moment I have the following problem:
I plotted 42 varieties with ggplot by using facet_wrap:
`ggplot(sumfvvar,aes(x=TemperaturCmean,y=Fv.Fm,col=treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety)`
That works very well, but I would like to annotate the r squared values for the regression lines. I have two treatments and 42 varieties, therefore 84 regression lines.
Are there any possibilties to calculate all r squared values and integrate them into the ggplot? I found allready the function
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
but that works just for one variety and one treatment. Could be a loop for the lm() function an option?
Here is an example with the ggpmisc package:
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
formula <- y ~ poly(x, 1, raw = TRUE)
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, parse = TRUE,
mapping = aes(label = stat(rr.label)))
You can't apply different labels to different facet, unless you add another r^2 column to your data.. One way is to use geom_text, but you need to calculate the stats you need first. Below I show an example with iris, and for your case, just change Species for Variety, and so on
library(tidyverse)
# simulate data for 2 treatments
# d2 is just shifted up from d1
d1 <- data.frame(iris,Treatment="A")
d2 <- data.frame(iris,Treatment="B") %>%
mutate(Sepal.Length=Sepal.Length+rnorm(nrow(iris),1,0.5))
# combine datasets
DF <- rbind(d1,d2) %>% rename(Variety = Species)
# plot like you did
# note I use "free" scales, if scales very different between Species
# your facet plots will be squished
g <- ggplot(DF,aes(x=Sepal.Width,y=Sepal.Length,col=Treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety,scales="free")
# rsq function
RSQ = function(y,x){signif(summary(lm(y ~ x))$adj.r.squared, 3)}
#calculate rsq for variety + treatment
STATS <- DF %>%
group_by(Variety,Treatment) %>%
summarise(Rsq=RSQ(Sepal.Length,Sepal.Width)) %>%
# make a label
# one other option is to use stringr::str_wrap in geom_text
mutate(Label=paste("Treat",Treatment,", Rsq=",Rsq))
# set vertical position of rsq
VJUST = ifelse(STATS$Treatment=="A",1.5,3)
# finally the plot function
g + geom_text(data=STATS,aes(x=-Inf,y=+Inf,label=Label),
hjust = -0.1, vjust = VJUST,size=3)
For the last geom_text() call, I allowed the y coordinates of the text to be different by multiplying the Treatment.. You might need to adjust that depending on your plot..
I am having a problem where geom_smooth() is not working on my ggplot2.
But instead of a smooth curve, there is a fold.
My X-axis variable is the factor variable(I've tried to convert it to a numerical variable, but it didn't work), and Y-axis is numeric variable.
My data.frame is that
ggplot(tmp, aes(x = x, y = y))+
geom_point()+
geom_smooth(formula = y ~ x, method = "loess", stat = "identity", se = T, group = "")
I hope to get a pic like this.
A quick fix will be to wrap the group inside aes. Generated a data similar to the structure you have (a factor x variable and a numeric y var).
set.seed(777)
x <- rep(c(LETTERS[1:7]), 3)
y <- rnorm(21, mean = 0, sd = 1)
tmp <- data.frame(x,y)
# -------------------------------------------------------------------------
base <- ggplot(tmp, aes(x = x, y = y))+geom_point()
base + geom_smooth(formula = y ~ x, method = "loess",se = TRUE, aes(group = "" ), level = 0.95) + theme_bw()
If you want to use a different level of confidence interval, you can change the value of level (which is a 95% by default).
Output
I'm analyzing a series that varies around zero. And to see where there are parts of the series with a tendency to be mostly positive or mostly negative I'm plotting a geom_smooth. I was wondering if it is possible to have the color of the smooth line be dependent on whether or not it is above or below 0. Below is some code that produces a graph much like what I am trying to create.
set.seed(5)
r <- runif(22, max = 5, min = -5)
t <- rep(-5:5,2)
df <- data.frame(r+t,1:22)
colnames(df) <- c("x1","x2")
ggplot(df, aes(x = x2, y = x1)) + geom_hline() + geom_line() + geom_smooth()
I considered calculating the smoothed values, adding them to the df and then using a scale_color_gradient, but I was wondering if there is a way to achieve this in ggplot directly.
You may use the n argument in geom_smooth to increase "number of points to evaluate smoother at" in order to create some more y values close to zero. Then use ggplot_build to grab the smoothed values from the ggplot object. These values are used in a geom_line, which is added on top of the original plot. Last we overplot the y = 0 values with the geom_hline.
# basic plot with a larger number of smoothed values
p <- ggplot(df, aes(x = x2, y = x1)) +
geom_line() +
geom_smooth(linetype = "blank", n = 10000)
# grab smoothed values
df2 <- ggplot_build(p)[[1]][[2]][ , c("x", "y")]
# add smoothed values with conditional color
p +
geom_line(data = df2, aes(x = x, y = y, color = y > 0)) +
geom_hline(yintercept = 0)
Something like this:
# loess data
res <- loess.smooth(df$x2, df$x1)
res <- data.frame(do.call(cbind, res))
res$posY <- ifelse(res$y >= 0, res$y, NA)
res$negY <- ifelse(res$y < 0, res$y, NA)
# plot
ggplot(df, aes(x = x2, y = x1)) +
geom_hline() +
geom_line() +
geom_line(data=res, aes(x = x, y = posY, col = "green")) +
geom_line(data=res, aes(x = x, y = negY, col = "red")) +
scale_color_identity()
I created a ggplot with linear geom_smooth now i would like to have the points, from the geom_point to have a different colour below and above the linear smooth line.
I know I can add the color to the point by doing geom_point(aes(x, y, colour = z)). My problem is how to determine if a point in the plot is below or above the linear line.
Can ggplot2 do this or do have to create a new column in the data frame first?
Below is the sample code with geom_smooth but without the different colours above and below the line.
Any help is appreciated.
library(ggplot2)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
ggplot(df, aes(x,y)) +
geom_point() +
geom_smooth(method = "lm")
I believe ggplot2 can't do this for you. As you say, you could create a new variable in df to make the colouring. You can do so, based on the residuals of the linear model.
For example:
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm")
Note that the colour argument has to be passed to geom_point(), otherwise geom_smooth() will produce a fit to each group separately.
Result: