I'm using ggplot2 and ggmics to generate a trend line.
library(ggplot2)
library(ggpmisc)
library(scales)
x <- c(5,2,6,8,9,1,3,6,8,2)
y <- c(4,7,2,5,7,9,5,2,1,3)
df <- data.frame(x,y)
g <- ggplot(df,aes(x,y))
g <- g + geom_point(colour = "black")
g <- g + stat_smooth(method = lm, formula = y ~ x, se = FALSE)
g <- g + stat_poly_eq(formula = y ~ x,
aes(label = paste(stat(eq.label),stat(rr.label),stat(adj.rr.label),stat(p.value.label),sep = "~~~")),
label.x = "right",label.y = "bottom",parse = TRUE)
gg <- g + coord_trans(y = "identity")
gg <- g + coord_trans(y = "log")
gg <- g + scale_y_log10(breaks=10^(0:3),
labels=trans_format("log10",math_format(10^.x)))
For g, the label is y = 6.26 - 0.351x
For gg, it is y = 0.782 - 0.0419x
When I use exponential notation, the value of eq.label changes. I think the eq.label should not change because I am only changing the axis scale. Can you tell me why?
Main thing to understand is in my opinion this information provided on the link below:
The difference between transforming the scales and transforming the
coordinate system is that scale transformation occurs BEFORE
statistics, and coordinate transformation afterwards. Coordinate
transformation also changes the shape of geoms:
In case of transforamtion (scale) is BEFORE statistics, decreasing the errors of the sum of squares is performed on the transformed data.This would be ok if the relation is linear to the log of the variable.
This changes if you transform the coordinates because here the statistics is performed AFTER transformation. e.g. decreasing the errors of sum of squares is performed on the untransformed data.
See here: https://ggplot2.tidyverse.org/reference/coord_trans.html
Related
I've been trying to use the function trans_new with the scales package however I can't get it to display labels correctly
# percent to fold change
fun1 <- function(x) (x/100) + 1
# fold change to percent
inv_fun1 <- function(x) (x - 1) * 100
percent_to_fold_change_trans <- trans_new(name = "transform", transform = fun1, inverse = inv_fun1)
plot_data <- data.frame(x = 1:10,
y = inv_fun1(1:10))
# Plot raw data
p1 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point()
# This doesn't really change the plot
p2 <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = percent_to_fold_change_trans)
p1 and p2 are identical whereas I'm expecting p2 to be a diagonal line since we are reversing the inverting function. If I replace the inverse parameter in trans_new with another function (like fun(x) x) I can see the correct transformation but the labels are completely off. Any ideas of how to define the inverse parameters to get the right label positions?
You wouldn't expect a linear function like fun1 to change the appearance of the y axis. Remember, you are not transforming the data, you are transforming the y axis. This means that you are effectively changing the positions of the horizontal gridlines, but not the values they represent.
Any function that produces a linear transformation will result in fixed spacing between the horizontal grid lines, which is what you have already. The plot therefore won't change.
Let's take a simple example:
plot_data <- data.frame(x = 1:10, y = 1:10)
p <- ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
scale_y_continuous(breaks = 1:10)
p
Now let's create a straightforward non-linear transformation:
little_trans <- trans_new(name = "transform",
transform = function(x) x^2,
inverse = function(x) sqrt(x))
p + coord_trans(y = little_trans)
Note the values on the y axis are the same, but because we applied a non-linear transformation, the distances between the gridlines now varies.
In fact, if we plot a transformed version of our data, we would get the same shape:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2)
In a sense, this is all that the transform does, except it applies the inverse transform to the axis labels. We could do that manually here:
ggplot(plot_data, aes(x = x, y = y^2)) +
geom_point() +
scale_y_continuous(breaks = (1:10)^2, labels = sqrt((1:10)^2))
Now, suppose I instead do a more complicated but linear function of x:
little_trans <- trans_new(name = "transform",
transform = function(x) (0.1 * x + 20) / 3,
inverse = function(x) (x * 3 - 20) / 0.1)
ggplot(plot_data, aes(x = x, y = y)) +
geom_point() +
coord_trans(y = little_trans)
It's unchanged from before. We can see why if we again apply our transform directly:
ggplot(plot_data, aes(x = x, y = (0.1 * y + 20) / 3)) +
geom_point() +
scale_y_continuous(breaks = (0.1 * (1:10) + 20) / 3)
Obviously, if we do the inverse transform on the axis labels we will have 1:10, which means we will just have the original plot back.
The same holds true for any linear transform, and therefore the results you are getting are exactly what are to be expected.
I just joined the community and looking forward to get some help for the data analysis for my master thesis.
At the moment I have the following problem:
I plotted 42 varieties with ggplot by using facet_wrap:
`ggplot(sumfvvar,aes(x=TemperaturCmean,y=Fv.Fm,col=treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety)`
That works very well, but I would like to annotate the r squared values for the regression lines. I have two treatments and 42 varieties, therefore 84 regression lines.
Are there any possibilties to calculate all r squared values and integrate them into the ggplot? I found allready the function
ggplotRegression <- function (fit) {
require(ggplot2)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1])) +
geom_point() +
stat_smooth(method = "lm") +
labs(title = paste("Adj R2 = ",signif(summary(fit)$adj.r.squared, 5),
"Intercept =",signif(fit$coef[[1]],5 ),
" Slope =",signif(fit$coef[[2]], 5),
" P =",signif(summary(fit)$coef[2,4], 5)))
}
but that works just for one variety and one treatment. Could be a loop for the lm() function an option?
Here is an example with the ggpmisc package:
library(ggpmisc)
set.seed(4321)
x <- 1:100
y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4)
my.data <- data.frame(x = x,
y = y,
group = c("A", "B"))
formula <- y ~ poly(x, 1, raw = TRUE)
ggplot(my.data, aes(x, y)) +
facet_wrap(~ group) +
geom_point() +
geom_smooth(method = "lm", formula = formula) +
stat_poly_eq(formula = formula, parse = TRUE,
mapping = aes(label = stat(rr.label)))
You can't apply different labels to different facet, unless you add another r^2 column to your data.. One way is to use geom_text, but you need to calculate the stats you need first. Below I show an example with iris, and for your case, just change Species for Variety, and so on
library(tidyverse)
# simulate data for 2 treatments
# d2 is just shifted up from d1
d1 <- data.frame(iris,Treatment="A")
d2 <- data.frame(iris,Treatment="B") %>%
mutate(Sepal.Length=Sepal.Length+rnorm(nrow(iris),1,0.5))
# combine datasets
DF <- rbind(d1,d2) %>% rename(Variety = Species)
# plot like you did
# note I use "free" scales, if scales very different between Species
# your facet plots will be squished
g <- ggplot(DF,aes(x=Sepal.Width,y=Sepal.Length,col=Treatment))+
geom_point(shape=1,size=1)+
geom_smooth(method=lm)+
scale_color_brewer(palette = "Set1")+
facet_wrap(.~Variety,scales="free")
# rsq function
RSQ = function(y,x){signif(summary(lm(y ~ x))$adj.r.squared, 3)}
#calculate rsq for variety + treatment
STATS <- DF %>%
group_by(Variety,Treatment) %>%
summarise(Rsq=RSQ(Sepal.Length,Sepal.Width)) %>%
# make a label
# one other option is to use stringr::str_wrap in geom_text
mutate(Label=paste("Treat",Treatment,", Rsq=",Rsq))
# set vertical position of rsq
VJUST = ifelse(STATS$Treatment=="A",1.5,3)
# finally the plot function
g + geom_text(data=STATS,aes(x=-Inf,y=+Inf,label=Label),
hjust = -0.1, vjust = VJUST,size=3)
For the last geom_text() call, I allowed the y coordinates of the text to be different by multiplying the Treatment.. You might need to adjust that depending on your plot..
I am trying to do a exponential regression in ggplot2. So first my skript:
g <- ggplot(data, aes(x=datax, y=datay), color="black") +
geom_point(shape=1) + stat_smooth(method = 'nls', formula = y~a*exp(b*x), aes(colour = 'Exponential'), se = FALSE)
g <- g + theme_classic()
g <- g + theme(panel.grid.major=element_blank())
g <- g + theme(panel.grid.minor=element_blank())
g <- g + theme(axis.line.x=element_line(color="black"),
axis.line.y=element_line(color="black"),
panel.border=element_blank(),
panel.background=element_blank())
g <- g + labs(x="\ndatax",y="datay\n")
g <- g + theme(axis.text.y=element_text(size=14))
g <- g + theme(axis.text.x=element_text(size=14))
g <- g + theme(axis.title.y=element_text(size=18,vjust=1))
g <- g + theme(axis.title.x=element_text(size=18,vjust=1))
g
This is the image that I got
As a R-beginner I did the script by mixing scripts of mine and the internet. I always get the following error:
"In (function (formula, data = parent.frame(), start, control = nls.control(), : No starting values specified for some parameters.
Initializing ‘a’, ‘b’ to '1.'.Consider specifying 'start' or using a selfStart model"
I did not found a better way to do the exponential graph, yet.
In addition, I would like to change the color of the graph into black and delete the legend and I would like to have the R² and p value in the graph. (maybe as well the confidence intervals?)
It's not easy to answer without a reproducible example and so many questions.
Are you sure the message you reported is an error and not a warning instead? On my own laptop, with dataset 'iris', I got a warning...
However, how you can read on ?nls page on R documentation, you should provide through the parameter "start" an initial value for starting the estimates to help finding the convergence. If you don't provide it, nls() itself should use some dummy default values (in your case, a and b are set to 1).
You could try something like this:
g <- ggplot(data, aes(x=datax, y=datay), color="black") +
geom_point(shape=1) + stat_smooth(method = 'nls',
method.args = list(start = c(a=1, b=1)),
formula = y~a*exp(b*x), colour = 'black', se = FALSE)
You told R that the colour of the plot is "Exponential", I think that so is going to work (I tried with R-base dataset 'iris' and worked).
You can notice that I passed the start parameter as an element of a list passed to 'method.args': this is a new feature in ggplot v2.0.0.
Hope this helps
Edit:
just for completeness, I attach the code I reproduced on my laptop with default dataset: (please take care that it has no sense an exponential fit with such a dataset, but the code runs without warning)
library(ggplot2)
data('iris')
g1 <- ggplot(data=iris, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_point(color='green') +geom_smooth(method = 'nls',
method.args = list(start=c(a=1, b=1)), se = FALSE,
formula = y~a*exp(b*x), colour='black')
g1
I have a very simple question but so far couldn't find easy solution for that. Let's say I have a some data that I want to fit and show its x axis value where y is in particular value. In this case let's say when y=0 what is the x value. Model is very simple y~x for fitting but I don't know how to estimate x value from there. Anyway,
sample data
library(ggplot2)
library(scales)
df = data.frame(x= sort(10^runif(8,-6,1),decreasing=TRUE), y = seq(-4,4,length.out = 8))
ggplot(df, aes(x = x, y = y)) +
geom_point() +
#geom_smooth(method = "lm", formula = y ~ x, size = 1,linetype="dashed", col="black",se=FALSE, fullrange = TRUE)+
geom_smooth(se=FALSE)+
labs(title = "Made-up data") +
scale_x_log10(breaks = c(1e-6,1e-4,1e-2,1),
labels = trans_format("log10", math_format(10^.x)),limits = c(1e-6,1))+
geom_hline(yintercept=0,linetype="dashed",colour="red",size=0.6)
I would like to convert 1e-10 input to 10^-10 format and annotate it on the plot. As I indicated in the plot.
thanks in advance!
Because geom_smooth() uses R functions to calculate the smooth line, you can attain the predicted values outside the ggplot() environment. One option is then to use approx() to get a linear approximations of the x-value, given the predicted y-value 0.
# Define formula
formula <- loess(y~x, df)
# Approximate when y would be 0
xval <- approx(x = formula$fitted, y = formula$x, xout = 0)$y
# Add to plot
ggplot(...) + annotate("text", x = xval, y = 0 , label = yval)
I created a ggplot with linear geom_smooth now i would like to have the points, from the geom_point to have a different colour below and above the linear smooth line.
I know I can add the color to the point by doing geom_point(aes(x, y, colour = z)). My problem is how to determine if a point in the plot is below or above the linear line.
Can ggplot2 do this or do have to create a new column in the data frame first?
Below is the sample code with geom_smooth but without the different colours above and below the line.
Any help is appreciated.
library(ggplot2)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
ggplot(df, aes(x,y)) +
geom_point() +
geom_smooth(method = "lm")
I believe ggplot2 can't do this for you. As you say, you could create a new variable in df to make the colouring. You can do so, based on the residuals of the linear model.
For example:
library(ggplot2)
set.seed(2015)
df <- data.frame(x = rnorm(100),
y = rnorm(100))
# Fit linear regression
l = lm(y ~ x, data = df)
# Make new group variable based on residuals
df$group = NA
df$group[which(l$residuals >= 0)] = "above"
df$group[which(l$residuals < 0)] = "below"
# Make the plot
ggplot(df, aes(x,y)) +
geom_point(aes(colour = group)) +
geom_smooth(method = "lm")
Note that the colour argument has to be passed to geom_point(), otherwise geom_smooth() will produce a fit to each group separately.
Result: