After many google searches I decided to ask for your help, guys.
I am plotting just some observations at different time points and I want to add a linear regression with stat_smooth. However, I want the linear model with the intercept at 100 (because data are percentage relative to time 0). To do that, I found that the easiest way is to use the offset parameter in lm. The problem is how to get the number of 'y' observations per group(col and facet groups) to pass it to offset parameter.
If I use data with the same number of observations per group (10 in my case), I can just write the number and it works great:
myplot <- ggplot(mydt2, aes(x=Time_point, y=GFP_rel, col=Gene, fill=Gene,group=Gene))
myplot <- myplot + stat_smooth(method='lm', formula = y ~ x + 0, method.args=list(offset=rep(100,10))) +
facet_wrap(~Cell_line)
However, this is not very elegant and/or flexible. My question is: how can I pass the number of observations to method.args? I tried offset(100,..count..), but I get the error: (list) object cannot be coerced to type 'integer').
Any suggestions?
Thanks
You can use the I(y - 100) coding in the formula as shown here instead of using an offset.
However, the predicted values for stat_smooth will then be predictions for y - 100, not y. This line will go through 0. You can move the lines back to the position to display predictions of the original y variable using position_nudge.
So the stat_smooth code would look something like
stat_smooth(method = "lm", formula = I(y - 100) ~ x + 0,
position = position_nudge(y = 100))
Related
I want to represent three lines on a graph overlain with datapoints that I used in a discriminant function analysis. From my analysis, I have two points that fall on each line and I want to represent these three lines. The lines represent the probability contours of the classification scheme and exactly how I got the points on the line are not relevant to my question here. However, I want the lines to extend further than the points that define them.
df <-
data.frame(Prob = rep(c("5", "50", "95"), each=2),
Wing = rep(c(107,116), 3),
Bill = c(36.92055, 36.12167, 31.66012, 30.86124, 26.39968, 25.6008))
ggplot()+
geom_line(data=df, aes(x=Bill, y=Wing, group=Prob, color=Prob))
The above df is a dataframe for my points from which the three lines are constructed. I want the lines to extend from y=105 to y=125.
Thanks!
There are probably more idiomatic ways of doing it but this is one way to get it done.
In short you quickly calculate the linear formula that will connect the lines i.e y = mx+c
df_withFormula <- df |>
group_by(Prob) |>
#This mutate command will create the needed slope and intercept for the geom_abline command in the plotting stage.
mutate(increaseBill = Bill - lag(Bill),
increaseWing = Wing - lag(Wing),
slope = increaseWing/increaseBill,
intercept = Wing - slope*Bill)
# The increaseBill, increaseWing and slope could all be combined into one calculation but I thought it was easier to understand this way.
ggplot(df_withFormula, aes(Bill, Wing, color = Prob)) +
#Add in this just so it has something to plot ontop of. You could remove this and instead manually define all the limits (expand_limits would work).
geom_point() +
#This plots the three lines. The rows with NA are automatically ignored. More explicit handling of the NA could be done in the data prep stage
geom_abline(aes(slope = slope, intercept = intercept, color = Prob)) +
#This is the crucial part it lets you define what the range is for the plot window. As ablines are infite you can define whatever limits you want.
expand_limits(y = c(105,125))
Hope this helps you get the graph you want.
This is very much dependent on the structure of your data it could though be changed to fit different shapes.
Similar to the approach by #James in that I compute the slopes and the intercepts from the given data and use a geom_abline to plot the lines but uses
summarise instead of mutate to get rid of the NA values
and a geom_blank instead of a geom_point so that only the lines are displayed but not the points (Note: Having another geom is crucial to set the scale or the range of the data and for the lines to show up).
library(dplyr)
library(ggplot2)
df_line <- df |>
group_by(Prob) |>
summarise(slope = diff(Wing) / diff(Bill),
intercept = first(Wing) - slope * first(Bill))
ggplot(df, aes(x = Bill, y = Wing)) +
geom_blank() +
geom_abline(data = df_line, aes(slope = slope, intercept = intercept, color = Prob)) +
scale_y_continuous(limits = c(105, 125))
first of all, I have to apologize for my poor English. Second, the objective of this post is that I want to reproduce the plot of the ridge regression's MSE with ggplot2 instead of the function plot which is included in R.
The object of cv.out is defined by the next expression:
cv.out <- cv.glmnet(x_var[train,], y_var[train], alpha = 0). And when I print that object these are the elements of cv.out
Lambda
Measure
SE
Nonzero
min
439.8
32554969
1044541
5
lse
1343.1
33586547
1068662
5
This is the plot with plot(cv.out):
The thing what I want to do the same plot but more elaborated with ggplot and I don't know which aesthetics put in the function. These are the elements of cv.out when I call the object like this: cv.out$ :
lambda
cmv
cvsd
cvup
cvlo
nzero
call
name
lambda.min
lambda.lse
Finally, thanks for your help. I really appreciate it. :)
Using example dataset:
X = as.matrix(mtcars[,-1])
y = as.matrix(mtcars[,1])
cv.out = cv.glmnet(X,y,alpha=0)
plot(cv.out)
You just need to pull out the values and put into a data.frame, and plot using geom_point() and geom_errorbar() :
df = with(cv.out,
data.frame(lambda = lambda,MSE = cvm,MSEhi=cvup,MSElow=cvlo))
ggplot(df,aes(x=lambda,y=MSE)) +
geom_point(col="#f05454") +
scale_x_log10("log(lambda)") +
geom_errorbar(aes(ymin = MSElow,ymax=MSEhi),col="#30475e") +
geom_vline(xintercept=c(cv.out$lambda.1se,cv.out$lambda.min),
linetype="dashed")+
theme_bw()
I am having difficulty plotting a log(10) formula on to existing data points. I derived a logarithmic function based on a list of data where "Tout_F_6am" is my independent variable and "clo" is my dependent variable.
When I go to plot it, I am getting the error that lengths x and y are different. Can someone please help me figure out whats going wrong?
logKT=lm(log10(clo)~ Tout_F_6am,data=passive)
summary(logKT) #r2=0.12
coef(logKT)
plot(passive$Tout_F_6am,passive$clo) #plot data points
x=seq(53,84, length=6381)#match length of x variable
y=logKT
lines(x,y,type="l",lwd=2,col="red")
length(passive$Tout_F_6am) #6381
length(passive$clo) #6381
Additionally, can the formula curve(-0.0219-0.005*log10(x),add=TRUE,col=2)be written as eq=(10^-0.022)*(10^-0.005*x)? thanks!
The problem is that you are trying to plot the model object, not the predictions from the model. Try something like this:
Define the explanatory values you want to plot, in a data frame (or tibble). It doesn't have to be as many as there are data points.
library(dplyr)
explanatory_data <- tibble(
Tout_F_6am = seq(53, 84, 0.1)
)
Add a column of predicted values using predict(). This takes a model and your explanatory data. predict() will return the transformed values, so you have to backtransform them.
prediction_data <- explanatory_data %>%
mutate(
log10_clo = predict(logKT, explanatory_data),
clo = 10 ^ log10_clo
)
Finally, draw your plot.
plot(clo ~ Tout_F_6am, data = prediction_data, log="y", type = "l")
The plotting is actually easier using ggplot2. This should give you more or less what you want.
library(ggplot2)
ggplot(passive, aes(Tout_F_6am, clo)) +
geom_point() +
geom_smooth(method = "lm") +
scale_y_log10()
How can I plot the relative proportions of two groups using a fill aesthetic in ggplot2?
I am asking this question here because several other answers on this topic seem incorrect (ex1, ex2, and ex3), but Cross Validated seems to have functionally banned R specific questions (CV meta). ..density.. is conceptually related to, but distinct from proportions (ex4 and ex5). So the correct answer does not seem to involve density.
Example:
set.seed(1200)
test <- data.frame(
test1 = factor(sample(letters[1:2], 100, replace = TRUE,prob=c(.25,.75)),ordered=TRUE,levels=letters[1:2]),
test2 = factor(sample(letters[3:8], 100, replace = TRUE),ordered=TRUE,levels=letters[3:8])
)
ggplot(test, aes(test2)) + geom_bar(aes(y = ..density.., group=test1, fill=test1) ,position="dodge")
#For example, the plotted data shows level a x c as being slightly in excess of .15, but a manual calculation shows a value of .138
counts <- with(test,table(test1,test2))
counts/matrix(rowSums(counts),nrow=2,ncol=6)
The answer that seems to yield an output that is correct resorts to a solution that doesn't use ggplot2 (calculating it outside of ggplot2) or requires that a panel be used rather than a fill aesthetic.
Edit: Digging into stat_bin yields that the function ultimately called is bin, but bin only gets passed the values in the x aes. Without rewriting stat_bin (or making another stat_) the hack that was applied in the above referenced answer can be generalized to the fill aes in the absence of the group aes with the following code for the y aes: y = ..count../sapply(fill, FUN=function(x) sum(count[fill == x])). This just replaces PANEL (the hidden column that is present at the end of StatBin) with fill). Presumably other hidden variables could get the same treatment.
This is an aweful hack, but it seems to do what you want...
ggplot(test, aes(test2)) + geom_bar(aes(y = ..count../rep(c(sum(..count..[1:6]), sum(..count..[7:12])), each=6),
group=test1, fill=test1) ,position="dodge") +
scale_y_continuous(name="proportion")
I'm working with a dataset where I have to transform some data for a curve fit. I'm plotting it using ggplot2, and can use stat_smooth on the transformed data to get the fit, but then want to overlay the result on the correct datapoints.
As a toy example, let's say I had
qplot(1:10, 1:10)+stat_smooth(formula=y+1~x, method="lm")
But I want to shift the stat_smooth line down by one (other than by taking the +1 out of the formula). Is this possible?
I don't think position_nudge() was available when this was asked 10.5 years ago but it's provided a simpler way of doing this for some time (as of ggplot 3.3.5, late 2021).
qplot(1:10, 1:10 + rnorm(10, sd = 0.3)) + stat_smooth(formula = y~x, method = "lm", position = position_nudge(y = 1))
It seems worth cautioning there's a good chance of displaying confusing or misleading confidence intervals when manipulating stat_smooth()'s formula. I've added a bit of variation to qplot()'s input in the line above to illustrate this.
Sometimes things can be very obvious :
qplot(1:10, 1:10)+stat_smooth(formula=(y+1)-1~x, method="lm")
If you can raise it 1 by adding 1 to y, you can lower it 1 by substracting 1 from y. ;-)