attenuation = data.frame(km =
c(0,0,0.4,0.4,0.8,0.8,1.2,1.2,1.6,1.6,2,2,2.4,2.4,2.8,2.8,3.2,3.2,3.6,3.6,4,
4,4.4,4.4,4.8,4.8,5.2,5.2,5.6,5.6,6,6,6.4,6.4,6.8,6.8,7.2,7.2,7.6,7.6,8,8,
11.7,11.7,13,13), edna = c(76000,20000,0,0,6000,0,0,6880,10700,0,6000,
0,0,0,0,0,0,6000,0,0,0,0,0,0,0,0,6310,0,6000,6000,0,0,0,0,0,
0,0,0,0,0,0,6000,0,0,0,0))
#This worked great for a linear regression
ggplot(attenuation, aes(x = km, y = edna)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
xlab("Distance from Cage (km)") +
ylab("eDNA concentration (gene sequence/Liter)")
But the linear regression doesn't seem to be a good fit (r squared =0.09). So I'd like to try something else. I tried some other regressions also with poor fits, so I'd like to try a nonlinear regression.
I have researched this question on stack overflow and tried a number of different options, but nothing is working. The option I provide below makes the most sense-but I wonder if I have the formula wrong? Or if the start list needs to be modified?
For context I am trying to explore the relationship between river distance and concentration.
#This is not working for a nonlinear regression
ggplot(attenuation, aes(x = km, y = edna))+
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args=list (start =
list(a = 1,b=1), se=FALSE))
I get the following error from r when I run the code for nls above
Computation failed in stat_smooth():
variable lengths differ (found for '(se)')
You have 2 problems. First a missplaced ")" since se=FALSE is an argument to stat_smooth=, not method.args=:
ggplot(attenuation, aes(x = km, y = edna))+
geom_point() +
stat_smooth(method='nls', formula='y~a*x^b', method.args=list(start =
list(a=1, b=1)), se=FALSE)
But this will not work either because your model is impossible to fit to your data. Look at the equation. When x=0, y will equal 0. For values of x greater than 0, y will increase unless b is negative, but but then x=0 is Inf so the algorithm fails to try negative values. Since you have a decreasing relationship, you need to specify a function that is defined for x=0 and plausible starting values. This one parameter fits your data better than a linear function (it could also be defined as a*(x + 1)^-1 which is essentially your function with 1 added to x so that it is defined at x=0:
ggplot(attenuation, aes(x = km, y = edna))+
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a/(x + 1)',
method.args=list(start=list(a=50000)), se=FALSE)
[
I picked 50000 by splitting the difference between 20,000 and 76,000. The final estimate is about 20,000. You can bend the curve more sharply by adding a second parameter, but you have so many 0 values it may be too much depending on what you are trying to communicate:
ggplot(attenuation, aes(x = km, y = edna))+
geom_point() +
stat_smooth(method='nls', formula='y~a*(1+x)^b', method.args=list(start =
list(a=50000, b=-1)), se=FALSE)
I agree with #dcarlson's answer. You've got a pretty small data set here (a total of 11 non-zero data points, two of which fall on top of each other) so you probably shouldn't push any conclusions too hard. The first two points are definitely large, and there might be a mild declining trend after that, but beyond that you can't say too much.
If you want to do the power-law fit you have to displace the zero-km data point from the origin. I've done it by adding 0.1 to the x values. This is an arbitrary choice on my part and should be thought about carefully on your end ... (note that there's a large difference in the results if you add 0.1 as I did or 1 as #dcarlson did). I also had to put in more reasonable starting values, which I did by fitting a log-log linear regression (lm(log(edna) ~ log(km+0.1), data=attenuation)) and extracting the coefficients (which were approximately 4 and -1.5).
ggplot(attenuation, aes(x = km, y = edna))+
geom_point() +
stat_smooth(method = 'nls', formula = 'y~a*(x+0.1)^b',
method.args=list (start = list(a = exp(4),b=-1.5)), se=FALSE)
You can also do this slightly more efficiently with a log-link Gaussian GLM as follows (you still need to displace the x-values from zero). I also added some code to disambiguate the repeated points.
ggplot(attenuation, aes(x = km, y = edna))+
stat_sum() +
geom_smooth(method="glm", formula=y~log(x+0.1),
method.args=list(family=gaussian(link="log"),
start=c(4,-1.5)))+
scale_size(breaks=c(1,2),range=c(1,3))
Related
So I have 2 groups and an x and y variable. I am trying to run a linear regression to see if there is a significant relationship between the x and y variables within each group but I also want to look at the significance between groups. Then I would like to plot those results and provide a p-value, equation, and R^2 value on the graph. How would I go about accomplishing this?
I am able to plot the data on the same graph using this code:
ggplot(data_NeuroPsych, aes(x = Flanker_Ratio, y = Neuropsych_Delta, color = Group)) +
geom_point() +
geom_smooth(method = "lm", fill = NA)
Then using this open source code I was able to look at the results separately: https://github.com/kassambara/ggpubr/blob/master/R/stat_regline_equation.R#L7
The issue with the above is the data is not on the same plot and it does not look at the comparison between groups.
first of all, I have to apologize for my poor English. Second, the objective of this post is that I want to reproduce the plot of the ridge regression's MSE with ggplot2 instead of the function plot which is included in R.
The object of cv.out is defined by the next expression:
cv.out <- cv.glmnet(x_var[train,], y_var[train], alpha = 0). And when I print that object these are the elements of cv.out
Lambda
Measure
SE
Nonzero
min
439.8
32554969
1044541
5
lse
1343.1
33586547
1068662
5
This is the plot with plot(cv.out):
The thing what I want to do the same plot but more elaborated with ggplot and I don't know which aesthetics put in the function. These are the elements of cv.out when I call the object like this: cv.out$ :
lambda
cmv
cvsd
cvup
cvlo
nzero
call
name
lambda.min
lambda.lse
Finally, thanks for your help. I really appreciate it. :)
Using example dataset:
X = as.matrix(mtcars[,-1])
y = as.matrix(mtcars[,1])
cv.out = cv.glmnet(X,y,alpha=0)
plot(cv.out)
You just need to pull out the values and put into a data.frame, and plot using geom_point() and geom_errorbar() :
df = with(cv.out,
data.frame(lambda = lambda,MSE = cvm,MSEhi=cvup,MSElow=cvlo))
ggplot(df,aes(x=lambda,y=MSE)) +
geom_point(col="#f05454") +
scale_x_log10("log(lambda)") +
geom_errorbar(aes(ymin = MSElow,ymax=MSEhi),col="#30475e") +
geom_vline(xintercept=c(cv.out$lambda.1se,cv.out$lambda.min),
linetype="dashed")+
theme_bw()
I am relatively new to R and obviously not very experienced.
However, I used multilevel modeling to identify influences of voice on sleep parameters.
E.g. TST in this snippet is total sleep time, intensity is voice intensity (in this case as mean).
I manage to get a scatterplot, depending on participant number as I want to. However, I now want to include regression lines for my model, that show the intercept and slope for the model vs for the null-model (excluding my independent variable).
Yet, no matter what I try, I do not seem to be able to display the regression lines based on intercept and slope, even after entering their values manually!
Here's my code for the calculation and my calculation and for the plot.
Model:
library(lme4)
TST_RE_Intensity.model = lmer(Intensity_mean ~ TST_re + Day + (1+ TST_re|Participant_ID) + ( 1|Filename), data=my.df, REML = FALSE)
TST_RE_Intensity.null = lmer(Intensity_mean ~ Day + (1+ TST_re|Participant_ID) + (1|Filename), data=my.df, REML = FALSE)
Plot:
library(ggplot2)
p <- ggplot(my.df, aes(x=my.df$TST_re, y=my.df$Intensity_mean, colour=my.df$Participant_ID))+
theme(legend.position = "none")+
geom_point(shape=20) +
geom_abline(aes(intercept=64, slope = - 0.0167, size= 1.5)+
geom_abline(aes(intercept=61, slope = - 0.0162, size=1.5)+
scale_size_manual(values = c(0.3, 0.3))+
scale_y_log10(name="Log10(TST)", limits=c(40,80)) +
scale_x_log10(name="Log10(Intensity)")
I do not get any error messages but also never get to see any lines. I tried to follow instructions of the ggplot2 manual like this one, but end up short
p + geom_abline(intercept = 37, slope = -5)
Is there a way to just plot a line "manually"?
Thanks in advance!!
After many google searches I decided to ask for your help, guys.
I am plotting just some observations at different time points and I want to add a linear regression with stat_smooth. However, I want the linear model with the intercept at 100 (because data are percentage relative to time 0). To do that, I found that the easiest way is to use the offset parameter in lm. The problem is how to get the number of 'y' observations per group(col and facet groups) to pass it to offset parameter.
If I use data with the same number of observations per group (10 in my case), I can just write the number and it works great:
myplot <- ggplot(mydt2, aes(x=Time_point, y=GFP_rel, col=Gene, fill=Gene,group=Gene))
myplot <- myplot + stat_smooth(method='lm', formula = y ~ x + 0, method.args=list(offset=rep(100,10))) +
facet_wrap(~Cell_line)
However, this is not very elegant and/or flexible. My question is: how can I pass the number of observations to method.args? I tried offset(100,..count..), but I get the error: (list) object cannot be coerced to type 'integer').
Any suggestions?
Thanks
You can use the I(y - 100) coding in the formula as shown here instead of using an offset.
However, the predicted values for stat_smooth will then be predictions for y - 100, not y. This line will go through 0. You can move the lines back to the position to display predictions of the original y variable using position_nudge.
So the stat_smooth code would look something like
stat_smooth(method = "lm", formula = I(y - 100) ~ x + 0,
position = position_nudge(y = 100))
I'm trying to plot an exponential decay line (with error bars) onto a scatterplot in ggplot of price information over time. I currently have this:
f2 <- ggplot(data, aes(x=date, y=cost) ) +
geom_point(aes(y = cost), colour="red", size=2) +
geom_smooth(se=T, method="lm", formula=y~x) +
# geom_smooth(se=T) +
theme_bw() +
xlab("Time") +
scale_y_log10("Price over time") +
opts(title="The Falling Price over time")
print(f2)
The key line is in the geom_smooth command, of formula=y~x Although this looks like a linear model, ggplot seems to automatically detect my scale_y_log10 and log it.
Now, my issue here is that date is a date data type. I think I need to convert it to seconds since t=0 to be able to apply an exponential decay model of the form y = Ae^-(bx).
I believe this because when I tried things like y = exp(x), I get a message that I think(?) is telling me I can't take exponents of dates. It reads:
Error in lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, :
NA/NaN/Inf in foreign function call (arg 1)
However, log(y) = x works correctly. (y is a numeric data type, x is a date.)
Is there a convenient way to fit exponential growth/decay time series models within ggplot plots in the geom_smooth(formula=formula) function call?
This appears to work, although I don't know how finicky it will be with real/messy data:
set.seed(101)
dat <- data.frame(d=seq.Date(as.Date("2010-01-01"),
as.Date("2010-12-31"),by="1 day"),
y=rnorm(365,mean=exp(5-(1:365)/100),sd=5))
library(ggplot2)
g1 <- ggplot(dat,aes(x=d,y=y))+geom_point()+expand_limits(y=0)
g1+geom_smooth(method="glm",family=gaussian(link="log"),
start=c(5,0))