Fitting a Gaussian to a dataset using geom_density - r

I'm pretty new to R and am trying to analyse some data and fit a Gaussian to it using the ggplot2 package.
I am able to plot a smooth curve using geom_smooth and the results are as expected. However, using geom_density (see code below) the result is not as expected.
ggplot(All_Wavelengths_LabVIEW_selected_)+
geom_smooth(mapping = aes(Actual_Wavelength, B), se = FALSE)+
geom_density(kernel = "gaussian", Actual_Wavelength, B)
Instead of a Gaussian fit, I get:
'Error in fortify(data) : object 'B' not found'
I don't understand how this can occur given it uses B to plot the smooth curve without any issue.
In addition, I would like to do the following:
Extract FWHM value of the peak
Overlay multiple of these Gaussian fits for other sets of data (similar to B) with the same X axis
Is this possible?
Any help on this would be greatly appreciated.

Related

How to easily show the equation behind ggplot's geom_smooth

Is there any simple command to show the geom_smooth equation of a non-linear relationship? Something as simple as "show. equation". The equation has to be somewhere, I just want to call the equation used by default.
ggplot(dataset, aes(x=variablex, y=variabley)) +
geom_point()+
geom_smooth()+
theme_bw()
If you look at the documentation for geom_smooth and stat_smooth you can see that it uses stats::loess for small data sets (1,000 observations) and mgcv::gam otherwise:
For method = NULL the smoothing method is chosen based on the size of
the largest group (across all panels). stats::loess() is used for less
than 1,000 observations; otherwise mgcv::gam() is used with formula = y ~ s(x, bs = "cs") with method = "REML". Somewhat anecdotally, loess
gives a better appearance, but is 𝑂(𝑁2) in memory, so does not work
for larger datasets.
So if you want to use the model implied by the geom_smooth fit, you could just call the underlying method (e.g. stats::loess(variabley ~ variablex, data = dataset)) and then use the predict method to calculate values for new data.

Set y-axis for glm probit regression in effect plot

I am currently computing binomial probit glm in R.
For analysis of interaction effects, I use the effects package. I want to plot different interactions, where one of the interacting variables is held constant at a fixed level. I do this for several values of the variable to see how the effect evolves.
I use the following operation:
plot(effect("varL",hx1,given.values=c("varP"=0.7)))
plot(effect("varL",hx1,given.values=c("varP"=0.1)))
However, to graphically compare the different plots, the y-axis should be the same for all plots, which is not the case. When plotting for different varP values, the y axis changes its range.
When specifying ylim, the plot is also incorrect and shows a completely different segment than specified.
I tried what was recommended in this post (Scaling axis in a GLM plot with library "effects"), however, it resulted in an error message:
plot(effect("varL",hx1,given.values=c("varP"=0.7)), ylim = qlogis(c(0, 0.20)))
Error in valid.viewport(x, y, width, height, just, gp, clip, xscale, yscale, :
invalid 'yscale' in viewport
Now my question: how can I set the y-axis for plotting interaction effects with the effect package using a probit glm model? I am sure the problem is that ylim takes the values as specified without adjusting them into the logit and probit scale. qlogis likely works for logit, but not probit.
Below some code to replicate the issue. You see that the y axis "jumps around", which I want to avoid.
install.packages("effects")
require(effects)
varL <- rnorm(100, mean = 1000, sd = 10)
varP <- rnorm(100, mean = 5)
entry <- as.factor(sample(0:1, 100, replace = TRUE))
dat <- data.frame(varL, varP, entry)
hx1 <- glm(entry ~ varL*varP, data = dat, family = binomial(link = "probit"))
plot(effect("varL",hx1,given.values=c("varP"=min(dat$varP))))
plot(effect("varL",hx1,given.values=c("varP"=max(dat$varP))))
Here are the plots with the "jumping" y-axes:
I had a similar problem with logistic regression,actually, and I used the advice Change the y axis on Effect plot in R
Basically all that I needed to do is add the argument rescale.axis=F in addition to ylim=c(0,1)

Plot Non-linear Mixed Model Over Original Fitted Data

I'm trying to plot the resultant curve from fitting a non-linear mixed model. It should be something like a curve of a normal distribution but skewed to the right. I followed previous links here and here, but when I use my data I can not make it happen for different difficulties (see below).
Here is the dataset
and code
s=read.csv("GRVMAX tadpoles.csv")
t=s[s$SPP== levels(s$SPP)[1],]
head(t)
vmax=t[t$PERFOR=="VMAX",]
colnames(vmax)[6]="vmax"
vmax$TEM=as.numeric(as.character(vmax$TEM));
require(lme4)
start =c(TEM=25)
is.numeric(start)
nm1 <- nlmer ( vmax ~ deriv(TEM)~TEM|INDIVIDUO,nlpars=start, nAGQ =0,data= vmax)# this gives an error suggesting nlpars is not numeric, despite start is numeric...:~/
After that, I want to plot the curve over the original data
with(vmax,plot(vmax ~ (TEM)))
x=vmax$TEM
lines(x, predict(nm1, newdata = data.frame(TEM = x, INDIVIDUO = "ACI5")))
Any hint?
Thanks in advance

add a logarithmic regression line to a scatterplot (comparison with Excel)

In Excel, it's pretty easy to fit a logarithmic trend line of a given set of trend line. Just click add trend line and then select "Logarithmic." Switching to R for more power, I am a bit lost as to which function should one use to generate this.
To generate the graph, I used ggplot2 with the following code.
ggplot(data, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)+
stat_smooth(method='loess')
But the code does local polynomial regression fitting which is based on averaging out numerous small linear regressions. My question is whether there is a log trend line in R similar to the one used in Excel.
An alternative I am looking for is to get an log equation in form y = (c*ln(x))+b; is there a coef() function to get 'c' and 'b'?
Let my data be:
c(0.599885189,0.588404133,0.577784156,0.567164179,0.556257176,
0.545350172,0.535112897,0.52449292,0.51540375,0.507271336,0.499904325,
0.498851894,0.498851894,0.497321087,0.4964600,0.495885955,0.494068121,
0.492154612,0.490145427,0.486892461,0.482395714,0.477229238,0.471010333)
The above data are y-points while the x-points are simply integers from 1:length(y) in increment of 1. In Excel: I can simply plot this and add a logarithmic trend line and the result would look:
With black being the log. In R, how would one do this with the above dataset?
I prefer to use base graphics instead of ggplot2:
#some data with a linear model
x <- 1:20
set.seed(1)
y <- 3*log(x)+5+rnorm(20)
#plot data
plot(y~x)
#fit log model
fit <- lm(y~log(x))
#look at result and statistics
summary(fit)
#extract coefficients only
coef(fit)
#plot fit with confidence band
matlines(x=seq(from=1,to=20,length.out=1000),
y=predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000)),
interval="confidence"))
#some data with a non-linear model
set.seed(1)
y <- log(0.1*x)+rnorm(20,sd=0.1)
#plot data
plot(y~x)
#fit log model
fit <- nls(y~log(a*x),start=list(a=0.2))
#look at result and statistics
summary(fit)
#plot fit
lines(seq(from=1,to=20,length.out=1000),
predict(fit,newdata=list(x=seq(from=1,to=20,length.out=1000))))
You can easily specify alternative smoothing methods (such as lm(), linear least-squares fitting) and an alternative formula
library(ggplot2)
g0 <- ggplot(dat, aes(horizon, success)) + geom_line() + geom_area(alpha=0.3)
g0 + stat_smooth(method="lm",formula=y~log(x),fill="red")
The confidence bands are automatically included: I changed the color to make them visible since they're very narrow. You can use se=FALSE in stat_smooth to turn them off.
The other answer shows you how to get the coefficients:
coef(lm(success~log(horizon),data=dat))
I can imagine you might next want to add the equation to the graph: see Adding Regression Line Equation and R2 on graph
I'm pretty sure a simple +scale_y_log10() would get you what you wanted. GGPlot stats are calculated after transformations, so the loess() would then be calculated on the log transformed data.
I've just written a blog post here that describes how to match Excel's logarithmic curve fitting exactly. The nub of the approach centers around the lm() function:
# Set x and data.to.fit to the independent and dependent variables
data.to.fit <- c(0.5998,0.5884,0.5777,0.5671,0.5562,0.5453,0.5351,0.524,0.515,0.5072,0.4999,0.4988,0.4988,0.4973,0.49,0.4958,0.4940,0.4921,0.4901,0.4868,0.4823,0.4772,0.4710)
x <- c(seq(1, length(data.to.fit)))
data.set <- data.frame(x, data.to.fit)
# Perform a logarithmic fit to the data set
log.fit <- lm(data.to.fit~log(x), data=data.set)
# Print out the intercept, log(x) parameters, R-squared values, etc.
summary(log.fit)
# Plot the original data set
plot(data.set)
# Add the log.fit line with confidence intervals
matlines(predict(log.fit, data.frame(x=x), interval="confidence"))
Hope that helps.

fitting a distribution graphically

I am running some tests to try and determine what distribution my data follows. By the look of the density of my data I thought it looked a bit like a logistic distribution. I than used the package MASS to estimate the parameters of the distribution. However when I graph them together although better than the normal, the logistic is still not very good..Is there a way to find what distribution would go better? Thank you for the help !
library(quantmod)
getSymbols("^NDX",src="yahoo", from='1997-6-01', to='2012-6-01')
daily<- allReturns(NDX) [,c('daily')]
dailySerieTemporel<-ts(data=daily)
x<-na.omit(dailySerieTemporel)
library(MASS)
(xFit<-fitdistr(x,"logistic"))
# location scale
# 0.0005210570 0.0106366354
# (0.0002941922) (0.0001444678)
xFitEst<-coef(xFit)
plot(density(x))
set.seed(125)
lines(density(rlogis(length(x), xFitEst['location'], xFitEst['scale'])), col=3)
lines(density(rnorm(length(x), mean(x), sd(x))), col=2)
This is elementary R: plot() creates a new plotting canvas by default, and you should use a command such as lines() to add to an existing plot.
This works for your example:
plot(density(x))
lines(density(rlogis(length(x), location = 0.0005210570,
scale = 0.0106366354)), col="blue")
as it adds the estimated logistic fit in blue to your existing plot.

Resources