I would like to plot a smooth titration curve with empirical values in R. Unfortunately, I was not able to calculate the point of inflection of the curve where the equivalence point is located.
Do you have any ideas on how I can do this?
par(mfrow=c(1, 1))
vtit <- c(7.05, 9.00, 11.10, 13.00, 15.00, 17.00, 18.05, 18.95, 20.00, 21.00,
21.95, 23.05, 24.00, 25.05, 26.00, 28.10, 30.00, 33.05, 36.10, 39.05,
41.10, 42.10, 42.55, 43.15, 44.99)
vtit. <- vtit - 7.05
pH <- c(2.99, 3.48, 3.82, 4.02, 4.18, 4.30, 4.37, 4.42, 4.45, 4.51, 4.57, 4.64,
4.67, 4.74, 4.79, 4.86, 4.99, 5.18, 5.42, 5.77, 6.33, 9.01, 10.62,
11.06, 11.39)
plot(vtit., pH, type="o", lwd=2, main="Titration of acetic acid with 0.86M NaOH",
cex.main=0.8, xlab=expression(italic(V[NaOH])), ylab=expression(pH))
model <- lm(pH ~ poly(vtit.,17))
pHcurve <- predict(model)
lines(vtit., pHcurve, col='green', lwd=2)
abline(v=34.9, lty=2)
One option is to try the approx() function in order to attempt a reasonable smoothing of the curve. In the code below I am using 200 points, you may want to try increasing or decreasing this value to see how the results may change.
For this example this method works reasonably well,
plot(vtit., pH, type="o", lwd=2, main="Titration of acetic acid with 0.86M NaOH",
cex.main=0.8, xlab=expression(italic(V[NaOH])), ylab=expression(pH))
#use the approx function
#plot(approx(vtit., pH, n=200))
app<-approx(vtit., pH, n=200)
#calculate the slope
slope <- (app$y-lag(app$y))/(app$x-lag(app$x))
#find the titration point with max slope
equ_pt <- app$x[which.max(slope)]
#plot initial estimate aganist found point
abline(v=34.9, lty=2)
abline(v=equ_pt, lty=2, col="red")
Here is the original chart with the initial estimate and the current estimate in red.
Related
I am trying to fit planks equation to a data frame but keep getting the error above.
# create data frame from provided data
data <- data.frame(x = c(2.27, 2.72, 3.18, 3.63, 4.08, 4.54, 4.99, 5.45, 5.90, 6.35, 6.81, 7.26, 7.71, 8.17, 8.62, 9.08, 9.53, 9.98, 10.44, 10.89, 11.34, 11.80, 12.25, 12.71, 13.16, 13.61, 14.07, 14.52, 14.97, 15.43, 15.88, 16.34, 16.79, 17.24, 17.70, 18.15, 18.61, 19.06, 19.51, 19.97, 20.42, 20.87, 21.33),
brightness = c(200.723, 249.508, 293.024, 327.770, 354.081, 372.079, 381.493, 383.478, 378.901, 368.833, 354.063, 336.278, 316.076, 293.924, 271.432, 248.239, 225.940, 204.327, 183.262, 163.830, 145.750, 128.835, 113.568, 99.451, 87.036, 75.876, 65.766, 57.008, 49.223, 42.267, 36.352, 31.062, 26.580, 22.644, 19.255, 16.391, 13.811, 11.716, 9.921, 8.364, 7.087, 5.801, 4.523))
freq <- (data$x)*29.9792458
planck <- function(freq, t, h, c, k) {
(2 * h * freq^3) / (c^2 * (exp((h * freq) / (k * t)) - 1))
}
# fit the data using nls
library(stats)
h = 6.62607e-34
c= 3e8
k = 1.38065e-23
fit <- nls(brightness ~ planck(freq, t, h, c, k), start = list(t = 3), data = data)
# create a sequence of frequencies to use for the fitted curve
freq_seq <- seq(min(data$freq), max(data$freq), length.out = 100)
# use the predict function to generate the fitted curve
brightness_fit <- predict(fit, list(freq = freq_seq))
# create a new data frame with the fitted curve
fit_data <- data.frame(freq = freq_seq, brightness = brightness_fit)
# plot the data and the fitted curve
plot(data$freq, data$brightness, xlab = "Frequency (Hz)", ylab = "Brightness")
lines(fit_data$freq, fit_data$brightness, col = "red")
I am expecting it to create a scatter plot of the original data, with the x-axis as frequency, y-axis as brightness, and then add a line representing the fitted curve in red color.
I would like to plot a threshold model with smooth 95% confidence interval lines between line segments. You would think this would be on the simple side but I have not been able to find an answer!
My threshold/breakpoints are known, it would be great if there were a way to visualize this data. I have tried the segmented package which produces the following plot:
The plot shows a threshold model with a breakpoint at 5.4. However, the confidence intervals are not smooth between regression lines.
If anyone knows of any way to produce smooth (i.e. without the jump between line segments) CI lines between segmented regression lines (ideally in ggplot) that would be amazing. Thank you so much.
I have included sample data and the code I have tried below:
x <- c(2.26, 1.95, 1.59, 1.81, 2.01, 1.63, 1.62, 1.19, 1.41, 1.35, 1.32, 1.52, 1.10, 1.12, 1.11, 1.14, 1.23, 1.05, 0.95, 1.30, 0.79,
0.81, 1.15, 1.10, 1.29, 0.97, 1.05, 1.05, 0.84, 0.64, 0.80, 0.81, 0.61, 0.71, 0.75, 0.30, 0.30, 0.49, 1.13, 0.55, 0.77, 0.51,
0.67, 0.43, 1.11, 0.29, 0.36, 0.57, 0.02, 0.22, 3.18, 3.79, 2.49, 2.44, 2.12, 2.45, 3.22, 3.44, 3.86, 3.53, 3.13)
y <- c(22.37, 18.93, 16.99, 15.65, 14.62, 13.79, 13.09, 12.49, 11.95, 11.48, 11.05, 10.66, 10.30, 9.96, 9.65, 9.35, 9.07, 8.81,
8.56, 8.32, 8.09, 7.87, 7.65, 7.45, 7.25, 7.05, 6.86, 6.68, 6.50, 6.32, 6.15, 5.97, 5.80, 5.63, 5.47, 5.30,
5.13, 4.96, 4.80, 4.63, 4.45, 4.28, 4.09, 3.90, 3.71, 3.50, 3.27, 3.01, 2.70, 2.28, 22.37, 16.99, 11.05, 8.81,
8.56, 8.32, 7.25, 7.05, 6.50, 6.15, 5.63)
lin.mod <- lm(y ~ x)
segmented.mod <- segmented(lin.mod, seg.Z = ~x, psi=2)
plot(x, y)
plot(segmented.mod, add=TRUE, conf.level = 0.95)
which produces the following plot (and associated jumps in 95% confidence intervals):
segmented plot
Background: The non-smoothness in existing change point packages are due to the fact that frequentist packages operate with a fixed change point value. But as with all inferred parameters, this is wrong because there is indeed uncertainty concerning the location of the change.
Solution: AFAIK, only Bayesian methods can quantify that and the mcp package fills this space.
library(mcp)
model = list(
y ~ 1 + x, # Segment 1: Intercept and slope
~ 0 + x # Segment 2: Joined slope (no intercept change)
)
fit = mcp(model, data = data.frame(x, y))
Default plot (plot.mcpfit() returns a ggplot object):
plot(fit) + ggtitle("Default plot")
Each line represents a possible model that generated the data. The posterior for the change point is shown as a blue density. You can add a credible interval on top using plot(fit, q_fit = TRUE) or plot it alone:
plot(fit, lines = 0, q_fit = c(0.025, 0.975), cp_dens = FALSE) + ggtitle("Credible interval only")
If your change point is indeed known and if you want to model different residual scales for each segment (i.e., quasi-emulate segmented), you can do:
model2 = list(
y ~ 1 + x,
~ 0 + x + sigma(1) # Add intercept change in residual scale
)
fit = mcp(model2, df, prior = list(cp_1 = 1.9)) # Note: prior is a fixed value - not a distribution.
plot(fit, q_fit = TRUE, cp_dens = FALSE)
Notice that the CI does not "jump" around the change point as in segmented. I believe that this is the correct behavior. Disclosure: I am the author of mcp.
I am a newbie working with streamflow duration curves and the function fdc.
I am working with more than 300 series and I am interested in saving the low quartile threshold Qlow.thr value that appears in the plot generated:
Here is the reproducible example:
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98, 4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16, 13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60 ,2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19 ,3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91, 21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60, 12.73, 4.17, 6.70 ,16.45)
fdc(dat,plot = T,lQ.thr=0.8,ylab='Hm3',main='Upstream monthly duration curve',thr.shw=TRUE)
The fdc function returns a vector of probabilities, but I am not sure how to convert these probabilities to the original units and select the 80% percentile value expressed in Hm3 as I would do with pnorm, for example, in case of working with normal probabilities.
Thank you so much.
You can construct the FDC yourself by
dat <- c(13.05, 90.29, 5.68, 49.13, 26.39, 15.06, 23.39, 17.98,
4.21, 2.51, 38.29, 8.57, 2.48 , 3.78, 18.09 ,15.16,
13.46, 8.69, 6.85, 11.97, 12.10, 9.87 ,21.89, 2.60,
2.40, 27.40, 4.94, 83.17 ,12.10, 5.08 ,12.42, 6.19,
3.60 ,32.58, 53.69, 38.49,3.61, 14.84, 34.48, 1.91,
21.79, 31.53, 6.70, 9.52, 22.64, 1.80 , 8.13, 10.60,
12.73, 4.17, 6.70 ,16.45)
dat <- sort(dat, decreasing = T)
df <- data.frame(x = 100/length(dat) * 1:length(dat), y = dat)
plot(x = df$x, y = df$y, type = "l", log = "y")
So the sorted flow data is simply plotted against the percentage exceedance scale. This scale is created by dividing 100% by the number of data points which gives us the increment for each point.
Therefore
quantile(dat, p = c(0.2, 0.8), type = 1)
gives you your desired results.
Notice that the computation of the quantile differs in fdc. It seems like they just use
p <- c(0.8, 0.2)
dat[round(p * length(dat))]
> [1] 4.21 27.40
to compute the values.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 5 years ago.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Improve this question
I have seen some papers that deal (not completely) with residuals in a regression analysis in a smart way, they plot the residual distribution perpendicular to the fitted line.
Example image from Figure 2 or 5 (linear regression) in: https://www.nature.com/articles/nn.4538#results
My R example:
The data example it has been taken from: https://www.r-bloggers.com/simple-linear-regression-2/
Data example:
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
Linear regression model:
reg <- lm(alligator$lnWeight ~ alligator$lnLength)
Scatter plot:
plot(alligator,
xlab = "Snout vent length (inches) on log scale",
ylab = "Weight (pounds) on log scale",
main = "Alligators in Central Florida"
)
Fitted line:
abline(reg,col = "black", lwd = 1)
Residual distribution (histogram):
hist(reg$residuals, 10, xaxt='n', yaxt='n', ann=FALSE)
I would like to insert the histogram at the topright of the linear regression plot as the example image from Figure 2 or 5 (linear regression) in: https://www.nature.com/articles/nn.4538#results
Thanks for the help.
This will work to get the residual histogram overlaid on the main plot. You'll have to do a bit of work to get it to be angled perpendicular, as in the examples you cite.
library("ggplot2")
theme_set(theme_minimal())
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
reg <- lm(alligator$lnWeight ~ alligator$lnLength)
# make main plot, with best fit line (set se=TRUE to get error ribbon)
main_plot <- ggplot(alligator, aes(x=lnLength, y=lnWeight)) +
geom_point() + geom_smooth(method="lm", se=FALSE) +
scale_y_continuous(limits=c(0,7))
# create another plot, histogram of the residuals
added_plot <- ggplot(data.frame(resid=reg$residuals), aes(x=resid)) +
geom_histogram(bins=5) +
theme(panel.grid=element_blank(),
axis.text.y=element_blank(),
axis.text.x=element_text(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
axis.ticks.y=element_blank(),
axis.line.y=element_blank())
# turn the residual plot into a ggplotGrob() object
added_plot_grob <- ggplot2::ggplotGrob(added_plot)
# then add the residual plot to the main one as a custom annotation
main_plot + annotation_custom(grob=added_plot_grob,
xmin=4.0, xmax=4.35, ymin=1, ymax=5)
Then check out the docs for ggplot2:: and gridExtra:: to figure out the rotation. Hope this helps!
I am trying to fit a negative exponential to some data in R, but the fitted line looks too high compared to the data, whereas the fit I get using Excel's built-in power fit looks more believable. Can someone tell me why? I've tried using the nls() function and also optim() and get similar parameters from both of those methods, but the fits for both look high.
x <- c(5.96, 12.86, 8.40, 2.03, 12.84, 21.44, 21.45, 19.97, 8.92, 25.00, 19.90, 20.00, 20.70, 16.68, 14.90, 26.00, 22.00, 22.00, 10.00, 5.70, 5.40, 3.20, 7.60, 0.59, 0.14, 0.85, 9.20, 0.79, 1.40, 2.68, 1.91)
y <- c(5.35, 2.38, 1.77, 1.87, 1.47, 3.27, 2.01, 0.52, 2.72, 0.85, 1.60, 1.37, 1.48, 0.39, 2.39, 1.83, 0.71, 1.24, 3.14, 2.16, 2.22, 11.50, 8.32, 38.98, 16.78, 32.66, 3.89, 1.89, 8.71, 9.74, 23.14)
xy.frame <- data.frame(x,y)
nl.fit <- nls(formula=(y ~ a * x^b), data=xy.frame, start = c(a=10, b=-0.7))
a.est <- coef(nl.fit)[1]
b.est <- coef(nl.fit)[2]
plot(x=xy.frame$x,y=xy.frame$y)
# curve looks too high
curve(a.est * x^b.est , add=T)
# these parameters from Excel seem to fit better
curve(10.495 * x^-0.655, add=T)
# alternatively use optim()
theta.init <- c(1000,-0.5, 50)
exp.nll <- function(theta, data){
a <- theta[1]
b <- theta[2]
sigma <- theta[3]
obs.y <- data$y
x <- data$x
pred.y <- a*x^b
nll <- -sum(dnorm(x=obs.y, mean=pred.y , sd=sigma, log=T))
nll
}
fit.optim <- optim(par=theta.init,fn=exp.nll,method="BFGS",data=xy.frame )
plot(x=xy.frame$x,y=xy.frame$y)
# still looks too high
curve(a.est * x^b.est, add=T)
The reason you're seeing the unexpected behavior is that the curves that look "too high" actually have much lower sums of squared errors than the curves from excel:
# Fit from nls
sum((y - a.est*x^b.est)^2)
# [1] 1588.313
# Fit from excel
sum((y - 10.495*x^ -0.655)^2)
# [1] 1981.561
The reason nls favors the higher curve is that it is working to avoid huge errors at small x values at the cost of slightly larger errors with large x values. One way to address this might be to apply a log-log transformation:
mod <- lm(log(y)~log(x))
(a.est2 <- exp(coef(mod)["(Intercept)"]))
# (Intercept)
# 10.45614
(b.est2 <- coef(mod)["log(x)"])
# log(x)
# -0.6529741
These are quite close to the coefficients from excel, and yield a more visually appealing fit (despite the worse performance on the sum-of-squared-errors metric):