How to fix 'non-conformable arrays' error in SVM? - r

Currently practicing SVM with a sample dataset and I'm trying to add the abline to the plot but says that w[1,2] is not the correct dimensions, leading me back to the created variable 'w' in which 'coefs' and 'SV' are not wanting to work in the code
I've tried lowercasing SV, which works but results in w | numeric (empty) instead of num [1, 1:2] -0.035 -0.0188.
structure(list(Height = c(44, 52.1, 57.1, 33, 27.8, 27.2, 32,
45.1, 56.7, 56.9, 122.1, 123.9, 122.9, 101.1, 128.9, 137.1, 127,
103, 141.6, 102.4), Weight = c(126.3, 136.9, 109.2, 148.3, 110.4,
107.8, 128.4, 120.2, 140.2, 139.2, 154.1, 170.8, 183.1, 164,
193.6, 181.7, 164.8, 174.6, 185.8, 176.9)), class = "data.frame", row.names = c(NA,
-20L))
#create an is horse indicator variable
ishorse <- c(rep(-1,10),rep(+1,10))
library(e1071)
#create data frame for performing svm
data <- data.frame(Height= animal_data['Height'],
Weight= animal_data['Weight'],
animal=as.factor(ishorse))
#plot data
plot(data[,-3],col=(3)/2, pch=19); abline(h=0,v=0,lty=3)
#perform svm
svm.model <- svm(animal ~ .,
data=data,
type='C-classification',
kernel='linear',
scale=FALSE)
#show support vectors
points(data[svm.model$index, c(1,2)], col="orange", cex = 2)
#get parameters of hyperplane
w <- t(svm.model$coefs) %% svm.model$SV
b <- -svm.model$rho
#in this 2D case the hyperplane is the line w[1,1]*x1 + w[1,2]*2 +b = 0
abline(a=-b/w[2,2], b=-w[,]/w[,], col = "blue", lty = 3)

The correct syntax for matrix multiplication is %*%. Try this:
svm.model <- svm(animal ~ .,
data=data,
type='C-classification',
kernel='linear',
scale=FALSE)
#show support vectors
points(data[svm.model$index, c(1,2)], col="orange", cex = 2)
#get parameters of hyperplane
w <- t(svm.model$coefs) %*% svm.model$SV
b <- -1*svm.model$rho
#in this 2D case the hyperplane is the line w[1,1]*x1 + w[1,2]*2 +b = 0
abline(a=-b/w[1,2], b=-w[1]/w[2], col = "blue", lty = 3)

Related

How to plot fitted,observed and forecast values in the same graph using R

So, I have a nnetar model and I want to plot my fitted, observed (actual) and forecast in the same plot with different colors and legend. It is a time-series data and "y" is my ts() object.
fit<-nnetar(y,xreg = train_reg)
results<-forecast(fit,xreg = test_reg)
plot(results)
With this code, I only have the forecast values and visualization, I know that I can reach the fitted ones using results$fitted and for the forecasted ones results$mean.
Thank you!
This is very simple, you are looking for the command points(). When you run the plot()command, make sure to write type="n", then fill in data with the points()command as below
Please provide data on your next post.
Here, I have a pretend dataframe dfthat has two columns of data:
People that voted yes (percent)
Percent employment (meaning, how high employment is in the area)
The variables are arbitrary here, but it's just to illustrate what you want to do.
Here is the df
df <- structure(list(employment = c(23, 14.6, 9.9, 20.1, 34.4, 13.8,
20.6, 37.2, 21.8, 17.3, 13.1, 16.8, 24.6, 12.6, 13.6, 24.4, 19.3,
20, 22.6, 27.4, 23.1, 10.7, 32.1, 22, 25.6, 25), yes = c(55.2,
54.4, 63.5, 50.6, 39, 51.1, 48.5, 39.1, 59.4, 50.6, 44.1, 53.3,
39.3, 58.8, 59.1, 58.1, 63.1, 54.6, 55.9, 68.2, 57.8, 58.2, 38.9,
48.3, 49.9, 47.3)), class = "data.frame", row.names = c(NA, -26L
))
Run your model and add fitted values to your df
model<-lm(yes ~ employment, data = df)
df$fit <-predict(model) # add fitted, aka predicted, values to df
attach(df)
we can also add confidence intervals to the plot using the code below
& you can shade in the confidence intervals if you wish with the polygon command
lets make a custom transparent color using library(yarr) and the function function yarr::transparent()
library(yarrr)
par(mfrow = c(1, 1), cex=2, pch=10) # set plot window to desired conditions
plot(yes ~ employment, data = df, ylab = "Yes [%]", xlab="Employment [%]", type="n")
points(yes ~ employment)
points(fit~employment, col="red") # Note, these are the fitted values in red
newx <- seq(min(df$employment), max(df$employment), length.out=100)
preds <- predict(model, newdata = data.frame(employment=newx), interval = 'confidence')
polygon(c(rev(newx), newx), c(rev(preds[ ,3]), preds[ ,2]), col = yarrr::transparent("583", trans.val = .5), border = NA)
abline(lm(yes~foreigners),col="blue", lwd=2)
lines(newx, preds[ ,3], lty = 'dashed', col = 'blue', lwd=2)
lines(newx, preds[ ,2], lty = 'dashed', col = 'blue',lwd=2)

simple curve fitting in R

I am trying to find a fit for my data. But so far had no luck.
Tried the logarithmic, different ones from the drc package .. but I am sure there must be a better one I just don't know the type.
On a different note - I would be grateful for advice on how to go about curve hunting in general.
library(drc)
df<-structure(list(x = c(10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
52, 53, 54, 55), y = c(0.1066, -0.6204, -0.2028, 0.2621, 0.4083,
0.4497, 0.6343, 0.7762, 0.8809, 1.0029, 0.8089, 0.7845, 0.8009,
0.9319, 0.9414, 0.9505, 0.9323, 1.0321, 0.9381, 0.8975, 1.0929,
1.0236, 0.9589, 1.0644, 1.0411, 1.0763, 0.9679, 1.003, 1.142,
1.1049, 1.2868, 1.1569, 1.1952, 1.0802, 1.2125, 1.3765, 1.263,
1.2507, 1.2125, 1.2207, 1.2836, 1.3352, 1.1311, 1.2321, 1.4277,
1.1645), w = c(898, 20566, 3011, 1364, 1520, 2376, 1923, 1934,
1366, 1010, 380, 421, 283, 262, 227, 173, 118, 113, 95, 69, 123,
70, 80, 82, 68, 83, 76, 94, 101, 97, 115, 79, 98, 84, 92, 121,
97, 102, 93, 92, 101, 74, 124, 64, 52, 63)), row.names = c(NA,
-46L), class = c("tbl_df", "tbl", "data.frame"), na.action = structure(c(`47` = 47L), class = "omit"))
fit <- drm(data = df,y ~ x,fct=LL.4(), weights = w)
plot(fit)
1) If we ignore the weights then y = a + b * x + c/x^2 seems to fit and is linear in the coefficients so is easy to fit. This seems upward sloping so we started with a line but then we needed to dampen that so we added a reciprocal term. A reciprocal quadratic worked slightly better than a plain reciprocal based on the residual sum of squares so we switched to that.
fm <- lm(y ~ x + I(1 / x^2), df)
coef(summary(fm))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.053856e+00 0.116960752 9.010341 1.849238e-11
## x 4.863077e-03 0.002718613 1.788808 8.069195e-02
## I(1/x^2) -1.460443e+02 16.518887452 -8.841049 3.160306e-11
The coefficient of the x term is not significant at the 5% level -- the p value is 8% in the table above -- so we can remove it and it will fit nearly as well giving a model with only two parameters. In the plot below the fm fit with 3 parameters is solid and the fm2 fit with 2 parameters is dashed.
fm2 <- lm(y ~ I(1 / x^2), df)
plot(y ~ x, df)
lines(fitted(fm) ~ x, df)
lines(fitted(fm2) ~ x, df, lty = 2)
2) Another approach is to use two straight lines. This is still continuous but has one non-differentiable point at the point of transition. The model has 4 parameters, the intercepts and slopes of each line. Below we do use the weights. It has the advantage of an obvious motivation based on the appearance of the data. The break point at the intersection of the two lines may have significance as the transition point between the higher sloping initial growth and the lower sloping subsequent growth.
# starting values use lines fitted to 1st ten and last 10 points
fm_1 <- lm(y ~ x, df, subset = 1:10)
fm_2 <- lm(y ~ x, df, subset = seq(to = nrow(df), length = 10))
st <- list(a = coef(fm_1)[[1]], b = coef(fm_1)[[2]],
c = coef(fm_2)[[1]], d = coef(fm_2)[[2]])
fm3 <- nls(y ~ pmin(a + b * x, c + d * x), df, start = st, weights = w)
# point of transition
X <- with(as.list(coef(fm3)), (a - c) / (d - b)); X
## [1] 16.38465
Y <- with(as.list(coef(fm3)), a + b * X); Y
## [1] 0.8262229
plot(y ~ x, df)
lines(fitted(fm3) ~ x, df)
The basic idea is to understand how the selected function performs. Take a function you know (e.g. the logistic) and modify it. Or (even better) go to the literature and see which functions people use in your specific domain. Then create a user-defined model, play with it to understand the parameters, define good start values and then fit it.
Her a quick & dirty example of a user-defined function (with package growthrates). It can surely be made similarly with drc.
library("growthrates")
grow_userdefined <- function (time, parms) {
with(as.list(parms), {
y <- (K * y0)/(y0 + (K - y0) * exp(-mumax * time)) + shift
return(as.matrix(data.frame(time = time, y = y)))
})
}
fit <- fit_growthmodel(FUN=grow_userdefined,
p = c(y0 = -1, K = 1, mumax = 0.1, shift = 1),
time = df$x, y = df$y)
plot(fit)
summary(fit)
It can of course be made better. As we have no exponential start at the onset, one can for example start with a simple saturation function instead of a logistic, e.g. something Monod-like. As said, the preferred way is to use a function related to the application domain.

How do I plot 3 variables along the x-axis in R plot?

How do I label the x-axis as actual, knn, and pca and plot its respective values along the y-axis?
dat.pca.knn <- rbind(actual, knn, pca)
plot(c(1, ncol(dat)),range(dat.pca.knn),type="n",main="Profile plot of probeset 206054_at\nwith actual and imputed values of GSM146784_Normal",xlab="Actual vs imputed values",ylab="Expression Intensity",axes=F,col="red")
axis(side=1,at=1:length(dat.pca.knn),labels=dimnames(dat.pca.knn)[[2]],cex.axis=0.4,col="1",las=2,tick=T)
axis(side=2)
for(i in 1:length(dat.pca.knn)) {
dat.y <- as.numeric(dat.pca.knn[i,])
lines(c(1:ncol(dat.pca.knn)),dat.y,lwd=2,type="p",col="1")
}
Data
dput(dat[1:2,])
structure(c(1942.1, 40.1, 2358.3, 58.2, 2465.2, 132.6, 2732.9,
64.3, 1952.2, 66.1, 2048.3, 69, 2109, 109.7, 3005.1, 59.4, 2568.1,
81.7, 2107.7, 100.8, 1940.2, 170.1, 2608.8, 186.7, 1837.2, 103.8,
1559.2, 86.8, 2111.6, 86, 2641, 152.7, 1972.7, 124.8, 1737.2,
115, 1636.1, 202.1, 2718.4, 257.3), .Dim = c(2L, 20L), .Dimnames = list(
c("1007_s_at", "1053_at"), c("GSM146778_Normal", "GSM146780_Normal",
"GSM146782_Normal", "GSM146784_Normal", "GSM146786_Normal",
"GSM146789_Normal", "GSM146790_Normal", "GSM146792_Normal",
"GSM146794_Normal", "GSM146796_Normal", "GSM146779_Tumor",
"GSM146781_Tumor", "GSM146783_Tumor", "GSM146785_Tumor",
"GSM146787_Tumor", "GSM146788_Tumor", "GSM146791_Tumor",
"GSM146793_Tumor", "GSM146795_Tumor", "GSM146797_Tumor")))
dat.pca.knn
> print(dat.pca.knn)
[,1]
actual 8385.300
knn 7559.533
pca 10418.002
you probably need a barplot if Understand you correctly.
# just to recreate your data
dat.pca.knn <- dplyr::tribble(
~actual, ~knn, ~pca,
8385.300, 7559.533, 10418.002
)
with(
dat.pca.knn,
barplot(height = c(actual, knn, pca),
names.arg = c("actual", "knn", "pca"))
)

Fitting a sigmoidal curve to points with ggplot

I have a simple dataframe for the response measurements from a drug treatment at various doses:
drug <- c("drug_1", "drug_1", "drug_1", "drug_1", "drug_1",
"drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2",
"drug_2", "drug_2", "drug_2", "drug_2", "drug_2")
conc <- c(100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14,
0.05, 100.00, 33.33, 11.11, 3.70, 1.23, 0.41, 0.14, 0.05)
mean_response <- c(1156, 1833, 1744, 1256, 1244, 1088, 678, 489,
2322, 1867, 1333, 944, 567, 356, 200, 177)
std_dev <- c(117, 317, 440, 200, 134, 38, 183, 153, 719,
218, 185, 117, 166, 167, 88, 50)
df <- data.frame(drug, conc, mean_response, std_dev)
I can plot these point using the following code and get the basic foundation of the visualization that I would like:
p <- ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
scale_x_log10()
p
The next thing I would like to do with these data is add a sigmoidal curve to the plot, that fits the plotted points for each drug. Following that, I would like to calculate the EC50 for this curve.
I realize I may not have the entire range of the sigmoidal curve in my data, but I am hoping to get the best estimate I can with what I have. Also, the final point for drug_1 does not follow the expected trend of a sigmoidal curve, but this is actually not unexpected as the solutions that the drug is in can inhibit responses at high concentrations (each drug is in a different solution). I would like to exclude this point from the data.
I am getting stuck at the step of fitting a sigmoidal curve to my data. I have looked over some other solutions to fitting sigmoidal curves to data but none seem to work.
One post that is very close to my problem is this:
(sigmoid) curve fitting glm in r
Based on it, I tried:
p + geom_smooth(method = "glm", family = binomial, se = FALSE)
This gives the following error, and seems to default to plotting straight lines:
`geom_smooth()` using formula 'y ~ x'
Warning message:
Ignoring unknown parameters: family
I have also tried the solution from this link:
Fitting a sigmoidal curve to this oxy-Hb data
In this case, I get the following error:
Computation failed in `stat_smooth()`:
Convergence failure: singular convergence (7)
and no lines are added to the plot.
I have tried looking up both of these errors but cannot seem to find a reason that makes sense with my data.
Any help would be much appreciated!
As I said in a comment, I would only use geom_smooth() for a very easy problem; as soon as I run into trouble I use nls instead.
My answer is very similar to #Duck's, with the following differences:
I show both unweighted and (inverse-variance) weighted fits.
In order to get the weighted fits to work, I had to use the nls2 package, which provides a slightly more robust algorithm
I use SSlogis() to get automatic (self-starting) initial parameter selection
I do all of the prediction outside of ggplot2, then feed it into geom_line()
p1 <- nls(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
subset=(drug=="drug_1" & conc<100)
## , weights=1/std_dev^2 ## error in qr.default: NA/NaN/Inf ...
)
library(nls2)
p1B <- nls2(mean_response~SSlogis(conc,Asym,xmid,scal),data=df,
subset=(drug=="drug_1" & conc<100),
weights=1/std_dev^2)
p2 <- update(p1,subset=(drug=="drug_2"))
p2B <- update(p1B,subset=(drug=="drug_2"))
pframe0 <- data.frame(conc=10^seq(log10(min(df$conc)),log10(max(df$conc)), length.out=100))
pp <- rbind(
data.frame(pframe0,mean_response=predict(p1,pframe0),
drug="drug_1",wts=FALSE),
data.frame(pframe0,mean_response=predict(p2,pframe0),
drug="drug_2",wts=FALSE),
data.frame(pframe0,mean_response=predict(p1B,pframe0),
drug="drug_1",wts=TRUE),
data.frame(pframe0,mean_response=predict(p2B,pframe0),
drug="drug_2",wts=TRUE)
)
library(ggplot2); theme_set(theme_bw())
(ggplot(df,aes(conc,mean_response,colour=drug)) +
geom_pointrange(aes(ymin=mean_response-std_dev,
ymax=mean_response+std_dev)) +
scale_x_log10() +
geom_line(data=pp,aes(linetype=wts),size=2)
)
I believe the EC50 is equivalent to the xmid parameter ... note the large differences between weighted and unweighted estimates ...
I would suggest next approach which is close to what you want. I also tried with a setting for your data using binomial family but there are some issues about values between 0 and 1. In that case you would need an additional variable in order to determine the respective proportions. The code in the following lines use a non linear approximation in order to sketch your output.
Initially, the data:
library(ggplot2)
#Data
df <- structure(list(drug = c("drug_1", "drug_1", "drug_1", "drug_1",
"drug_1", "drug_1", "drug_1", "drug_1", "drug_2", "drug_2", "drug_2",
"drug_2", "drug_2", "drug_2", "drug_2", "drug_2"), conc = c(100,
33.33, 11.11, 3.7, 1.23, 0.41, 0.14, 0.05, 100, 33.33, 11.11,
3.7, 1.23, 0.41, 0.14, 0.05), mean_response = c(1156, 1833, 1744,
1256, 1244, 1088, 678, 489, 2322, 1867, 1333, 944, 567, 356,
200, 177), std_dev = c(117, 317, 440, 200, 134, 38, 183, 153,
719, 218, 185, 117, 166, 167, 88, 50)), class = "data.frame", row.names = c(NA,
-16L))
In a non linear least squares, you need to define initial values for the search of ideal parameters. We use next code with base function nls() to obtain those initial values:
#Drug 1
fm1 <- nls(log(mean_response) ~ log(a/(1+exp(-b*(conc-c)))), df[df$drug=='drug_1',], start = c(a = 1, b = 1, c = 1))
#Drug 2
fm2 <- nls(log(mean_response) ~ log(a/(1+exp(-b*(conc-c)))), df[df$drug=='drug_2',], start = c(a = 1, b = 1, c = 1))
With this initial approach of parameters, we sketch the plot using geom_smooth(). We again use nls() to find the right parameters:
#Plot
ggplot(data=df, aes(y=mean_response, x= conc, color = drug)) +
geom_pointrange(aes(ymax = (mean_response + std_dev), ymin = (mean_response - std_dev))) +
geom_smooth(data = df[df$drug=='drug_1',],method = "nls", se = FALSE,
formula = y ~ a/(1+exp(-b*(x-c))),
method.args = list(start = coef(fm1),
algorithm='port'),
color = "tomato")+
geom_smooth(data = df[df$drug=='drug_2',],method = "nls", se = FALSE,
formula = y ~ a/(1+exp(-b*(x-c))),
method.args = list(start = coef(fm0),
algorithm='port'),
color = "cyan3")
The output:

Drawing a Tangent to the Plot and Finding the X-Intercept using R

For a data frame whose 2 columns can draw a plot using plot(data$x,data$y) as shown below, how can we draw a tangent at an arbitrary point say x=25, then find the x interception of the tangent with the axis y=0?
data:
df <- structure(list(x = c(40, 39.8, 39.5999999999999, 39.3999999999999,
39.1999999999998, 39, 38.8, 38.5999999999999, 38.3999999999999,
38.1999999999998, 38, 37.8, 37.5999999999999, 37.3999999999999,
37.1999999999998, 37, 36.8, 36.5999999999999, 36.3999999999999,
36.1999999999998, 36, 35.8, 35.5999999999999, 35.3999999999999,
35.1999999999998, 35, 34.8, 34.5999999999999, 34.3999999999999,
34.1999999999998, 34, 33.8, 33.5999999999999, 33.3999999999999,
33.1999999999998, 33, 32.8, 32.5999999999999, 32.3999999999999,
32.1999999999998, 32, 31.8, 31.5999999999999, 31.3999999999999,
31.1999999999998, 31, 30.8, 30.5999999999999, 30.3999999999999,
30.1999999999998, 30, 29.8, 29.5999999999999, 29.3999999999999,
29.1999999999998, 29, 28.8, 28.5999999999999, 28.3999999999999,
28.1999999999998, 28, 27.8, 27.5999999999999, 27.3999999999999,
27.1999999999998, 27, 26.8, 26.5999999999999, 26.3999999999999,
26.1999999999998, 26, 25.8, 25.5999999999999, 25.3999999999999,
25.1999999999998, 25, 24.8, 24.5999999999999, 24.3999999999999,
24.1999999999998, 24, 23.8, 23.5999999999999, 23.3999999999999,
23.1999999999998, 23, 22.8, 22.5999999999999, 22.3999999999999,
22.1999999999998, 22, 21.8, 21.5999999999999, 21.3999999999999,
21.1999999999998, 21, 20.8, 20.5999999999999, 20.3999999999999,
20.1999999999998, 20, 19.8, 19.5999999999999, 19.3999999999999,
19.1999999999998, 19, 18.8, 18.5999999999999, 18.3999999999999,
18.1999999999998, 18, 17.8, 17.5999999999999, 17.3999999999999,
17.1999999999998, 17, 16.8, 16.5999999999999, 16.3999999999999,
16.1999999999998, 16, 15.8, 15.5999999999999, 15.3999999999999,
15.1999999999998, 15, 14.8, 14.5999999999999, 14.3999999999999,
14.1999999999998, 14, 13.8, 13.5999999999999, 13.3999999999999,
13.1999999999998, 13, 12.8, 12.5999999999999, 12.3999999999999,
12.1999999999998, 12, 11.8, 11.5999999999999, 11.3999999999999,
11.1999999999998, 11, 10.8, 10.5999999999999, 10.3999999999999,
10.1999999999998, 10, 9.79999999999995, 9.59999999999991, 9.39999999999986,
9.19999999999982, 9, 8.79999999999995, 8.59999999999991, 8.39999999999986,
8.19999999999982, 8, 7.79999999999995, 7.59999999999991, 7.39999999999986,
7.19999999999982, 7, 6.79999999999995, 6.59999999999991, 6.39999999999986,
6.19999999999982, 6, 5.79999999999995, 5.59999999999991, 5.39999999999986,
5.19999999999982, 5, 4.79999999999995, 4.59999999999991, 4.39999999999986,
4.19999999999982, 4, 3.79999999999995, 3.59999999999991, 3.39999999999986,
3.19999999999982, 3, 2.79999999999995, 2.59999999999991, 2.39999999999986,
2.19999999999982, 2, 1.79999999999995, 1.59999999999991, 1.39999999999986,
1.19999999999982, 1, 0.799999999999955, 0.599999999999909, 0.399999999999864,
0.199999999999818, 0, -0.200000000000045, -0.400000000000091,
-0.600000000000136, -0.800000000000182, -1, -1.20000000000005,
-1.40000000000009, -1.60000000000014, -1.80000000000018, -2,
-2.20000000000005, -2.40000000000009, -2.60000000000014, -2.80000000000018,
-3, -3.20000000000005, -3.40000000000009, -3.60000000000014,
-3.80000000000018, -4, -4.20000000000005, -4.40000000000009,
-4.60000000000014, -4.80000000000018, -5), y = c(35785, 34955.9,
34448, 33632.6, 32905.1, 31976.5, 31332.4, 30851.3, 29547.2,
29092, 28786.6, 28053.9, 27609.1, 27117.9, 26628.8, 26236.2,
25997.6, 25248.5, 24876.7, 24697.4, 24745.8, 24403.1, 23935.2,
23994.9, 23489, 23596, 23630.8, 23537, 23489.3, 23336.4, 23515.6,
23373.7, 23191.3, 23455.6, 23510.1, 23653.3, 23574.3, 23504.9,
23239.2, 22993.1, 23246.4, 23057.4, 22718.7, 22532.4, 22656.7,
22362.9, 22184, 21929.7, 21511.1, 21579.2, 21692.3, 21839.3,
21906.5, 22342, 22830.3, 23506.3, 24714.6, 26358.3, 28813.4,
32087.4, 37048.2, 43795.1, 52583.3, 63510.6, 74687.6, 87307,
98589.2, 109683, 123260, 143686, 173741, 206937, 225777, 213844,
188426, 179882, 195311, 213540, 199136, 153434, 102167, 65320.5,
43524.6, 31564.6, 24683.9, 20481.1, 17533.8, 15430.9, 13942.4,
12742.8, 11795.2, 11032.4, 10758.8, 9999.94, 9620.13, 9242.19,
8769.68, 8336.05, 7848.7, 7439.94, 7236.61, 6840.17, 6474.1,
6314.83, 6119.46, 5984.71, 5838.92, 5482.56, 5592.16, 5479.21,
5473.05, 5263.42, 5131.52, 5160.8, 5037.07, 5111.43, 4925.95,
5044.38, 5073.06, 5163.09, 5395.14, 5685.84, 5781.48, 5927.53,
5991.07, 5918.79, 6208.65, 6355.15, 6336.74, 6866.93, 6765.42,
7010.48, 6975.84, 7173.03, 7208.73, 7167.87, 7150.69, 7066.63,
6850.88, 6615.22, 6514.08, 6244.01, 6000.48, 5574.86, 5179.76,
5093.81, 4797.62, 4561.38, 4378.95, 4480.99, 4454.68, 4528.07,
4697.9, 4895.49, 5127.23, 5522.48, 5618.13, 5909.07, 6134.77,
6444.93, 6347.01, 6890.34, 7092.59, 7232.97, 7125.42, 6986.75,
6699.94, 6458.58, 6257.7, 6080.23, 5982.45, 5692.35, 5829.27,
5843.41, 6057.06, 6181.98, 6516.04, 6597.08, 6776.47, 6912.55,
7053.48, 7008.47, 7194.52, 7362.31, 7320.14, 7362.08, 7428.24,
7096.13, 6704.73, 6217.18, 5835.08, 5424.88, 5640.08, 5687.38,
5571.61, 5074.32, 4456.51, 3369.23, 2196.51, 1276.21, 756.717,
484.31, 315.931, 265.893, 190.014, 161.302, 145.354, 126.608,
90.857, 101.612, 76.3276, 84.7987, 83.5705, 77.3315, 53.2172,
52.2799, 39.4456, 53.3765, 30.8025, 48.2821, 39.0606, 22.8557,
34.3351, 25.2945, 25.4592, 21.7338)), .Names = c("x", "y"), class = "data.frame", row.names = c(NA,
-226L))
Here's an option using derivatives of a fitted spline function (smooth.spline - you may need to adjust the parameter spar to your case):
x <- seq(0,40)
y <- dnorm(seq(0,40), mean=25, sd=5)
plot(x, y)
spl <- smooth.spline(y ~ x)
lines(spl, col=2)
newx <- 20
pred0 <- predict(spl, x=newx, deriv=0)
pred1 <- predict(spl, x=newx, deriv=1)
yint <- pred0$y - (pred1$y*newx)
xint <- -yint/pred1$y
xint
The results visually:
plot(x, y)
abline(h=0, col=8)
lines(spl, col=2) # spline
points(pred0, col=2, pch=19) # point to predict tangent
lines(x, yint + pred1$y*x, col=3) # tangent (1st deriv. of spline at newx)
points(xint, 0, col=3, pch=19) # x intercept
Update
Here is an example using your dataset (df):
# fit smooth spline
plot(y ~ x, df)
spl <- smooth.spline(df$x, df$y, spar=0.3)
newx <- seq(min(df$x), max(df$x), 0.1)
pred <- predict(spl, x=newx, deriv=0)
lines(pred, col=2)
# solve for tangent at a given x
newx <- 25
pred0 <- predict(spl, x=newx, deriv=0)
pred1 <- predict(spl, x=newx, deriv=1)
yint <- pred0$y - (pred1$y*newx)
xint <- -yint/pred1$y
xint
# plot results
plot(y ~ x, df)
abline(h=0, col=8)
lines(spl, col=2) # spline
points(pred0, col=2, pch=19) # point to predict tangent
lines(x, yint + pred1$y*x, col=3) # tangent (1st deriv. of spline at newx)
points(xint, 0, col=3, pch=19) # x intercept
I'm not sure how much noise is in your data, but by using a smaller spar setting in smooth.spline, you can increase the detail of the fitted function.

Resources