Finding x for f(x) loess function - r

I have a couple of loess models using similar data to....
set.seed(123)
y<-(runif(100,-20,20))
z<-seq(-12.75,12,.25)*rnorm(100,1,3)
x<-seq(1,100,1)
df<-data.frame(cbind(y,x,z))
model <- loess(y ~ x, data = df)
model2<-loess(z~x,data=df)
What I am trying to accomplish (without any luck) is to identify where the smoothed lines do 2 things:
1) I want to identify at what value(s) of x do the lines cross y=0
2) I want to identify at what value(s) of x the 2 loess lines cross each other.
I've been looking for similar problems and solutions to those problems for too long now with no success. Any help would be greatly appreciated.
ggplot(df,aes(x=x,y=y))+
geom_point()+
geom_smooth(method="loess",se=F)+
geom_smooth(aes(y=z),method="loess",se=F)

You can use predict to get y value for any x, and then optimise to find the specific x value that solves for the y value you want.
For example, to find the zero crossing of model, we can optimise to find where the square of its fitted value is a minimum
zero1 <- optimize(function(x, m) predict(m, x)^2, range(x), model)
#
# $minimum
# [1] 67.89191
Note that this will only find a single local minimum. If your model crosses zero several times, you will need to solve like this in each of the ranges where there is a zero (by changing the second argument of optimize, which specifies the range to search within).
Exactly the same approach can find where the models intersect. For that case you minimise the square of the difference between the two models:
intersection <- optimize(function(x, m1, m2) (predict(m1, x) - predict(m2, x))^2,
range(x), model, model2)
# $minimum
# [1] 45.65295

Related

Looking for a way to find the differential equation of an orthogonal regression

I used orthogornal regression in R to find a relation between two variables. I I want to find the tangent in several points on the best fit line.
D(expression (model), "x")
gives me a very unexpected result. I suspect it is because the poly function uses orthogornal polynomiums. In the regression above i get
D(expression(44+ 67*x -5.5*x^2), "x")
which returns me
67 - 5.5 * (2 * x)
It is obviously wrong, "like the coefficients" (know they are not wrong).
x <- c(1,2,3,4,5,6,7,8,9,10)
y <-c(10,15,23,33,46,50,57,63,68,75)
model <- lm( y ~poly(x,2))
Now i want to find the tangent in x=2 and x=7
If i just look at the numbers i suspect that in x=2 the tangent would be something like 6.5? (23-10) / (3-1)
Because it is a second order poly regression, it it makes no sense to input the summary variables i get from the regression, as it gives a meaningless result

exponential regression with R ( and negative values)

I am trying to fit a curve to a set of data points but did not succeed. So I ask you.
plot(time,val) # look at data
exponential.model <- lm(log(val)~ a) # compute model
fit <- exp(predict(exponential.model,list(Time=time))) # create the fitted curve
plot(time,val)#plot it again
lines(time, fit,lwd=2) # show the fitted line
My only problem is, that my data contains negative values and so log(val) produces a lot of NA making the model computation crash.
I know that my data does not necessarily look like exponential , but I want to see the fit anyway. I also used another program which shows me val=27.1331*exp(-time/2.88031) is a nice fit but I do not know, what I am doing wrong.
I want to compute it with R.
I had the idea to shift data so no negative values remain, but result is poor and quite sure wrong.
plot(time,val+20) # look at data
exponential.model <- lm(log(val+20)~ a) # compute model
fit <- exp(predict(exponential.model,list(Time=time))) # create the fitted curve
plot(time,val)#plot it again
lines(time, fit-20,lwd=2) # show the (BAD) fitted line
Thank you!
I figured some things out and have a satisfying solution.
exponential.model <- lm(log(val)~ a) # compute model
The log(val) term is trying to rescale the values, so a linear model can be applied. Since this not possible to my values, you have to use a non-linear model (nls).
exponential.model <- nls(val ~ a*exp(b*time), start=c(b=-0.1,h=30))
This worked fine for me.
satisfying fit

Finding non-linear correlations in R

I have about 90 variables stored in data[2-90]. I suspect about 4 of them will have a parabola-like correlation with data[1]. I want to identify which ones have the correlation. Is there an easy and quick way to do this?
I have tried building a model like this (which I could do in a loop for each variable i = 2:90):
y <- data$AvgRating
x <- data$Hamming.distance
x2 <- x^2
quadratic.model = lm(y ~ x + x2)
And then look at the R^2/coefficient to get an idea of the correlation. Is there a better way of doing this?
Maybe R could build a regression model with the 90 variables and chose the ones which are significant itself? Would that be in any way possible? I can do this in JMP for linear regression, but I'm not sure I could do non-linear regression with R for all the variables at ones. Therefore I was manually trying to see if I could see which ones are correlated in advance. It would be helpful if there was a function to use for that.
You can use nlcor package in R. This package finds the nonlinear correlation between two data vectors.
There are different approaches to estimate a nonlinear correlation, such as infotheo. However, nonlinear correlations between two variables can take any shape.
nlcor is robust to most nonlinear shapes. It works pretty well in different scenarios.
At a high level, nlcor works by adaptively segmenting the data into linearly correlated segments. The segment correlations are aggregated to yield the nonlinear correlation. The output is a number between 0 to 1. With close to 1 meaning high correlation. Unlike a pearson correlation, negative values are not returned because it has no meaning in nonlinear relationships.
More details about this package here
To install nlcor, follow these steps:
install.packages("devtools")
library(devtools)
install_github("ProcessMiner/nlcor")
library(nlcor)
After you install it,
# Implementation
x <- seq(0,3*pi,length.out=100)
y <- sin(x)
plot(x,y,type="l")
# linear correlation is small
cor(x,y)
# [1] 6.488616e-17
# nonlinear correlation is more representative
nlcor(x,y, plt = T)
# $cor.estimate
# [1] 0.9774
# $adjusted.p.value
# [1] 1.586302e-09
# $cor.plot
As shown in the example the linear correlation was close to zero although there was a clear relationship between the variables that nlcor could detect.
Note: The order of x and y inside the nlcor is important. nlcor(x,y) is different from nlcor(y,x). The x and y here represent 'independent' and 'dependent' variables, respectively.
Fitting a generalized additive model, will help you identify curvature in the
relationships between the explanatory variables. Read the example on page 22 here.
Another option would be to compute mutual information score between each pair of variables. For example, using the mutinformation function from the infotheo package, you could do:
set.seed(1)
library(infotheo)
# corrleated vars (x & y correlated, z noise)
x <- seq(-10,10, by=0.5)
y <- x^2
z <- rnorm(length(x))
# list of vectors
raw_dat <- list(x, y, z)
# convert to a dataframe and discretize for mutual information
dat <- matrix(unlist(raw_dat), ncol=length(raw_dat))
dat <- discretize(dat)
mutinformation(dat)
Result:
| | V1| V2| V3|
|:--|---------:|---------:|---------:|
|V1 | 1.0980124| 0.4809822| 0.0553146|
|V2 | 0.4809822| 1.0943907| 0.0413265|
|V3 | 0.0553146| 0.0413265| 1.0980124|
By default, mutinformation() computes the discrete empirical mutual information score between two or more variables. The discretize() function is necessary if you are working with continuous data transform the data to discrete values.
This might be helpful at least as a first stab for looking for nonlinear relationships between variables, such as that described above.

Coefficients of my polynomial model in R don't match graph

Using Greg's helpful answer here, I fit a second order polynomial regression line to my dataset:
poly.fit<-lm(y~poly(x,2),df)
When I plot the line, I get the graph below:
The coefficients are:
# Coefficients:
# (Intercept) poly(x, 2)1 poly(x, 2)2
# 727.1 362.4 -269.0
I then wanted to find the x-value of the peak. I assume there is an easy way to do so in R but I did not know it,* so I went to Wolfram Alpha. I entered the equation:
y=727.1+362.4x-269x^2
Wolfram Alpha returned the following:
As you can see, the function intersects the x-axis at approximately x=2.4. This is obviously different from my plot in R, which ranges from 0≤x≤80. Why are these different? Does R interpret my x-values as a fraction of some backroom variable?
*I would also appreciate answers on how to find this peak. Obviously I could take the derivative, but how do I set to zero?
Use predict.
plot( 40:90, predict( poly.fit, list(x=40:90) )
In the case of a quadratic polynomial, you can of course use a little calculus and algebra (once you have friendly coefficients).
Somewhat more generally, you can get an estimate by evaluating your model over a range of candidate values and determining which one gives you the maximum response value.
Here is a (only moderately robust) function which will work here.
xmax <- function(fit, startx, endx, x='x', within=NA){
## find approximate value of variable x where model
## specified by fit takes maximum value, inside interval
## [startx, endx]; precision specified by within
within <- ifelse(is.na(within), (endx - startx)/100, within)
testx <- seq(startx, endx, by=within)
testlist <- list(testx)
names(testlist)[1] <- x
testy <- predict(fit, testlist)
testx[which.max(testy)]
}
Note if your predictor variable were called something other than x, you have to specify it as a string in the x parameter.
So to find the x value where your curve has its peak:
xmax(poly.fit, 50, 80, within=0.1)

inverse of 'predict' function

Using predict() one can obtain the predicted value of the dependent variable (y) for a certain value of the independent variable (x) for a given model. Is there any function that predicts x for a given y?
For example:
kalythos <- data.frame(x = c(20,35,45,55,70),
n = rep(50,5), y = c(6,17,26,37,44))
kalythos$Ymat <- cbind(kalythos$y, kalythos$n - kalythos$y)
model <- glm(Ymat ~ x, family = binomial, data = kalythos)
If we want to know the predicted value of the model for x=50:
predict(model, data.frame(x=50), type = "response")
I want to know which x makes y=30, for example.
Saw the previous answer is deleted. In your case, given n=50 and the model is binomial, you would calculate x given y using:
f <- function (y,m) {
(logit(y/50) - coef(m)[["(Intercept)"]]) / coef(m)[["x"]]
}
> f(30,model)
[1] 48.59833
But when doing so, you better consult a statistician to show you how to calculate the inverse prediction interval. And please, take VitoshKa's considerations into account.
Came across this old thread but thought I would add some other info. Package MASS has function dose.p for logit/probit models. SE is via delta method.
> dose.p(model,p=.6)
Dose SE
p = 0.6: 48.59833 1.944772
Fitting the inverse model (x~y) would not makes sense here because, as #VitoshKa says, we assume x is fixed and y (the 0/1 response) is random. Besides, if the data weren’t grouped you’d have only 2 values of the explanatory variable: 0 and 1. But even though we assume x is fixed it still makes sense to calculate a confidence interval for the dose x for a given p, contrary to what #VitoshKa says. Just as we can reparameterize the model in terms of ED50, we can do so for ED60 or any other quantile. Parameters are fixed, but we still calculate CI's for them.
The chemcal package has an inverse.predict() function, which works for fits of the form y ~ x and y ~ x - 1
You just have to rearrange the regression equation, but as the comments above state this may prove tricky and not necessarily have a meaningful interpretation.
However, for the case you presented you can use:
(1/coef(model)[2])*(model$family$linkfun(30/50)-coef(model)[1])
Note I did the division by the x coefficient first to allow the name attribute to be correct.
For just a quick view (without intervals and considering additional issues) you could use the TkPredict function in the TeachingDemos package. It does not do this directly, but allows you to dynamically change the x value(s) and see what the predicted y-value is, so it would be fairly simple to move x until the desired Y is found (for given values of additional x's), this will also show possibly problems with multiple x's that would work for the same y.

Resources