How to plot a log curve in R? - r

I have the following set of data:
x = c(8,16,64,128,256)
y = c(7030.8, 3624.0, 1045.8, 646.2, 369.0)
Which, when plotted, looks like an exponential decay or negative ln function.
I'm trying to fit a smooth curve to this data, but I don't know how. I've tried nls and lm functions, but I can't seem to get it right. The online examples have too many steps for the simple data I have, and I can't understand well enough to modify the examples for what I need. Any help or advice would be appreciated. Thank you.
Edit: When I say I tried nls and lm functions, I mean that the lines produced were linear, no matter what parameters I tried.
And when I say too many steps, I mean the examples I found were for predicting with 2 independent variables, or for creating multiple fit lines.
What I'm asking is what is the best way to fit a simple smooth line to data that, when graphed, looks like an exponential decay or negative ln. What the equation of the line is isn't important, it's meant to be a reference for the shape of the data.

A good way to fit a curve to a function is the built-in nls function, which performs non-linear least squares optimization. For example, if you wanted to fit the model y = b * x^e, you could do:
n <- nls(y ~ b * x ^ e, data = data.frame(x, y), start = c(b = 1000, e = -1))
(?nls, or this walkthrough, can tell you more about these options). You could then plot the curve on top of your points:
plot(x, y)
curve(predict(n, newdata = data.frame(x = x)), add = TRUE)
You can try a few other models (specified by that formula in nls) that may fit your data.

Maybe 'lowess' is what you're looking for? Try:
plot(y ~ x)
lines(lowess(y ~ x))
That function just connects the dots. It sounds like you would prefer something that smooths out the elbows. In principle, 'loess' is useful for that, but you don't have enough data points here for that to work.

Related

How I use numerical methods to calculate roots in R

A machine learning model predicted probability p using input x. It is unknown how model calculates the probability.
In the example below,
We have 100 xand p values.
Can someone please show an algorithm to find all values of x for which p is 0.5.
There are two challenges
I don't know the function p = f(x). I don't wish to fit some smooth polynomial curves which will remove the noise. The noises are important.
x values are discrete. So, we need to interpolate to find the desired values of x.
library(tidyverse)
x <- c(0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1.0,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2.0, 2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3.0,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9,4.0,4.1, 4.2,4.3,4.4,4.5,4.6,4.7,4.8,4.9,5.0,5.1,5.2,5.3,5.4,5.5,5.6,5.7,5.8,5.9,6.0,6.1,6.2, 6.3,6.4,6.5,6.6,6.7,6.8,6.9,7.0,7.1,7.2,7.3,7.4,7.5,7.6,7.7,7.8,7.9,8.0,8.1,8.2,8.3, 8.4,8.5,8.6,8.7,8.8,8.9,9.0,9.1,9.2,9.3,9.4,9.5,9.6,9.7,9.8,9.9, 10.0)
p <- c(0.69385203,0.67153592,0.64868391,0.72205029,0.64917218,0.66818861,0.55532616,0.58631660,0.65013198,0.53695673,0.57401464,0.57812980,0.39889101,0.41922821,0.44022287,0.48610191,0.34235438,0.30877592,0.20408235,0.17221558,0.23667792,0.29237938,0.10278049,0.20981142,0.08563396,0.12080935,0.03266140,0.12362265,0.11210208,0.08364931,0.04746024,0.14754152,0.09865584,0.16588175,0.16581508,0.14036209,0.20431540,0.19971309,0.23336415,0.12444293,0.14120138,0.21566896,0.18490258,0.34261082,0.38338941,0.41828079,0.34217964,0.38137610,0.41641546,0.58767796,0.45473784,0.60015956,0.63484702,0.55080768,0.60981219,0.71217369,0.60736818,0.78073246,0.68643671,0.79230105,0.76443958,0.74410139,0.63418201,0.64126278,0.63164615,0.68326471,0.68154362,0.75890922,0.72917978,0.55839943,0.55452549,0.69419777,0.64160572,0.63205751,0.60118916,0.40162340,0.38523375,0.39309260,0.47021037,0.33391614,0.22400555,0.20929558,0.20003229,0.15848124,0.11589228,0.13326047,0.11848593,0.17024106,0.11184393,0.12506915,0.07740497,0.02548386,0.07381765,0.02610759,0.13271803,0.07034573,0.02549706,0.02503864,0.11621910,0.08636754)
tbl <- tibble(x, p)
# plot for visualization
ggplot(data = tbl,
aes(x = x,
y = p)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0.5) +
theme_bw() +
theme(aspect.ratio = 0.4)
The figure below shows that there are five roots.
This question is clearer than your previous one: How I use numerical methods to calculate roots in R.
I don't know the function p = f(x)
So you don't have a predict function to calculate p for new x values. This is odd, though. Many statistical models have methods for predict. As BenBolker mentioned, the "obvious" solution is to use uniroot or more automated routines to find a or all roots, for the following template function:
function (x, model, p.target) predict(model, x) - p.target
But this does not work for you. You only have a set of (x, p) values that look noisy.
I don't wish to fit some smooth polynomial curves which will remove the noise. The noises are important.
So we need to interpolate those (x, p) values for a function p = f(x).
So, we need to interpolate to find the desired values of x.
Exactly. The question is what interpolation method to use.
The figure below shows that there are five roots.
This line chart is actually a linear interpolation, consisting of piecewise line segments. To find where it crosses a horizontal line, you can use function RootSpline1 defined in my Q & A back in 2018: get x-value given y-value: general root finding for linear / non-linear interpolation function
RootSpline1(x, p, 0.5)
#[1] 1.243590 4.948805 5.065953 5.131125 7.550705
Thank you very much. Please add the information of how to install the required package. That will help everyone.
This function is not in a package. But this is a good suggestion. I am now thinking of collecting all functions I wrote on Stack Overflow in a package.
The linked Q & A does mention an R package on GitHub: https://github.com/ZheyuanLi/SplinesUtils, but it focuses on splines of higher degree, like cubic interpolation spline, cubic smoothing spline and regression B-splines. Linear interpolation is not dealt with there. So for the moment, you need to grab function RootSpline1 from my Stack Overflow answer.

Is there a way to get the antilog in R?

I've been given a data set and have inputted the values into R. For the assignment question you need to replicate the following equation: y= 0.08x^0.75.
In order to turn this into an equation that fits into y = Bo + B1x, I took the log10 of both sides using the following code.
fit <- lm(log10(Predator_Biomass)~log10(Prey_Biomass))
summary(fit)
From this I was able to obtain: y = -1.1050 + 0.7450x
Now I've been instructed that I need to take the antilog of both sides so that the Bo value will match 0.08 or be somewhat similar. Is there an antilog function in R that could be helpful to this? Any information would be helpful.
EDIT: Apparently everything that was offered as an answer only took a antilog of the coefficients and not the entire equation. Is there a way to take the antilog of an equation in R?
This is really a math problem more than a computational problem. If you fit a log-log regression as follows:
fit <- lm(log10(Predator_Biomass)~log10(Prey_Biomass))
The underlying equation is
log10(y) = a+b*log10(x)
Raising 10 to both sides gives:
y = 10^(a+b*log10(x)) = 10^a * 10^(b*log10(x)) = 10^a * (10^log10(x))^b
= 10^a * x^b
The parameters a and b are the first and second coefficients of the linear model. If you want to recover the parameters of y = c*x^b you need to antilog the intercept (10^(coef(fit)[1])), but the exponent b should be fine without transformation (coef(fit)[2]).

exponential regression with R ( and negative values)

I am trying to fit a curve to a set of data points but did not succeed. So I ask you.
plot(time,val) # look at data
exponential.model <- lm(log(val)~ a) # compute model
fit <- exp(predict(exponential.model,list(Time=time))) # create the fitted curve
plot(time,val)#plot it again
lines(time, fit,lwd=2) # show the fitted line
My only problem is, that my data contains negative values and so log(val) produces a lot of NA making the model computation crash.
I know that my data does not necessarily look like exponential , but I want to see the fit anyway. I also used another program which shows me val=27.1331*exp(-time/2.88031) is a nice fit but I do not know, what I am doing wrong.
I want to compute it with R.
I had the idea to shift data so no negative values remain, but result is poor and quite sure wrong.
plot(time,val+20) # look at data
exponential.model <- lm(log(val+20)~ a) # compute model
fit <- exp(predict(exponential.model,list(Time=time))) # create the fitted curve
plot(time,val)#plot it again
lines(time, fit-20,lwd=2) # show the (BAD) fitted line
Thank you!
I figured some things out and have a satisfying solution.
exponential.model <- lm(log(val)~ a) # compute model
The log(val) term is trying to rescale the values, so a linear model can be applied. Since this not possible to my values, you have to use a non-linear model (nls).
exponential.model <- nls(val ~ a*exp(b*time), start=c(b=-0.1,h=30))
This worked fine for me.
satisfying fit

R - poly.calc not stable when using many points

I've been trying to solve a problem using Lagrange interpolation, which is implemented in poly.calc method (polynom package) in R language.
Basically, my problem is to predict the population of a certain country using Lagrange Interpolation. I have the population from the past years (1961 - 2014). The csv file is here
w1 = read.csv(file="country.csv", sep=",", head=TRUE)
array_x = w1$x
array_y = w1$y
#calls Lagrange Method
p = poly.calc(array_x, array_y)
#create a function to evaluate the polynom
prf <- as.function(p)
#create some points to plot
myx = seq(1961, 2020, 0.5)
#y's to plot
myy = prf(myx)
#plot
plot(myx, myy,col='blue')
After that, the plotted curve is declining and the y-axis is (very big) negative (power of 134).
It does not make sense.
However, if I use like five points, it is correct.
This is not really an SO question but rather a numerical analysis question.
R is doing everything you want it to, it's not a programming error. It's just that what you want it to do is notoriously bad. Lagrange polynomials are notorious for being incredibly unstable, especially when a large number of points are fit.
A much more stable alternative is the use of splines, such as B-splines. They can be fit very easily with R's default spline library into any regression model, i.e. you could fit a least squares model with
library(splines)
x <- sort(runif(500, -3,3) ) #sorting makes for easier plotting ahead
y <- sin(x)
splineFit <- lm(y ~ bs(x, df = 5) )
est_y <- predict(splineFit)
plot(x, y, type = 'l')
lines(x, est_y, col = 'blue')
You can see from the above model that the splines can do a good job of fitting non-linear relations.

finding functions that match dot plots

ggplot(test,aes(x=timepoints,y= mean,ymax = mean + sde, ymin = mean - sde)) +
geom_errorbar(width=2) +
geom_point() +
geom_line() +
stat_smooth(method='loess') +
xlab('Time (min)') +
ylab('Fold Induction') +
opts(title = 'yo')
I can plot the blue 'loess'-ed line. But is there a way to find the mathematical function of the blue 'loess'-ed line?
You can get the predictions for a regular sequence:
fit <-loess( mean ~ timepoints, data=test)
fit.points <- predict(fit, newdata= data.frame(
speed = seq(min(timepoints), max(timepoints), length=100)),
se = FALSE)
fitdf <- dataframe(x = seq(min(timepoints), max(timepoints), length=100)
y = fit.points)
You can then fit to that set of points with splines of an appropriate degree. Cubic spline fits can be described with greater ease than can loess fits.It would be easier to synchronize an answer to variable names it you had offered a data example to work with. The plot does not seem to be created with that code.
Rule Number One: not all distributions have a (closed-form) function which generates them. Yes, you can create a close fit by way of splines, or calculating moments (mean, variance, skew, etc) and building the series, so your choice depends on whether you intend to interpolate, extrapolate, or just "view" the resultant function.
In the scientific world, it's more common to have a theory, or premise, about the behavior behind your data. You can then do standard (e.g. nls) fitting methods to see how well the proposed fit function can be made to match your data.
To understand how the loess line is computed see the loess.demo function in the TeachingDemos package. This is an interactive graphical demonstration that will show how the y-value at each point is computed for each x-value based on the data and bandwidth parameter (it also shows the difference in the raw loess fit and the spline that is often fit to the loess estimates).

Resources