obtain derivative by spline interpolation - r

there is a series of x and y values I have (but not the function itself). I would like to get derivative of the unknown function by spline interpolation of the x and y values (getting the derivat...).
My example
EDITED
x<-c(1,2,3,4,5,6,7,8,9,10)
y<-c(0.1,0.3,0.8,0.9,0.91,0.93,0.95,0.98,0.99,0.999)
is it possible in R to interpolate and to get the functional form of the derivative?
My problem is that I have only x and y values of a cdf function but would need to obtain the probability denisty function..so I want to get the derivative by spline interpolation...
The reason for the question is that I would need to obtain the pdf of that cdf so I am trying to spline interpolate the xy values of the cdf - please note that this is a simple example here and not a real cdf

I haven't found the functional form of restricted cubic splines to be particularly difficult to grasp after reading the explanation by Frank Harrell in his book: "Regression Modeling Strategies".
require(rms)
df <- data.frame( x = c(1,2,3,4,5,6,7,8,9,10),
y =c(12,2,-3,5,6,9,8,10,11,10.5))
ols( y ~ rcs(x, 3), df)
#--------------
Linear Regression Model
ols(formula = y ~ rcs(x, 3), data = df)
Model Likelihood Discrimination
Ratio Test Indexes
Obs 10 LR chi2 3.61 R2 0.303
sigma 4.4318 d.f. 2 R2 adj 0.104
d.f. 7 Pr(> chi2) 0.1646 g 2.811
Residuals
Min 1Q Median 3Q Max
-8.1333 -1.1625 0.5333 0.9833 6.9000
Coef S.E. t Pr(>|t|)
Intercept 5.0833 4.2431 1.20 0.2699
x 0.0167 1.1046 0.02 0.9884
x' 1.0000 1.3213 0.76 0.4738
#----------
The rms package has an odd system for storing summary information that needs to be done for some of its special
dd <- datadist(df)
options(datadist="dd")
mymod <- ols( y ~ rcs(x, 3), df)
# cannot imagine that more than 3 knots would make sense in such a small example
Function(mymod)
# --- reformatted to allow inspection of separate terms
function(x = 5.5) {5.0833333+0.016666667* x +
1*pmax(x-5, 0)^3 -
2*pmax(x-5.5, 0)^3 +
1*pmax(x-6, 0)^3 }
<environment: 0x1304ad940>
The zeros in the pmax functions basically suppress any contribution to the total from the term when the x value is less than the knots ( 5, 5.5 and 6 in this case)
Compare three versus four knots (and if you wanted smooth curves then include a finer grained ...-data argument to Predict):
png()
plot(df$x,df$y )
mymod <- ols( y ~ rcs(x, 3), df)
lines(df$x, predict(mymod) ,col="blue")
mymod <- ols( y ~ rcs(x, 4), df)
lines(df$x, predict(mymod) ,col="red")
dev.off()

Take a look at monotone cubic splines, which are nondecreasing by construction. A web search for "monotone cubic spline R" turns up some hits. I haven't used any of the packages mentioned.

Related

How to calculate t-statistic for the difference between two groups using robust standard erros?

I am trying to estimate t-statistic for a difference between two groups and I need to use robust standard errors.
I have two groups and I have estimated both groups coefficients' using lm-model. Then I have subtracted second models coefficient from the first ones coefficient. This way I am able to get the difference.
But now I need to calculate t-statistics for the difference using robust standard errors. And this is where the problems start.. I do not know how to calculate these robust standard errors when I have two groups that I would like to compare. I have tried using t.test function in R but I think this is not the right way.
Can you help me where to start?
Thank you in advance!
With the lmtest and sandwich packages:
# simulates some data
set.seed(666) # just for replication
n1 <- 10; n2 <- 15 # sample sizes
y1 <- rnorm(n1)
y2 <- rnorm(n2)
group <- rep(c("A", "B"), times = c(n1, n2))
dat <- data.frame(group = group, y = c(y1, y2))
# linear regression
fit <- lm(y ~ group, data = dat)
# standard erros, p-values, confidence intervals, based on robust
# estimation of the variance-covariance matrix
library(parameters)
standard_error_robust(fit)
p_value_robust(fit)
ci_robust(fit)
# or
library(lmtest)
library(sandwich)
coeftest(fit, vcov = vcovHC)
# t test of coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -0.096076 0.494739 -0.1942 0.8477
# groupB 0.102826 0.575257 0.1787 0.8597

Calculate α and β in Probit Model in R

I am facing following issue: I want to calculate the α and β from the following probit model in R, which is defined as:
Probability = F(α + β sprd )
where sprd denotes the explanatory variable, α and β are constants, F is the cumulative normal distribution function.
I can calculate probabilities for the entire dataset, the coeffcients (see code below) etc. but I do not know how to get the constant α and β.
The purpose is to determine the Spread in Excel that corresponds to a certain probability. E.g: Which Spread corresponds to 50% etc.
Thank you in advance!
Probit model coefficients
probit<- glm(Y ~ X, family=binomial (link="probit"))
summary(probit)
Call:
glm(formula = Y ~ X, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4614 -0.6470 -0.3915 -0.2168 2.5730
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3566755 0.0883634 -4.036 5.43e-05 ***
X -0.0058377 0.0007064 -8.264 < 2e-16 ***
From the help("glm") page you can see that the object returns a value named coefficients.
An object of class "glm" is a list containing at least the following
components:
coefficients a named vector of coefficients
So after you call glm() that object will be a list, and you can access each element using $name_element.
Reproducible example (not a Probit model, but it's the same):
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
d.AD <- data.frame(treatment, outcome, counts)
# fit model
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson())
Now glm.D93$coefficients will print the vector with all the coefficients:
glm.D93$coefficients
# (Intercept) outcome2 outcome3 treatment2 treatment3
#3.044522e+00 -4.542553e-01 -2.929871e-01 1.337909e-15 1.421085e-15
You can assign that and access each individually:
coef <- glm.D93$coefficients
coef[1] # your alpha
#(Intercept)
# 3.044522
coef[2] # your beta
# outcome2
#-0.4542553
I've seen in your deleted post that you are not convinced by #RLave's answer. Here are some simulations to convince you:
# (large) sample size
n <- 10000
# covariate
x <- (1:n)/n
# parameters
alpha <- -1
beta <- 1
# simulated data
set.seed(666)
y <- rbinom(n, 1, prob = pnorm(alpha + beta*x))
# fit the probit model
probit <- glm(y ~ x, family = binomial(link="probit"))
# get estimated parameters - very close to the true parameters -1 and 1
coef(probit)
# (Intercept) x
# -1.004236 1.029523
The estimated parameters are given by coef(probit), or probit$coefficients.

Simple Linear Regression lm function R

I've read some tutorial about the lm() function in R and I am a little bit confuse about how this function deal with continuous or discrete predictors. In https://www.r-bloggers.com/r-tutorial-series-simple-linear-regression/, for continuous labels, the coefficients represent the intercept and the slope of the linear regression.
This is clear, but if now I have a category of gender, where values are 0 or 1, how does the lm() function work. Does the function apply a logistic regression or is it still possible to use the function in this way.
Your the answer you are looking for is unclear from your question. Yes, you can use the lm function with a categorical variables. The resultant equation is the sum of two linear fits.
It is best to illustrate with an example. Using made up data:
x <- seq(1:10)
y1<- x+rnorm(10, 0, 0.1)
y2<- 14-x+rnorm(10, 0, 0.1)
f<-rep(c("A", "B"), each=10)
df<-data.frame(x=c(x,x), y=c(y1, y2), f)
#Model 1
print(lm(y1~x))
# lm(formula = y1 ~ x)
#
# Coefficients:
# (Intercept) x
# 0.1703 0.9754
#Model 2
model<-lm(y~x*f, data=df)
print(model)
# lm(formula = y ~ x * f, data = df)
#
# Coefficients:
#(Intercept) x fB x:fB
# 0.1703 0.9754 13.7622 -1.9709
#Model 3
print(lm(y2~x))
# lm(formula = y2 ~ x)
#
# Coefficients:
# (Intercept) x
# 13.9325 -0.9955
After running the code above and comparing the Model 1 and 2, you can see how the intercept and the x slope are the same. This is because the when it is factor A (i.e. 0 or absence), fb and x:fb are 0 and drops out. When the factor is B then fb and x:fb are actual values and are additive to the model.
If you add the intercept and fb together and add the x slope to x:fb the results will be the slope and intercept of model 3.
I hope this helps and did not cloud your understanding.

How to interpret lm() coefficient estimates when using bs() function for splines

I'm using a set of points which go from (-5,5) to (0,0) and (5,5) in a "symmetric V-shape". I'm fitting a model with lm() and the bs() function to fit a "V-shape" spline:
lm(formula = y ~ bs(x, degree = 1, knots = c(0)))
I get the "V-shape" when I predict outcomes by predict() and draw the prediction line. But when I look at the model estimates coef(), I see estimates that I don't expect.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.93821 0.16117 30.639 1.40e-09 ***
bs(x, degree = 1, knots = c(0))1 -5.12079 0.24026 -21.313 2.47e-08 ***
bs(x, degree = 1, knots = c(0))2 -0.05545 0.21701 -0.256 0.805
I would expect a -1 coefficient for the first part and a +1 coefficient for the second part. Must I interpret the estimates in a different way?
If I fill the knot in the lm() function manually than I get these coefficients:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.18258 0.13558 -1.347 0.215
x -1.02416 0.04805 -21.313 2.47e-08 ***
z 2.03723 0.08575 23.759 1.05e-08 ***
That's more like it. Z's (point of knot) relative change to x is ~ +1
I want to understand how to interpret the bs() result. I've checked, the manual and bs model prediction values are exact the same.
I would expect a -1 coefficient for the first part and a +1 coefficient for the second part.
I think your question is really about what is a B-spline function. If you want to understand the meaning of coefficients, you need to know what basis functions are for your spline. See the following:
library(splines)
x <- seq(-5, 5, length = 100)
b <- bs(x, degree = 1, knots = 0) ## returns a basis matrix
str(b) ## check structure
b1 <- b[, 1] ## basis 1
b2 <- b[, 2] ## basis 2
par(mfrow = c(1, 2))
plot(x, b1, type = "l", main = "basis 1: b1")
plot(x, b2, type = "l", main = "basis 2: b2")
Note:
B-splines of degree-1 are tent functions, as you can see from b1;
B-splines of degree-1 are scaled, so that their functional value is between (0, 1);
a knots of a B-spline of degree-1 is where it bends;
B-splines of degree-1 are compact, and are only non-zero over (no more than) three adjacent knots.
You can get the (recursive) expression of B-splines from Definition of B-spline. B-spline of degree 0 is the most basis class, while
B-spline of degree 1 is a linear combination of B-spline of degree 0
B-spline of degree 2 is a linear combination of B-spline of degree 1
B-spline of degree 3 is a linear combination of B-spline of degree 2
(Sorry, I was getting off-topic...)
Your linear regression using B-splines:
y ~ bs(x, degree = 1, knots = 0)
is just doing:
y ~ b1 + b2
Now, you should be able to understand what coefficient you get mean, it means that the spline function is:
-5.12079 * b1 - 0.05545 * b2
In summary table:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.93821 0.16117 30.639 1.40e-09 ***
bs(x, degree = 1, knots = c(0))1 -5.12079 0.24026 -21.313 2.47e-08 ***
bs(x, degree = 1, knots = c(0))2 -0.05545 0.21701 -0.256 0.805
You might wonder why the coefficient of b2 is not significant. Well, compare your y and b1: Your y is symmetric V-shape, while b1 is reverse symmetric V-shape. If you first multiply -1 to b1, and rescale it by multiplying 5, (this explains the coefficient -5 for b1), what do you get? Good match, right? So there is no need for b2.
However, if your y is asymmetric, running trough (-5,5) to (0,0), then to (5,10), then you will notice that coefficients for b1 and b2 are both significant. I think the other answer already gave you such example.
Reparametrization of fitted B-spline to piecewise polynomial is demonstrated here: Reparametrize fitted regression spline as piece-wise polynomials and export polynomial coefficients.
A simple example of first degree spline with single knot and interpretation of the estimated coefficients to calculate the slope of the fitted lines:
library(splines)
set.seed(313)
x<-seq(-5,+5,len=1000)
y<-c(seq(5,0,len=500)+rnorm(500,0,0.25),
seq(0,10,len=500)+rnorm(500,0,0.25))
plot(x,y, xlim = c(-6,+6), ylim = c(0,+8))
fit <- lm(formula = y ~ bs(x, degree = 1, knots = c(0)))
x.predict <- seq(-2.5,+2.5,len = 100)
lines(x.predict, predict(fit, data.frame(x = x.predict)), col =2, lwd = 2)
produces plot
Since we are fitting a spline with degree=1 (i.e. straight line) and with a knot at x=0, we have two lines for x<=0 and x>0.
The coefficients are
> round(summary(fit)$coefficients,3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.014 0.021 241.961 0
bs(x, degree = 1, knots = c(0))1 -5.041 0.030 -166.156 0
bs(x, degree = 1, knots = c(0))2 4.964 0.027 182.915 0
Which can be translated into the slopes for each of the straight line using the knot (which we specified at x=0) and boundary knots (min/max of the explanatory data):
# two boundary knots and one specified
knot.boundary.left <- min(x)
knot <- 0
knot.boundary.right <- max(x)
slope.1 <- summary(fit)$coefficients[2,1] /(knot - knot.boundary.left)
slope.2 <- (summary(fit)$coefficients[3,1] - summary(fit)$coefficients[2,1]) / (knot.boundary.right - knot)
slope.1
slope.2
> slope.1
[1] -1.008238
> slope.2
[1] 2.000988

Regression line and fitted curve for scatter plots in r

I have a set of data of HEIGHT and DIAMETER of trees. I want to find a regression relationship between them and plot it. For example I want to try a * DIAMETER + b * DIAMETER^2 + C and show its curve in a scatterplot.
By bellow instruction I reach several lines, but I want just a trend line related to developed Model. what should I do?
setwd('D:\\PhD\\Data\\Field Measurments\\Data Analysis\\')
dat1 = read.table('Fagus.csv', header = TRUE, sep =',')
# fit a non-linear regression
Height = dat1$Height
Diameter = dat1$Diameter
plot(Diameter, Height, main="Height Curve", xlab="Diameter", ylab="Height", pch=19)
nls1 <- nls(Height ~ a*(Diameter)^2+b*Diameter+c, data = dat1, start = list(a =a, b=b,c=c), algorithm="port")
lines(fitted(nls1) ~ Diameter, lty = 1, col = "red") # solid red line
Is above instruction wrong for my purpose?
As stated above, you should not put the coefficients into your formulas. Try:
nls1 <- nls(Height ~ I(Diameter^2) + Diameter, data = dat1, algorithm="port")
Regarding the I(Diameter ^2):
"To avoid this confusion, the function I() can be used to bracket those portions of a model formula where the operators are used in their arithmetic sense. For example, in the formula y ~ a + I(b+c), the term b+c is to be interpreted as the sum of b and c." ~ formula{stats} documentation
I did not run the rest (on mobile), but your code looks OK at first glance.
There seems to be a misunderstanding here about linear vs. non-linear models. A linear model is linear in the coefficients. A non-linear model is not. Whether the model is linear in the predictor variables (Diameter in your case) is irrelevant. So in your case a model of the form:
Height = a * Diameter + b * Diameter^2 + c
is a linear model. You don't need to use nls(...). You can specify the model formula in either of two ways, both of which lead to identical results:
Height~Diameter + I(Diameter^2)
or
Height~poly(Diameter,2,raw=TRUE)
The second form uses the poly(...) function to create a polynomial of order 2. raw=T tells poly(...) to generate raw polynomials, rather than orthogonal polynomials (the default). The first form is a bit simpler unless you want polynomials of order greater than 2. Here's an example using both forms.
set.seed(1) # for reproducible example
df <- data.frame(Diameter=sample(1:50,50))
df$Height <- with(df,2*Diameter + .5*Diameter^2 + 4 + rnorm(50,sd=30))
fit <- lm(Height~Diameter + I(Diameter^2),df)
summary(fit)
# ...
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -6.85088 12.26720 -0.558 0.57917
# Diameter 3.31030 1.10964 2.983 0.00451 **
# I(Diameter^2) 0.47717 0.02109 22.622 < 2e-16 ***
fit.poly<- lm(Height~poly(Diameter,2,raw=TRUE),df)
summary(fit.poly)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) -6.85088 12.26720 -0.558 0.57917
# poly(Diameter, 2, raw = TRUE)1 3.31030 1.10964 2.983 0.00451 **
# poly(Diameter, 2, raw = TRUE)2 0.47717 0.02109 22.622 < 2e-16 ***
To plot the data and the trend curve:
df$pred <- predict(fit)
with(df,plot(Height~Diameter))
with(df[order(df$Diameter),],lines(pred~Diameter,col="red",lty=2))
Your problem is your start= parameter. You need to supply actual values for the a, b, and c parameters. Here's a reproducible example
#sample data
dat<-data.frame(Diameter = runif(50, 2, 6))
dat<-transform(dat,Height=2*Diameter + .75 * Diameter^2 +4 + rnorm(50))
dat<-dat[order(dat$Diameter), ]
#now fit the model
mynls<-nls(Height ~ a*I(Diameter^2) + b*Diameter + c, dat,
start=list(a=1, b=1, c=1), algorithm="port")
Notice how we set default values of 1 for each of the coefficients. You can set whatever you think would be most appropriate. And how we can plot the raw values with the fitted results
plot(Height~Diameter,dat, main="Height Curve",
xlab="Diameter", ylab="Height", pch=19)
lines(fitted(mynls)~ dat$Diameter, col="red")
This gives

Resources