R - creating a linear model with fixed poly() coefficients - r

In R, one can build an lm() or glm() object with fixed coefficients, using the offset parameter in a formula.
x=seq(1,100)
y=x^2+3*x+7
# Forcing to fit the polynomial: 2x^2 + 4x + 8
fixed_model = lm(y ~ 0 + offset(8 + 4*x + 2*I(x^2) ))
Is it possible to do the same thing using poly()? I tried the code below but it doesn't seem to work.
fixed_model_w_poly <- lm(y ~ 0 + offset(poly(x, order=2, raw = TRUE, coefs= c(8, 4, 2))))
Error : number of offsets is 200, should equal 100 (number of observations)
I want to use poly() as a convenient interface to run iterations with a high number of fixed coefficients or order values, rather than having to manually code: offset(8 + 4*x + 2*I(x^2) ) for each order/coefficient combination.
P.S: Further but not essential information: This is to go inside an MCMC routine. So an example usage would be to generate (and then compare) model_current to model_next in the below code:
library(MASS)
coeffs_current <- c(8, 4, 2)
model_current <- lm(y ~ 0 + offset(poly(x, order=2, raw = TRUE, coefs= coeffs_current )))
cov <- diag(rep(1,3))
coeffs_next <- mvrnorm(1, mu = as.numeric(coeffs_current ),
Sigma = cov )
model_next <- lm(y ~ 0 + offset(poly(x, order=2, raw = TRUE, coeffs_next ))

This demonstrates what I suggested. (Not to use poly.)
library(MASS)
# coeffs_current <- c(8, 4, 2) Name change for compactness.
cc <- c(8, 4, 2)
form <- as.formula(bquote(y~x+offset(.(cc[1])+x*.(cc[2])+.(cc[3])*I(x^2) )))
model_current <- lm(form, data=dat))
I really have no idea what you intend to do with this next code. Looks like you want something based on the inputs to the prior function, but doesn't look like you want it based on the results.
cov <- diag(rep(1,3))
coeffs_next <- mvrnorm(1, mu = as.numeric(cc ),
Sigma = cov )
The code works (at least as I intended) with a simple test case. The bquote function substitutes values into expressions (well actually calls) and the as.formula function evaluates its argument and then dresses the result up as a proper formula-object.
dat <- data.frame(x=rnorm(20), y=rnorm(20) )
cc <- c(8, 4, 2)
form <- as.formula( bquote(y~x+offset(.(cc[1])+x*.(cc[2])+.(cc[3])*I(x^2) )))
model_current <- lm(form, data=dat)
#--------
> model_current
Call:
lm(formula = form, data = dat)
Coefficients:
(Intercept) x
-9.372 -5.326 # Bizarre results due to the offset.
#--------
form
#y ~ x + offset(8 + x * 4 + 2 * I(x^2))

Related

R how to make lm() to reappear the curve formula

I use formula y=x^3+3 to generate data.frame df with variables x and y,
but when i using lm() to describe the relation of xy, i get y=81450x-5463207.2. This is really different with the original y=x^3+3.
How to make lm() or using other way to reappear the original formula ?
library(tidyverse)
mf <- function(x){
y=x^3+3
}
df=data.frame()
for (i in 1:300){
df[i,1]=i
df[i,2]=mf(i)
}
names(df) <- c('x','y')
model <- lm(y~x,data = df)
model$coefficients
#DarrenTsai ansered first in the comments, if he also writes an answer, consider to accept his' first.
lm(y ~ x, data = df) searches for a solution in the form of y = b0 + b1*x which is not how the data were generated. You can tell lm to include x^n using I() as in
lm(y ~ x + I(x^2), + I(x^3) + I(x^4))
a short form for x + I(x^2), + I(x^3) + ... + I(x^n) is `poly(x, n)' as used in the comment of user2554330
Let me do some changes of your code for better coding style
# library(tidyverse) -- you did not use any of this so there is no need to load it
mf <- function(x){ # -- writing this in one without curly braces is an option
y=x^3+3 # -- this will be retrieved as Intercept 3 plus 1*x^3
}
#for (i in 1:300){ -- there is really no need for a loop here
# df[i,1]=i
# df[i,2]=mf(i)
#}
#names(df) <- c('x','y')
df <- data.frame(x = 1:300, #-- this is shorter and faster then the loop
y = mf(1:300))
model <- lm(y ~ poly(x, 5, raw = TRUE), data = df)
round(coef(model), 4)
#> (Intercept) poly(x, 5, raw = TRUE)1
#> 3 0
#> poly(x, 5, raw = TRUE)2 poly(x, 5, raw = TRUE)3
#> 0 1
#> poly(x, 5, raw = TRUE)4 poly(x, 5, raw = TRUE)5
#> 0 0
Created on 2022-09-24 with reprex v2.0.2
(Intercept) is three and the I(x^3) coded here as poly(df$x, 5, raw = TRUE)3 is one as coded in mf.

Simulating a mixed linear model and evaluating it with lmerTest in R

I am trying to understand how to use mixed linear models to analyse my data by simulating a model, but I can't reproduce the input parameters. What am I missing?
I want to start simulating a model with a random intercept for each subject. Here is the formula of what I want to simulate and reproduce:
If beta1 (<11) is small I find gamma00 as the intercept in fixed section, but I am completedly unaable to retrieve the slope (beta1). Also, the linear effect is not significant. Where is my conceptual mistake?
library(lmerTest)
# Generating data set
# General values and variables
numObj <- 20
numSub <- 100
e <- rnorm(numObj * numSub, mean = 0, sd = 0.1)
x <- scale(runif(numObj * numSub, min = -100, max = 100))
y <- c()
index <- 1
# Coefficients
gamma00 <- 18
gamma01 <- 0.5
beta1 <- -100
w <- runif(numSub, min = -3, max = 3)
uo <- rnorm(numSub, mean = 0, sd = 0.1)
meanBeta0 <- mean(gamma00 + gamma01*w + uo) # I should be able to retrieve that parameter.
for(j in 1:numSub){
for(i in 1:numObj){
y[index] <- gamma00 + gamma01*w[j]+ uo[j] + beta1*x[i] + e[index]
index <- index + 1
}
}
dataFrame2 <- data.frame(y = y, x = x, subNo = factor(rep(1:numSub, each = numObj)), objNum = factor(rep(1:numObj, numSub)))
model2 <- lmer(y ~ x +
(1 | subNo), data = dataFrame2)
summary(model2)
anova(model2)
No conceptual mistake here, just a mixed up index value: you should be using index rather than i to index x in your data generation loop.
Basically due to the mix-up you were using the first subject's x values for generating data for all the subjects, but using the individual x values in the model.

What I'm doing wrong here when trying to convert an nlmrt object to an nls object

I am trying to convert an "nlmrt object to an "nls" object using nls2. However, I can only manage to do it if I write explicitly the names of the parameters in the call. Can't I define the parameter names programmatically? See the reproducible example:
library(nlmrt)
scale_vector <- function(vector, ranges_in, ranges_out){
t <- (vector - ranges_in[1, ])/(ranges_in[2, ]-ranges_in[1, ])
vector <- (1-t) * ranges_out[1, ] + t * ranges_out[2, ]
}
shobbs.res <- function(x) {
# UNSCALED Hobbs weeds problen -- coefficients are rescaled internally using
# scale_vector
ranges_in <- rbind(c(0, 0, 0), c(100, 10, 0.1))
ranges_out <- rbind(c(0, 0, 0), c(1, 1, 1))
x <- scale_vector(x, ranges_in, ranges_out)
tt <- 1:12
res <- 100*x[1]/(1+10*x[2]*exp(-0.1*x[3]*tt)) - y }
y <- c(5.308, 7.24, 9.638, 12.866, 17.069, 23.192, 31.443,
38.558, 50.156, 62.948, 75.995, 91.972)
st <- c(b1=100, b2=10, b3=0.1)
ans1n <- nlfb(st, shobbs.res)
print(coef(ans1n))
This works:
library(nls2)
ans_nls2 <- nls2(y ~ shobbs.res(c(b1, b2, b3)) + y, start = coef(ans1n), alg = "brute")
However, this forces me to hard-code the parameters names in the call to nls2. For reasons related to my actual code, I would like to be able to do something like
ans_nls2 <- nls2(y ~ shobbs.res(names(st)) + y, start = coef(ans1n), alg = "brute")
But this returns an error:
Error in vector - ranges_in[1, ] :
non-numeric argument to binary operator
Is it possible to fix this, without having to hard-code explicitly the names of parameters in the call to nls2?
nls2 will accept a string as a formula:
co <- coef(ans1n)
fo_str <- sprintf("y ~ shobbs.res(c(%s)) + y", toString(names(co)))
nls2(fo_str, start = co, alg = "brute")
giving:
Nonlinear regression model
model: y ~ shobbs.res(c(b1, b2, b3)) + y
data: NULL
b1 b2 b3
196.1863 49.0916 0.3136
residual sum-of-squares: 2.587
Number of iterations to convergence: 3
Achieved convergence tolerance: NA

How to pass a long list of parameters to `nls` function in R

The nls function works normally like the following:
x <- 1:10
y <- 2*x + 3 # perfect fit
yeps <- y + rnorm(length(y), sd = 0.01) # added noise
nls(yeps ~ a + b*x, start = list(a = 0.12345, b = 0.54321))#
Because the model I use have a lot of parameters or I don't know beforehand what will be included in the parameter list, I want something like following
tmp <- function(x,p) { p["a"]+p["b"]*x }
p0 <- c(a = 0.12345, b = 0.54321)
nls(yeps ~ tmp(x,p), start = list(p=p0))
Does anyone know how to modify the nls function so that it can accept a parameter vector argument in the formula instead of many seperate parameters?
You can give a vector of init coefficients like this :
tmp <- function(x, coef){
a <- coef[1]
b <- coef[2]
a +b*x
}
x <- 1:10
yeps <- y + rnorm(length(y), sd = 0.01) # added noise
nls(yeps ~ a + b*x, start = list(a = 0.12345, b = 0.54321))#
nls(yeps ~ tmp(x,coef), start = list(coef = c(0.12345, 0.54321)))
Nonlinear regression model
model: yeps ~ tmp(x, coef)
data: parent.frame()
coef1 coef2
3 2
residual sum-of-squares: 0.0016
Number of iterations to convergence: 2
Achieved convergence tolerance: 3.47e-08
PS:
example(nls)
Should be a good start to understand how to play with nls.

maximum likelihood estimation

I am new user of R and hope you will bear with me if my question is silly. I want to estimate the following model using the maximum likelihood estimator in R.
y= a+b*(lnx-α)
Where a, b, and α are parameters to be estimated and X and Y are my data set. I tried to use the following code that I get from the web:
library(foreign)
maindata <- read.csv("C:/Users/NUNU/Desktop/maindata/output2.csv")
h <- subset(maindata, cropid==10)
library(likelihood)
modelfun <- function (a, b, x) { b *(x-a)}
par <- list(a = 0, b = 0)
var<-list(x = "x")
par_lo <- list(a = 0, b = 0)
par_hi <- list(a = 50, b = 50)
var$y <- "y"
var$mean <- "predicted"
var$sd <- 0.815585
var$log <- TRUE
results <- anneal(model = modelfun, par = par, var = var,
source_data = h, par_lo = par_lo, par_hi = par_hi,
pdf = dnorm, dep_var = "y", max_iter = 20000)
The result I am getting is similar although the data is different, i.e., even when I change the cropid. Similarly, the predicted value generated is for x rather than y.
I do not know what I missed or went wrong. Your help is highly appreciated.
I am not sure if your model formula will lead to a unique solution, but in general you can find MLE with optim function
Here is a simple example for linear regression with optim:
fn <- function(beta, x, y) {
a = beta[1]
b = beta[2]
sum( (y - (a + b * log(x)))^2 )
}
# generate some data for testing
x = 1:100
# a = 10, b = 3.5
y = 10 + 3.5 * log(x)
optim(c(0,0,0),fn,x=x,y=y,method="BFGS")
you can change the function "fn" to reflect your model formula e.g.
sum( (y - (YOUR MODEL FORMULA) )^2 )
EDIT
I am just giving a simple example of using optim in case you have a custom model formula to optimize. I did not mean using it from simple linear regression, since lm will be sufficient.
I was a bit surprised that iTech used optim for what is a problem that is linear in its parameters. With his data for x and y:
> lm(y ~ log(x) )
Call:
lm(formula = y ~ log(x))
Coefficients:
(Intercept) log(x)
10.0 3.5
For linear problems, the least squares solution is the ML solution.

Resources