Is there a way to gain the equations and coefficients when using the R's car package of both the smoothing and regression (line) model?
scatterplot(prestige ~ income, data=Prestige)
scatterplot(prestige ~ income, data=Prestige, smoother=gamLine)
(the gamLine argument requiring the mgcv package)
Thanks
In the first case, you can see the info by
summary(lm(prestige ~ income, data = Prestige))
In the 2nd case, something similar will likely apply but I don't know that package or smoothing method. It's surely not as simple as a linear fit. Take a look at ?mgcv-FAQ.
Related
I used nls package to analyze non linear model (power curve, y= ax^b).
cal<- nls(formula= agw~a*area^b, data=calibration_6, start=list(a=1, b=1))
summary(cal)
What I want now is to force intercept (a) to zero to check something. In Excel, I can't set intercept for power curve. In R, is that possible to set intercept?
If possible, could you tell me how to do it?
y~ x^b model is what I considered first,
nls(formula= agw ~ area^b, data=calibration_6, start=list(b=1))
but, I also found other way, please check the below link.
How to calculate R-squared in nls package (non-linear model) in R?
I'm currently trying to use a GAM to calculate a rough estimation of expected goals model based purely on the commentary data from ESPN. However, all the data is either a categorical variable or a logical vector, so I'm not sure if there's a way to smooth, or if I should just use the factor names.
Here are my variables:
shot_where (factor): shot location (e.g. right side of the box)
assist_class (factor): type of assist (cross, through ball, pass)
follow_corner (logical): whether the shot follows a corner
shot_with (factor): right foot, left food, header
follow_set_piece (logical): whether the shot follows a set piece
I think I should just use the formula as just the variable names.
model <- bam(is_goal ~ shot_where + assist_class + follow_set_piece + shot_where + follow_corner + shot_where:shot_with, family = "binomial", method = "REML")
The shot_where and shot_with would incorporate any interactions between these two varaibles.
However, I was told I could smooth factor variables as well using the below structure.
model <- bam(is_goal ~ s(shot_where, bs = 'fs') + s(assist_class, bs = 'fs') + as.logical(follow_set_piece) +
as.logical(follow_corner) + s(shot_with, bs = 'fs'), data = model_data, family = "binomial", method = "REML")
This worked for creating a model, but I want to make sure this is a correct method of building the model. I've yet to see any information on using only factor/logical variables in a GAM model, so I thought it was worth asking.
If you only have categorical covariates then you aren't fitting a GAM, whether you fit the model with gam(), bam(), or something else.
What you are doing when you pass factor variables to s() using the fs basis like this
s(f, bs = 'fs')`
is creating a random intercept for each level of the factor f.
There's no smoothing going on here at all; the model is simply exploiting the equivalence of the Bayesian view of smoothing with random effects.
Given that none of your covariates could reasonably be considered random in the sense of a mixed effects model then the only justification for doing what you're doing might be as a computational trick.
Your first model is just a simple GLM (note the typo in the formula as shot_where is repeated twice in the formula.)
It's not clear to me why you are using bam() to fit this model; you're loosing computational efficiency that bam() provides by using method = 'REML'; it should be 'fREML' for bam() models. But as there is no smoothness selection going on in the first model you'd likely be better off using glm() to fit that model. If the issue is large sample sizes, there are several packages that can fit GLMs to large data, for example biglm and it's bigglm() function.
In the second model there is no smoothing going on but there is penalisation which is shrinking the estimates for the random intercepts toward zero. You're likely to get better performance on big data using the lme4 package or TMB and the glmmTMB package to fit what is a GLMM.
This is more of a theoretical question than about R, but let me provide a brief answer. Essentially, the most flexible model you could estimate would be one where you used the variables as factors. It also produces a model that is reasonably easily interpreted - where each coefficient gives you the expected difference in y between the reference level and the level represented by the dummy regressor.
Smoothing splines try to strike the appropriate bias-variance tradeoff. If you've got lots of data and relatively few categories in the categorical variables, there will be no real loss in efficiency for including all of the dummy regressors representing the categories and the bias will also be as small as possible. To the extent that the smoothing spline model is different from the one treating everything as factors, it is likely inducing bias without a corresponding increase in efficiency. If it were me, I would stick with a model that treats all of the categorical variables as factors.
I am trying to fit a delayed entry parametric regression model for a Poisson process with Weibull baseline rate. It doesn't appear that R's survreg function supports left truncated data (I get the error: start-stop type Surv objects are not supported). Is there an alternate approach/R package that I could use to do this?
You may want to try something like:
flexsurv::flexsurvreg(formula = Surv(starttime, stoptime, status) ~ x1 + x2,
data=data, dist = "weibull")
Check the options the package offers which may fit your need.
The other way would be to define a truncated distribution and use interval values. See the survival packages example of using a truncated Gaussian distribution aka ”tobit”. https://www.rdocumentation.org/packages/survival/versions/2.44-1.1/topics/tobin
I'm using the Avplots function in R. The function places a fitted line on the graph, and I'm wondering if there is a way to view the equation of that line. I know I could computationally reproduce it us the lm function, but I'm curious if there is a way to view the "back-end" code being used to produce the graph.
Thanks!
Below is some code. The function takes a linear model followed by the variables you want to create avPlots for (all against the regressor).
avPlots(mlm1,terms=~pctUrban+log(ppgdp))
I am not very familiar with Added-Variable Plots, but had an idea, though I'm not entirely sure what you are looking for. I hope this might be helpful.
Say you have an example using a linear model lm such as this (also from the Car package):
res <- avPlots(lm(prestige~income+education+type, data=Duncan))
This includes data on the prestige and other characteristics of 45 U. S. occupations in 1950.
The returned data res will have the data points for each of the four plots generated (see below). The avPlot function uses lsfit (least squares fit) for the fitted line. This can also be done from the returned data for each factor (e.g., for typeprof):
fit <- lsfit(res$typeprof[,1], res$typeprof[,2])
You could then get your slope from the coefficients (16.7):
fit$coefficients
Intercept X
4.178364e-16 1.665751e+01
As mentioned, this would give the same slopes from the lm model:
Call:
lm(formula = prestige ~ income + education + type, data = Duncan)
Coefficients:
(Intercept) income education typeprof typewc
-0.1850 0.5975 0.3453 16.6575 -14.6611
I am keen to implement a conditional (bivariate?) poisson regression in R to assess the change in rates of a variable (stratified by treatment condition) pre- / post- an intervention. Is anyone familiar with a package that runs this type of analysis?
Check out this "gnm" package in R. It has a function of gnm() where you can specify you model formula, family=poisson(), offset, dataset and strata id in "eliminate". Please read it.