How to interpret the output of MuMIn model.avg on GAMs - r

Say I have a series of GAMs that I would like to average together using MuMIn. How do I go about interpreting the results of the averaged smoothers? Why are there numbers after each smoother term?
library(glmmTMB)
library(mgcv)
library(MuMIn)
data("Salamanders") # glmmTMB data
# mgcv gams
gam1 <- gam(count ~ spp + s(cover) + s(DOP), data = Salamanders, family = tw, method = "ML")
gam2 <- gam(count ~ mined + s(cover) + s(DOP), data = Salamanders, family = tw, method = "ML")
gam3 <- gam(count ~ s(Wtemp), data = Salamanders, family = tw, method = "ML")
gam4 <- gam(count ~ mined + s(DOY), data = Salamanders, family = tw, method = "ML")
# MuMIn model average
summary(model.avg(gam1, gam2, gam3, gam4))
And an excerpt from the results...
Model-averaged coefficients:
(full average)
Estimate Std. Error
(Intercept) -1.32278368618846586812765053764451295137405 0.16027398202204409805027296442858641967177
minedno 2.22006553885311141982583649223670363426208 0.19680444996609294805445244946895400062203
s(cover).1 0.00096638939252485735100645092288118576107 0.05129736767981037115493592182247084565461
s(cover).2 0.00360413985630353601863351542533564497717 0.18864911049300209233692271482141222804785
s(cover).3 0.00034381902619062468381624930735540601745 0.01890820689958183642431777116144075989723
s(cover).4 -0.00248365164684107844403349041328965540743 0.12950622739175629560826052966149291023612
s(cover).5 -0.00089826079366626997504963192398008686723 0.04660149540411069601919535898559843190014
s(cover).6 0.00242197856572917875894734862640689243563 0.12855093144749979439112053114513400942087
s(cover).7 -0.00032596616013735266745646179664674946252 0.02076865732570042782922925539423886220902
s(cover).8 0.00700001172809289889942263584998727310449 0.36609857217759655956257347497739829123020
s(cover).9 -0.17150069832114492318630993850092636421323 0.17672571419517621449379873865836998447776
s(DOP).1 0.00018839994220792031023870016781529557193 0.01119134546418791391342306695833030971698
s(DOP).2 -0.00081869157242861999301819508900734945200 0.04333670935815417402103832955617690458894
s(DOP).3 -0.00021538789478326670289408395486674407948 0.01164171952980479901595955993798270355910
s(DOP).4 0.00043433676942596419591827161532648915454 0.02463278659589070856972270462392771150917

This is a little easier to read if you don't print so many digits (see below):
Each smooth term is parameterized using multiple coefficients (9 by default), which is why we have multiple s.(whatever).xxx coefficients.
It's not clear to me what you want to do with the model-averaged results. It's usually best to make model-averaged predictions rather than trying to interpret model-averaged coefficients, which has some pitfalls ... There is a predict() method for objects of class "averaging" (which is what model.average() returns).
For further questions about interpretation you might want to ask on CrossValidated ...
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
(Intercept) -1.323e+00 1.603e-01 1.606e-01 8.239 <2e-16 ***
minedno 2.220e+00 1.968e-01 1.971e-01 11.263 <2e-16 ***
s(cover).1 9.664e-04 5.130e-02 5.130e-02 0.019 0.985
s(cover).2 3.604e-03 1.886e-01 1.887e-01 0.019 0.985
s(cover).3 3.438e-04 1.891e-02 1.891e-02 0.018 0.985
s(cover).4 -2.484e-03 1.295e-01 1.295e-01 0.019 0.985
s(cover).5 -8.983e-04 4.660e-02 4.660e-02 0.019 0.985
s(cover).6 2.422e-03 1.286e-01 1.286e-01 0.019 0.985
s(cover).7 -3.260e-04 2.077e-02 2.078e-02 0.016 0.987
s(cover).8 7.000e-03 3.661e-01 3.661e-01 0.019 0.985
s(cover).9 -1.715e-01 1.767e-01 1.768e-01 0.970 0.332
s(DOP).1 1.884e-04 1.119e-02 1.120e-02 0.017 0.987
s(DOP).2 -8.187e-04 4.334e-02 4.334e-02 0.019 0.985
s(DOP).3 -2.154e-04 1.164e-02 1.164e-02 0.018 0.985
s(DOP).4 4.343e-04 2.463e-02 2.464e-02 0.018 0.986
s(DOP).5 -1.737e-04 1.019e-02 1.020e-02 0.017 0.986
s(DOP).6 -3.224e-04 1.790e-02 1.790e-02 0.018 0.986
s(DOP).7 2.991e-07 5.739e-04 5.750e-04 0.001 1.000
s(DOP).8 -1.756e-03 9.557e-02 9.559e-02 0.018 0.985
s(DOP).9 1.930e-02 5.630e-02 5.639e-02 0.342 0.732
s(DOY).1 5.189e-08 3.378e-04 3.384e-04 0.000 1.000

Related

R-INLA not computing fitted marginal values

I've run into an issue where R INLA isn't computing the fitted marginal values. I first had it with my own dataset, and have been able to reproduce it following an example from this book. I suspect there must be some configuration I need to change, or maybe INLA isn't working well with something under the hood? Anyways here is the code:
library("rgdal")
boston.tr <- readOGR(system.file("shapes/boston_tracts.shp",
package="spData")[1])
#create adjacency matrices
boston.adj <- poly2nb(boston.tr)
W.boston <- nb2mat(boston.adj, style = "B")
W.boston.rs <- nb2mat(boston.adj, style = "W")
boston.tr$CMEDV2 <- boston.tr$CMEDV
boston.tr$CMEDV2 [boston.tr$CMEDV2 == 50.0] <- NA
#define formula
boston.form <- log(CMEDV2) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) +
AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT)
boston.tr$ID <- 1:length(boston.tr)
#run model
boston.iid <- inla(update(boston.form, . ~. + f(ID, model = "iid")),
data = as.data.frame(boston.tr),
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
control.predictor = list(compute = TRUE)
)
When I look at the output of this model, it species that the fitted values were computed:
summary(boston.iid)
Call:
c("inla(formula = update(boston.form, . ~ . + f(ID, model = \"iid\")), ", " data = as.data.frame(boston.tr),
control.compute = list(dic = TRUE, ", " waic = TRUE, cpo = TRUE), control.predictor = list(compute = TRUE))"
)
Time used:
Pre = 0.981, Running = 0.481, Post = 0.0337, Total = 1.5
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 4.376 0.151 4.080 4.376 4.672 4.376 0
CRIM -0.011 0.001 -0.013 -0.011 -0.009 -0.011 0
ZN 0.000 0.000 -0.001 0.000 0.001 0.000 0
INDUS 0.001 0.002 -0.003 0.001 0.006 0.001 0
CHAS1 0.056 0.034 -0.010 0.056 0.123 0.056 0
I(NOX^2) -0.540 0.107 -0.751 -0.540 -0.329 -0.540 0
I(RM^2) 0.007 0.001 0.005 0.007 0.010 0.007 0
AGE 0.000 0.001 -0.001 0.000 0.001 0.000 0
log(DIS) -0.143 0.032 -0.206 -0.143 -0.080 -0.143 0
log(RAD) 0.082 0.018 0.047 0.082 0.118 0.082 0
TAX 0.000 0.000 -0.001 0.000 0.000 0.000 0
PTRATIO -0.031 0.005 -0.040 -0.031 -0.021 -0.031 0
B 0.000 0.000 0.000 0.000 0.001 0.000 0
log(LSTAT) -0.329 0.027 -0.382 -0.329 -0.277 -0.329 0
Random effects:
Name Model
ID IID model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Precision for the Gaussian observations 169.24 46.04 99.07 160.46 299.72 141.30
Precision for ID 42.84 3.40 35.40 43.02 49.58 43.80
Deviance Information Criterion (DIC) ...............: -996.85
Deviance Information Criterion (DIC, saturated) ....: 1948.94
Effective number of parameters .....................: 202.49
Watanabe-Akaike information criterion (WAIC) ...: -759.57
Effective number of parameters .................: 337.73
Marginal log-Likelihood: 39.74
CPO and PIT are computed
Posterior marginals for the linear predictor and
the fitted values are computed
However, when I try to inspect those fitted marginal values, there is nothing there:
> boston.iid$marginals.fitted.values
NULL
Interestingly enough, I do get a summary of the posteriors, so they must be getting computed somehow?
> boston.iid$summary.fitted.values
mean sd 0.025quant 0.5quant 0.975quant mode
fitted.Predictor.001 2.834677 0.07604927 2.655321 2.844934 2.959994 2.858717
fitted.Predictor.002 3.020424 0.08220780 2.824525 3.034319 3.149766 3.052558
fitted.Predictor.003 3.053759 0.08883760 2.841738 3.071530 3.188051 3.094010
fitted.Predictor.004 3.032981 0.09846662 2.801099 3.056692 3.175215 3.084842
Any ideas on what I'm mis-specifying in the call. I have set compute = T which is what I had seen causing issues on the R-INLA forums.
The developers intentionally disabled computing the marginals to make the model faster.
To enable it, you can add these to the inla arguments:
control.predictor=list(compute=TRUE)
control.compute=list(return.marginals.predictor=TRUE)
So it looks something like this:
boston.form <- log(CMEDV2) ~ CRIM + ZN + INDUS + CHAS + I(NOX^2) + I(RM^2) +
AGE + log(DIS) + log(RAD) + TAX + PTRATIO + B + log(LSTAT)
boston.tr$ID <- 1:length(boston.tr)
#run model
boston.iid <- inla(update(boston.form, . ~. + f(ID, model = "iid")),
data = as.data.frame(boston.tr),
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE, return.marginals.predictor=TRUE),
control.predictor = list(compute = TRUE)
)
boston.iid$summary.fitted.values
boston.iid$marginals.fitted.values

How to use dwplot to plot model.average for glmm?

This is an example of waht I want:
This is terrible because when I checked the summary of my averaged model as below(I use conditional average part), I found that the significant terms have issues,such like water , shouldn't have been significant but really significant in the below plot, I don't know how to resolve it .
And do we use the dwplot to plot them? I think the dwplot is a good method all the time.
Model-averaged coefficients:
(full average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
cond((Int)) 1.920090 0.162094 0.162496 11.816 <2e-16 ***
cond(semi_habitats) 0.186583 0.097901 0.098116 1.902 0.0572 .
cond(SHDI) -0.129630 0.151271 0.151428 0.856 0.3920
cond(tree) -0.162590 0.124409 0.124601 1.305 0.1919
zi((Int)) 1.500601 0.128244 0.128614 11.668 <2e-16 ***
cond(water) -0.044410 0.081564 0.081655 0.544 0.5865
cond(YEAR2019) 0.085849 0.197617 0.197803 0.434 0.6643
cond(other_crop) 0.043594 0.099623 0.099674 0.437 0.6618
cond(maize) 0.052875 0.096160 0.096256 0.549 0.5828
cond(cotton) 0.014107 0.060640 0.060688 0.232 0.8162
cond(wheat) -0.031901 0.095976 0.096014 0.332 0.7397
cond(pesticide_June) -0.008873 0.051123 0.051208 0.173 0.8624
cond(buildings) -0.001023 0.015340 0.015378 0.066 0.9470
(conditional average)
Estimate Std. Error Adjusted SE z value Pr(>|z|)
cond((Int)) 1.92009 0.16209 0.16250 11.816 <2e-16 ***
cond(semi_habitats) 0.19525 0.09131 0.09156 2.133 0.0330 *
cond(SHDI) -0.24078 0.12546 0.12581 1.914 0.0556 .
cond(tree) -0.20019 0.10737 0.10765 1.860 0.0629 .
zi((Int)) 1.50060 0.12824 0.12861 11.668 <2e-16 ***
cond(water) -0.13278 0.09032 0.09056 1.466 0.1426
cond(YEAR2019) 0.37262 0.25029 0.25093 1.485 0.1376
cond(other_crop) 0.17128 0.13086 0.13101 1.307 0.1911
cond(maize) 0.15044 0.10785 0.10809 1.392 0.1640
cond(cotton) 0.12061 0.13636 0.13654 0.883 0.3771
cond(wheat) -0.27238 0.11467 0.11494 2.370 0.0178 *
cond(pesticide_June) -0.12006 0.14837 0.14877 0.807 0.4197
cond(buildings) -0.03181 0.07962 0.07985 0.398 0.6904
What I used the technology is glmmTMB and I have been proceeding the model averaging with dredge & model.avg with MuMInpackages and directly save it to rdsfile.
LCWJuneLS1avg.rds = readRDS("LCWJuneLS1avg.rds")
mA <-(LCWJuneLS1avg.rds) #pulling out model averages #
df1<-as.data.frame(mA$coefmat.subset) #selecting full model coefficient averages
CI <- as.data.frame(confint(LCWJuneLS1avg.rds, full=T)) # get confidence intervals for full model
df1$CI.min <-CI$`2.5 %` #pulling out CIs and putting into same df as coefficient estimates
df1$CI.max <-CI$`97.5 %`# order of coeffients same in both, so no mixups; but should check anyway
setDT(df1, keep.rownames = "coefficient") #put rownames into column
names(df1) <- gsub(" ", "", names(df1)) # remove spaces from column headers
ggplot(data=df1[-1,], aes(x=coefficient, y=Estimate))+ #again, excluding intercept because estimates so much larger
geom_hline(yintercept=0, color = "red",linetype="dashed", lwd=1.5)+ #add dashed line at zero
geom_errorbar(aes(ymin=Estimate-AdjustedSE, ymax=Estimate+AdjustedSE), colour="blue", #adj SE
width=0, lwd=1.5) +
coord_flip()+ # flipping x and y axes
geom_point(size=8)+theme_classic(base_size = 20)+ ylab("Coefficient")
Here is my dataset:
file.name : LCWJuneLS1avg.rds
https://drive.google.com/open?id=1C3vzpA17Ewfu5ZXWp-BNLE0_VEgkeq6b

Recreate spss GEE regression table in R

I have the (sample) dataset below:
round<-c( 0.125150, 0.045800, -0.955299, -0.232007, 0.120880, -0.041525, 0.290473, -0.648752, 0.113264, -0.403685)
square<-c(-0.634753, 0.000492, -0.178591, -0.202462, -0.592054, -0.583173, -0.632375, -0.176673, -0.680557, -0.062127)
ideo<-c(0,1,0,1,0,1,0,0,1,1)
ex<-data.frame(round,square,ideo)
When I ran the GEE regression in SPSS I took this table as a result.
I used packages gee and geepack in R to run the same analysis and I took these results:
#gee
summary(gee(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 1.0541 0.4099 2.572 0.1328 7.937
square 1.1811 0.8321 1.419 0.4095 2.884
round 0.7072 0.5670 1.247 0.1593 4.439
#geepack
summary(geeglm(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) 1.054 0.133 63.00 2.1e-15 ***
square 1.181 0.410 8.32 0.0039 **
round 0.707 0.159 19.70 9.0e-06 ***
---
I would like to recreate exactly the table of SPSS(not the results as I use a subset of the original dataset)but I do not know how to achieve all these results.
A tiny bit of tidyverse magic can get the same results - more or less.
Get the information from coef(summary(geeglm())) and compute the necessary columns:
library("tidyverse")
library("geepack")
coef(summary(geeglm(ideo ~ square + round,data = ex, id = ideo,
corstr = "independence"))) %>%
mutate(lowerWald = Estimate-1.96*Std.err, # Lower Wald CI
upperWald=Estimate+1.96*Std.err, # Upper Wald CI
df=1,
ExpBeta = exp(Estimate)) %>% # Transformed estimate
mutate(lWald=exp(lowerWald), # Upper transformed
uWald=exp(upperWald)) # Lower transformed
This produces the following (with the data you provided). The order and the names of the columns could be modified to suit your needs
Estimate Std.err Wald Pr(>|W|) lowerWald upperWald df ExpBeta lWald uWald
1 1.0541 0.1328 62.997 2.109e-15 0.7938 1.314 1 2.869 2.212 3.723
2 1.1811 0.4095 8.318 3.925e-03 0.3784 1.984 1 3.258 1.460 7.270
3 0.7072 0.1593 19.704 9.042e-06 0.3949 1.019 1 2.028 1.484 2.772

R - Plm and lm - Fixed effects

I have a balanced panel data set, df, that essentially consists in three variables, A, B and Y, that vary over time for a bunch of uniquely identified regions. I would like to run a regression that includes both regional (region in the equation below) and time (year) fixed effects. If I'm not mistaken, I can achieve this in different ways:
lm(Y ~ A + B + factor(region) + factor(year), data = df)
or
library(plm)
plm(Y ~ A + B,
data = df, index = c('region', 'year'), model = 'within',
effect = 'twoways')
In the second equation I specify indices (region and year), the model type ('within', FE), and the nature of FE ('twoways', meaning that I'm including both region and time FE).
Despite I seem to be doing things correctly, I get extremely different results. The problem disappears when I do not consider time fixed effects - and use the argument effect = 'individual'.
What's the deal here? Am I missing something? Are there any other R packages that allow to run the same analysis?
Perhaps posting an example of your data would help answer the question. I am getting the same coefficients for some made up data. You can also use felm from the package lfe to do the same thing:
N <- 10000
df <- data.frame(a = rnorm(N), b = rnorm(N),
region = rep(1:100, each = 100), year = rep(1:100, 100))
df$y <- 2 * df$a - 1.5 * df$b + rnorm(N)
model.a <- lm(y ~ a + b + factor(year) + factor(region), data = df)
summary(model.a)
# (Intercept) -0.0522691 0.1422052 -0.368 0.7132
# a 1.9982165 0.0101501 196.866 <2e-16 ***
# b -1.4787359 0.0101666 -145.450 <2e-16 ***
library(plm)
pdf <- pdata.frame(df, index = c("region", "year"))
model.b <- plm(y ~ a + b, data = pdf, model = "within", effect = "twoways")
summary(model.b)
# Coefficients :
# Estimate Std. Error t-value Pr(>|t|)
# a 1.998217 0.010150 196.87 < 2.2e-16 ***
# b -1.478736 0.010167 -145.45 < 2.2e-16 ***
library(lfe)
model.c <- felm(y ~ a + b | factor(region) + factor(year), data = df)
summary(model.c)
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# a 1.99822 0.01015 196.9 <2e-16 ***
# b -1.47874 0.01017 -145.4 <2e-16 ***
This does not seem to be a data issue.
I'm doing computer exercises in R from Wooldridge (2012) Introductory Econometrics. Specifically Chapter 14 CE.1 (data is the rental file at: https://www.cengage.com/cgi-wadsworth/course_products_wp.pl?fid=M20b&product_isbn_issn=9781111531041)
I computed the model in differences (in Python)
model_diff = smf.ols(formula='diff_lrent ~ diff_lpop + diff_lavginc + diff_pctstu', data=rental).fit()
OLS Regression Results
==============================================================================
Dep. Variable: diff_lrent R-squared: 0.322
Model: OLS Adj. R-squared: 0.288
Method: Least Squares F-statistic: 9.510
Date: Sun, 05 Nov 2017 Prob (F-statistic): 3.14e-05
Time: 00:46:55 Log-Likelihood: 65.272
No. Observations: 64 AIC: -122.5
Df Residuals: 60 BIC: -113.9
Df Model: 3
Covariance Type: nonrobust
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
Intercept 0.3855 0.037 10.469 0.000 0.312 0.459
diff_lpop 0.0722 0.088 0.818 0.417 -0.104 0.249
diff_lavginc 0.3100 0.066 4.663 0.000 0.177 0.443
diff_pctstu 0.0112 0.004 2.711 0.009 0.003 0.019
==============================================================================
Omnibus: 2.653 Durbin-Watson: 1.655
Prob(Omnibus): 0.265 Jarque-Bera (JB): 2.335
Skew: 0.467 Prob(JB): 0.311
Kurtosis: 2.934 Cond. No. 23.0
==============================================================================
Now, the PLM package in R gives the same results for the first-difference models:
library(plm) modelfd <- plm(lrent~lpop + lavginc + pctstu,
data=data,model = "fd")
No problem so far. However, the fixed effect reports different estimates.
modelfx <- plm(lrent~lpop + lavginc + pctstu, data=data, model =
"within", effect="time") summary(modelfx)
The FE results should not be any different. In fact, the Computer Exercise question is:
(iv) Estimate the model by fixed effects to verify that you get identical estimates and standard errors to those in part (iii).
My best guest is that I am miss understanding something on the R package.

Logistic Regression: Strange Variable arises

I am using R to perform logistic regression on my data set. My data set has more than 50 variables.
I am running the following code:
glm(X...ResponseFlag ~ NetWorth + LOR + IntGrandChld + OccupInput, family = binomial, data = data)
When I see summary() I got the following output:
> summary(ResponseModel)
Call:
glm(formula = X...ResponseFlag ~ NetWorth + LOR + IntGrandChld +
OccupInput, family = binomial, data = data)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2785 -0.9576 -0.8925 1.3736 1.9721
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.971166 0.164439 -5.906 3.51e-09 ***
NetWorth 0.082168 0.019849 4.140 3.48e-05 ***
LOR -0.019716 0.006494 -3.036 0.0024 **
IntGrandChld -0.021544 0.085274 -0.253 0.8005
OccupInput2 0.005796 0.138390 0.042 0.9666
OccupInput3 0.471020 0.289642 1.626 0.1039
OccupInput4 -0.031880 0.120636 -0.264 0.7916
OccupInput5 -0.148898 0.129922 -1.146 0.2518
OccupInput6 -0.481183 0.416277 -1.156 0.2477
OccupInput7 -0.057485 0.218309 -0.263 0.7923
OccupInput8 0.505676 0.123955 4.080 4.51e-05 ***
OccupInput9 -0.382375 0.821362 -0.466 0.6415
OccupInputA -12.903334 178.064831 -0.072 0.9422
OccupInputB 0.581272 1.003193 0.579 0.5623
OccupInputC -0.034188 0.294507 -0.116 0.9076
OccupInputD 0.224634 0.385959 0.582 0.5606
OccupInputE -1.292358 1.072864 -1.205 0.2284
OccupInputF 14.132144 308.212341 0.046 0.9634
OccupInputH 0.622677 1.006982 0.618 0.5363
OccupInputU 0.087526 0.095740 0.914 0.3606
OccupInputV -1.010939 0.637746 -1.585 0.1129
OccupInputW 0.262031 0.256238 1.023 0.3065
OccupInputX 0.332209 0.428806 0.775 0.4385
OccupInputY 0.059771 0.157135 0.380 0.7037
OccupInputZ 0.638520 0.711979 0.897 0.3698
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5885.1 on 4467 degrees of freedom
Residual deviance: 5809.6 on 4443 degrees of freedom
AIC: 5859.6
Number of Fisher Scoring iterations: 12
From the output, it is seen that some new variable like OccupInput2... has arisen. Actually OccupInput had values 1,2,3,...A,B,C,D.. But it did not happen for NetWorth,LOR.
I am new to R and do not have any explanation, why there are new variables.
Can anybody give me an explanation? Thank you in advance.
I would assume that OccupInput in your model is a factor variable. R introduces so-called dummy variables, when you include factorial regressors in a linear model.
What you see as OccupInput2 and so forth in the table are the coefficients associated with the individual factor levels (the reference level OccupInput1 is covered by the intercept term).
You can verify the type of OccupInput from the output of the sapply(data, class) call, which yields the data types of the columns in your input data frame.

Resources