How to conduct unconditional quantile regression in R? - r

The only package I know that does unconditional quantile regression in R is uqr. Unfortunately, it's been removed from CRAN. Even though I can still use it, its functionality is limited (e.g., does not conduct significance tests or allow to compare effects across quantiles). I'm wondering if anyone knows how to conduct UQR in R, with either functions they wrote or some other means.

there are many limitations in terms of test and asymptotic theory regarding unconditional quantile regressions, especially if you are thinking on the version proposed in Firpo, Fortin, and Limeaux (2009) "Unconditional quantile regressions".
The application, however, is straightforward. you need only 2 elements:
the unconditional quantile (estimated with any of your favorite packages).
the density of the outcome at the quantile you got in (1)
After that, you apply the RIF function:
$$RIF(q) = q(t)+\frac{t-1(y<=q(t)}{f(q(t))}$$
Once you have this, you just use that instead of your dep variable, when you write your "lm()" function. And that is it.
HTH

Related

Obtaining glmer coefficient confidence intervals via bootstrapping

I am in my first experience using mixed models in R for my statistical analysis. Due to my data being comprised of binary outcome variables, I have managed to build a logistic model using the glmer function of the lme4 package that I think works as I wanted it to.
I am now aiming to investigate the statistical significance of my model coefficients. I have read that generally, the best approach for generalized mixed models is to bootstrap confidence intervals, but I haven't managed to find a good, clear, explanation of how to do this in R.
Would anyone have any suggestions? Are there any packages in R that expedite this process, or do people generally build their own functions for this? I haven't really done any bootstrapping before so I'd appreciate some more in-depth answers.
If you want to compute parametric bootstrap confidence intervals, the built-in functionality
confint(fitted_model, method = "boot")
should work (see ?confint.merMod)
Also see this answer (which illustrates both parametric and nonparametric bootstrapping for user-defined quantities).
If you have multiple cores, you can speed this up by adding parallel = "multicore", ncpus = parallel::detectCores()-1 (or some other appropriate number of cores to use): see ?lme4::bootMer for details.

Initial parameters in nonlinear regression in R

I want to learn how to do nonlinear regression in R. I managed to learn the basics of the nls function, but how we know it's crucial in nonlinear regression to use good initial parameters. I tried to figure out how selfStart and getInitial functions works but failed. The documentation is very scarce and not very usefull. I wanted to learn these functions via a simple simulation data. I simulated data from logistic model:
n<-100 #Number of observations
d<-10000 #our parameters
b<--2
e<-50
set.seed(n)
X<-rnorm(n, -e/b, 2) #Thanks to it we'll have many observations near the point where logistic function grows the faster
Y<-d/(1+exp(b*X+e))+rnorm(n, 0, 200) #I simulate data
Now I wanted to do regression with a function f(x)=d/(1+exp(b*x+e)) but I don't know how to use selfStart or getInitial. Could you help me? But please, don't tell me about SSlogis. I'm aware it's a functon destined to find initial parameters in logistic regression, but It seems it only works in regression with one explanatory variable and I'd like to learn how to do logistic regression with more than one explanatory variables and even how to do general nonlinear regression with a function that I defined mysefl.
I will be very gratefull for your help.
I don't know why the calculus of good initial parameters fails in R. The aim of my answer is to provide a method to find good enough initial parameters.
Note that a non-iterative method exists which doesn't requires initial parameters. The principle is explained in this paper, pp.37-46 : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
A simplified version is shown below.
If the results are not sufficient, they can be used as initial parameters in an usual non-linear regression software such as in R.
A numerical example is shown below. Usually the number of points is much higher. Here it is deliberately low in order to make easier the checking when one edit the code and check it.

Quantile Regression with Time-Series Models (ARIMA-ARCH) in R

I am working on quantile forecasting with time-series data. The model I am using is ARIMA(1,1,2)-ARCH(2) and I am trying to get quantile regression estimates of my data.
So far, I have found "quantreg" package to perform quantile regression, but I have no idea how to put ARIMA-ARCH models as the model formula in function rq.
rq function seems to work for regressions with dependent and independent variables but not for time-series.
Is there some other package that I can put time-series models and do quantile regression in R? Any advice is welcome. Thanks.
I just put an answer on the Data Science forum.
It basically says that most of the ready made packages are using so called exact test based on assumption on the distribution (independent identical normal-Gauss distribution, or wider).
You also have a family of resampling methods in which you simulate a sample with a similar distribution of your observed sample, perform your ARIMA(1,1,2)-ARCH(2) and repeat the process a great number of times. Then you analyze this great number of forecast and measure (as opposed to compute) your confidence intervals.
The resampling methods differs in the way to generate the simulated samples. The most used are:
The Jackknife: in which you "forget" one point, that is you simulate a n samples of size n-1 (if n is the size of the observed sample).
The Bootstrap: in which you simulate a sample by taking n values of the original sample with replacements: some will be taken once, some twice or more, some never,...
It is a (not easy) theorem that the expectation of the confidence intervals, as most of the usual statistical estimators, are the same on the simulated sample than on the original sample. With the difference that you can measure them with a great number of simulations.
Hello and welcome to StackOverflow. Please take some time to read the help page, especially the sections named "What topics can I ask about here?" and "What types of questions should I avoid asking?". And more importantly, please read the Stack Overflow question checklist. You might also want to learn about Minimal, Complete, and Verifiable Examples.
I can try to address your question, although this is hard since you don't provide any code/data. Also, I guess by "put ARIMA-ARCH models" you actually mean that you want to make an integrated series stationary using an ARIMA(1,1,2) plus an ARCH(2) filters.
For an overview of the R time-series capabilities you can refer to the CRAN task list.
You can easily apply these filters in R with an appropriate function.
For instance, you could use the Arima() function from the forecast package, then compute the residuals with residuals() from the stats package. Next, you can use this filtered series as input for the garch() function from the tseries package. Other possibilities are of course possible. Finally, you can apply quantile regression on this filtered series. For instance, you can check out the dynrq() function from the quantreg package, which allows time-series objects in the data argument.

How to deal with heteroscedasticity in OLS with R

I am fitting a standard multiple regression with OLS method. I have 5 predictors (2 continuous and 3 categorical) plus 2 two-way interaction terms. I did regression diagnostics using residuals vs. fitted plot. Heteroscedasticity is quite evident, which is also confirmed by bptest().
I don't know what to do next. First, my dependent variable is reasonably symmetric (I don't think I need to try transformations of my DV). My continuous predictors are also not highly skewed. I want to use weights in lm(); however, how do I know what weights to use?
Is there a way to automatically generate weights for performing weighted least squares? or Are you other ways to go about it?
One obvious way to deal with heteroscedasticity is the estimation of heteroscedasticity consistent standard errors. Most often they are referred to as robust or white standard errors.
You can obtain robust standard errors in R in several ways. The following page describes one possible and simple way to obtain robust standard errors in R:
https://economictheoryblog.com/2016/08/08/robust-standard-errors-in-r
However, sometimes there are more subtle and often more precise ways to deal with heteroscedasticity. For instance, you might encounter grouped data and find yourself in a situation where standard errors are heterogeneous in your dataset, but homogenous within groups (clusters). In this case you might want to apply clustered standard errors. See the following link to calculate clustered standard errors in R:
https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r
What is your sample size? I would suggest that you make your standard errors robust to heteroskedasticity, but that you do not worry about heteroskedasticity otherwise. The reason is that with or without heteroskedasticity, your parameter estimates are unbiased (i.e. they are fine as they are). The only thing that is affected (in linear models!) is the variance-covariance matrix, i.e. the standard errors of your parameter estimates will be affected. Unless you only care about prediction, adjusting the standard errors to be robust to heteroskedasticity should be enough.
See e.g. here how to do this in R.
Btw, for your solution with weights (which is not what I would recommend), you may want to look into ?gls from the nlme package.

fixed effect, instrumental variable regression like xtivreg in stata (FE IV regression)

Does anyone know about a R package that supports fixed effect, instrumental variable regression like xtivreg in stata (FE IV regression). Yes, I can just include dummy variables but that just gets impossible when the number of groups increases.
Thanks!
I can just include dummy variables but that just gets impossible when the number of groups increases
By "impossible," do you mean "computationally impossible"? If so, check out the plm package, which was designed to handle cases that would otherwise be computationally infeasible, and which permits fixed-effects IV.
Start with the plm vignette. It will quickly make clear whether plm is what you're looking for.
Update 2018 December 03: the estimatr package will also do what you want. It's faster and easier to use than the plm package.
As you may know, for many fixed effects and random effects models {I should mention FE and RE from econometrics and education standpoint since the definitions in statistics are different}, you can create an equivalent SEM (Structural Equation Modeling) model. There are two packages in R that can be used for that purpose: 1)SEM 2) LAVAAN
Another solution is to use SAS. In SAS, you can use Proc GLM which enables you to use "absorb" statement which automatically takes care of the dummies as well as finding (x - xbar) per each observation.
Hope it helps.
Try the ivreg command from the AER package.

Resources