standard errors for loess in R - r

I am attempting to find a reference which explains how one computes standard errors for local polynomial regression? Specifically, in R one can use the loess function to get a model object and then use the predict function to retrieve standard errors. Is there a reference somewhere to what is actually happening? What about in the case when there may be serial correlation in the residuals, one must adjust this using Newey-West type methods, is there a way to use the sandwich package to do this as you would for a regular OLS using lm?
I tried looking at the source but the standard error computation calls a C function.

The "Source" section of ?loess tells you that the underlying C-code comes from the cloess package of Cleveland et al., and points you to its web home:
Source:
The 1998 version of ‘cloess’ package of Cleveland, Grosse and
Shyu. A later version is available as ‘dloess’ at http://www.netlib.org/a>.
Going there, you will find a link to a 50 page document (warning: postscript doc) that should tell you everything you need to know about this implementation of loess. In Cleveland's words:
This guide describes crucial steps in the proper analysis of data using
loess. Please read it.
Of particular interest will be the first couple pages of "Section 4: Statistical and Computational Methods".

Related

What is the output format of a Maxent model run in Biomod2 (raw, cumulative, logistic or cloglog) and can it be specified by the user?

I'm building a species distribution model/habitat suitability model using the package biomod2.
Maxent allows the user to choose one of four output formats (see title) when the Java application is used directly. However, when Maxent is called by functions in the package biomod2 (e.g. BIOMOD_Modeling) there doesn't seem to be an option to specify the output format. Nor is there any indication which one has been chosen. I think cloglog is the default so it is likely that one but I would like to be sure.
This tutorial has some images showing the differences between some of the outputs: https://biodiversityinformatics.amnh.org/open_source/maxent/Maxent_tutorial2017.pdf
Thanks for your help!
I emailed the developers and Damien Georges told me that the output format for MAXENT in biomod2 is logistic and that (so far) there is no way to change it

Does R support switching between optimizers like STATA does?

I need to implement the model show here:
http://www.ssc.upenn.edu/~fdiebold/papers/paper55/DRAfinal.pdf
The model estimation step on p.315 notes that:
"We maximize the likelihood by iterating the Marquart and
Berndt–Hall–Hall–Hausman algorithms, using numerical derivatives, optimal
stepsize, and a convergence criterion of 10^-6 for the change in the norm of the
parameter vector from one iteration to the next."
Now I know that stata supports switching between optimizers,
http://www.stata.com/manuals13/rmaximize.pdf
see bottom of p2.
Is there an R package or Matlab function/s that can do the same thing?
Specifically I need to be able to switch between BHHH and Levenberg-Marquardt.
Kind Regards
Baz
For R, check out the CRAN Task View on Optimization. Searching that page, it looks like BHHH and Marquardt are available in separate packages (minpack.lm and maxLik, respectively). You could write your own code to handle switching between them.

Frailty estimates in coxph object

If one uses obj=coxph(... + frailty(id) ), then the object also returns (log)frailty estimates for each individual, which can be extracted with obj$frail.
Does anybody knows how these estimates are being obtained? Are they Empirical Bayes estimates?
Thanks!
Theodor
The default distribution for frailty can be seen in the ?frailty page to be "gamma". If you look at the frailty function (which is not hidden) you see that it simply pastes the name of the distribution onto "frailty." and uses get() to retrieve the proper function. So look at frailty.gamma (also not hidden) to find the answers to your question. Looking back at the help page again, you can see that I should have been able to figure all that out without looking at the code, since it's right up at the top of the page. But there are many routes to knowledge with R. (They are ML, not "empirical Bayes", estimates.)
The help page suggests to me that the author (Therneau) expects you to consult Therneau and Grambsch for further details not obvious from reading the code. If you are doing serious work with survival models in R that is a very useful book to have. It's very clear and helpful in understanding the underpinnings of the 'survival'-package.

How does r calculate the p-values in logistic regression

What type of p-values do R calculate in a binomial logistic regression, and where is this documented?
When i read the documentation for ?glm() I find no reference to the calculation of the p-values.
The p-values are calculated by the function summary.glm. See ?summary.glm for a (very brief) bit about how those are calculated.
For more information, look at the source code by typing
summary.glm
at the R command prompt. There you will find the lines of code where an object pvalue is created. Follow the code back to see how the components of the p-value calculation are (conditionally) calculated.
The authors of R wrote the help system with several principles in mind: compactness (don't write more than is needed, it's not a textbook), accuracy, and a curious and well-educated audience. It really was written for other statisticians. The "curious" part of that opening sentence was included to raise the question why you did not also follow the various links in the ?glm page: to summary.glm where you would have found one answer to your ambiguous question or to anova.glm where you would have found another possible answer. The help-authors do expect that you will follow those links and read the whole page and execute the examples. You will notice that even after you get to summary.glm that there is no mention of "binary logistic regression" since they pretty much assume that you are well-grounded in statistics and have copy of McCullagh and Nelder handy, or if not that you will go read the references.
The other principle: sometimes it is the code itself (given the open-source nature of R) that performs the documentation. Technically glm doesn't print anything and print.glm doesn't print p-values. It would be print.summary.glm or print.anova.glm that would be doing any printing. Part of learning R is learning that the results printed to the console will have gone through a eval-print loop and that output can be tailored with object-class-specific functions.
These assumptions are just part of what many people see as a "steep learning curve for R" (although I would have called it a shallow curve if plotted with time/effort on x-axis.)

Standard error of the ARIMA constant

I am trying to manually calculate the standard error of the constant in an ARIMA model, if it is included. I have referred to Box and Jenkins (1994) text, specially Section 7.2, but my understanding is that the methods mentioned here calculates the variance-covariance matrix for the ARIMA parameters only, not the constant. Tried searching on the Internet, but couldn't find any theory. Software like Minitab, R etc. calculate this, so I was wondering what is the way? Can someone provide any pointer(s) on this topic?
Thanks.
arima() will fit a regression model with ARMA errors. The constant is treated as the coefficient of a regression variable consisting only of 1s. So you need the covariance matrix of the regression coefficients which is usually calculated separately from the covariance matrix of the ARMA coefficients. Look at Section 8.3 of Hamilton's "Time series analysis"
One of the nicest things about R is that you can access a lot of the source code to R itself from within the environment. If you simply type arima at the command prompt, you get the high-level source code for the arima() function. I got several pages of code here, when I tried it.
You do miss out on anything implemented internally within the R executable in native code, but often the high-level code tells you everything you want to know.
Perhaps a shift of perspective can solve this problem.
Rather than seeing the constant as something special, just consider the problem without constant and with a variable that is a vector of ones.

Resources