Clarification on squaring residuals - math

I understand we take squares of residuals because it penalizes large deviations from actuals more than smaller ones, but how does that help? We try to minimize sum of squared residuals anyways, so why does it matter if one of the predictions is more penalized vs others?

Related

Converting scaled and centered estimates to unstandardized estimates in R

I am using the glmmTMB package in R to run a logistic regression glm model with fixed and random effects (random intercepts and slopes). For some background, I have 5 fixed covariates, one of which includes a quadratic (so really 6 fixed effects) and I am including random slopes for each of those 6 covariates. Prior to running my model, I have scaled and centered each covariate (using the scale function) and checked for correlation between covariates (other than the quadratic, correlation < 0.6). I would like to convert the estimates from the model (which are standardized) to unstandardized estimates because I need to create a predictive map in ArcGIS, which have unstandardized rasters. For obtaining unstandardized estimates for use in ArcGIS, I have tried running my model with the raw data (i.e. skipping the scale and center code) but I believe I am running into convergence issues because even though it runs without warnings, the estimates have large standard errors (10-100x larger than the estimate) and the relationship of the estimate (+ or -) flips between the standardized and unstandardized runs. I have found similar posts such as this, this, and this but I don't think they are exactly my issue, or I am not understanding the math in the solutions. Advice would be very much appreciated.

How to extract R squared from an ARIMA model

Is it possible to calculate an R squared value from an ARIMA model in R?
This is the output given from summary(model)
edit: I am worried about the biases associated with MAPE and other percentage errors. The quantities I'm predicting are relatively small so I feel that finding R2, correlation or some sort of other metric might be a better indicator.
Once you have ARMA errors, it is not a simple linear regression any more.

auto.arima produces non-gaussian residual

I'm using R's auto.arimafunction - but it seems like that it does not produce gaussian errors all the time. I cannot find any documentation that it does some bootstrapping of the prediction error (if the error is not gaussian), or what it does if the error is not gaussian?
Estimation does not require Gaussian errors, even when a Gaussian likelihood is being used. A Gaussian likelihood is almost the same as least squares and will give consistent estimates for any error distribution with finite variance.
The only time that the distribution of residuals really matters is when producing prediction intervals. If the residuals are not Gaussian, the default prediction intervals will not necessarily have the correct coverage. But then you can set bootstrap=TRUE and get bootstrapped prediction intervals which are based on the empirical distribution of the residuals.

How to set a weighted least-squares in r for heteroscedastic data?

I'm running a regression on census data where my dependent variable is life expectancy and I have eight independent variables. The data is aggregated be cities, so I have many thousand observations.
My model is somewhat heteroscedastic though. I want to run a weighted least-squares where each observation is weighted by the city’s population. In this case, it would mean that I want to weight the observations by the inverse of the square root of the population. It’s unclear to me, however, what would be the best syntax. Currently, I have:
Model=lm(…,weights=(1/population))
Is that correct? Or should it be:
Model=lm(…,weights=(1/sqrt(population)))
(I found this question here: Weighted Least Squares - R but it does not clarify how R interprets the weights argument.)
From ?lm: "weights: an optional vector of weights to be used in the fitting process. Should be NULL or a numeric vector. If non-NULL, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used." R doesn't do any further interpretation of the weights argument.
So, if what you want to minimize is the sum of (the squared distance from each point to the fit line * 1/sqrt(population) then you want ...weights=(1/sqrt(population)). If you want to minimize the sum of (the squared distance from each point to the fit line * 1/population) then you want ...weights=1/population.
As to which of those is most appropriate... that's a question for CrossValidated!
To answer your question, Lucas, I think you want weights=(1/population). R parameterizes the weights as inversely proportional to the variances, so specifying the weights this way amounts to assuming that the variance of the error term is proportional to the population of the city, which is a common assumption in this setting.
But check the assumption! If the variance of the error term is indeed proportional to the population size, then if you divide each residual by the square root of its corresponding sample size, the residuals should have constant variance. Remember, dividing a random variable by a constant results in the variance being divided by the square of that constant.
Here's how you can check this: Obtain residuals from the regression by
residuals = lm(..., weights = 1/population)$residuals
Then divide the residuals by the square roots of the population variances:
standardized_residuals = residuals/sqrt(population)
Then compare the sample variance among the residuals corresponding to the bottom half of population sizes:
variance1 = var(standardized_residuals[population < median(population)])
to the sample variance among the residuals corresponding to the upper half of population sizes:
variance2 = var(standardized_residuals[population > median(population)])
If these two numbers, variance1 and variance2 are similar, then you're doing something right. If they are drastically different, then maybe your assumption is violated.

How the length of averaged normal can be seen as a function of deviation of the angle?

Recently I read NVidia's Mipmapping_Normal_Maps
which says we can used the un-renormalized averaged normal to compute the standard deviation of the angle between averaged normal and sample normals.
By the first step, it assumes a Gaussian distribution of the angular deviation and give a figure (sorry but I cannot post an image as a new user, please refer to Figure_2 in that paper).
Then my question is, how the length of averaged normal is represented by a function of Standard Deviation of the angle(original function of Gaussian distribution, red curve in the figure)?
I believe the answer to your question is equation (1) in the paper. It shows how the averaged normal is equal to the reciprocal of 1 + sigma^2. Sigma is the standard deviation. Sometimes sigma^2 is called the variance.
At any rate, if you know the standard deviation, that's your value for sigma in the equations. Square it to get the variance, sigma^2.

Resources