Why does my linear regression fit line look wrong? - r

I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc.
Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be?
To demonstrate here is my plot on the left with both a lowess regression fit and linear fit.
lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3)
abline(lm(b[cc]~a[cc]),lwd=3)
Here a and b are my values and cc are the points within the densest parts (i.e. most points lay there), red+yellow+blue.
Why doesn't my regression line look more like that on the right (hand-drawn fit)?
If I was plotting a line of best fit it would be there?
I have numerous plots similar to this but still I get the same results....
Are there any alternative linear regression fits that could prove to be better for me?

A linear regression is a method to fit a linear function to a set of points (observations) minimizing the least-squares error.
Now imagine your heatmap indicating a shape where you would assume a vertical line fitting best. Just turn your heatmap 10 degrees counter clock-wise and you have it.
Now how would a linear function supposed to be defined which is vertical? Exactly, it is not possible.
The result of this little thought experiment is that you confuse the purpose of linear regression and what you most likely want is - as indicated already by Gavin Simpson - the 1st principal component vector.

Related

Residual vs Fitted values plot, weird linear line

Thanks for taking the time to read my post.
I am researching the relationship between songs featured in TV series and their music popularity. By using a DiD, I have estimated the average treatment effect. Furthermore, I would like to conduct a simple residual analysis by plotting the fitted values. The plot seems "okay" but there is a weird linear line on the left which I cannot explain (see picture). Does somebody have an idea what this could indicate?
residual vs fitted values plot
There seems to be a pattern that looks linear. Although I am not investigating a linear relationship. What interpretations can I make about this plot?
Kind regards,
Max

Does this curve represent non-linearity in my residuals vs fitted plot? (simple linear regression)

Hi,
I am running a simple linear regression model in R at the moment and wanted to check my assumptions. As seen by the plot, my red line does not appear to be flat and instead curved in places.
I am having a little difficulty interpreting this - does this imply non-linearity? And if so, what does this say about my data?
Thank you.
The observation marked 19 on your graph (bottom right corner) seems to have significant influence and is pulling down your line more than other points are pulling it up. The relationship looks linear all in all, getting rid of that outlier by either nullifying it by increasing sample size (Law of large numbers) or removing the outlier(s) should fix your problem without compromising the story your data is trying to tell you and give you the nice graph you're looking for.

How to produce this figure of thin-plate splines fit to observed data with contours plotted?

How does one reproduce this figure from Elements of Statistical Learning page 166?
I understand that a regression is fit to the data using age and obesity as features. But I am wondering how to represent the fitted surface using contours as is shown in the figure. I would prefer an R implementation because I believe that is what was used in this case.

LOESS smoothing - geom_smooth vs() loess()

I have some data which I would like to fit with a model. For this example we have been using LOESS smoothing (<1.000 observations). We applied LOESS smoothing using the geom_smooth() function from the ggplot package. So far, so good.
The next step was to acquire a first derivative of the smoothed curve, and as far as we know this is not possible to extract from geom_smooth(). Thus, we sought to manually create our model using loess() and use this to extract our first derivative from this.
Strangely however, we observed that the plotted geom_smooth() curve is different from the manually constructed loess() curve. This can be observed in the figure which is shown underneath; in red the geom_smooth() and in orange the loess() function.
If somebody would be interested, a minimal working reproducible example can be found here.
Would somebody be able to pinpoint why the curves are different? Is this because of the optimization settings of both curves? In order to acquire a meaningful derivative we need to ensure that these curves are identical.

Scatterplot:car extracting fits

Is there a way to extract functions that are used when you plot with scatterplot6 from car?
Example:
require(car)
scatterplot(x~y)
What it produces by default is a scatterplot with four lines, one for linear regression, 2 for residuals, and one (full red line) for a function that fits the data.
Now I want to know which function is used to produce the red line. Not this specific one, but generally is there a way to obtain this function?

Resources