Scatterplot:car extracting fits - r

Is there a way to extract functions that are used when you plot with scatterplot6 from car?
Example:
require(car)
scatterplot(x~y)
What it produces by default is a scatterplot with four lines, one for linear regression, 2 for residuals, and one (full red line) for a function that fits the data.
Now I want to know which function is used to produce the red line. Not this specific one, but generally is there a way to obtain this function?

Related

LOESS smoothing - geom_smooth vs() loess()

I have some data which I would like to fit with a model. For this example we have been using LOESS smoothing (<1.000 observations). We applied LOESS smoothing using the geom_smooth() function from the ggplot package. So far, so good.
The next step was to acquire a first derivative of the smoothed curve, and as far as we know this is not possible to extract from geom_smooth(). Thus, we sought to manually create our model using loess() and use this to extract our first derivative from this.
Strangely however, we observed that the plotted geom_smooth() curve is different from the manually constructed loess() curve. This can be observed in the figure which is shown underneath; in red the geom_smooth() and in orange the loess() function.
If somebody would be interested, a minimal working reproducible example can be found here.
Would somebody be able to pinpoint why the curves are different? Is this because of the optimization settings of both curves? In order to acquire a meaningful derivative we need to ensure that these curves are identical.

how to fit baseline/background in R

I am trying to fit the background shape in nmr spectra. For this I have been using the loess function so far.
First I try to identify all the peaks (which works more or less) and remove them from the spectrum. Then I try to fit the rest of the spectrum with the loess function.
My problem now is that if the removal of peaks doesn't work perfectly there are still some points left which are clearly not background.
Is there a way to tell the fit not to go over the data, i.e. having the fitted line always below the data points (which is clearly what you want from a baseline)? My hope is that, if I am able to constrain the fit to be below the data points I can find suitable parameters, so that the remaining points from the peaks are ignored.
Thanks
John

r- Adding multiple unrelated nonlinear fuctions (not fitted) to a scatterplot

I have made a scatter plot of raw data. The equation for the quantile lines takes the form of y=10^a*x^b. (The equation for the quantile was log transformed and meaningless to the audience when viewed). How do I add this form of a function to the scatter plot of the raw data.
I think this may be a matter of my not knowing the terminology to search for the right method.
a=1
b=2
curve(10^a*x^b,add =T,from=0,to=10)

Why does my linear regression fit line look wrong?

I have plotted a 2-D histogram in a way that I can add to the plot with lines, points etc.
Now I seek to apply a linear regression fit at the region of dense points, however my linear regression line seems totally off where it should be?
To demonstrate here is my plot on the left with both a lowess regression fit and linear fit.
lines(lowess(na.omit(a),na.omit(b),iter=10),col='gray',lwd=3)
abline(lm(b[cc]~a[cc]),lwd=3)
Here a and b are my values and cc are the points within the densest parts (i.e. most points lay there), red+yellow+blue.
Why doesn't my regression line look more like that on the right (hand-drawn fit)?
If I was plotting a line of best fit it would be there?
I have numerous plots similar to this but still I get the same results....
Are there any alternative linear regression fits that could prove to be better for me?
A linear regression is a method to fit a linear function to a set of points (observations) minimizing the least-squares error.
Now imagine your heatmap indicating a shape where you would assume a vertical line fitting best. Just turn your heatmap 10 degrees counter clock-wise and you have it.
Now how would a linear function supposed to be defined which is vertical? Exactly, it is not possible.
The result of this little thought experiment is that you confuse the purpose of linear regression and what you most likely want is - as indicated already by Gavin Simpson - the 1st principal component vector.

Splitting lme residual plot into separate boxplots

Using the basic plot function (plot.intervals.lmList) from an lme model (called meef1), I produced a massive graph of boxplots. My vector v2andv3commoditycombined has 98 levels.
plot(meef1, v2andv3commoditycombined~resid(.))
I would like to separate by the grouping values of my variable v2andv3commoditycombined to either graph them separately, order them, or exclude some. I'm not sure if there is code to do this or if I have to extract information from the lme output. If that is the case, I'm not sure what to extract to create the boxplots as extracting the residuals returns only one value for each level. If this is impossible, any advice on how to space out the commodity names would be equally helpful.
Thank you.
For each level of v2andv3commoditycombined, what exactly would you like your Y axis and your X axis to be? Since you're splitting the plots by v2andv3commoditycombined, you obviously can't also use that as one of your axes.
Let's pretend you just want do the traditional residuals on the Y axis and fitted values on the X axis, in a separate plot for each of the 98 levels. You can change the code to do plot whatever it is you actually want to plot.
As per ?plot.lme, you would do something like this:
plot(meef1,resid(.,type='pearson',level=1)~fitted(.,level=1)|v2andv3commoditycombined);
Make sure you stretch out your plot window beforehand so that it's nice and big, otherwise you might get an error saying something about margins. The following might produce a better-looking plot:
plot(meef1,resid(.,type='pearson',level=1)~fitted(.,level=1)|v2andv3commoditycombined,pch='.',cex=1.5,abline=0);
Since it wasn't clear from your question I went ahead and assumed you're interested in the individual level residuals (i.e. how much each datapoint differs from the predicted value given its random variables), and that you have one level of nesting in your random formula. If you want population residuals (i.e. how much each datapoint differs from the average predicted value), change both instances of level to say level=0. If you have K levels of nesting, change them to level=K and good luck.
I also assumed you wanted standardized residuals (because you can use the convenient rule of thumb that absolute values greater than 3 are possible outliers, regardless of what scale the original data are on). If not, see ?residuals.lme for other valid options for the type argument.
Oh, and the name of your variables suggests that you're looking at some sort of financial time series. If so, have a look at ACF(meef1) to see if there is a lot of autocorrelation. If there is, you could remedy it by instead fitting a model where the response (Y) variable is diff(...) the original variable. If you're seeing really skewed residuals, you might consider log-transforming your response variable before taking the diff.

Resources