Loess Fitting Issue - r

Hi I am extremely new to R and typically use matlab or c# but currently need to perform some smooth curve fitting, and do some residual analysis, so I've turned to R. I know that there are plenty of questions asked regarding the topic of loess and lowess fitting but the issue isn't the typical "your data is out of order" problem I seem to keep seeing.
Some sample data I am working with can be seen in the plot below
In the end the method I would like to use is loess, but I've also tried lowess and scatter.smooth. My issue is that I can't seem to get these methods working when I plot my data as x1,y1 but they seem to work alright when I plot y1,x1. I expect I'm just totally clueless here, but this seems odd to me.
ord <- order(x)
x1 <- x[ord]
y1 <- y[ord]
plot(x1,y1)
fm = loess(y1~x1)
lines(x1, predict(fm))
ord <- order(y)
x1 <- x[ord]
y1 <- y[ord]
plot(y1,x1)
fm = loess(x1~y1)
lines(y1, predict(fm))
The above plots show that for the x1,y1 plot the fit cuts across the data, clearly not oriented properly, but with a shape that would make sense for the data if it were flipped and rotated. For the y1,x1 plot though, using the same steps but just with the use of x1 and y1 switched in all lines, the fit works fine. I feel like this issue is actually quite a simple one, and just something I'm drawing a blank on. Any help/explanation here would be greatly appreciated, as I would like to be able to plot the data in the intended x1,y1 orientation.

In mathematics one of the meanings of "function" is a relationship where there is only one y-value for each of the x-values in a relationship between two variables. loess tries to create a function in that meaning. Your data would support a curve that starts high on the left, arcs over to the right and sweeps back to the left. That would then be a 1-2 relation because many of the x-values would have 2 y-values. Mathematically that would lose many desirable features of have a "true" function. You demonstrate that the relationship can be "functional" with your inverse display and loess fit. It would be possible to take that second fit and "invert", which would rotate the curve 90 degrees).
You didn't provide data that would support a coding demonstration, but if you remedy that omission, such a demonstration could be offered.

Related

Does this curve represent non-linearity in my residuals vs fitted plot? (simple linear regression)

Hi,
I am running a simple linear regression model in R at the moment and wanted to check my assumptions. As seen by the plot, my red line does not appear to be flat and instead curved in places.
I am having a little difficulty interpreting this - does this imply non-linearity? And if so, what does this say about my data?
Thank you.
The observation marked 19 on your graph (bottom right corner) seems to have significant influence and is pulling down your line more than other points are pulling it up. The relationship looks linear all in all, getting rid of that outlier by either nullifying it by increasing sample size (Law of large numbers) or removing the outlier(s) should fix your problem without compromising the story your data is trying to tell you and give you the nice graph you're looking for.

How to generate mean curve of non-function?

I am currently working on curves generated in tensile tests of polymer specimens. Here, I try to generate a mean curve of five data sets generated at the same composition of the samples. Unfortunately, the resulting curve is not a function but has a vertical section which is why a simple smooth is not sufficient. Is there a way to fix the smoothed curve to a defined end point in R? Or an even better way that I did not see yet?
I already tried a geometric_smooth() from ggplot2 on all data points but it did not work as wished.
My current approach:
data <- read.csv("data.csv", header = TRUE, sep = ";")
ggplot(data, aes(y=stress, x=strain))+geom_point()+geom_smooth()
In the figure, you can see that the blue average curve does not fit the actual curves near their end points, probably due to the vertical sections. That's why I want to fix it to the mean end point. Additionally, I would like to fix it to (0|0) as the blue mean curve starts somewhere above it which does not fit the actual behaviour.

R plotting strangeness with large dataset

I have a data frame with several million points in it - each having two values.
When I plot this like this:
plot(myData)
All the points are plotted, but the plot is quite busy, so I thought I'd plot it as a line:
plot(myData, type="l")
But while the x axis doesn't change (i.e. goes from 0 to 7e+07), the actual plotting stops at about 3e+07 and I don't actually get a proper line plot either.
Is there a limitation on line plotting?
Update
If I use
plot(myData, type="h")
I get correct and useable output, but I still wonder why the type="l" option fails so badly.
Further update
I am plotting a time series - here is one output using type="h":
That's perfectly usable, but having a line would allow me to compare several outputs.
High dimensional data graphic representation is growing issue in data analysis. The problem, actually, is not create the graph. The problem is make the graph capable of communicate information that we could transform in useful knowledge. Allow me to present an example to produce this point, by considering a data with a million observations, that is, not that big.
x <- rnorm(10^6, 0, 1)
y <- rnorm(10^6, 0, 1)
Let's plot it. R can yes easily manage such a problem. But can we? Probably not.
Afterall, what kind of information can we deduce from an ink hard stain? Probably, no more than a tasseographyst trying to divinate the future in patterns of tea leaves, coffee grounds, or wine sediments.
plot(x, y)
A different approach is represented by the smoothScatter function. It creates a density plot of bivariate data. There, we create two examples.
First, with defaults.
smoothScatter(x, y)
Second, the bandwidth was specified to be a little larger than the default, and five points are specified to be shown using a different symbol pch = 3.
smoothScatter(x, y, bandwidth=c(5,1)/(1/3), nrpoints=5, pch=3)
As you can see, the problem is not solved. Nevertheless, we can have a better grasp on the distribution of our data. This kind of approach is still in development, and there are several matters that are discussed and evolved. If this approach represents a more suitable approach to represent your big dataset, I suggest you to visit this blog that discuss throughfully the issue.
For what it's worth, all the evidence I have is that is computer - even though it was a lump of big iron - ran out of memory.

how to fit baseline/background in R

I am trying to fit the background shape in nmr spectra. For this I have been using the loess function so far.
First I try to identify all the peaks (which works more or less) and remove them from the spectrum. Then I try to fit the rest of the spectrum with the loess function.
My problem now is that if the removal of peaks doesn't work perfectly there are still some points left which are clearly not background.
Is there a way to tell the fit not to go over the data, i.e. having the fitted line always below the data points (which is clearly what you want from a baseline)? My hope is that, if I am able to constrain the fit to be below the data points I can find suitable parameters, so that the remaining points from the peaks are ignored.
Thanks
John

Make density cloud from point cloud

My question consists of two sub questions.
I have a graphical illustration presenting (some virtual) worst case scenarios sampled from history organized based on two parameters.
Image:
At this moment I have a point cloud. I would like to create nicely splined density cloud of my results. I would like the 3d spline to consider density of points when aproximating (so aproximate further around when there are less samples availabe and more exactly in more dense region of space)
Because then, having that density cloud, I would be able scale the density in each vertical line specified by the two input parameters, and that would make it a likehood function of each outcome - [the worst case scenario])
Second part is, I would like to plot it, at best as semi-transparent 3d-regions that would be forming sometihng like a fog around the most dense region.
Uh,wow.. that wasn't easy to explain. Sigh. :)
Thanks for reading that far.
So here is a way to generate 3D density plots using the ks package. Since you provided no data this example is taken directly from the documentation to plot(...) in the ks package
library(MASS)
library(ks)
x <- iris[,1:3]
H.pi <- Hpi(x, pilot="samse")
fhat <- kde(x, H=H.pi, compute.cont=TRUE)
plot(fhat, drawpoints=TRUE)

Resources