How to generate mean curve of non-function? - r

I am currently working on curves generated in tensile tests of polymer specimens. Here, I try to generate a mean curve of five data sets generated at the same composition of the samples. Unfortunately, the resulting curve is not a function but has a vertical section which is why a simple smooth is not sufficient. Is there a way to fix the smoothed curve to a defined end point in R? Or an even better way that I did not see yet?
I already tried a geometric_smooth() from ggplot2 on all data points but it did not work as wished.
My current approach:
data <- read.csv("data.csv", header = TRUE, sep = ";")
ggplot(data, aes(y=stress, x=strain))+geom_point()+geom_smooth()
In the figure, you can see that the blue average curve does not fit the actual curves near their end points, probably due to the vertical sections. That's why I want to fix it to the mean end point. Additionally, I would like to fix it to (0|0) as the blue mean curve starts somewhere above it which does not fit the actual behaviour.

Related

Loess Fitting Issue

Hi I am extremely new to R and typically use matlab or c# but currently need to perform some smooth curve fitting, and do some residual analysis, so I've turned to R. I know that there are plenty of questions asked regarding the topic of loess and lowess fitting but the issue isn't the typical "your data is out of order" problem I seem to keep seeing.
Some sample data I am working with can be seen in the plot below
In the end the method I would like to use is loess, but I've also tried lowess and scatter.smooth. My issue is that I can't seem to get these methods working when I plot my data as x1,y1 but they seem to work alright when I plot y1,x1. I expect I'm just totally clueless here, but this seems odd to me.
ord <- order(x)
x1 <- x[ord]
y1 <- y[ord]
plot(x1,y1)
fm = loess(y1~x1)
lines(x1, predict(fm))
ord <- order(y)
x1 <- x[ord]
y1 <- y[ord]
plot(y1,x1)
fm = loess(x1~y1)
lines(y1, predict(fm))
The above plots show that for the x1,y1 plot the fit cuts across the data, clearly not oriented properly, but with a shape that would make sense for the data if it were flipped and rotated. For the y1,x1 plot though, using the same steps but just with the use of x1 and y1 switched in all lines, the fit works fine. I feel like this issue is actually quite a simple one, and just something I'm drawing a blank on. Any help/explanation here would be greatly appreciated, as I would like to be able to plot the data in the intended x1,y1 orientation.
In mathematics one of the meanings of "function" is a relationship where there is only one y-value for each of the x-values in a relationship between two variables. loess tries to create a function in that meaning. Your data would support a curve that starts high on the left, arcs over to the right and sweeps back to the left. That would then be a 1-2 relation because many of the x-values would have 2 y-values. Mathematically that would lose many desirable features of have a "true" function. You demonstrate that the relationship can be "functional" with your inverse display and loess fit. It would be possible to take that second fit and "invert", which would rotate the curve 90 degrees).
You didn't provide data that would support a coding demonstration, but if you remedy that omission, such a demonstration could be offered.

What does the span argument control in geom_smooth?

I am using geom_smooth from the ggplot2 package to create a smoothed line on a time series scatter plot (one point for each day of the year, so I have 365 points). One of the arguments is called span, and going into the help file (?geom_smooth) the following description is given:
span controls the amount of smoothing for the default loess smoother. Smaller numbers produce wigglier lines, larger numbers produce smoother lines.
However, this doesn't actually tell me what the span argument is controlling. Setting it to 1 is useless, and setting it to 0.1 provides something that looks good.
span = 0.5
span = 0.1
However, when describing the plot, since I'm not totally sure what span actually changes, I'm not sure how to describe the smoothed line. Any pointers?
The span (also defined alpha) will determine the width of the moving window when smoothing your data.
"In a loess fit, the alpha parameter determines the width of the sliding window. More specifically, alpha gives the proportion of observations that is to be used in each local regression. Accordingly, this parameter is specified as a value between 0 and 1. The alpha value used for the loess curve in Fig. 2 is 0.65; so, each of the local regressions used to produce that curve incorporates 65% of the total data points. "
Taken from:
Jacoby (2000) Loess:: a nonparametric, graphical tool for depicting relationships between variables. Electoral Studies 19-4. (Paywalled paper)
For more details check the referenced paper.
LOESS smoothing is a non-parametric form of regression that uses a weighted, sliding-window, average to calculate a line of best fit. Within each "window", a weighted average is calculated, and the sliding window passes along the x-axis.
One can control the size of this window with the span argument. The span element controls the alpha, which is the degree of smoothing. The smaller the span, the smaller the 'window', hence the noisier/ more jagged the line.
Look for documentation under LOESS rather than span.

how to fit baseline/background in R

I am trying to fit the background shape in nmr spectra. For this I have been using the loess function so far.
First I try to identify all the peaks (which works more or less) and remove them from the spectrum. Then I try to fit the rest of the spectrum with the loess function.
My problem now is that if the removal of peaks doesn't work perfectly there are still some points left which are clearly not background.
Is there a way to tell the fit not to go over the data, i.e. having the fitted line always below the data points (which is clearly what you want from a baseline)? My hope is that, if I am able to constrain the fit to be below the data points I can find suitable parameters, so that the remaining points from the peaks are ignored.
Thanks
John

R smoothing functions

I have an R plot that looks like this:
The redline is the attempted smoothing with lines(smooth.spline(x, y, spar=0.000001)). Notice the insanely low spar value that STILL fails to include the spike near the end of the graph. Nevertheless, it is because of the number of points plotted: 107350. The 20 points near the end are unable to sway, although it is clearly noticeable that these are different than the rest.
What kind of R smoothing function could I use that encompasses these points?
Or if a smoother won't do it, how would I be able to "statistically" distinguish these points?

Make density cloud from point cloud

My question consists of two sub questions.
I have a graphical illustration presenting (some virtual) worst case scenarios sampled from history organized based on two parameters.
Image:
At this moment I have a point cloud. I would like to create nicely splined density cloud of my results. I would like the 3d spline to consider density of points when aproximating (so aproximate further around when there are less samples availabe and more exactly in more dense region of space)
Because then, having that density cloud, I would be able scale the density in each vertical line specified by the two input parameters, and that would make it a likehood function of each outcome - [the worst case scenario])
Second part is, I would like to plot it, at best as semi-transparent 3d-regions that would be forming sometihng like a fog around the most dense region.
Uh,wow.. that wasn't easy to explain. Sigh. :)
Thanks for reading that far.
So here is a way to generate 3D density plots using the ks package. Since you provided no data this example is taken directly from the documentation to plot(...) in the ks package
library(MASS)
library(ks)
x <- iris[,1:3]
H.pi <- Hpi(x, pilot="samse")
fhat <- kde(x, H=H.pi, compute.cont=TRUE)
plot(fhat, drawpoints=TRUE)

Resources