How to smooth a curve in R? - r

location diffrence<-c(0,0.5,1,1.5,2)
Power<-c(0,0.2,0.4,0.6,0.8,1)
plot(location diffrence,Power)
The guy which has written the paper said he has smoothed the curve using a weighted moving average with weights vector w = (0.25,0.5,0.25) but he did not explained how he did this and with which function he achieved that.i am really confused

Up front, as #MartinWettstein cautions, be careful in when you smooth data and what you do with it (infer from it). Having said that, a simple exponential moving average might look like this.
# replacement data
x <- seq(0, 2, len=5)
y <- c(0, 0.02, 0.65, 1, 1)
# smoothed
ysm <-
zoo::rollapply(c(NA, y, NA), 3,
function(a) Hmisc::wtd.mean(a, c(0.25, 0.5, 0.25), na.rm = TRUE),
partial = FALSE)
# plot
plot(x, y, type = "b", pch = 16)
lines(x, ysm, col = "red")
Notes:
the zoo:: package provides a rolling window (3-wide here), calling the function once for indices 1-3, then again for indices 2-4, then 3-5, 4-6, etc.
with rolling-window operations, realize that they can be center-aligned (default of zoo::rollapply) or left/right aligned. There are some good explanations here: How to calculate 7-day moving average in R?)
I surround the y data with NAs so that I can mimic a partial window. Normally with rolling-window ops, if k=3, then the resulting vector is length(y) - (k-1) long. I'm inferring that you want to include data on the ends, so the first smoothed data point would be effectively (0.5*0 + 0.25*0.02)/0.75, the second smoothed data point (0.25*0 + 0.5*0.02 + 0.25*0.65)/1, and the last smoothed data point (0.25*1 + 0.5*1)/0.75. That is, omitting the 0.25 times a missing data point. That's a guess and can easily be adjusted based on your real needs.
I'm using Hmisc::wtd.mean, though it is trivial to write this weighted-mean function yourself.
This is suggestive only, and not meant to be authoritative. Just to help you begin exploring your smoothing processes.

Related

Creating a beta distribution Q-Q plot

My task is to create 100 random generated numbers from beta distribution and compare that random variable with beta distribution using quantile plot.
This is my attempt:
library(MASS)
library(qualityTools)
Random_Numbers_Beta <- rbeta(100, 1, 1)
qqPlot(Random_Numbers_Beta, "beta", list(shape = 1, rate = 1))
Unfortunately something is wrong. This is an error which occurs:
Error in (function (x, densfun, start, ...) :
'start' must be a named list
Can something be done with that issue?
First, you had to specify that list(shape = 1, rate = 1) is the start parameter; right now this list is being treated as a value for the confbounds parameter. Second, it's actually not shape and rate, but shape1 and shape2, as in, e.g., ?dbeta.
qqPlot(Random_Numbers_Beta, "beta", start = list(shape1 = 1, shape2 = 1))
Again inspecting ?qqPlot you may see that ... is for "further graphical parameters: (see par)." Hence, you may modify the plot the way you like; e.g., adding col = 'red'.
Also notice that Beta(1,1) is simply the uniform distribution on [0,1] and, hence, its quantile function is the identity function. That is, qbeta(x, 1, 1) == x for any x in [0,1]. So, you may also simply work directly with
x <- seq(0, 1, length = 500)
plot(quantile(Random_Numbers_Beta, x), x)
abline(a = 0, b = 1, col = 'red')
if you don't need the confidence bounds.
One can notice, however, that the two plots are a little different. Given your task, it would seem that you need the second one.
In the first one, it looks like qqPlot fits a beta distribution for your data and uses its quantiles, which apparently isn't exactly the identity function. That is, it doesn't use the exact knowledge about the parameters. The second plot uses this knowledge.

How can I fit a smooth hysteresis in R?

I have a measurment of which should fit an hysteresis. For visualisation purpose I would like to plot a line approximating the hysteresis to help explain this pattern.
I created an example in the following image using the code below.
I would like to have an output similar to the green curve - however I don't have this data directly available, and I don't care whether it is pointy.
However most smoothing functions such as smooth.spline which I plotted in blue - allow no loops. The closest I can find is from the bezier library - plotted in red. Not nicely visible here but it produces a loop, however it fits poorly (and gives some warnings and takes quite some time).
Can you suggest a method?
set.seed(12345)
up <- seq(0,1,length.out=100)^3
down <- sqrt(seq(1,0,length.out=100))
x <- c(seq(0,1,length.out=length(up)),
seq(1,0, length.out=length(down)))
data <- data.frame(x=x, y=c(up,down),
measuredx=x + rnorm(length(x))*0.01,
measuredy=c(up,down) + rnorm(length(up)+length(down))*0.03)
with(data,plot(measuredx,measuredy, type = "p"))
with(data,lines(x,y, col='green'))
sp <- with(data,smooth.spline(measuredx, measuredy))
with(sp, lines(x,y, col="blue"))
library(bezier)
bf <- bezierCurveFit(as.matrix(data[,c(1,3)]))
lines(bezier(t=seq(0, 1, length=500), p=bf$p), col="red", cex=0.25)
UPDATE
As it turns out my actual problem is slightly different I ask another question to reflect my actual issue in the question: How to fit a smooth hysteresis in a poorly distributed data set?
set.seed(12345)
up <- seq(0,1,length.out=100)^3
down <- sqrt(seq(1,0,length.out=100))
x <- c(seq(0,1,length.out=length(up)),
seq(1,0, length.out=length(down)))
data <- data.frame(x=x, y=c(up,down),
measuredx=x + rnorm(length(x))*0.01,
measuredy=c(up,down) + rnorm(length(up)+length(down))*0.03)
Instead of smoothing data$measuredy directly over data$measuredx, do two separate smoothing, by smoothing each against a time stamp variable. Then combine the fitted values from two smoothing. This is a general way for smoothing a closed curve or a loop. (See also Q & A: Smoothing Continuous 2D Points)
t <- seq_len(nrow(data) + 1)
xs <- smooth.spline(t, c(data$measuredx, data$measuredx[1]))$y
ys <- smooth.spline(t, c(data$measuredy, data$measuredy[1]))$y
with(data, plot(measuredx, measuredy))
lines(xs, ys)
c(data$measuredx, data$measuredx[1]) for example is just to ensure that the last value in the vector agrees with the first, so that it completes a cycle.
The curve is not really closed at the bottom left corner, because smooth.spline is doing smoothing not interpolation, so even if we have ensure that data vector completes a cycle, the fitted one may not be a closed one. A practical workaround is to use weighted regression, imposing heavy weight on this spot to make it closed.
t <- seq_len(nrow(data) + 1)
w <- rep(1, length(t)) ## initially identical weight everywhere
w[c(1, length(w))] <- 100000 ## give heavy weight
xs <- smooth.spline(t, c(data$measuredx, data$measuredx[1]), w)$y
ys <- smooth.spline(t, c(data$measuredy, data$measuredy[1]), w)$y
with(data, plot(measuredx, measuredy), col = 8)
lines(xs, ys, lwd = 2)

R - Histogram Doesn't show density due to magnitude of the Data

I have a vector called data with length 444000 approximately, and most of the numeric values are between 1 and 100 (almost all of them). I want to draw the histogram and draw the the appropriate density on it. However, when I draw the histogram I get this:
hist(data,freq=FALSE)
What can I do to actually see a more detailed histogram? I tried to use the breaks code, it helped, but it's really hard do see the histogram, because it's so small. For example I used breaks = 2000 and got this:
Is there something that I can do? Thanks!
Since you don't show data, I'll generate some random data:
d <- c(rexp(1e4, 100), runif(100, max=5e4))
hist(d)
Dealing with outliers like this, you can display the histogram of the logs, but that may difficult to interpret:
If you are okay with showing a subset of the data, then you can filter the outliers out either dynamically (perhaps using quantile) or manually. The important thing when showing this visualization in your analysis is that if you must remove data for the plot, then be up-front when the removal. (This is terse ... it would also be informative to include the range and/or other properties of the omitted data, but that's subjective and will differ based on the actual data.)
quantile(d, seq(0, 1, len=11))
d2 <- d[ d < quantile(d, 0.90) ]
hist(d2)
txt <- sprintf("(%d points shown, %d excluded)", length(d2), length(d) - length(d2))
mtext(txt, side = 1, line = 3, adj = 1)
d3 <- d[ d < 10 ]
hist(d3)
txt <- sprintf("(%d points shown, %d excluded)", length(d3), length(d) - length(d3))
mtext(txt, side = 1, line = 3, adj = 1)

Graphing a polynomial output of calc.poly

I apologize first for bringing what I imagine to be a ridiculously simple problem here, but I have been unable to glean from the help file for package 'polynom' how to solve this problem. For one out of several years, I have two vectors of x (d for day of year) and y (e for an index of egg production) data:
d=c(169,176,183,190,197,204,211,218,225,232,239,246)
e=c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,0.016599262,0.002810977,0.00560387 8,0,0.002810977,0.002810977)
I want to, for each year, use the poly.calc function to create a polynomial function that I can use to interpolate the timing of maximum egg production. I want then to superimpose the function on a plot of the data. To begin, I have no problem with the poly.calc function:
egg1996<-poly.calc(d,e)
egg1996
3216904000 - 173356400*x + 4239900*x^2 - 62124.17*x^3 + 605.9178*x^4 - 4.13053*x^5 +
0.02008226*x^6 - 6.963636e-05*x^7 + 1.687736e-07*x^8
I can then simply
plot(d,e)
But when I try to use the lines function to superimpose the function on the plot, I get confused. The help file states that the output of poly.calc is an object of class polynomial, and so I assume that "egg1996" will be the "x" in:
lines(x, len = 100, xlim = NULL, ylim = NULL, ...)
But I cannot seem to, based on the example listed:
lines (poly.calc( 2:4), lty = 2)
Or based on the arguments:
x an object of class "polynomial".
len size of vector at which evaluations are to be made.
xlim, ylim the range of x and y values with sensible defaults
Come up with a command that successfully graphs the polynomial "egg1996" onto the raw data.
I understand that this question is beneath you folks, but I would be very grateful for a little help. Many thanks.
I don't work with the polynom package, but the resultant data set is on a completely different scale (both X & Y axes) than the first plot() call. If you don't mind having it in two separate panels, this provides both plots for comparison:
library(polynom)
d <- c(169,176,183,190,197,204,211,218,225,232,239,246)
e <- c(0,0,0.006839425,0.027323127,0.024666883,0.005603878,
0.016599262,0.002810977,0.005603878,0,0.002810977,0.002810977)
egg1996 <- poly.calc(d,e)
par(mfrow=c(1,2))
plot(d, e)
plot(egg1996)

Tufte tables: convert quartile plots into standard error plots hacking qTable function from NMOF package

If you remember there is a nice version of table conceived by Tufte that include small quartile plots running next to the corresponding data rows:
There is an implementation of such solution in R using package NMOF and function qTable, which basically creates the table shown above and outputs it as a LaTeX code:
require(NMOF)
x <- rnorm(100, mean = 0, sd = 2)
y <- rnorm(100, mean = 1, sd = 2)
z <- rnorm(100, mean = 1, sd = 0.5)
X <- cbind(x, y, z)
qTable(X,filename="res.tex")#this will save qTable in tex format in current dir
This method of visualization seem to be especially useful if you have a small amount of information to present, and you don't want to waste a space for the full graph. But I would like to hack qTable a bit. Instead of displaying quantile plots, I would prefer to display standard errors of the mean. I am not great in hacking such functions, and I used brute force to do it. I replaced the line from the qTable function which computes quantiles:
A <- apply(X, 2L, quantile, c(0.25, 0.5, 0.75))
with something very brutal, that computes standard errors:
require(psych)#got 'SE' function to compute columns standard deviation
M = colMeans(X)#column means
SE = SD(X)/sqrt(nrow(X))#standard error
SELo = M-SE#lower bound
SEHi = M+SE#upper bound
A = t(data.frame(SELo,M,SEHi))#combines it together
I know, I know it's possibly unsustainable approach, but it actually works to some extend - it plots standard errors but keeps this gap in the plot:
and I would like this gap to disappear.
Here is a qTable function with modification discussed above.
To remove the gaps, you can insert these two lines:
B[2, ] <- B[3, ]
B[4, ] <- B[3, ]
right before the for loop that starts with
for (cc in 1L:dim(X)[2L]) {...
Why does it work? Reading the graph from left to right, the five rows of B correspond to where
1) the left segments start
2) the left segments ends
3) the dots are
4) the right segments start
5) the right segments end
so by forcing B[2, ] and B[4, ] to B[3, ] you are effectively getting rid of the gaps.

Resources