This question already has answers here:
Moving average of previous three values in R
(3 answers)
Closed 2 years ago.
For a series
X=(x_1,x_2,...x_t-2, x_t-1, x_t)
I would like to compute a moving average for each point with respect to the previous k time steps. For example, if k = 2, I want to return:
X =(NA, (x_1+x_2)/2 ... (x_t-2 + x_t-3)/2, (x_t-2 + x_t-1)/2, (x_t + x_t-1)/2)
If I use the moving average function ma, e.g.
ma(X, order = 2, centre = TRUE)
I get the average of each point and its neighbor in the positive and negative direction, while setting centre=FALSE calculates the moving average with respect to the positive direction. Is there a simple way to have point t as the running average of (t-k+1...t)?
Assuming test input X as shown this takes the mean of the current and prior value. Note the r on the end of rollmeanr which tells it to use the right aligned version rather than the center aligned version.
library(zoo)
X <- 1:10 # test input
rollmeanr(X, 2, fill = NA)
## [1] NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
so does this (no packages):
n <- length(X)
c(NA, (X[-1] + X[-n])/2)
## [1] NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
and this somewhat more general base R approach (also no packages):
k <- 2
c(rep(NA, k-1), rowMeans(embed(X, k)))
## [1] NA 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
Related
This question already has an answer here:
How to get the points inside of the ellipse in ggplot2?
(1 answer)
Closed 2 years ago.
I have a data set that needs to be cleaned from mistakes. For that, I have a sub-data set that contains only observations that I know are correct ("Match"). I would like to draw a 95% confidence ellipse around those correct observations on a plot and exclude all observations out of the ellipse from my main data set.
I figured out how to draw it but now I would like to be able to take out data based on that.
I'm a beginner with R so all of that is pretty new to me so I might not understand complicated coding. :)
Thanks !
To add more details, my data are measurements of collembolas (a type of insect). It has this basic structure:
replicate node day MajorAxisLengtnh MinorAxisLength Data.type
1 1 1 50 2.1 0.4 Match
2 2 1 50 2.3 0.2 Unknown
Therefore, I want to validate measurements by excluding unrealistic aspect ratios (length/width). Using the subset that I know is correct (match observations), I want to determine a reasonable range of aspect ratios for collembola, and use it to remove any unrealistic observation. I was advised to use a 95% confidence ellipse for good observations and take out observations that don't fit in the ellipse.
The SIBER package has some functions to help you here.
library(SIBER)
Let's use the iris dataset, plotting sepal width vs length.
dat <- iris[,1:2]
plot(dat)
mu <- colMeans(dat)
Sigma <- cov(dat)
addEllipse(mu, Sigma, p.interval = 0.95, col = "blue", lty = 3)
Z <- pointsToEllipsoid(dat, Sigma, mu) # converts the data to ellipsoid coordinates
out <- !ellipseInOut(Z, p = 0.95) # logical vector
(outliers <- dat[out,]) # finds the points outside the ellipse
# Sepal.Length Sepal.Width
#16 5.7 4.4
#34 5.5 4.2
#42 4.5 2.3
#61 5.0 2.0
#118 7.7 3.8
#132 7.9 3.8
points(outliers, col="red", pch=19)
You can then use the out vector to remove unwanted rows.
dat.in <- dat[!out,]
I have a data frame that contains grid data, with columns corresponding to XY coordinates and a factor "value".
I would like to detect areas with the same 'value' and plot there edges.
An example of my data:
dat = melt(volcano[26:40, 26:40])
dat$value=factor(round(dat$value/10))
dat[dat$X1==12 & dat$X2==6,"value"]=NA
dat[dat$X1==13 & dat$X2==6,"value"]=NA
dat=dat[7:nrow(dat),]
head(dat)
My plot:
library(ggplot2)
p=ggplot(dat) +
geom_tile(aes(x=X1, y=X2, fill=value))+geom_text(aes(x=X1, y=X2, label=value))
p
My attempt would be to use p+geom_polygon(data = polys, aes(x = x , y = y , group = id),size=1, color = "black"), but i'm struggling with the step consisting to get 'polys': the coordinates of edges points. This data frame should like this:
id x y
1 0.5 1.5
1 1.5 1.5
1 2.5 1.5
1 2.5 2.5
1 1.5 2.5
1 0.5 2.5
1 0.5 1.5
2 2.5 1.5
2 3.5 1.5
2 4.5 1.5
2 5.5 1.5
...
with x and y that are the coordinates of the corners of the tiles that i'm looking for, and 'id' the grouping factor for the different polygons. For example id=1 corresponds to the little purple rectangle ine the left bottom corner.
Any idea, to automatically detect these edges points for each area, based on "value" column ?
Thanks
Problem: Use the following data to find the velocity and acceleration
at t = 10 seconds:
Time, t, s 0 2 4 6 8 10 12 14 16
Position, x, m 0 0.7 1.8 3.4 5.1 6.3 7.3 8.0 8.4
I resolved the centered finite-difference
How can I apply this in Scilab?
This calculates velocity from the given arrays x and t, using central differences:
v = (x(3:$) - x(1:$-2)) ./ (t(3:$) - t(1:$-2))
To see what this does, focus on the first index in each range:
(x(3) - x(1)) ./ (t(3) - t(1))
Clearly, this is the velocity at the 2nd moment of time. The formula performs this calculation for all times when it's possible to do; the centered difference formula does not apply at the first and last moment. One may want to introduce truncated time range to reflect this:
tr = t(2:$-1)
Similarly for acceleration:
a = (x(3:$) - 2*x(2:$-1) + x(1:$-2)) ./ (t(3:$) - t(1:$-2)).^2
This can now be plotted with plot(tr,v) or plot(tr,a). And to look up their values when time is 10, use
v(tr==10)
and
a(tr==10)
How do I get the cumulative frequency for each unique X value plotted on a normal probability (non linear) y axis. Have used probplot(x) but it doesn't accumulate identical values. My data is a data frame with blood results and. The probplot plots each individual blood result rather than a cumulated frequency for each unique result .
Small data example:
V1
7.1
7.2
7.2
7.6
6.8
6.9
6.8
7.4
7.0
I can calculate the cummulated frequency but not plot it with the correct normal probability axis:
tabvals <- table(data$V1)
tabvals <- cbind.data.frame(tabvals)
tabvals$frequency <- tabvals$Freq/sum(tabvals$Freq)
tabvals$kummulated <- NA
for (i in 1:nrow(tabvals)){
if (i == 1) {
tabvals$kummulated[i] <- tabvals$frequency[i]
} else {
tabvals$kummulated[i] <- tabvals$kummulated[i-1] + tabvals$frequency[i]
}}
plot(tabvals$Var1, tabvals$kummulated , type="l")
The only way to get the right Y axis is this:
library(e1071)
probplot(data$V1)
But this plots 7.2 and 7.2 as two different points rather than accumulate them.
I have the following data frame in R:
>AcceptData
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a
Which I want to plot, with the Sentence.Type column in the x axis, with the actual name of each cell as a point in the x axis. I want the y axis to go from 1 to 4 in steps of .5
So far I haven't been able to plot this, neither with plot() not with hist(). I keep getting different types of errors, mainly because of the nature of the character column in the data.frame.
I know this should be easy for most, but I'm sort of noob with R still and after hours I can't get the plot right. Any help is much appreciated.
Edit:
Some of the errors I've gotten:
> hist(AcceptData$Sentence.Type,AcceptData$Mean.Rank)
Error in hist.default(AcceptData$Sentence.Type, AcceptData$Mean.Rank) :
'x' must be numeric
Or: (this doesn't give an error, but definitely not the graph I want. It has all the x values cramped to the left of the x axis)
plot(AcceptData$Sentence.Type,AcceptData$Mean.Rank,lty=5,lwd=2,xlim=c(1,16),ylim=c(1,4),xla b="Sentence Type",ylab="Mean Ranking",main="Mean Acceptability Ranking per Sentence")
The default plot function has a method that allows you to plot factors on the x-axis, but to use this, you have to convert your text data to a factor:
Here is an example:
x <- letters[1:5]
y <- runif(5, 0, 5)
plot(factor(x), y)
And with your sample data:
AcceptData <- read.table(text="
Mean.Rank Sentence.Type
1 2.5 An+Sp+a
2 2.6 An+Nsp+a
3 2.1 An+Sp-a
4 3.1 An+Nsp-a
5 2.4 In+Sp+a
6 1.7 In+Nsp+a
7 3.1 In+Sp-a
8 3.0 In+Nsp-a", stringsAsFactors=FALSE)
plot(Mean.Rank~factor(Sentence.Type), AcceptData, las=2,
xlab="", main="Mean Acceptability Ranking per Sentence")