Is there any subtle difference between 'mean' and 'average' in R? - r

Is there any difference between what these two lines of code do:
mv_avg[i-2] <- (sum(file1$rtn[i-2]:file1$rtn[i+2])/5)
and
mv_avg[i-2] <- mean(file1$rtn[i-2]:file1$rtn[i+2])
I'm trying to calculate the moving average of first 5 elements in my dataset. I was running a for loop and the two lines are giving different outputs. Sorry for not providing the data and the rest of the code for you guys to execute and see (can't do that, some issues).
I just want to know if they both do the same thing or if there's a subtle difference between them both.

It's not an issue with mean or sum. The example below illustrates what's happening with your code:
x = seq(0.5,5,0.5)
i = 8
# Your code
x[i-2]:x[i+2]
[1] 3 4 5
# Index this way to get the five values for the moving average
x[(i-2):(i+2)]
[1] 3.0 3.5 4.0 4.5 5.0
x[i-2]=3 and x[i+2]=5, so x[i-2]:x[i+2] is equivalent to 3:5. You're seeing different results with mean and sum because your code is not returning 5 values. Therefore dividing the sum by 5 does not give you the average. In my example, sum(c(3,4,5))/5 != mean(c(3,4,5)).
#G.Grothendieck mentioned rollmean. Here's an example:
library(zoo)
rollmean(x, k=5, align="center")
[1] 2.1 3.1 4.1 5.1 6.1 7.1 8.1

Related

Using rolling function for DataFrame in R [duplicate]

In the zoo package there is a function called rollmean, which enables you to make moving averages. The rollmean(x,3) will take the previous, current and next value (ie 4, 6 and 2) in the table below. This is shown in the second column.
x rollmean ma3
4
6 4.0
2 4.3
5 3.0 4.0
2 6.3 4.3
12 6.0 3.0
4 6.0 6.3
2 6.0
I would like to get the same job done, but by averaging out the previous 3 values in the fourth row. This is displayed in the third column. Can anybody tell me the name of the function that will help to accomplish this?
You can use rollmean, but set align='right'. Or you could use rollmeanr, which has align='right' as the default.
ma3 <- rollmeanr(x[,1],3,fill=NA)
...but you would still need to lag the result. Another solution is to use rollapply with a list for the width argument:
ma3 <- rollapplyr(x[,1],list(-(3:1)),mean,fill=NA)
I struggled searching for a simple function for moving averages that had some flexibility to do what I needed. I finally wrote a couple functions extending the one based on the filter function which rinni gives above in the comment (but which itself won't work because it will include the current observation in the 3 period average).
Moving average function that includes the current observation
mav <- function(x,n){filter(x,rep(1/n,n), sides=1)}
Moving average function that does not include the current observation
mavback <- function(x,n){
a<-mav(x,1)
b<-mav(x,(n+1))
c<-(1/n)*((n+1)*b - a)
return(c)
}
Backward looking moving average function, not including current obs, based on [h2] readings starting [h1] periods back
mavback1<-function(x,h1,h2){
a<-mavback(x,h1)
b<-mavback(x,h1-h2)
c<-(1/h2)*(h1*a -(h1-h2)*b)
return(c)
}
A simplier implementation of w_i_l_l's mavback function based on his mav function
mavback <- function(x,n){ filter(x, c(0, rep(1/n,n)), sides=1) }

Calculate a value in a column for each row

Here is the table that I am trying to manipulate:
colnames sampA sampB
#1 conA conB
#2 1.1 4.4
#3 2.2 5.5
#4 3.3 6.6
I want to calculate log2(x(1-x)) for each number in $sampB. Here is my code so far:
DF[-1,3] <- apply(DF[-1,]$sampB,1,function(x) log2(x(1-x)))
then I got the error message:
dim(X) must have a positive length
You shouldn't need apply(), as log2() is vectorized. Try this
x <- as.numeric(as.character(DF$sampB[-1]))
log2(x * (1 - x))
I took off the first element because I'm not really sure what that conB part is about (and now you have confirmed it in the comments). I also suspect that the column might be a factor (because of conB), so I wrapped the column in as.numeric(as.character(...)). That may not be necessary, but better safe than sorry.

Generate values between non-linear points

I am aiming at smoothing out a curve with set values. To do this, I currently generate a vector between points in my curve like so:
> y.values <- c(values[1], mean(values[1:2]), values[2], ...)
This is not the fastest approach to say the least (and this snippet is just between two of the numbers!). I need a better way to generate a vector with known non-linear values and insert a value between each one, like so:
> values
[1] 1 2 4 6 9
> y.values <- magic(values)
> y.values
[1] 1 1.5 2 3 4 5 6 7.5 9
This question feels basic but I researched it and cannot seem to find a proper method for my non-linear vector, and any help is appreciated. Thank you for reading.
Maybe not the most elegant way to do this but it works:
values <- c(1,2,4,6,9)
#lapply is used to create the mean values and those get merged
#in between your values inside the function
a <- c(unlist(lapply( 1:(length(values)-1 ), function(x) c(values[x],(values[x]+values[x+1])/2))),
values[length(values)])
Output:
> a
[1] 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.5 9.0
Or as a function:
magic <- function(x) {
c(unlist(lapply( 1:(length(x)-1 ), function(z) c(x[z],(x[z]+x[z+1])/2))),
x[length(x)])
}
> magic(values)
[1] 1.0 1.5 2.0 3.0 4.0 5.0 6.0 7.5 9.0

Double For Loop and calculate averages in R

I have a minor problem, and I'm unsure how to fix the error.
Basically, I have two columns and I want to use a Double For Loop to calculate the averages between each number in both columns so it results in a vector of averages. To clarify, apply and mean functions isn't the best function because I need only half of the total possible combinations to obtain averages. For example:
Col1<-c(1,2,3,4,5)
Col2<-c(1,2,3,4,5)
Q1<-data.frame(cbind(Col1, Col2))
Q1$mean<-0
for (i in 1:length(Q1$Col1)) {
for (j in i+1:length(Q1$Col2)) {
Q1$mean[i]<-(Q1$Col1[i]+Q1$Col2[j])/2
}
}
Basically, for each number in Q1$Col1, I want it average it with Q1$Col2. The reason why I want to use a double for loop is to eliminate duplicates. This is the matrix version to provide visualization:
1.0 1.5 2.0 2.5 3.0
1.5 2.0 2.5 3.0 3.5
2.0 2.5 3.0 3.5 4.0
2.5 3.0 3.5 4.0 4.5
3.0 3.5 4.0 4.5 5.0
Here, each row represents a number from Q1$Col1 and each column represents a number from Q1$Col2. However, notice that there is redundancy on both sides of the matrix diagonal. So using the Double For Loop, I eliminate the redundancy to obtain the averages of the unique combination of cases. Using the matrix above, it should look like this:
1.0 1.5 2.0 2.5 3.0
2.0 2.5 3.0 3.5
3.0 3.5 4.0
4.0 4.5
5.0
What I think you're asking is this: given two vectors of numbers, how can I find the mean of the first items in each vector, the mean of the second items in each vector, and so on. If that's the case, then here is a way to do that.
First, you want use cbind() not rbind() in order to get columns not rows.
Col1<-c(1,2,3,4,5)
Col2<-c(2,3,4,5,6)
Q1<-cbind(Col1, Col2)
Then you can use the function [rowMeans()][1] to figure out (you guessed it) the means of each row. (See also rowSums() and colMeans() and colSums().)
rowMeans(Q1)
#> [1] 1.5 2.5 3.5 4.5 5.5
The more general way to do this is the apply() function, which will let us apply a function to each column or row. Here we use the argument 1 to apply it to rows (because the first row takes the first item from Col1 and Col2, etc.).
apply(Q1, 1, mean)
The results are these:
#> [1] 1.5 2.5 3.5 4.5 5.5
If you really want them in your existing matrix, you could do something like this:
means <- rowMeans(Q1)
cbind(Q1, means)
You do not need the loops to get the averages, you can use vectorised operations:
Col1 <- c(1,2,3,4,5)
Col2 <- c(2,3,4,5,6)
Mean <- (Col1+Col2)/2
Q1 <- rbind(Col1, Col2, Mean)
However rbind treats your vectors as rows, you could use cbind for columns.
You could just use the outer function to first calculate the averages, then use lower.trito fill the area underneath the diagonal of the matrix with NA values.
matrix<-outer(Q1$Col1, Q1$Col2, "+")/2
matrix[lower.tri(matrix)] = NA

Moving average of previous three values in R

In the zoo package there is a function called rollmean, which enables you to make moving averages. The rollmean(x,3) will take the previous, current and next value (ie 4, 6 and 2) in the table below. This is shown in the second column.
x rollmean ma3
4
6 4.0
2 4.3
5 3.0 4.0
2 6.3 4.3
12 6.0 3.0
4 6.0 6.3
2 6.0
I would like to get the same job done, but by averaging out the previous 3 values in the fourth row. This is displayed in the third column. Can anybody tell me the name of the function that will help to accomplish this?
You can use rollmean, but set align='right'. Or you could use rollmeanr, which has align='right' as the default.
ma3 <- rollmeanr(x[,1],3,fill=NA)
...but you would still need to lag the result. Another solution is to use rollapply with a list for the width argument:
ma3 <- rollapplyr(x[,1],list(-(3:1)),mean,fill=NA)
I struggled searching for a simple function for moving averages that had some flexibility to do what I needed. I finally wrote a couple functions extending the one based on the filter function which rinni gives above in the comment (but which itself won't work because it will include the current observation in the 3 period average).
Moving average function that includes the current observation
mav <- function(x,n){filter(x,rep(1/n,n), sides=1)}
Moving average function that does not include the current observation
mavback <- function(x,n){
a<-mav(x,1)
b<-mav(x,(n+1))
c<-(1/n)*((n+1)*b - a)
return(c)
}
Backward looking moving average function, not including current obs, based on [h2] readings starting [h1] periods back
mavback1<-function(x,h1,h2){
a<-mavback(x,h1)
b<-mavback(x,h1-h2)
c<-(1/h2)*(h1*a -(h1-h2)*b)
return(c)
}
A simplier implementation of w_i_l_l's mavback function based on his mav function
mavback <- function(x,n){ filter(x, c(0, rep(1/n,n)), sides=1) }

Resources