lag and summarize time series data - r

I have spent a significant amount of time searching for an answer with little luck. I have some time series data and need to collapse and create a rolling mean of every nth row in that data. It looks like this is possible in zoo and maybe hmisc and i am sure other packages. I need to average rows 1,2,3 then 3,4,5 then 5,6,7 and so on. my data looks like such and has thousands of observations:
id time x.1 x.2 y.1 y.2
10 1 22 19 0 -.5
10 2 27 44 -1 0
10 3 19 13 0 -1.5
10 4 7 22 .5 1
10 5 -15 5 .33 2
10 6 3 17 1 .33
10 7 6 -2 0 0
10 8 44 25 0 0
10 9 27 12 1 -.5
10 10 2 11 2 1
I would like it to look like this when complete:
id time x.1 x.2 y.1 y.2
10 1 22.66 25.33 -.33 -.66
10 2 3.66 13.33 .27 .50
The time var 1 would actually be times 1,2,3 averaged and 2 would be 3,4,5 averaged but at this point the time var would not be important to keep. I would need to group by id as it does change eventually. The only way I could figure out how to do this successfully was to use Lag() and make new rows lead by 1 and another by 2 then take average across columns. after that you have to delete every other row
1 NA NA
2 1 NA
3 2 1
4 3 2
5 4 3
use the 123 and 345 and remove 234... to do this for each var would be outrageous especially as i gather new data.
any ideas? help would be much appreciated

something like this maybe?
# sample data
id <- c(10,10,10,10,10,10)
time <- c(1,2,3,4,5,6)
x1 <- c(22,27,19,7,-15,3)
x2 <- c(19,44,13,22,5,17)
df <- data.frame(id,time,x1,x2)
means <- data.frame(rollmean(df[,c(1,3:NCOL(df))], 3))
means <- means[c(T,F),]
means$time <- seq(1:NROW(means))
row.names(means) <- 1:NROW(means)
> means
id x1 x2 time
1 10 22.666667 25.33333 1
2 10 3.666667 13.33333 2

Related

Arrange a data set in a repeating manner from a reshaped data

I have reshaped the data to long. It has been sorted in ascending order based on one column (as x2 in the below reproducible example) and I want to keep the data in a repeating manner rather than factored. Here is a sample:
set.seed(234)
data<-data.frame(x1=c(1:12),x2=rep(1:3,each=4),x3=runif(12,min=0,max=12))
And I want the format something like this:
x1 x2 x3
1 1 1 6.115445
2 2 2 5.157014
3 3 3 4.793458
4 4 1 9.998710
5 5 2 2.620250
6 6 3 1.825839
7 7 1 5.842854
8 8 2 5.616670
9 9 3 6.511315
10 10 1 9.164444
11 11 2 8.401418
Can you please help me with either what to include in the melt function while converting the data to long format or any other function I should use in rearranging that data.
note:
The above result is to show the desired format, not the exact solution for my data.
EDIT:
Here is head() of my real data:
Date stn Elev Amount
1 2010-01-01 11 0 268.945
2 2010-01-01 11 0 268.396
3 2010-01-01 11 0 267.512
4 2010-01-01 11 0 266.488
5 2010-01-01 11 0 265.558
6 2010-01-01 11 0 265.178
In the actual data, the column Elev contains values like, c("0","100","250","500"...). So you assume that 0 is equivalent to 1 in x2 of the above sample, and so forth for 100, 250....
One method is to use ave as follows:
data[order(ave(data$x3, data$x2, FUN=function(i) 1:length(i)), data$x2),]
x1 x2 x3
1 1 1 8.9474400
5 5 2 0.8029211
9 9 3 11.1328381
2 2 1 9.3805491
6 6 2 7.7375415
10 10 3 3.4107614
3 3 1 0.2404454
7 7 2 11.1526315
11 11 3 6.6686992
4 4 1 9.3130246
8 8 2 8.6117063
12 12 3 6.5724198
In this instance, ave calculates a running count by data$x2, which is then used to sort the data with the order function.
You can also renumber x1 if desired: data$x1 <- 1:nrow(data), which would return your desired result.

r - aggregate / substract two variables, rows

I'm using the aggregate function for calculating the difference for every observation of two variables,so somehow like this (and the I want to save the result as a new variable) :
data1
Group Points_Attempt1 Points_Attempt2
1 1 10 5
2 1 34 23
3 1 50 5
4 1 10 12
5 2 11 21
6 2 23 23
7 2 32 10
8 2 12 10
I'm able to do something like this:
aggregate(data1[c("Points_Attempt1","Points_Attempt2")],list(data1$group),diff)
But I want it for every single observations and I just do not now to select the observations, so somehow the row numbers (here from 1-8).
So I'm searching for the following fourth column (Difference), which I then would like to safe as a new variable:
Group Points_Attempt1 Points_Attempt2 Difference
1 1 10 5 5
2 1 34 23 11
3 1 50 5 45
4 1 10 12 -2
5 2 11 21 -10
6 2 23 23 0
7 2 32 10 22
8 2 12 10 2
I would be highly thankful, if someone could help me with this.
We can use mutate_each
library(dplyr)
data1 %>%
group_by(Group) %>%
mutate_each(funs(c(NA, diff(.))), 2:3)
Or if we need to subtract between the variables,
data1 %>%
mutate(Difference = Points_Attemp1 - Points_Attemp2)

Building a contingency table

I have a data like this:
A B
1 10
1 20
1 30
2 10
2 30
2 40
3 20
3 10
3 30
4 20
4 10
5 10
5 10
and I want to build a contingency table like this:
10 20 30 40
10 1 3 2 0
20 3 0 2 0
30 2 2 0 0
40 0 0 0 0
Meaning: According to column A, for each two values of column B mark + 1 in the specific Contingency table.
Can you help me do this?
Here is a very ugly answer, using the data from the image, because I already spent too much time on your problem. In general, it's not practical to have your result depend on the order of variables.
A <- rep(c(1:4),c(3,2,3,3))
B <- c(10,10,30,10,20,30,20,10,10,20,30)
data <- data.frame(cbind(A,B))
#split by A
library(plyr)
data2 <- ddply(data,.(A),function(x){
combined_pairs <- cbind(x$B[-nrow(x)],
x$B[-1])
#return data where first is always lowest
smallest <- apply(combined_pairs,MARGIN=1,
FUN=min)
largest <- apply(combined_pairs,MARGIN=1,
FUN=max)
return(data.frame(small=smallest,large=largest))
})
library(reshape2)
result <- dcast(small~large,data=data2,
fun.aggregate=length)
> result
small 10 20 30
1 10 1 3 1
2 20 0 0 2
I think you can add the empty rows yourself if you still need them.

Compute difference between rows in R and setting in zero first difference

Hi everybody I am trying to solve a little problem in R. I want to compute the difference between rows in a dataframe in R. My dataframe looks like this:
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
I want to create a new column named diff.var. This column saves the results of differences from rows in variable. One posibble solution is using diff() function. When I used this function I got this:
diff(df$x4)
[1] 2 6 -10 1 0 8 3
That works fine but when I try to apply in my dataframe using df$diff.var=diff(df$x4) I got this:
Error in `$<-.data.frame`(`*tmp*`, "diff.var", value = c(2, 6, -10, 1, :
replacement has 7 rows, data has 8
Due to the fact that the firs row doesn't have a previous row to compute the difference I want to set this in zero. I would like to get something this:
ID x2 x3 x4 diff.var
1 8 11 2 0
2 7 12 4 2
3 6 13 10 6
4 5 14 0 -10
5 4 15 1 1
6 3 16 1 0
7 2 17 9 8
8 1 18 12 3
Where the first element of diff.var is zero due to this element doesn't have a previous element. I would like to build a function to set firts element of diff.var is zero and that makes the differences for the next rows. I wish to create a new dataframe with all variables and diff.var because ID is used por posterior analysis with diff.var. diff() doesn't allow to create this new variable. Thanks for your help.
This question was already asked before in this forum and can be found elsewhere. Anyway, do what Frank suggests
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
df$vardiff <- c(0, diff(df$x4))
df
ID x2 x3 x4 vardiff
1 1 8 11 2 0
2 2 7 12 4 2
3 3 6 13 10 6
4 4 5 14 0 -10
5 5 4 15 1 1
6 6 3 16 1 0
7 7 2 17 9 8
8 8 1 18 12 3

In R, how do I set a value for a variable based on the change from the prior (or following) row?

Given a data frame as follows:
id<-c(1,1,1,1,1,1,2,2,2,2,2,2)
t<-c(6,8,9,11,12,14,55,57,58,60,62,63)
p<-c("a","a","a","b","b","b","a","a","b","b","b","b")
df<-data.frame(id,t,p)
row id t p
1 1 6 a
2 1 8 a
3 1 9 a
4 1 11 b
5 1 12 b
6 1 14 b
7 2 55 a
8 2 57 a
9 2 58 b
10 2 60 b
11 2 62 b
12 2 63 b
I want to create a new variable 'ta' such that the value of ta is:
Zero for the row in which 'p' changes from a to b for a given ID (rows 4 and 9) (this I can do)
Within each unique id, when p is 'a', the value of ta should count down from zero by the change in t between the row in question and the row above it. For example, for row 3, the value of ta should be 0 - (11-9) = -2.
Within each unique id, when p is 'b', the value of ta should count up from zero by the change in t between the row in question and the row below it. For example, for row 5, the value of ta should be 0 + (12-11) = 1.
Thus, when complete, the data frame should look as follows:
row id t p ta
1 1 6 a -5
2 1 8 a -3
3 1 9 a -2
4 1 11 b 0
5 1 12 b 1
6 1 14 b 3
7 2 55 a -3
8 2 57 a -1
9 2 58 b 0
10 2 60 b 2
11 2 62 b 4
12 2 63 b 5
I've been playing around with loops and cumsum() and head() and tail() and can't quite make this kind of within id/within condition summing work. There are a number of other questions about working with values from previous or following rows, but I can't quite reshape any of those techniques to work here. Your thoughts are greatly appreciated.
Here you go. This is a split-apply-combine strategy of breaking everything up by id, establishing the transition point between p=='a' and p=='b' and then subtracting values above and below that. It only works if your data are actually ordered in the way you show them here.
do.call('rbind',
lapply(split(df, id), function(x) {
# save values of `0` at transition points in `p`
x <- cbind.data.frame(x, ta=ifelse(c(0,diff(as.numeric(as.factor(x$p))))==1, 0, NA))
# identify indices for those points
w <- which(x$ta==0)
# handle `ta` values for `p=='b'`
x$ta[(w+1):nrow(x)] <- x$ta[w] + (x$t[(w+1):nrow(x)] - x$t[w])
# handle `ta` values for `p=='a'`
x$ta[1:(w-1)] <- x$ta[w] - (x$t[w] - x$t[1:(w-1)])
return(x)
})
)
Result:
id t p ta
1.1 1 6 a -5
1.2 1 8 a -3
1.3 1 9 a -2
1.4 1 11 b 0
1.5 1 12 b 1
1.6 1 14 b 3
2.7 2 55 a -3
2.8 2 57 a -1
2.9 2 58 b 0
2.10 2 60 b 2
2.11 2 62 b 4
2.12 2 63 b 5

Resources