lag in apply statement doesn't work in R - r

I'm trying to "apply" a function that does "lag"s on zoo objects in R.
The function works correctly if I pass a single zoo vector - it applys the lag and everything works.
However, if I apply( data, 1, function ) then the lag doesn't work. There is no error, just the equivalent of a zero lag.
This is also the case with a simple apply( data, 1, lag ).
Can anyone explain why this should be the case? Is there anything I can do to make the lag to occur?

Here's some data:
> x <- zoo(matrix(1:12, 4, 3), as.Date("2003-01-01") + 0:3)
> x
2003-01-01 1 5 9
2003-01-02 2 6 10
2003-01-03 3 7 11
2003-01-04 4 8 12
If you want to lag this multivariate time series, just call lag (i.e. no need for apply):
> lag(x)
2003-01-01 2 6 10
2003-01-02 3 7 11
2003-01-03 4 8 12
If you want to apply a function across the rows, it needs to be sensible. For instance, to get mean of the row values:
> apply(x, 1, mean)
2003-01-01 2003-01-02 2003-01-03 2003-01-04
5 6 7 8
You can't apply a zoo object and get a zoo object back. The output of apply is "a vector or array or list of values". In the example above:
> class(apply(x, 1, mean))
[1] "numeric"
You need to recreate it as a zoo object and then lag it:
> lag(zoo(apply(coredata(x), 1, mean), index(x)))
2003-01-01 2003-01-02 2003-01-03
6 7 8
You need to be slightly careful of the direction of your output. But you can transpose it if necessary with the t() function. For instance:
> zoo(t(apply(coredata(x), 1, quantile)), index(x))
0% 25% 50% 75% 100%
2003-01-01 1 3 5 7 9
2003-01-02 2 4 6 8 10
2003-01-03 3 5 7 9 11
2003-01-04 4 6 8 10 12
You could also wrap this in a function. Alternatively you can use one of the apply functions in the xts time series library (this will retain the time series object in the process):
> x <- as.xts(x)
> apply.daily(x, mean)
[,1]
2003-01-01 5
2003-01-02 6
2003-01-03 7
2003-01-04 8

Why don't you try quantmod::Lag function for generating a matrix consisting of various lagged-series of a series, at different lag values? For example
> quantmod::Lag (1:10, k=c(0,5,2))
will return
Lag.0 Lag.5 Lag.2
[1,] 1 NA NA
[2,] 2 NA NA
[3,] 3 NA 1
[4,] 4 NA 2
[5,] 5 NA 3
[6,] 6 1 4
[7,] 7 2 5
[8,] 8 3 6
[9,] 9 4 7
[10,] 10 5 8

#Marek - lag(data) does do what I want, but I wanted to be able to use this as part of an "apply" construct to make the vector->matrix abstraction a little easier.

Related

What is causing this cryptic error message in bind_rows?

I have the below mentioned data frame which I am trying to bind with another data frame
X_df
1 2
1 18 NA
2 3 NA
3 6 NA
4 8 8
y_df
1 2
35 8
y_df is actually the sum of each column from x. I have been trying to bind these two dataframes using bind_rows. It shows me the following error. Can I get some advice on how I can rectify it. I am relatively new to R
*Error in (function (cond) :
error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': Can't combine `..1$1` <table> and `..2$1` <double>*
Thanks in advance
I tried to run using the same function bind_rows you've mentioned:
x_df
X1 X2
1 18 NA
2 3 NA
3 6 NA
4 8 8
y_df
X1 X2
1 35 8
x_df %>% bind_rows(y_df)
X1 X2
1 18 NA
2 3 NA
3 6 NA
4 8 8
5 35 8
Another approach:
x_df %>% bind_rows(x_df %>% summarise(across(everything(), ~ sum(., na.rm = TRUE))))
X1 X2
1 18 NA
2 3 NA
3 6 NA
4 8 8
5 35 8
I suggest not to use y_df for totals, but use something like janitor::adorn_totals() for actual calculation of the columns' totals.
library(janitor)
library(tidyverse)
X_df %>%
tibble::rowid_to_column() %>%
janitor::adorn_totals(where = "row")
# id X1 X2
# 1 18 NA
# 2 3 NA
# 3 6 NA
# 4 8 8
# Total 35 8
Try using rbind :
result <- rbind(X_df, y_df)

r - Extract subsequences with specific time increments

I have a data frame df. It has several columns, two of them are dates and serial_day, corresponding to the date an observation was taken and MATLAB's serial day. I would like to restrict my time series such that the increment (in days) between two consecutive observations is 3 or 4 and separate such blocks by a NA row.
It is known that consecutive daily observations never occur and the case of 2 day separation followed by 2 day separation is rare, so it can be ignored.
In the example, increment is shown for convenience, but it is easily generated using the diff function. So, if the data frame is
serial_day increment
1 4 NA
2 7 3
3 10 3
4 12 2
5 17 5
6 19 2
7 22 3
8 25 3
9 29 4
10 34 5
I would hope to get a new data frame as:
serial_day increment
1 4 NA
2 7 3
3 10 3
4 NA ## Entire row of NAs NA
5 19 NA
6 22 3
7 25 3
8 29 4
9 NA ## Entire row of NAs NA
I can't figure out a way to do this without looping, which is bad idea in R.
First you check in which rows the increment is not equal to 3 or 4. Then you'd replace these rows with a row of NAs:
inds <- which( df$increment > 4 | df$increment < 3 )
df[inds, ] <- rep(NA, ncol(df))
# serial_day increment
# 1 4 NA
# 2 7 3
# 3 10 3
# 4 NA NA
# 5 NA NA
# 6 NA NA
# 7 22 3
# 8 25 3
# 9 29 4
# 10 NA NA
This may result in multiple consecutive rows of NAs. In order to reduce these consecutive NA-rows to a single NA-row, you'd check where the NA-rows are located with which() and then see whether these locations are consecutive with diff() and remove these rows from df:
NArows <- which(rowSums(is.na(df)) == ncol(df)) # c(4, 5, 6, 10)
inds2 <- NArows[c(FALSE, diff(NArows) == 1)] # c(5, 6)
df <- df[-inds2, ]
# serial_day increment
# 1 4 NA
# 2 7 3
# 3 10 3
# 4 NA NA
# 7 22 3
# 8 25 3
# 9 29 4
# 10 NA NA

R Help - na.approx similar to SuperTrend in Excel

Raw Data na.approx desired result
1 1 1
NA 3 4
5 5 5
6 6 6
7 7 7
NA 8 4
NA 9 7
10 10 10
13 11 13
14 12 14
By default, i believe na.approx in R will interpolate NA between two known values; one before and another after NA (the result will be seen as column "na.approx" above). Is there a way I can change this function to interpolate based on next two known values? for eg, first NA to be interpolated using 5 and 6.... but not 1 and 5.
I am not sure if there is an exact equivalent to what you want to do, but you can achieve similar results the following way:
> data <- c(1, NA, 5,6,7,NA,NA,10,13,14)
> ind <- which(is.na(data))
> sapply(rev(ind), function(i) data[i] <<- data[i + 1] - 1)
> data
[1] 1 4 5 6 7 8 9 10 13 14

R Create Column as Running Average of Another Column

I want to create a column in R that is simply the average of all previous values of another column. For Example:
D
X
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
I would like D$Y to be the prior average of D$X that is, D$Y is the average of all previous observations of D$X. I know how to do this using a for loop moving through every row, but is there a more efficient manner?
I have a large dataset and hardware not up to that task!
You can generate cumulative means of a vector like this:
set.seed(123)
x<-sample(20)
x
## [1] 6 15 8 16 17 1 18 12 7 20 10 5 11 9 19 13 14 4 3 2
xmeans<-cumsum(x)/1:length(x)
xmeans
## [1] 6.000000 10.500000 9.666667 11.250000 12.400000 10.500000 11.571429
## [8] 11.625000 11.111111 12.000000 11.818182 11.250000 11.230769 11.071429
## [15] 11.600000 11.687500 11.823529 11.388889 10.947368 10.500000
So D$Y<-cumsum(D$X)/1:nrow(D) should work.

head and tail doesn't take negative number as argument for data.table?

Why head and tail work differently for data.table? Is it by design?
> head(data.frame(x=1:10), -2)
x
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
> head(data.table(x=1:10), -2)
Error in seq_len(min(n, nrow(x))) :
argument must be coercible to non-negative integer
> tail(data.table(x=1:10), -2)
x
1: NA
2: NA
3: NA
4: 10
> tail(data.frame(x=1:10), -2)
x
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
Yes, this was reported before, #2375. This is now fixed in v1.8.11. From NEWS:
head() and tail() handle negative 'n' values correctly now, #2375. Thanks to Garrett See for reporting. Also it results in an error when length(n) != 1. Tests added.

Resources