How to use rownames_to_column with dates - r

I am trying to convert my yahoo price downloads to a "tidy" format, but in the reprex below, the dates lose their format and are converted to rownumbers. Stated differently, how do I convert from xts to tibble and preserve the dates?
prices <- getSymbols("QQQ", adjustOHLC = TRUE, auto.assign = FALSE) %>%
as_tibble() %>%
rownames_to_column(var = "Date")
head(prices)

To keep it in all in a single tidyverse pipe, simply convert to a data frame first:
library(quantmod)
library(tibble)
getSymbols("QQQ", adjustOHLC = TRUE, auto.assign = FALSE) %>%
as.data.frame() %>%
rownames_to_column(var = "Date") %>%
as_tibble()
#> # A tibble: 3,419 x 7
#> Date QQQ.Open QQQ.High QQQ.Low QQQ.Close QQQ.Volume QQQ.Adjusted
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 2007-01-03 43.5 44.1 42.5 43.2 167689500 38.3
#> 2 2007-01-04 43.3 44.2 43.2 44.1 136853500 39.1
#> 3 2007-01-05 44.0 44.0 43.5 43.8 138958800 38.9
#> 4 2007-01-08 43.9 44.1 43.6 43.9 106401600 38.9
#> 5 2007-01-09 44.0 44.3 43.6 44.1 121577500 39.1
#> 6 2007-01-10 44.0 44.7 43.8 44.6 121070100 39.6
#> 7 2007-01-11 44.7 45.2 44.7 45.1 174029800 40.0
#> 8 2007-01-12 45.0 45.3 45.0 45.3 104217300 40.2
#> 9 2007-01-16 45.3 45.4 45.1 45.3 95690500 40.1
#> 10 2007-01-17 45.1 45.3 44.8 44.9 127142600 39.8
#> # ... with 3,409 more rows
Created on 2020-08-02 by the reprex package (v0.3.0)

I think you should use index() on the .xts rather than rownames_to_column() on the tibble
library(quantmod)
library(dplyr)
price.xts <-getSymbols("QQQ", adjustOHLC = TRUE, auto.assign = FALSE)
price<-as_tibble(price.xts)
price$Date <-index(price.xts)
head(price)
tail(price)

Related

Forecasting irregular stock data with ARIMA and tsibble

I want to forecast a certain stock using ARIMA in a similar way that R. Hyndman does it in FPP3.
The first issue that I've run into is that stock data is obviously irregular, since the stock exchange is closed during weekends and some holidays. This creates some issues if I want to use functions from the tidyverts packages:
> stock
# A tsibble: 750 x 6 [1D]
Date Open High Low Close Volume
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-05-21 36.3 36.4 36.3 36.4 232
2 2019-05-22 36.4 37.0 36.4 36.8 1007
3 2019-05-23 36.7 36.8 36.1 36.1 4298
4 2019-05-24 36.4 36.5 36.4 36.4 452
5 2019-05-27 36.5 36.5 36.3 36.4 2032
6 2019-05-28 36.5 36.8 36.4 36.5 3049
7 2019-05-29 36.2 36.5 36.1 36.5 2962
8 2019-05-30 36.8 37.1 36.8 37.1 432
9 2019-05-31 36.8 37.4 36.8 37.4 8424
10 2019-06-03 37.3 37.5 37.2 37.3 1550
# ... with 740 more rows
> stock %>%
+ feasts::ACF(difference(Close)) %>%
+ autoplot()
Error in `check_gaps()`:
! .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.
The same error regarding gaps in time applies to other functions like fable::ARIMA() or feasts::gg_tsdisplay().
I have tried filling the gaps with values from previous rows:
stock %>%
group_by_key() %>%
fill_gaps() %>%
tidyr::fill(Close, .direction = "down")
# A tsibble: 1,096 x 6 [1D]
Date Open High Low Close Volume
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2019-05-21 36.3 36.4 36.3 36.4 232
2 2019-05-22 36.4 37.0 36.4 36.8 1007
3 2019-05-23 36.7 36.8 36.1 36.1 4298
4 2019-05-24 36.4 36.5 36.4 36.4 452
5 2019-05-25 NA NA NA 36.4 NA
6 2019-05-26 NA NA NA 36.4 NA
7 2019-05-27 36.5 36.5 36.3 36.4 2032
8 2019-05-28 36.5 36.8 36.4 36.5 3049
9 2019-05-29 36.2 36.5 36.1 36.5 2962
10 2019-05-30 36.8 37.1 36.8 37.1 432
# ... with 1,086 more rows
and everything works as it should from there. My question is:
Is there a way to use the "tidyverts approach" without running into the issue regarding gaps in time?
If not, is filling the gaps with values from previous rows a correct way to overcome this or will it bias the model?
First, you're clearly using an old version of the feasts package, because the current version gives a warning rather than an error when computing the ACF from data with implicit gaps.
Second, the answer depends on what analysis you want to do. You have three choices:
use day as the time index and fill the gaps with NAs;
use day as the time index and fill the gaps with the previous closing stock prices;
use trading day as the time index, in which case there are no gaps.
Here are the results for each of them, using an example of Apple stock over the period 2014-2018.
library(fpp3)
#> ── Attaching packages ─────────────────────────────────────── fpp3 0.4.0.9000 ──
#> ✔ tibble 3.1.7 ✔ tsibble 1.1.1
#> ✔ dplyr 1.0.9 ✔ tsibbledata 0.4.0
#> ✔ tidyr 1.2.0 ✔ feasts 0.2.2
#> ✔ lubridate 1.8.0 ✔ fable 0.3.1
#> ✔ ggplot2 3.3.6 ✔ fabletools 0.3.2
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date() masks base::date()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval() masks lubridate::interval()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ tsibble::setdiff() masks base::setdiff()
#> ✖ tsibble::union() masks base::union()
1. Fill non-trading days with missing values
stock <- gafa_stock %>%
filter(Symbol == "AAPL") %>%
tsibble(index = Date, regular = TRUE) %>%
fill_gaps()
stock
#> # A tsibble: 1,825 x 8 [1D]
#> Symbol Date Open High Low Close Adj_Close Volume
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
#> 3 <NA> 2014-01-04 NA NA NA NA NA NA
#> 4 <NA> 2014-01-05 NA NA NA NA NA NA
#> 5 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
#> 6 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
#> 7 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
#> 8 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
#> 9 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
#> 10 <NA> 2014-01-11 NA NA NA NA NA NA
#> # … with 1,815 more rows
stock %>%
model(ARIMA(Close ~ pdq(d=1)))
#> A mable: 1 x 1
#> `ARIMA(Close ~ pdq(d = 1))`
#> <model>
#> 1 <ARIMA(0,1,0)>
In this case, calculations of the ACF will find the longest contiguous part which is too small to be meaningful, so there isn't any point showing the results of ACF() or gg_tsdisplay(). Also, the automated choice of differencing in the ARIMA model fails due to the missing values, so I have manually set it to one. The other parts of the ARIMA model work fine in the presence of missing values.
2. Fill non-trading days with the last observed values
stock <- stock %>%
tidyr::fill(Close, .direction = "down")
stock
#> # A tsibble: 1,825 x 8 [1D]
#> Symbol Date Open High Low Close Adj_Close Volume
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
#> 3 <NA> 2014-01-04 NA NA NA 77.3 NA NA
#> 4 <NA> 2014-01-05 NA NA NA 77.3 NA NA
#> 5 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
#> 6 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
#> 7 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
#> 8 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
#> 9 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
#> 10 <NA> 2014-01-11 NA NA NA 76.1 NA NA
#> # … with 1,815 more rows
stock %>%
ACF(difference(Close)) %>%
autoplot()
stock %>%
model(ARIMA(Close))
#> # A mable: 1 x 1
#> `ARIMA(Close)`
#> <model>
#> 1 <ARIMA(0,1,0)>
stock %>%
gg_tsdisplay(Close)
3. Re-index by trading day
stock <- gafa_stock %>%
filter(Symbol == "AAPL") %>%
tsibble(index = Date, regular = TRUE) %>%
mutate(trading_day = row_number()) %>%
tsibble(index = trading_day)
stock
#> # A tsibble: 1,258 x 9 [1]
#> Symbol Date Open High Low Close Adj_Close Volume trading_day
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200 1
#> 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900 2
#> 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700 3
#> 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300 4
#> 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400 5
#> 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200 6
#> 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000 7
#> 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200 8
#> 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400 9
#> 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700 10
#> # … with 1,248 more rows
stock %>%
ACF(difference(Close)) %>%
autoplot()
stock %>%
model(ARIMA(Close))
#> # A mable: 1 x 1
#> `ARIMA(Close)`
#> <model>
#> 1 <ARIMA(2,1,3)>
stock %>%
gg_tsdisplay(Close)
Created on 2022-05-22 by the reprex package (v2.0.1)

Add new variable with arithmetic conditions

the randomly generated data frame contains ID, Dates, and Earnings. I changed up the data frame format so that each column represents a date and its values corresponds to the earnings.
I want to create a new variable named "Date_over100 " that would determine the date when one's cumulative earnings have exceeded 100. I have put below a reproducible code that would generate the data frame. I assume conditional statements or loops would be used to achieve this. I would appreciate all the help there is. Thanks in advance!
ID <- c(1:10)
Date <- sample(seq(as.Date('2021/01/01'), as.Date('2021/01/11'), by="day", replace=T), 10)
Earning <- round(runif(10,30,50),digits = 2)
df <- data.frame(ID,Date,Earning,check.names = F)
df1 <- df%>%
arrange(Date)%>%
pivot_wider(names_from = Date, values_from = Earning)
df1 <- as.data.frame(df1)
df1[is.na(df1)] <- round(runif(sum(is.na(df1)),min=30,max=50),digits = 2)
I go back to long format for the calculation, then join to the wide data:
library(dplyr)
library(tidyr)
df1 %>% pivot_longer(cols = -ID, names_to = "date") %>%
group_by(ID) %>%
summarize(Date_over_100 = Date[which.max(cumsum(value) > 100)]) %>%
right_join(df1, by = "ID")
# # A tibble: 10 × 12
# ID Date_over_100 `2021-01-04` `2021-01-01` `2021-01-08` `2021-01-11` `2021-01-02` `2021-01-09`
# <int> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2021-01-08 45.0 46.2 40.1 47.4 47.5 48.8
# 2 2 2021-01-08 36.7 30.3 36.2 47.5 41.4 41.7
# 3 3 2021-01-08 49.5 46.0 45.0 43.9 45.4 37.1
# 4 4 2021-01-08 31.0 48.7 47.3 40.4 40.8 35.5
# 5 5 2021-01-08 48.2 35.2 32.1 44.2 35.4 49.7
# 6 6 2021-01-08 40.8 37.6 31.8 40.3 38.3 42.5
# 7 7 2021-01-08 37.9 42.9 36.8 46.0 39.8 33.6
# 8 8 2021-01-08 47.7 47.8 39.7 46.4 43.8 46.5
# 9 9 2021-01-08 32.9 42.0 41.8 32.8 33.9 35.5
# 10 10 2021-01-08 34.5 40.1 42.7 35.9 44.8 31.8
# # … with 4 more variables: 2021-01-10 <dbl>, 2021-01-03 <dbl>, 2021-01-07 <dbl>, 2021-01-05 <dbl>

Pivot Longer with Modification of Columns

I have data that is in the following format:
(data <- tribble(
~Date, ~ENRSxOPEN, ~ENRSxCLOSE, ~INFTxOPEN, ~INFTxCLOSE,
"1989-09-11",82.97,82.10,72.88,72.56,
"1989-09-12",83.84,83.96,73.52,72.51,
"1989-09-13",83.16,83.88,72.91,72.12))
# A tibble: 3 x 5
Date ENRSxOPEN ENRSxCLOSE INFTxOPEN INFTxCLOSE
<chr> <dbl> <dbl> <dbl> <dbl>
1 1989-09-11 83.0 82.1 72.9 72.6
2 1989-09-12 83.8 84.0 73.5 72.5
3 1989-09-13 83.2 83.9 72.9 72.1
For analysis, I want to pivot this tibble longer to the following format:
tribble(
~Ticker, ~Date, ~OPEN, ~CLOSE,
"ENRS","1989-09-11",82.97,82.10,
"ENRS","1989-09-12",83.84,83.96,
"ENRS","1989-09-13",83.16,83.88,
"INFT","1989-09-11",72.88,72.56,
"INFT","1989-09-12",73.52,72.51,
"INFT","1989-09-13",72.91,72.12)
# A tibble: 3 x 5
Date ENRSxOPEN ENRSxCLOSE INFTxOPEN INFTxCLOSE
<chr> <dbl> <dbl> <dbl> <dbl>
1 1989-09-11 83.0 82.1 72.9 72.6
2 1989-09-12 83.8 84.0 73.5 72.5
3 1989-09-13 83.2 83.9 72.9 72.1
I.e., I want to separate the Open/Close prices from the ticker, and put the latter as an entirely new column in the beginning.
I've tried to use the function pivot_longer:
pivot_longer(data, cols = ENRSxOPEN:INFTxCLOSE)
While this goes into the direction of what I wanna achieve, it does not separate the prices and keep them in one row for each Ticker.
Is there a way to add additional arguments to pivot_longer()to achieve that?
pivot_longer(data, -Date, names_to = c('Ticker', '.value'), names_sep = 'x')
# A tibble: 6 x 4
Date Ticker OPEN CLOSE
<dbl> <chr> <dbl> <dbl>
1 1969 ENRS 83.0 82.1
2 1969 INFT 72.9 72.6
3 1968 ENRS 83.8 84.0
4 1968 INFT 73.5 72.5
5 1967 ENRS 83.2 83.9
6 1967 INFT 72.9 72.1

How to get returns from a DF

So I'm using the quantmod library to calculate historical returns, but while I can get the past prices, how can I calculate the returns and add it on to the dataframe???
My code looks like this
tickers <- c('KO', 'AAPL')
getSymbols(tickers, from = '2020-07-01', to = '2021-07-01')
history <- cbind(KO$KO.Close,AAPL$AAPL.Close)
First I did a way to better import and structure data
Import
library(quantmod)
library(tidyverse)
tickers <- c('KO', 'AAPL')
df <-
map_df(
.x = tickers,
.f = function(x){
getSymbols(x, from = '2020-07-01', to = '2021-07-01',auto.assign = FALSE) %>%
as_tibble() %>%
set_names(c("open","high","low","close","volume","adjusted")) %>%
mutate(symbol = x)
}
)
# A tibble: 504 x 7
open high low close volume adjusted symbol
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 44.9 45.5 44.7 44.8 14316500 43.1 KO
2 45.3 45.4 44.8 44.9 15111900 43.2 KO
3 45.1 45.3 44.6 45.2 15146000 43.5 KO
4 45 45.5 44.8 45.2 13043600 43.5 KO
5 45.1 45.2 44.5 45.1 13851200 43.3 KO
6 45.0 45.0 43.8 43.9 16087100 42.2 KO
7 43.9 45.2 43.9 45.2 15627800 43.4 KO
8 45.5 45.7 45.0 45.2 16705300 43.5 KO
9 44.9 45.9 44.7 45.9 17080100 44.1 KO
10 46.3 47.2 46.2 46.4 23738000 44.6 KO
Return
I do not know if this is the right formula for return, but you can change later inside mutate
df %>%
group_by(symbol) %>%
mutate(return = 100*((open/lag(open))-1))
# A tibble: 504 x 8
# Groups: symbol [2]
open high low close volume adjusted symbol return
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 44.9 45.5 44.7 44.8 14316500 43.1 KO NA
2 45.3 45.4 44.8 44.9 15111900 43.2 KO 0.801
3 45.1 45.3 44.6 45.2 15146000 43.5 KO -0.331
4 45 45.5 44.8 45.2 13043600 43.5 KO -0.310
5 45.1 45.2 44.5 45.1 13851200 43.3 KO 0.311
6 45.0 45.0 43.8 43.9 16087100 42.2 KO -0.199
7 43.9 45.2 43.9 45.2 15627800 43.4 KO -2.60
8 45.5 45.7 45.0 45.2 16705300 43.5 KO 3.76
9 44.9 45.9 44.7 45.9 17080100 44.1 KO -1.36
10 46.3 47.2 46.2 46.4 23738000 44.6 KO 3.10
# ... with 494 more rows
Assuming the return you're looking for as today's value/yesterday's value, and using the tidyverse:
library(tidyverse)
library(timetk)
tickers <- c('KO', 'AAPL')
quantmod::getSymbols(tickers, from = '2020-07-01', to = '2021-07-01')
# Convert to a tibble to keep the dates
equity1 <- tk_tbl(KO) %>%
select(date = index, 5)
equity2 <- tk_tbl(AAPL) %>%
select(date = index, 5)
# Combine the series using a join, in case dates don't line up exactly.
history <- full_join(equity1, equity2, by = "date")
# Make data long, group by equity, do the calculation, turn back into wide data:
return <- history %>%
pivot_longer(-date) %>%
group_by(name) %>%
mutate(return = value/lag(value)-1) %>%
ungroup() %>%
pivot_wider(names_from = name, values_from = c(value, return))
# A tibble: 252 x 5
date value_KO.Close value_AAPL.Close return_KO.Close return_AAPL.Close
<date> <dbl> <dbl> <dbl> <dbl>
1 2020-07-01 44.8 91.0 NA NA
2 2020-07-02 44.9 91.0 0.00134 0
3 2020-07-06 45.2 93.5 0.00780 0.0268
4 2020-07-07 45.2 93.2 -0.000442 -0.00310
5 2020-07-08 45.1 95.3 -0.00310 0.0233
6 2020-07-09 43.9 95.8 -0.0257 0.00430
7 2020-07-10 45.2 95.9 0.0282 0.00175
8 2020-07-13 45.2 95.5 0.00221 -0.00461
9 2020-07-14 45.9 97.1 0.0137 0.0165
10 2020-07-15 46.4 97.7 0.0116 0.00688
# ... with 242 more rows

How to create several new group-based variables most efficiently?

Let's use the following example:
set.seed(2409)
N=5
T=10
id<- rep(LETTERS[1:N],each=T)
time<-rep(1:T, times=N)
var1<-runif(N*T,0,100)
var2<-runif(N*T,0,100)
var3<-runif(N*T,0,100)
var4<-runif(N*T,0,100)
var5<-runif(N*T,0,100)
df<-data.frame(id,time,var1,var2,var3,var4,var5); rm(N,T,id,time,var1,var2,var3,var4,var5)
I now try to execute a function for several of these variables (not the whole series of variables!) and create new variables accordingly.
I already have a suitable code for creating log variables. For this I would use the following code:
cols <- c("var1",
"var3",
"var5")
log <- log(df[cols])
colnames(log) <- paste(colnames(log), "log", sep = "_")
df <- cbind(df,log); rm(log, cols)
This would give me my additional log variables. But now I also want to create lagged and z-transformed variables. These functions refer to the individual IDs. So I wrote the following code that of course works, but is extremely long and inefficient in my real dataset where I apply the function to 38 variables each:
library(Hmisc)
library(dplyr)
df<-df %>%
group_by(id) %>%
mutate(var1_1=Lag(var1, shift=1),
var3_1=Lag(var3, shift=1),
var5_1=Lag(var5, shift=1),
var1_2=Lag(var1, shift=2),
var3_2=Lag(var3, shift=2),
var5_2=Lag(var5, shift=2),
var1_z=scale(var1),
var3_z=scale(var3),
var5_z=scale(var5)
)
I am very sure that there is also a way to make this more efficient. It would be desirable if I could define the original variable once and execute different functions and create new variables as a result.
Thank you very much!
You can use mutate_at with funs. This will apply the three functions in funs to each of the three variables in vars, creating 9 new columns.
library(dplyr)
df %>%
group_by(id) %>%
mutate_at(vars(var1, var3, var5),
funs(lag1 = lag(.), lag2 = lag(., 2), scale))
# # A tibble: 50 x 16
# # Groups: id [5]
# id time var1 var2 var3 var4 var5 var1_lag1 var3_lag1 var5_lag1
# <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 A 1 38.8 25.7 29.2 91.1 35.3 NA NA NA
# 2 A 2 87.1 22.3 8.27 31.5 93.7 38.8 29.2 35.3
# 3 A 3 61.7 38.8 0.887 63.0 50.4 87.1 8.27 93.7
# 4 A 4 0.692 60.1 71.5 74.0 41.6 61.7 0.887 50.4
# 5 A 5 60.1 13.3 90.4 80.6 47.5 0.692 71.5 41.6
# 6 A 6 46.4 3.67 36.7 86.9 67.5 60.1 90.4 47.5
# 7 A 7 80.4 72.1 82.2 25.5 70.3 46.4 36.7 67.5
# 8 A 8 48.8 25.7 93.4 19.8 81.2 80.4 82.2 70.3
# 9 A 9 48.2 31.5 82.1 47.2 49.2 48.8 93.4 81.2
# 10 A 10 21.8 32.6 76.5 19.7 41.1 48.2 82.1 49.2
# # ... with 40 more rows, and 6 more variables: var1_lag2 <dbl>, var3_lag2 <dbl>,
# # var5_lag2 <dbl>, var1_scale <dbl>, var3_scale <dbl>, var5_scale <dbl>
Here is an option with data.table
library(data.table)
nm1 <- c('var1', 'var3', 'var5')
nm2 <- paste0(nm1, rep(c('_lag1', '_lag2'), each = 3))
nm3 <- paste0(nm1, '_scale')
setDT(df)[, c(nm2, nm3) := c(shift(.SD, n = 1:2), lapply(.SD,
function(x) as.vector(scale(x)))), by = id, .SDcols = nm1]'

Resources