Using lapply to standardise the ts() function - r

I am new to R and writing functions. I've spent hours trying to figure this out and searching Google, but can't seem to find anything. Hopefully you can help? I want to use lapply() to analyze the data below using the ts() function.
My code looks like this:
library(dplyr)
#group out different sites
mylist <- data %>%
group_by(Site)
mylist
#Write ts() function
alpha_function = function(x) {
ts_alpha = ts(x$Temperature, frequency=12, start=c(0017, 7, 20))
return(data.frame(ts_alpha))
}
#Run list through lapply()
results = lapply(mylist, alpha_function())
But I get this error: argument "x" is missing with no default.
I have a data set that looks like:
Site(factor) Date(POSIXct) Temperature(num)
1 0017-03-04 2.73
2 0017-03-04 3.73
3 0017-03-04 2.71
4 0017-03-04 2.22
5 0017-03-04 2.89
etc.
I have over 3,000 temperature readings at different dates for 5 different sites.
Thanks in advance!

I'm not exactly an R guy, but I would wager this line:
results = lapply(mylist, alpha_function())
should be
results = lapply(mylist, alpha_function).
What you have calls the alpha function when you are trying to supply it to lapply, when what you really (most likely) want to do is provide a reference to the function without calling it. (The error you are getting indicates that alpha_function needs an x parameter when being called like alpha_function()).

A recommended approach when working with dplyr and the tidyverse is to keep things in data frames:
library(tidyverse)
library(zoo)
dat %>%
nest(-Site) %>%
mutate(data = map(data, ~ zoo(.x$Temperature, .x$Date)))
# # A tibble: 5 x 2
# Site data
# <fct> <list>
# 1 a <S3: zoo>
# 2 b <S3: zoo>
# 3 c <S3: zoo>
# 4 d <S3: zoo>
# 5 e <S3: zoo>
Or if we must have ts rather than zoo objects, we can use as.ts(zoo(...)).
In case we still prefer regular lists, we can use base split() and lapply():
dat %>%
split(.$Site) %>%
lapply(function(.x) zoo(.x$Temperature, .x$Date))
# List of 5
# $ a:‘zoo’ series from 2017-03-04 12:00:00 to 2017-05-06 00:30:00
# Data: num [1:3000] 5.37 5.49 5.32 5.44 5.43 ...
# Index: POSIXct[1:3000], format: "2017-03-04 12:00:00" ...
# $ b:‘zoo’ series from 2017-03-04 12:00:00 to 2017-05-06 00:30:00
# Data: num [1:3000] 5.36 5.22 5.15 5.41 5.41 ...
# Index: POSIXct[1:3000], format: "2017-03-04 12:00:00" ...
# $ c:‘zoo’ series from 2017-03-04 12:00:00 to 2017-05-06 00:30:00
# Data: num [1:3000] 6.08 6.11 6.22 6.13 6.03 ...
# Index: POSIXct[1:3000], format: "2017-03-04 12:00:00" ...
# $ d:‘zoo’ series from 2017-03-04 12:00:00 to 2017-05-06 00:30:00
# Data: num [1:3000] 5.06 4.96 5.23 5.16 5.29 ...
# Index: POSIXct[1:3000], format: "2017-03-04 12:00:00" ...
# $ e:‘zoo’ series from 2017-03-04 12:00:00 to 2017-05-06 00:30:00
# Data: num [1:3000] 5.1 5.08 5.14 5.13 5.22 ...
# Index: POSIXct[1:3000], format: "2017-03-04 12:00:00" ...
(where dat is generated as follows:
n_sites <- 5
n_dates <- 3000
set.seed(123) ; dat <- tibble(
Site = factor(rep(letters[1:n_sites], each = n_dates)),
Date = rep(seq.POSIXt(as.POSIXct("2017-03-04 12:00:00"), by = "30 min", length.out = n_dates), times = n_sites),
Temperature = as.vector(replicate(n_sites, runif(1, 5, 6) + cumsum(rnorm(n_dates, 0, 0.1))))
)

Related

R: Converting date/time format 0000-00-00T00:00:00Z to POSIXct

I have a dataset I exported as .csv from a dataframe in R. The dataset contains a date+time column, for which the format was set as POSIXct before exporting it as a .csv file. When I import the saved .csv file back into R, the format of the date+time column has changed into a character and looks as follows: 000-00-00T00:00:00Z (see below the head of part of my data).
ID Timestamp X_mean X_min X_max
1 2021-06-30T08:00:00Z -0.4313333 -5.914 4.129
2 2021-06-30T08:00:01Z -0.3817667 -5.641 4.082
3 2021-06-30T08:00:02Z -0.2770667 -0.484 -0.109
4 2021-06-30T08:00:03Z -0.3686667 -0.863 -0.148
5 2021-06-30T08:00:04Z -0.2696667 -0.41 -0.102
6 2021-06-30T08:00:05Z -0.4150000 -0.734 -0.145
I would really like to get the date+time back to a POSIXct variable if possible, so it looks like this again: YYYY-MM-DD HH:MM:SS, without the 'T' and the 'Z'.
I have tried to apply the solutions given here and here, but they do not seem to work or I am doing something wrong in changing the code to have it work for my dataframe. I only end up with NA's.
Is there a way I can change the format back to POSIXct?
We can use ymd_hms from lubridate
df1$Timestamp <- lubridate::ymd_hms(df1$Timestamp)
The anytime function of the anytime package was made for this: reliable and easy parsing of any (reasonable) time format whatever the input data type, with requiring a format. We can demo it:
Code
## re-create your data
dat <- read.table(text="ID Timestamp X_mean X_min X_max
1 2021-06-30T08:00:00Z -0.4313333 -5.914 4.129
2 2021-06-30T08:00:01Z -0.3817667 -5.641 4.082
3 2021-06-30T08:00:02Z -0.2770667 -0.484 -0.109
4 2021-06-30T08:00:03Z -0.3686667 -0.863 -0.148
5 2021-06-30T08:00:04Z -0.2696667 -0.41 -0.102
6 2021-06-30T08:00:05Z -0.4150000 -0.734 -0.145", header=TRUE)
## add a Datetime column
dat$datetime <- anytime::anytime(dat$Timestamp)
## look at dat
dat
Output
> dat <- read.table(text="ID Timestamp X_mean X_min X_max
+ 1 2021-06-30T08:00:00Z -0.4313333 -5.914 4.129
+ 2 2021-06-30T08:00:01Z -0.3817667 -5.641 4.082
+ 3 2021-06-30T08:00:02Z -0.2770667 -0.484 -0.109
+ 4 2021-06-30T08:00:03Z -0.3686667 -0.863 -0.148
+ 5 2021-06-30T08:00:04Z -0.2696667 -0.41 -0.102
+ 6 2021-06-30T08:00:05Z -0.4150000 -0.734 -0.145", header=TRUE)
> dat$datetime <- anytime::anytime(dat$Timestamp)
> dat
ID Timestamp X_mean X_min X_max datetime
1 1 2021-06-30T08:00:00Z -0.431333 -5.914 4.129 2021-06-30 08:00:00
2 2 2021-06-30T08:00:01Z -0.381767 -5.641 4.082 2021-06-30 08:00:01
3 3 2021-06-30T08:00:02Z -0.277067 -0.484 -0.109 2021-06-30 08:00:02
4 4 2021-06-30T08:00:03Z -0.368667 -0.863 -0.148 2021-06-30 08:00:03
5 5 2021-06-30T08:00:04Z -0.269667 -0.410 -0.102 2021-06-30 08:00:04
6 6 2021-06-30T08:00:05Z -0.415000 -0.734 -0.145 2021-06-30 08:00:05
>

How to append columns horizontally in R in a loop?

I want to create a matrix of stockdata from n number of companies from a ticker list, though im struggling with appending them horizontally, it only works to append them vertically.
Also other functions like merge or rbind which i have tried, but they cannot work with the variables parsed as a string, so the hard part here is that i want to append n variables which are retrieved from the tickerlist which has n number of stocks. Other suggestions are welcome to get the same result.
Stocklist data:
> dput(stockslist)
structure(list(V1 = c("AMD", "MSFT", "SBUX", "IBM", "AAPL", "GSPC",
"AMZN")), .Names = "V1", class = "data.frame", row.names = c(NA,
-7L))
code:
library(quantmod)
library(tseries)
library(plyr)
library(PortfolioAnalytics)
library(PerformanceAnalytics)
library(zoo)
library(plotly)
tickerlist <- "sp500.csv" #CSV containing tickers on rows
stockslist <- read.csv("sp500.csv", header = FALSE, stringsAsFactors = F)
nrstocks = length(stockslist[,1]) #The number of stocks to download
maxretryattempts <- 5 #If there is an error downloading a price how many
times to retry
startDate = as.Date("2010-01-13")
for (i in 1:nrstocks) {
stockdata <- getSymbols(c(stockslist[i,1]), src = "yahoo", from =
startDate)
# pick 6th column of the ith stock
write.table((eval(parse(text=paste(stockslist[i,1]))))[,6], file =
"test.csv", append = TRUE, row.names=F)
}
This is exactly a great opportunity to talk about lists of dataframes. Having said that ...
Side bar: I really don't like side-effects. getSymbols defaults to using side-effect to saving the data into the parent frame/environment, and though this may be fine for most uses, I prefer functional methods. Luckily, using auto.assign=FALSE returns its behavior to within my bounds of comfort.
library(quantmod)
stocklist <- c("AMD", "MSFT")
startDate <- as.Date("2010-01-13")
dat <- sapply(stocklist, getSymbols, src = "google", from = startDate, auto.assign = FALSE,
simplify = FALSE)
str(dat)
# List of 2
# $ AMD :An 'xts' object on 2010-01-13/2017-05-16 containing:
# Data: num [1:1846, 1:5] 8.71 9.18 9.13 8.84 8.98 9.01 8.55 8.01 8.03 8.03 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:5] "AMD.Open" "AMD.High" "AMD.Low" "AMD.Close" ...
# Indexed by objects of class: [Date] TZ: UTC
# xts Attributes:
# List of 2
# ..$ src : chr "google"
# ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"
# $ MSFT:An 'xts' object on 2010-01-13/2017-05-16 containing:
# Data: num [1:1847, 1:5] 30.3 30.3 31.1 30.8 30.8 ...
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:5] "MSFT.Open" "MSFT.High" "MSFT.Low" "MSFT.Close" ...
# Indexed by objects of class: [Date] TZ: UTC
# xts Attributes:
# List of 2
# ..$ src : chr "google"
# ..$ updated: POSIXct[1:1], format: "2017-05-16 21:01:37"
Though I only did two symbols, it should work for many more without problem. Also, I shifted to using Google since Yahoo was asking for authentication.
You used write.csv(...), realize that you will lose the timestamp for each datum, since the CSV will look something like:
"AMD.Open","AMD.High","AMD.Low","AMD.Close","AMD.Volume"
8.71,9.2,8.55,9.15,32741845
9.18,9.26,8.92,9,22658744
9.13,9.19,8.8,8.84,34344763
8.84,9.21,8.84,9.01,24875646
Using "AMD" as an example, consider:
write.csv(as.data.frame(AMD), file="AMD.csv", row.names = TRUE)
head(read.csv("~/Downloads/AMD.csv", row.names = 1))
# AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume
# 2010-01-13 8.71 9.20 8.55 9.15 32741845
# 2010-01-14 9.18 9.26 8.92 9.00 22658744
# 2010-01-15 9.13 9.19 8.80 8.84 34344763
# 2010-01-19 8.84 9.21 8.84 9.01 24875646
# 2010-01-20 8.98 9.00 8.76 8.87 22813520
# 2010-01-21 9.01 9.10 8.77 8.99 37888647
To save all of them at once:
ign <- mapply(function(x, fn) write.csv(as.data.frame(x), file = fn, row.names = TRUE),
dat, names(dat))
There are other ways to store your data such as Rdata files (save()).
It is not clear to me if you are intending to append them as additional columns (i.e., cbind behavior) or as rows (rbind). Between the two, I tend towards "rows", but I'll start with "columns" first.
"Appending" by column
This may be appropriate if you want day-by-day ticker comparisons (though there are arguably better ways to prepare for this). You'll run into problems, since they have (and most likely will have) different numbers of rows:
sapply(dat, nrow)
# AMD MSFT
# 1846 1847
In this case, you might want to join based on the dates (row names). To do this well, you should probably convert the row names (dates) to a column and merge on that column:
dat2 <- lapply(dat, function(x) {
x <- as.data.frame(x)
x$date <- rownames(x)
rownames(x) <- NULL
x
})
datwide <- Reduce(function(a, b) merge(a, b, by = "date", all = TRUE), dat2)
As a simple demonstration, remembering that there is one more row in "MSFT" than in "AMD", we can find that row and prove that things are still looking alright with:
which(! complete.cases(datwide))
# [1] 1251
datwide[1251 + -2:2,]
# date AMD.Open AMD.High AMD.Low AMD.Close AMD.Volume MSFT.Open MSFT.High MSFT.Low MSFT.Close MSFT.Volume
# 1249 2014-12-30 2.64 2.70 2.63 2.63 7783709 47.44 47.62 46.84 47.02 16384692
# 1250 2014-12-31 2.64 2.70 2.64 2.67 11177917 46.73 47.44 46.45 46.45 21552450
# 1251 2015-01-02 NA NA NA NA NA 46.66 47.42 46.54 46.76 27913852
# 1252 2015-01-05 2.67 2.70 2.64 2.66 8878176 46.37 46.73 46.25 46.32 39673865
# 1253 2015-01-06 2.65 2.66 2.55 2.63 13916645 46.38 46.75 45.54 45.65 36447854
"Appending" by row
getSymbols names the columns unique to the ticker, a slight frustration. Additionally, since we'll be discarding the column names, we should preserve the symbol name in the data.
dat3 <- lapply(dat, function(x) {
ticker <- gsub("\\..*", "", colnames(x)[1])
colnames(x) <- gsub(".*\\.", "", colnames(x))
x <- as.data.frame(x)
x$date <- rownames(x)
x$symbol <- ticker
rownames(x) <- NULL
x
}) # can also be accomplished with mapply(..., dat, names(dat))
datlong <- Reduce(function(a, b) rbind(a, b, make.row.names = FALSE), dat3)
head(datlong)
# Open High Low Close Volume date symbol
# 1 8.71 9.20 8.55 9.15 32741845 2010-01-13 AMD
# 2 9.18 9.26 8.92 9.00 22658744 2010-01-14 AMD
# 3 9.13 9.19 8.80 8.84 34344763 2010-01-15 AMD
# 4 8.84 9.21 8.84 9.01 24875646 2010-01-19 AMD
# 5 8.98 9.00 8.76 8.87 22813520 2010-01-20 AMD
# 6 9.01 9.10 8.77 8.99 37888647 2010-01-21 AMD
nrow(datlong)
# [1] 3693

How to get min and max values with the date of accurance using quantmod

I would like to compute the range and the dates of the maximum price and low price of of each variable that is stored in xts format and store the results in a data frame. The result that I look for is a data frame that will contain the variable name,max value date,max value, min value date,min value.
I've imported two stocks data by using quantmod package and wrote a function to compute the ranges (yet without the dates of maximum price and low price) but with no succees.
d<-getSymbols(c("ZTS","ZX") , src = 'yahoo', from = '2015-01-01', auto.assign = T)
d<-cbind(ZTS,ZX)
head(d)
ZTS.Open ZTS.High ZTS.Low ZTS.Close ZTS.Volume ZTS.Adjusted ZX.Open ZX.High ZX.Low ZX.Close
2015-01-02 43.46 43.70 43.07 43.31 1784200 43.07725 1.40 1.40 1.21 1.24
2015-01-05 43.25 43.63 42.97 43.05 3112100 42.81864 1.35 1.38 1.24 1.32
2015-01-06 43.15 43.36 42.30 42.63 3977200 42.40090 1.28 1.29 1.22 1.22
2015-01-07 43.00 43.56 42.98 43.51 2481800 43.27617 1.24 1.34 1.24 1.29
2015-01-08 44.75 44.87 44.00 44.18 3121300 43.94257 1.26 1.28 1.17 1.18
2015-01-09 44.06 44.44 43.68 44.25 2993200 44.01220 1.30 1.39 1.22 1.27
ZX.Volume ZX.Adjusted
2015-01-02 20400 1.24
2015-01-05 43200 1.32
2015-01-06 16700 1.22
2015-01-07 6200 1.29
2015-01-08 17200 1.18
2015-01-09 60200 1.27
s<- for (i in names(d[,-1])) function(x) {max(x);min(x) }
> s
NULL
str(d)
An ‘xts’ object on 2015-01-02/2015-08-13 containing:
Data: num [1:155, 1:12] 43.5 43.2 43.2 43 44.8 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:12] "ZTS.Open" "ZTS.High" "ZTS.Low" "ZTS.Close" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
List of 2
$ src : chr "yahoo"
$ updated: POSIXct[1:1], format: "2015-08-14 18:35:14"
x
I can't tell if you date column is the rownames or an actual column, but the code below should produce your result if the date column is called date in the dataframe.
do.call(rbind, apply(d[,-1], 2, function(col) {
max_ind <- which.max(col)
min_ind <- which.min(col)
list(max=col[max_ind], max_date=d$date[max_ind], min=col[min_ind],
min_date=d$date[min_ind])
}))
and if the date column is the rownames
do.call(rbind, apply(d, 2, function(col) {
max_ind <- which.max(col)
min_ind <- which.min(col)
list(max=col[max_ind], max_date=as.character(index(d))[max_ind], min=col[min_ind],
min_date=as.character(index(d))[min_ind])
}))
The code does an apply over columns finding the index of the maximum and minimum, then returns the values and dates corresponding to these.

How do I merge two rows in time seriesr? (in R) [duplicate]

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.
I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:
>x
"1987-01-01" 7.1 NA 3
"1987-01-02" 5.2 5 2
"1987-01-06" 2.3 NA 9
>y
"1987-01-01" 55.3 66 45
"1987-01-03" 77.3 87 34
# result I would like
"1987-01-01" 7.1 NA 3 55.3 66 45
"1987-01-02" 5.2 5 2 NA NA NA
"1987-01-03" NA NA NA 77.3 87 34
"1987-01-06" 2.3 NA 9 NA NA NA
What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.
I've sat for hours getting almost nowhere, so all help is appreciated.
EDIT: some code included below as per suggestion from Soumendra
atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)
# only number of observations and the observations themselves differ
>str(atcoa)
'data.frame': 6151 obs. of 8 variables:
$ date :Class 'Date' num [1:6151] 6210 6213 6215 6216 6217 ...
$ max : num 4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
$ min : num 4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
$ close : num 4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
$ avg : num NA NA NA NA NA NA NA NA NA NA ...
$ tot.vol : int 877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
$ turnover : num 3762300 388900 3835900 8015900 12468100 ...
$ transactions: int 12 9 24 17 31 26 34 35 37 33 ...
>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02 4.31 4.28 4.31 NA 877733 3762300 12
# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")
>str(ts.atcoa)
Time Series:
Name: object
Data Matrix:
Dimension: 6151 7
Column Names: a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
Row Names: 1970-01-01 01:43:30 ... 1970-01-01 04:12:35
Positions:
Start: 1970-01-01 01:43:30
End: 1970-01-01 04:12:35
With:
Format: %Y-%m-%d %H:%M:%S
FinCenter: GMT
Units: a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
Title: Time Series Object
Documentation: Wed Aug 17 13:00:50 2011
>ts.atcoa[1:1, ]
GMT
a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1970-01-01 01:43:30 4.31 4.28 4.31 NA 877733 3762300 12
# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob) # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25 7.92 7.77 7.79 NA 269172 2119400 19 7.69 7.56 7.64 NA 81176693 593858000 12
EDIT: problem solved by (using zoo package)
atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))
names(atcoa) <- c("a.max", "a.min", "a.close",
"a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
"b.avg", "b.tot.vol", "b.turnover", "b.transactions")
atco <- merge.zoo(atcoa, atcob)
Thank you all for your help.
Try this:
Lines.x <- '"1987-01-01" 7.1 NA 3
"1987-01-02" 5.2 5 2
"1987-01-06" 2.3 NA 9'
Lines.y <- '"1987-01-01" 55.3 66 45
"1987-01-03" 77.3 87 34'
library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x, y)
giving:
V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01 7.1 NA 3 55.3 66 45
1987-01-02 5.2 5 2 NA NA NA
1987-01-03 NA NA NA 77.3 87 34
1987-01-06 2.3 NA 9 NA NA NA
You can create a timeSeries (timeSeries library) object from your dates, merge them (timeSeries default merge behaviour is different from zoo and xts and does exactly what you are asking for) and then make zoo/xts objects out of the result in case you don't want to stay with timeSeries.
One quick way to test is the following, assuming you have two zoo objects zz1 and zz2 -
library(timeSeries)
as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))
Compare the output of the above command with
merge(zz1, zz2)
You can also cbind -
cbind(zz1, zz2)
provided there are no shared columns with same names. Even if such column are there, you can choose the columns by which you cbind, and you will get a zoo object.
cbind(zz1[, 1:2], zz2[, 2:3]) #Assuming other columns are common
here, i found a more generic aproach from stat.ethz.ch
a <- ts(1:10, start=c(2014,6), frequency=12)
b <- ts(1:12, start=c(2015,1), frequency=12)
library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))
to get a ts object back:
as.ts(m)
How about this:
## Generate unique sorted time values.
i <- sort(unique(c(index(x), index(y))))
## Empty data matrix.
v <- matrix(nrow=length(i), ncol=6, NA)
## Pull in data items.
v[match(index(x), i), 1:3] <- coredata(x)
v[match(index(y), i), 4:6] <- coredata(y)
## Build new zoo object.
d <- zoo(v, order.by=i)

R: merge two irregular time series

I have two multivariate time series x and y, both covering approximately the same range in time (one starts two years before the other, but they end on the same date). Both series have missing observations in the form of empty columns next to the date column, and also in the sense that one of the series has several dates that are not found in the other, and vice versa.
I would like to create a data frame (or similar) with a column that lists all the dates found in x OR y, without duplicate dates. For each date (row), I would like to horizontally stack the observations from x next to the observations from y, with NA's filling the missing cells. Example:
>x
"1987-01-01" 7.1 NA 3
"1987-01-02" 5.2 5 2
"1987-01-06" 2.3 NA 9
>y
"1987-01-01" 55.3 66 45
"1987-01-03" 77.3 87 34
# result I would like
"1987-01-01" 7.1 NA 3 55.3 66 45
"1987-01-02" 5.2 5 2 NA NA NA
"1987-01-03" NA NA NA 77.3 87 34
"1987-01-06" 2.3 NA 9 NA NA NA
What I have tried: with the zoo package, I've tried the merge.zoo method, but this seems to just stack the two series next to each other, with the dates (as numbers, e.g. "1987-01-02" shown as 6210) from each series appearing in two separate columns.
I've sat for hours getting almost nowhere, so all help is appreciated.
EDIT: some code included below as per suggestion from Soumendra
atcoa <- read.csv(file = "ATCOA_full_adj.csv", header = TRUE)
atcob <- read.csv(file = "ATCOB_full_adj.csv", header = TRUE)
atcoa$date <- as.Date(atcoa$date)
atcob$date <- as.Date(atcob$date)
# only number of observations and the observations themselves differ
>str(atcoa)
'data.frame': 6151 obs. of 8 variables:
$ date :Class 'Date' num [1:6151] 6210 6213 6215 6216 6217 ...
$ max : num 4.31 4.33 4.38 4.18 4.13 4.05 4.08 4.05 4.08 4.1 ...
$ min : num 4.28 4.31 4.28 4.13 4.05 3.95 3.97 3.95 4 4.02 ...
$ close : num 4.31 4.33 4.31 4.15 4.1 3.97 4 3.97 4.08 4.02 ...
$ avg : num NA NA NA NA NA NA NA NA NA NA ...
$ tot.vol : int 877733 89724 889437 1927113 3050611 846525 1782774 1497998 2504466 5636999 ...
$ turnover : num 3762300 388900 3835900 8015900 12468100 ...
$ transactions: int 12 9 24 17 31 26 34 35 37 33 ...
>atcoa[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1 1987-01-02 4.31 4.28 4.31 NA 877733 3762300 12
# using timeSeries package
ts.atcoa <- timeSeries::as.timeSeries(atcoa, format = "%Y-%m-%d")
ts.atcob <- timeSeries::as.timeSeries(atcob, format = "%Y-%m-%d")
>str(ts.atcoa)
Time Series:
Name: object
Data Matrix:
Dimension: 6151 7
Column Names: a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
Row Names: 1970-01-01 01:43:30 ... 1970-01-01 04:12:35
Positions:
Start: 1970-01-01 01:43:30
End: 1970-01-01 04:12:35
With:
Format: %Y-%m-%d %H:%M:%S
FinCenter: GMT
Units: a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
Title: Time Series Object
Documentation: Wed Aug 17 13:00:50 2011
>ts.atcoa[1:1, ]
GMT
a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions
1970-01-01 01:43:30 4.31 4.28 4.31 NA 877733 3762300 12
# The following will create an object of class "data frame" and mode "list", which contains observations for the days mutual for the two series
>ts.atco <- timeSeries::merge(atcoa, atcob) # produces same result as base::merge, apparently
>ts.atco[1:1, ]
date a.max a.min a.close a.avg a.tot.vol a.turnover a.transactions b.max b.min b.close b.avg b.tot.vol b.turnover b.transactions
1 1989-08-25 7.92 7.77 7.79 NA 269172 2119400 19 7.69 7.56 7.64 NA 81176693 593858000 12
EDIT: problem solved by (using zoo package)
atcoa <- read.zoo(read.csv(file = "ATCOA_full_adj.csv", header = TRUE))
atcob <- read.zoo(read.csv(file = "ATCOB_full_adj.csv", header = TRUE))
names(atcoa) <- c("a.max", "a.min", "a.close",
"a.avg", "a.tot.vol", "a.turnover", "a.transactions")
names(atcob) <- c("b.max", "b.min", "b.close",
"b.avg", "b.tot.vol", "b.turnover", "b.transactions")
atco <- merge.zoo(atcoa, atcob)
Thank you all for your help.
Try this:
Lines.x <- '"1987-01-01" 7.1 NA 3
"1987-01-02" 5.2 5 2
"1987-01-06" 2.3 NA 9'
Lines.y <- '"1987-01-01" 55.3 66 45
"1987-01-03" 77.3 87 34'
library(zoo)
# in reality x might be in a file and might be read via: x <- read.zoo("x.dat")
# ditto for y. See ?read.zoo and the zoo-read vignette if you need other args too
x <- read.zoo(text = Lines.x)
y <- read.zoo(text = Lines.y)
merge(x, y)
giving:
V2.x V3.x V4.x V2.y V3.y V4.y
1987-01-01 7.1 NA 3 55.3 66 45
1987-01-02 5.2 5 2 NA NA NA
1987-01-03 NA NA NA 77.3 87 34
1987-01-06 2.3 NA 9 NA NA NA
You can create a timeSeries (timeSeries library) object from your dates, merge them (timeSeries default merge behaviour is different from zoo and xts and does exactly what you are asking for) and then make zoo/xts objects out of the result in case you don't want to stay with timeSeries.
One quick way to test is the following, assuming you have two zoo objects zz1 and zz2 -
library(timeSeries)
as.zoo(merge(as.timeSeries(zz1), as.timeSeries(zz2)))
Compare the output of the above command with
merge(zz1, zz2)
You can also cbind -
cbind(zz1, zz2)
provided there are no shared columns with same names. Even if such column are there, you can choose the columns by which you cbind, and you will get a zoo object.
cbind(zz1[, 1:2], zz2[, 2:3]) #Assuming other columns are common
here, i found a more generic aproach from stat.ethz.ch
a <- ts(1:10, start=c(2014,6), frequency=12)
b <- ts(1:12, start=c(2015,1), frequency=12)
library(zoo)
m <- merge(a = as.zoo(a), b = as.zoo(b))
to get a ts object back:
as.ts(m)
How about this:
## Generate unique sorted time values.
i <- sort(unique(c(index(x), index(y))))
## Empty data matrix.
v <- matrix(nrow=length(i), ncol=6, NA)
## Pull in data items.
v[match(index(x), i), 1:3] <- coredata(x)
v[match(index(y), i), 4:6] <- coredata(y)
## Build new zoo object.
d <- zoo(v, order.by=i)

Resources