I have a netcdf file with a timeseries and the time variable has the following typical metadata:
double time(time) ;
time:standard_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 1979-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
Inside R I want to convert the time into an R date object. I achieve this at the moment in a hardwired way by reading the units attribute and splitting the string and using the third entry as my origin (thus assuming the spacing is "days" and the time is 00:00 etc):
require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])
This hardwired solution works for my specific example, but I was hoping that there might be a package in R that nicely handles the UNIDATA netcdf date conventions for time units and convert them safely to an R date object?
I have just discovered (two years after posting the question!) that there is a package called ncdf.tools which has the function:
convertDateNcdf2R
which
converts a time vector from a netCDF file or a vector of Julian days
(or seconds, minutes, hours) since a specified origin into a POSIXct R
vector.
Usage:
convertDateNcdf2R(time.source, units = "days", origin = as.POSIXct("1800-01-01",
tz = "UTC"), time.format = c("%Y-%m-%d", "%Y-%m-%d %H:%M:%S",
"%Y-%m-%d %H:%M", "%Y-%m-%d %Z %H:%M", "%Y-%m-%d %Z %H:%M:%S"))
Arguments:
time.source
numeric vector or netCDF connection: either a number of time units since origin or a netCDF file connection, In the latter case, the time vector is extracted from the netCDF file, This file, and especially the time variable, has to follow the CF netCDF conventions.
units
character string: units of the time source. If the source is a netCDF file, this value is ignored and is read from that file.
origin
POSIXct object: Origin or day/hour zero of the time source. If the source is a netCDF file, this value is ignored and is read from that file.
Thus it is enough to simply pass the netcdf connection as the first argument and the function handles the rest. Caveat: This will only work if the netCDF file follows CF conventions (e.g. if your units are "years since" instead of "seconds since" or "days since" it will fail for example).
More details on the function are available here:
https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html
There is not, that I know of. I have this handy function using lubridate, which is basically identical to yours.
getNcTime <- function(nc) {
require(lubridate)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
times <- ncvar_get(nc, timevar)
if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
timeatt <- ncatt_get(nc, timevar) #get attributes
timedef <- strsplit(timeatt$units, " ")[[1]]
timeunit <- timedef[1]
tz <- timedef[5]
timestart <- strsplit(timedef[4], ":")[[1]]
if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
timedef[4] <- "00:00:00"
}
if (! tz %in% OlsonNames()) {
cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
tz <- "UTC"
}
timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
seconds=seconds, second=seconds, sec=seconds,
minutes=minutes, minute=minutes, min=minutes,
hours=hours, hour=hours, h=hours,
days=days, day=days, d=days,
months=months, month=months, m=months,
years=years, year=years, yr=years,
NA
)
suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
timestart + f(times)
}
EDIT: One might also want to take a look at ncdf4.helpers::nc.get.time.series
EDIT2: note that the newly-proposed and currently in developement awesome stars package will handle dates automatically, see the first blog post for an example.
EDIT3: another way is to use the units package directly, which is what stars uses. One could do something like this: (still not handling the calendar correctly, I'm not sure units can)
getNcTime <- function(nc) { ##NEW VERSION, with the units package
require(units)
require(ncdf4)
options(warn=1) #show warnings by default
if (is.character(nc)) nc <- nc_open(nc)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
if (length(timevar) > 1) {
warning(paste("Found more than one time var. Using the first:", timevar[1]))
timevar <- timevar[1]
}
if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
times <- ncvar_get(nc, timevar) #get time data
timeatt <- ncatt_get(nc, timevar) #get attributes
timeunit <- timeatt$units
units(times) <- make_unit(timeunit)
as.POSIXct(time)
}
I couldn't get #AF7's function to work with my files so I wrote my own. The function below creates a POSIXct vector of dates, for which the start date, time interval, unit and length are read from the nc file. It works with nc files of many (but probably not every...) shapes or forms.
ncdate <- function(nc) {
ncdims <- names(nc$dim) #Extract dimension names
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
"date", "Date"))[1]] # Pick the time dimension
ntstep <-nc$dim[[timevar]]$len
tm <- ncvar_get(nc, timevar) # Extract the timestep count
tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
tspace <- tm[2] - tm[1] # Calculate time period between two timesteps, for the "by" argument
tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
tmulti <- 3600 # Declare hourly multiplier for date
if (uname == "days") tmulti =86400 # Declare daily multiplier for date
## Rename "seconds" to "secs" for "by" argument and change the multiplier.
if (uname == "seconds") {
uname <- "secs"
tmulti <- 1 }
byt <- paste(tspace,uname) # Define the "by" argument
if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
byt= "1 hour" ## R won't understand "by < 1" so change by and unit to hour.
uname = "hours"}
datev <- seq(from=as.POSIXct(startd+tm[1]*tmulti),by= byt, units=uname,length=ntstep)
}
Edit
To address the flaw highlighted by #AF7's comment that the above code would only work for regularly spaced files, datev could be calculated as
datev <- as.POSIXct(tm*tmulti,origin=startd)
Related
I have a netcdf file with a timeseries and the time variable has the following typical metadata:
double time(time) ;
time:standard_name = "time" ;
time:bounds = "time_bnds" ;
time:units = "days since 1979-1-1 00:00:00" ;
time:calendar = "standard" ;
time:axis = "T" ;
Inside R I want to convert the time into an R date object. I achieve this at the moment in a hardwired way by reading the units attribute and splitting the string and using the third entry as my origin (thus assuming the spacing is "days" and the time is 00:00 etc):
require("ncdf4")
f1<-nc_open("file.nc")
time<-ncvar_get(f1,"time")
tunits<-ncatt_get(f1,"time",attname="units")
tustr<-strsplit(tunits$value, " ")
dates<-as.Date(time,origin=unlist(tustr)[3])
This hardwired solution works for my specific example, but I was hoping that there might be a package in R that nicely handles the UNIDATA netcdf date conventions for time units and convert them safely to an R date object?
I have just discovered (two years after posting the question!) that there is a package called ncdf.tools which has the function:
convertDateNcdf2R
which
converts a time vector from a netCDF file or a vector of Julian days
(or seconds, minutes, hours) since a specified origin into a POSIXct R
vector.
Usage:
convertDateNcdf2R(time.source, units = "days", origin = as.POSIXct("1800-01-01",
tz = "UTC"), time.format = c("%Y-%m-%d", "%Y-%m-%d %H:%M:%S",
"%Y-%m-%d %H:%M", "%Y-%m-%d %Z %H:%M", "%Y-%m-%d %Z %H:%M:%S"))
Arguments:
time.source
numeric vector or netCDF connection: either a number of time units since origin or a netCDF file connection, In the latter case, the time vector is extracted from the netCDF file, This file, and especially the time variable, has to follow the CF netCDF conventions.
units
character string: units of the time source. If the source is a netCDF file, this value is ignored and is read from that file.
origin
POSIXct object: Origin or day/hour zero of the time source. If the source is a netCDF file, this value is ignored and is read from that file.
Thus it is enough to simply pass the netcdf connection as the first argument and the function handles the rest. Caveat: This will only work if the netCDF file follows CF conventions (e.g. if your units are "years since" instead of "seconds since" or "days since" it will fail for example).
More details on the function are available here:
https://rdrr.io/cran/ncdf.tools/man/convertDateNcdf2R.html
There is not, that I know of. I have this handy function using lubridate, which is basically identical to yours.
getNcTime <- function(nc) {
require(lubridate)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))[1]] #find time variable
times <- ncvar_get(nc, timevar)
if (length(timevar)==0) stop("ERROR! Could not identify the correct time variable")
timeatt <- ncatt_get(nc, timevar) #get attributes
timedef <- strsplit(timeatt$units, " ")[[1]]
timeunit <- timedef[1]
tz <- timedef[5]
timestart <- strsplit(timedef[4], ":")[[1]]
if (length(timestart) != 3 || timestart[1] > 24 || timestart[2] > 60 || timestart[3] > 60 || any(timestart < 0)) {
cat("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
timedef[4] <- "00:00:00"
}
if (! tz %in% OlsonNames()) {
cat("Warning:", tz, "not a valid timezone. Assuming UTC\n")
warning(paste("Warning:", timestart, "not a valid start time. Assuming 00:00:00\n"))
tz <- "UTC"
}
timestart <- ymd_hms(paste(timedef[3], timedef[4]), tz=tz)
f <- switch(tolower(timeunit), #Find the correct lubridate time function based on the unit
seconds=seconds, second=seconds, sec=seconds,
minutes=minutes, minute=minutes, min=minutes,
hours=hours, hour=hours, h=hours,
days=days, day=days, d=days,
months=months, month=months, m=months,
years=years, year=years, yr=years,
NA
)
suppressWarnings(if (is.na(f)) stop("Could not understand the time unit format"))
timestart + f(times)
}
EDIT: One might also want to take a look at ncdf4.helpers::nc.get.time.series
EDIT2: note that the newly-proposed and currently in developement awesome stars package will handle dates automatically, see the first blog post for an example.
EDIT3: another way is to use the units package directly, which is what stars uses. One could do something like this: (still not handling the calendar correctly, I'm not sure units can)
getNcTime <- function(nc) { ##NEW VERSION, with the units package
require(units)
require(ncdf4)
options(warn=1) #show warnings by default
if (is.character(nc)) nc <- nc_open(nc)
ncdims <- names(nc$dim) #get netcdf dimensions
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime", "date", "Date"))] #find (first) time variable
if (length(timevar) > 1) {
warning(paste("Found more than one time var. Using the first:", timevar[1]))
timevar <- timevar[1]
}
if (length(timevar)!=1) stop("ERROR! Could not identify the correct time variable")
times <- ncvar_get(nc, timevar) #get time data
timeatt <- ncatt_get(nc, timevar) #get attributes
timeunit <- timeatt$units
units(times) <- make_unit(timeunit)
as.POSIXct(time)
}
I couldn't get #AF7's function to work with my files so I wrote my own. The function below creates a POSIXct vector of dates, for which the start date, time interval, unit and length are read from the nc file. It works with nc files of many (but probably not every...) shapes or forms.
ncdate <- function(nc) {
ncdims <- names(nc$dim) #Extract dimension names
timevar <- ncdims[which(ncdims %in% c("time", "Time", "datetime", "Datetime",
"date", "Date"))[1]] # Pick the time dimension
ntstep <-nc$dim[[timevar]]$len
tm <- ncvar_get(nc, timevar) # Extract the timestep count
tunits <- ncatt_get(nc, timevar, "units") # Extract the long name of units
tspace <- tm[2] - tm[1] # Calculate time period between two timesteps, for the "by" argument
tstr <- strsplit(tunits$value, " ") # Extract string components of the time unit
a<-unlist(tstr[1]) # Isolate the unit .i.e. seconds, hours, days etc.
uname <- a[which(a %in% c("seconds","hours","days"))[1]] # Check unit
startd <- as.POSIXct(gsub(paste(uname,'since '),'',tunits$value),format="%Y-%m-%d %H:%M:%S") ## Extract the start / origin date
tmulti <- 3600 # Declare hourly multiplier for date
if (uname == "days") tmulti =86400 # Declare daily multiplier for date
## Rename "seconds" to "secs" for "by" argument and change the multiplier.
if (uname == "seconds") {
uname <- "secs"
tmulti <- 1 }
byt <- paste(tspace,uname) # Define the "by" argument
if (byt == "0.0416666679084301 days") { ## If the unit is "days" but the "by" interval is in hours
byt= "1 hour" ## R won't understand "by < 1" so change by and unit to hour.
uname = "hours"}
datev <- seq(from=as.POSIXct(startd+tm[1]*tmulti),by= byt, units=uname,length=ntstep)
}
Edit
To address the flaw highlighted by #AF7's comment that the above code would only work for regularly spaced files, datev could be calculated as
datev <- as.POSIXct(tm*tmulti,origin=startd)
I downloaded stock market data from Yahoo (code below) - for context, at first I tried with getSymbols(^DJI) but I got error messages possibly related to Yahoo... different issue.
The point is that once downloaded, and imported into R, I massaged it into a format close enough to a time series to be able to run chartSeries(DJI):
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI$Date <- NULL # Removing the Date column
chartSeries(DJI, type="auto", theme=chartTheme('white'))
even if the dataset is not really a time series:
> is.ts(DJI)
[1] FALSE
The problem comes about when I try to find out the date of, for instance, the minimum closing value of the Dow. I can do something like
> DJI[DJI$Close == min(DJI$Close),]
Open High Low Close Adj.Close Volume
1985-05-01 1257.18 1262.81 1239.07 1242.05 1242.05 10050000
yielding the entire row, including the row name (1985-05-01), which is the only part I want. However, if I insist on just getting the actual date, I have to juggle a second dataset containing the dates in one of the columns:
require(RCurl)
require(foreign)
x <- getURL("https://raw.githubusercontent.com/RInterested/datasets/gh-pages/%5EDJI.csv")
DJI <- read.csv(text = x, sep =",")
DJI$Date <- as.Date(DJI$Date, format = "%m/%d/%Y") # Formatting Date as.Date
rownames(DJI) <- DJI$Date # Assigning Date to row names
DJI.raw <- DJI # Second dataset for future subsetting
DJI$Date <- NULL # Removing the Date column
which does allow me to run
> DJI.raw$Date[DJI.raw$Close == min(DJI.raw$Close)]
[1] "1985-05-01"
Further, I don't think that turning the dataset into an .xts file would help.
I'm not clear what you want but it sounds like you just want the date? You mention xts is not an option (which would have been runnable)
time(as.xts(DJI))[which.min(DJI$Close)] # POSIXct format
# [1] "1985-05-01 EDT"
Otherwise a simple rownames + which.min would get the date for you?
as.Date(rownames(DJI)[which.min(DJI$Close)]) # Date format
# [1] "1985-05-01"
Below is my dataset example saved as a csv file. Is it possible to extract them and save as several csv files based on specified timeframe.
For example:
The specified timeframes are:
daytime: 07:30 (same date) to 20:30 (same date)
nighttime: 21:30 (same date) to 06:30 (next date).
After the extraction, datasets are save as csv files based on this filename format:
daytime: "date"-day
daytime: "date"-night
"date" is the date from the timestamp.
Thanks for your help.
timestamp c3.1 c3.2 c3.3 c3.4 c3.5 c3.6 c3.7 c3.8 c3.9 c3.10 c3.11 c3.12
8/13/15 15:43 1979.84 1939.6 2005.21 1970 1955.55 1959.82 1989 2001.12 2004.38 1955.75 1958.75 1986.53
8/13/15 15:44 1979.57 1939.64 2005.14 1970.4 1956.43 1958.56 1989.7 2000.78 2004.53 1954.9 1959.76 1986.18
8/13/15 15:45 1979.32 1938.92 2004.52 1970.21 1955.75 1960.12 1989.07 2001.47 2003.7 1955.32 1958.94 1985.79
8/13/15 15:46 1979.33 1939.7 2004.66 1971.25 1955.89 1958.27 1989.24 2000.86 2003.92 1955.29 1959.25 1985.49
Assuming that dat is your data:
## The date-time format in the data set
format <- "%m/%d/%y %H:%M"
## Convert date-time to POSIXct
timestamp <- as.POSIXct(dat$timestamp, format = format)
## First and last dates in the data
first <- as.Date(min(timestamp))
last <- as.Date(max(timestamp))
## The start of day and night timeframes
start.day <- paste(first, "07:30")
start.night <- paste(first - 1, "20:30") ## first night timeframe starts the day before
end <- paste(last + 1, "20:30")
## The breakpoints, assuming that day is 7:30-20:30 and nigth 20:31-7:29 (i.e. no missing records)
breaks <- sort.POSIXlt(c(seq.POSIXt(as.POSIXct(start.day), as.POSIXct(end), by= "day"),
seq.POSIXt(as.POSIXct(start.night), as.POSIXct(end), by= "day")))
## The corresponding labels
labels <- head(paste0(as.Date(breaks), c("-night", "-day")), - 1)
## Add column with timeframe
dat$timeframe <-cut.POSIXt(timestamp, breaks = breaks, labels = labels)
## Save csv files
for(x in levels(dat$timeframe)) {
subset <- dat[dat$timeframe == x, ]
subset$timeframe <- NULL ## Remove the timeframe column
if(nrow(subset) > 0) write.csv(subset, file = paste0(x, ".csv"))
}
I'm retrieving one minute quotes from google. After processing the data I try to create an xts object with one minute intervals but get same datetime repeated several times but don't understand why. Note that if I use the same data to build a vector of timestamps called my.dat2it does work.
library(xts)
url <- 'https://www.google.com/finance/getprices?q=IBM&i=60&p=15d&f=d,o,h,l,c,v'
x <- read.table(url,stringsAsFactors = F)
mynam <- unlist(strsplit(unlist(strsplit(x[5,], split='=', fixed=TRUE))[2] , split=','))
interv <- as.numeric(unlist(strsplit(x[4,], split='=', fixed=TRUE))[2])
x2 <- do.call(rbind,strsplit(x[-(1:7),1],split=','))
rownames(x2) <- NULL
colnames(x2) <- mynam
ind <- which(nchar(x2[,1])>5)
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
#To convert from data.frame to numeric
class(x2) <- 'numeric'
my.dat <- rep(0,nrow(x2))
#Convert all to same format
for (i in 1:nrow(x2)) {
if (nchar(x2[i,1])>5) {
ini.dat <- x2[i,1]
my.dat[i] <- ini.dat
} else {
my.dat[i] <- ini.dat+interv*x2[i,1]
}
}
df <- xts(x2[,-1],as.POSIXlt(my.dat, origin = '1970-01-01'))
head(df,20)
my.dat2 <- as.POSIXlt(my.dat, origin = '1970-01-01')
head(my.dat2,20)
I tried a simpler example simulating the data and creating a sequence of dates by minute to create the xts object and it worked so it must be something that I'm missing when passing the dates to the xts function.
Your my.dat object has duplicated values and xts and zoo objects must be ordered, so all the duplicate values are being grouped together.
The issue is this line, where you only take the second element, rather than every non-blank element.
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
# this should be
x2[ind,1] <- sapply(strsplit(x2[ind,1], split='a', fixed=TRUE), "[[", 2)
I would like to download daily data from yahoo for the S&P 500, the DJIA, and 30-year T-Bonds, map the data to the proper time zone, and merge them with my own data. I have several questions.
My first problem is getting the tickers right. From yahoo's website, it looks like the tickers are: ^GSPC, ^DJI, and ^TYX. However, ^DJI fails. Any idea why?
My second problem is that I would like to constrain the time zone to GMT (I would like to ensure that all my data is on the same clock, GMT seems like a neutral choice), but I couldn' get it to work.
My third problem is that I would like to merge the yahoo data with my own data, obtained by other means and available in a different format. It is also daily data.
Here is my attempt at constraining the data to the GMT time zone. Executed at the top of my R script.
Sys.setenv(TZ = "GMT")
# > Sys.getenv("TZ")
# [1] "GMT"
# the TZ variable is properly set
# but does not affect the time zone in zoo objects, why?
Here is my code to get the yahoo data:
library("tseries")
library("xts")
date.start <- "1999-12-31"
date.end <- "2013-01-01"
# tickers <- c("GSPC","TYX","DJI")
# DJI Fails, why?
# http://finance.yahoo.com/q?s=%5EDJI
tickers <- c("GSPC","TYX") # proceed without DJI
z <- zoo()
index(z) <- as.Date(format(time(z)),tz="")
for ( i in 1:length(tickers) )
{
cat("Downloading ", i, " out of ", length(tickers) , "\n")
x <- try(get.hist.quote(
instrument = paste0("^",tickers[i])
, start = date.start
, end = date.end
, quote = "AdjClose"
, provider = "yahoo"
, origin = "1970-01-01"
, compression = "d"
, retclass = "zoo"
, quiet = FALSE )
, silent = FALSE )
print(x[1:4]) # check that it's not empty
colnames(x) <- tickers[i]
z <- try( merge(z,x), silent = TRUE )
}
Here is the dput(head(df)) of my dataset:
df <- structure(list(A = c(-0.011489000171423, -0.00020300000323914,
0.0430639982223511, 0.0201549995690584, 0.0372899994254112, -0.0183669999241829
), B = c(0.00110999995376915, -0.000153000000864267, 0.0497750006616116,
0.0337960012257099, 0.014121999964118, 0.0127800004556775), date = c(9861,
9862, 9863, 9866, 9867, 9868)), .Names = c("A", "B", "date"
), row.names = c("0001-01-01", "0002-01-01", "0003-01-01", "0004-01-01",
"0005-01-01", "0006-01-01"), class = "data.frame")
I'd like to merge the data in df with the data in z. I can't seem to get it to work.
I am new to R and very much open to your advice about efficiency, best practice, etc.. Thanks.
EDIT: SOLUTIONS
On the first problem: following GSee's suggestions, the Dow Jones Industrial Average data may be downloaded with the quantmod package: thus, instead of the "^DJI" ticker, which is no longer available from yahoo, use the "DJIA" ticker. Note that there is no caret in the "DJIA" ticker.
On the second problem, Joshua Ulrich points out in the comments that "Dates don't have timezones because days don't have a time component."
On the third problem: The data frame appears to have corrupted dates, as pointed out by agstudy in the comments.
My solutions rely on the quantmod package and the attached zoo/xts packages:
library(quantmod)
Here is the code I have used to get proper dates from my csv file:
toDate <- function(x){ as.Date(as.character(x), format("%Y%m%d")) }
dtz <- read.zoo("myData.csv"
, header = TRUE
, sep = ","
, FUN = toDate
)
dtx <- as.xts(dtz)
The dates in the csv file were stored in a single column in the format "19861231". The key to getting correct dates was to wrap the date in "as.character()". Part of this code was inspired from R - Stock market data from csv to xts. I also found the zoo/xts manuals helpful.
I then extract the date range from this dataset:
date.start <- start(dtx)
date.end <- end(dtx)
I will use those dates with quantmod's getSymbols function so that the other data I download will cover the same period.
Here is the code I have used to get all three tickers.
tickers <- c("^GSPC","^TYX","DJIA")
data <- new.env() # the data environment will store the data
do.call(cbind, lapply( tickers
, getSymbols
, from = date.start
, to = date.end
, env = data # data saved inside an environment
)
)
ls(data) # see what's inside the data environment
data$GSPC # access a particular ticker
Also note, as GSee pointed out in the comments, that the option auto.assign=FALSE cannot be used in conjunction with the option env=data (otherwise the download fails).
A big thank you for your help.
Yahoo doesn't provide historical data for ^DJI. Currently, it looks like you can get the same data by using the ticker "DJIA", but your mileage may vary.
It does work in this case because you're only dealing with Dates
the df object your provided is yearly data beginning in the year 0001. So, that's probably not what you wanted.
Here's how I would fetch and merge those series (or use an environment and only make one call to getSymbols)
library(quantmod)
do.call(cbind, lapply(c("^GSPC", "^TYX"), getSymbols, auto.assign=FALSE))