Data: DOWNLOAD .TXT
Code:
data = read.table("DistrBdaily1yrs.txt", header = TRUE, sep = "", dec = ",")
data$DATE = as.Date(as.character(data$DATE),format="%Y%m%d")
dataXts = xts(data$QUANTITY,data$DATE, frequency = 6)
tseries = ts(dataXts, start = start(dataXts), end = end(dataXts), frequency = 6)
What I'm trying to do is to convert the xts dataXts object to a ts object with correct starting and ending date in order to use the decompose function. In this case start = start(dataXts) and end = end(dataXts) give me the right starting and ending date but tseries doesn't recognize the data column in dataXts and then think that all is data.
How can I fix this?
I am not sure I was able to "FORCE" xts to ts but i got the decompose part to function:
library("data.table")
# I was unable to read-in using read.table() for some reason.... used fread() as it is much faster
data <- fread("DistrBdaily1yrs.txt", header = TRUE, sep="\t")
# Set column names to the ones I saw on dropbox, as i was unable to read-in header for some reason!
colnames(data) <- c("DATE", "QUANTITY")
# Keep as-is
data$DATE = as.Date(as.character(data$DATE),format="%Y%m%d")
dataXts = xts(data$QUANTITY,data$DATE, frequency = 6)
# Not sure what the "QUANTITY" Column means but it must be turned into "numeric"
# You can see this post on how to do it if the following is unsatisfactory:
# http://stackoverflow.com/questions/3605807/how-to-convert-numbers-with-comma-inside-from-character-to-numeric-in-r
a<-as.numeric(gsub(",",".",dataXts))
dataXts <- reclass(a, match.to=dataXts); colnames(dataXts)<- "QUANTITY"
# Now convert it to timeSeries
timeseries <- ts(dataXts,frequency=6)
# decompose
decompose(timeseries)
Also, when I convert xts to ts I assume that it will use the first and last dates in order to construct the ts which is why i left out start = start(dataXts), end = end(dataXts) in the ts() function. Also see ?ts since you cannot pass Dates in the start or end criteria, rather:
Either a single number or a vector of two integers, which specify a natural time unit and a (1-based) number of samples into the time unit.
You can always convert back to xts using reclass:
# for example: Say you only want the trend
reclass(decompose(timeseries)$trend,match.to=dataXts)
Related
I'm using the convert function in Highfrequency package in R. The dataset I'm using is TAQ downloaded from WRDS. The data looks like This.
The function convert suppose to convert the .csv into .RData files of xts objects.
I follow the instruction of the package and use the following code:
library(highfrequency)
from <- "2017-01-05"
to <- "2017-01-05"
format <- "%Y%m%d %H:%M:%S"
datasource <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
datadestination <- "C:/Users/feimo/OneDrive/SFU/Thesis-Project/R/IBM"
convert( from=from, to=to, datasource=datasource,
datadestination=datadestination, trades = T, quotes = F,
ticker="IBM", dir = T, extension = "csv",
header = F, tradecolnames = NULL,
format=format, onefile = T )
But I got the following error message:
> Error in `$<-.data.frame`(`*tmp*`, "COND", value = numeric(0)) :
> replacement has 0 rows, data has 23855
I believe the default column names in the function is: c("SYMBOL", "DATE", "EX", "TIME", "PRICE", "SIZE", "COND", "CORR", "G127") which is different from my dataset, so I manually changed it in my .csv to match it. Then I got another error
>Error in xts(tdata, order.by = tdobject) : 'order.by' cannot contain 'NA', 'NaN', or 'Inf'
Tried to look at the original code, but couldn't find a solution.
Any suggestion would be really helpful. Thanks!
When I run your code on the data to which you provide a link, I get the second error you mention:
Error in xts(tdata, order.by = tdobject) :
'order.by' cannot contain 'NA', 'NaN', or 'Inf'
This error can be traced to these lines in the function highfrequency:::makeXtsTrades(), which is called by highfrequency::convert():
tdobject = as.POSIXct(paste(as.vector(tdata$DATE), as.vector(tdata$TIME)),
format = format, tz = "GMT")
tdata = xts(tdata, order.by = tdobject)
The error results from two problems:
The variable "DATE" in your data file is read into R as numeric, whereas it appears that the code creating tdobject expects tdata$DATE to be a character vector. You could fix this by manually converting that variable to a character vector:
tdata <- read.csv("IBM_trades.csv")
tdata$DATE <- as.character(tdata$DATE)
write.csv(tdata, file = "IBM_trades_DATE_fixed.csv", row.names = FALSE)
The variable "TIME_M" in your data file is not a time of the format "%H:%M:%S". It looks like it is only the minutes and seconds component of a more complete time variable, because values only contain one colon and the values before and after the colon vary from 0 to 59.9. Fixing this problem would require finding the hour component of the time variable.
These two problems result in tdobject being filled with NA values rather than valid date-times, which causes an error when xts::xts() tries to order the data by tdobject.
The more general issue seems to be that the function highfrequency::convert() expects your data to follow something like the format described here on the WRDS website, but your data has slightly different column names and possibly different value formats. I would recommend taking a close look at that WRDS page and the documentation for your data file and determining which variables in your data correspond to those described on that page (for instance, it's not clear to me that your data contains any variable that is equivalent to "G127").
I am retrieving data from an ArcGIS Rest service using a web request in an R script. One of the attribute columns does contain a date value. These dates are returned as epoch values (character strings). I try to convert these epoch strings to human readable dates, but until now to no avail...
See my reproducible script below. The column containing the date is called WVK_BEGDAT. The content of this column should be converted to human readable dates.
I have tried several suggestions found via Google...
library(httr)
library(sf)
library(lubridate)
url <- parse_url("https://services.arcgis.com/nSZVuSZjHpEZZbRo/arcgis/rest/services")
url$path <- paste(url$path, "NWB_Wegvakken/FeatureServer/0/query", sep = "/")
url$query <- list(where = "WEGBEHSRT = 'R' AND WEGNUMMER = '015'",
outFields = "*",
returnGeometry = "true",
f = "json")
request <- build_url(url)
request
NWB <- st_read(request, stringsAsFactors = FALSE)
plot(st_geometry(NWB))
NWB$WVK_BEGDAT2 <- as_date(NWB$WVK_BEGDAT, format="%d-%m-%Y", tz = "CET")
The closest I get is using the as_date() function from the lubridate package. This function at least does not return an error, bu fills the column with NAs.
Any suggestions? Thanks in advance.
as.POSIXct should do the trick:
as.POSIXct(as.numeric(NWB$WVK_BEGDAT)/1000, origin = "1970-01-01")
You have to divide by 1000, because your epoch times seem to be in milliseconds.
I've got two csv files.
One file lists when and why an employee leaves.
EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
119549,Sales,Retirement,09/30/2013
2629053,Sales,Termination,09/30/2013
120395,Sales,Retirement,11/01/2013
122450,Sales,Transfer,11/30/2013
123962,Sales,Transfer,11/30/2013
1041054,Sales,Resignation,12/01/2013
990962,Sales,Retirement,12/14/2013
135396,Sales,Retirement,01/11/2014
Another file is a lookup table shows the start and end dates of every fiscal quarter:
FYFQ,Start,End
FY2014FQ1,10/1/2013,12/31/2013
FY2014FQ2,1/1/2014,3/31/2014
FY2014FQ3,4/1/2014,6/30/2014
FY2014FQ4,7/1/2014,9/30/2014
FY2015FQ1,10/1/2014,12/31/2014
FY2015FQ2,1/1/2015,3/31/2015
I'd like R to find what FYFQ the Separation_Date occurred in and print it into a fourth column in the data.
Input:
Separations.csv:
>EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
>990962,Sales,Retirement,12/14/2013
>135396,Sales,Retirement,01/11/2014
FiscalQuarterDates.csv:
>FYFQ,Start,End
>FY2013FQ4,7/1/2013,9/30/2013
>FY2014FQ1,10/1/2013,12/31/2013
>FY2014FQ2,1/1/2014,3/31/2014
Desired Output:
Output.csv:
>EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
>990962,Sales,Retirement,12/14/2013,FY2014FQ1
>135396,Sales,Retirement,01/11/2014,FY2014FQ2
I'm assuming there's some function that would iterate through the FiscalQuarterDates.csv and evaluate if each separation date was in a FYFQ, but I'm not sure.
Any thoughts on the best way to do this?
This is what worked.
#read in csv and declare th3 4th column a date
separations <- read.csv(file="Separations_DummyData.csv", head=TRUE,sep=",",colClasses=c(NA,NA,NA,"Date"))
#Use the zoo package (I installed it) to convert separation_date to quarter type and then set the quarter back by 1/4. Then construct the variable with FYyFQq.
library(zoo)
separations$FYFQ <- format(as.yearqtr(separations$Separation_Date, "%m/%d/%Y") + 1/4, "FY%YFQ%q")
#Write out this to CSV in working directory.
write.csv(separations, file = "sepscomplete.csv", row.names = FALSE)
You really don't need a second dataframe: A simple function will solve this:
yr<-with(firstdf,as.numeric(substr(Seperation_Date,7,10)))
mth<-with(firstdf,as.numeric(substr(Seperation_Date,1,2)))
firstdf$FYFQ<-with(firstdf,
ifelse(mth<=3,paste0("FY",yr,"FQ2"),
ifelse(mth>3 & mth<=6,paste0("FY",yr,"FQ3"),
ifelse(mth>7 & mth<=9,paste0("FY",yr,"FQ4"),
paste0("FY",yr+1,"FQ1")
))))
Convert each date to "yearqtr" class (from the zoo package) and add 1/4 to shift it to the next calendar quarter. Then write it out using write.csv:
library(zoo)
DF$FYFQ <- format(as.yearqtr(DF$Separation_Date, "%m/%d/%Y") + 1/4, "FY%YFQ%q")
giving:
> write.csv(DF, file = stdout(), row.names = FALSE)
"EmployeeID","Department","Separation_Type","Separation_Date","FYFQ"
990962,"Sales","Retirement","12/14/2013","FY2014FQ1"
135396,"Sales","Retirement","01/11/2014","FY2014FQ2"
Note:
1) If FYFQ need not be exactly in the format shown then it could be simplified to just:
DF$FYFQ <- as.yearqtr(DF$Separation_Date, "%m/%d/%Y") + 1/4
2) The second input file listed in the question is not used.
3) We used this for the input data:
Lines <- "EmployeeID,Department,Separation_Type,Separation_Date,FYFQ
990962,Sales,Retirement,12/14/2013
135396,Sales,Retirement,01/11/2014"
DF <- read.csv(text = Lines)
4) Fixed so that it produces shifted calendar quarters.
The text of this answer was just a copy of another answer so it has been moved to the question.
I'm learning R this semester and this is my first assignment. I want to retrieve monthly Adjusted stock quotes within a set date range using a for loop. And once I am able to do that I want to merge all the data into a data frame.
My code so far retrieves daily stock quotes for 5 stock symbols within a set date range, it assigns the object to the environment specified, and places only the .Adjusted column in the list.
Could someone point me in a better direction in obtaining the monthly quotes and am I on the right track with my code.
Thanks.
#Packages
library(quantmod)
#Data structure that contains stock quote objects
ETF_Data <- new.env()
#Assign dates to set range for stock quotes
sDate <- as.Date("2007-08-31")
eDate <- as.Date("2014-09-04")
#Assign a vector of ticker symbols.
ticker_symbol <- c("IVW","JKE","QQQ","SPYG","VUG")
#Assign number of ticker symbols.
total_ticker_symbols <- length(ticker_symbol)
#Assign empty list to for each object contained in my environment.
Temp_ETF_Data <- list()
#Assign integer value to counter.
counter <- 1L
#Loop and retrieve each ticker symbols quotes from Yahoo's API
for(i in ticker_symbol)
{
getSymbols(
i,
env = ETF_Data,
reload.Symbols = FALSE,
from = sDate,
to = eDate,
verbose = FALSE,
warnings = TRUE,
src = "yahoo",
symbol.lookup = TRUE)
#Add only Adjusted Closing Prices for each stock or object into list.
Temp_ETF_Data[[i]] <- Ad(ETF_Data[[i]])
if (counter == length(ticker_symbol))
{
#Merge all the objects of the list into one object.
ETF_Adj_Daily_Quotes <- do.call(merge, Temp_ETF_Data)
ETF_Adj_Monthly_Quotes <- ETF_Adj_Daily_Quotes[endpoints(ETF_Adj_Daily_Quotes,'months')]
}
else
{
counter <- counter + 1
}
}
There's no need for the for loop. You can loop over all the objects in the environment with eapply:
getSymbols(ticker_symbol, env=ETF_Data, from=sDate, to=eDate)
# Extract the Adjusted column from all objects,
# then merge all columns into one object
ETF_Adj_Data <- do.call(merge, eapply(ETF_Data, Ad))
# then extract the monthly endpoints
Monthly_ETF_Adj_Data <- ETF_Adj_Data[endpoints(ETF_Adj_Data,'months')]
I know that this is an old question but this answer might help potential future users seeking a better answer.
quantmod has now introduced an additional parameter to the getSymbols function called periodicity which can take the values of daily, weekly, monthly.
I tested out the following and it seems to work as desired:
getSymbols("EURGBP=X", from = starting, src = 'yahoo', periodicity = 'monthly')
just use
to.monthly(your_ticker)
I'm trying to convert the following date/time string into a zoo object:
2004:071:15:23:41.87250
2004:103:15:24:15.35931
year:doy:hour:minute:second
The date/time string is stored in a dataframe without headers. What's the best way to go about this in R?
Cheers!
Edit based on answer by Gavin:
# read in time series from CSV file; each entry as described above
timeSeriesDates <- read.csv("timeseriesdates.csv", header = FALSE, sep = ",")
# convert to format that can be used as a zoo object
timeSeriesDatesZ <- as.POSIXct(timeSeriesDates$V1, format = "%Y:%j:%H:%M:%S")
Read the data in to R in the usual way. You will have something like the following:
dats <- data.frame(times = c("2004:071:15:23:41.87250", "2004:103:15:24:15.35931"))
dats
These can be converted to one of the POSIXt classes via:
dats <- transform(dats, as.POSIXct(times, format = "%Y:%j:%H:%M:%S"))
or
data$times <- as.POSIXct(dats$times, format = "%Y:%j:%H:%M:%S"))
which can then be used in a zoo object. See ?strftime for details on the placeholders used in the format argument; essentially %j is the day of the year placeholder.
To do the zoo bit, we would do, using some dummy data for the actual time series
ts <- rnorm(2) ## dummy data
require(zoo) ## load zoo
tsZoo <- zoo(ts, dats$times)
the last line gives:
> tsZoo
2004:071:15:23:41.87250 2004:103:15:24:15.35931
0.3503648 -0.2336064
One thing to note with fractional seconds is that i) the exact fraction you have may not be recordable using floating point arithmetic. Also, R may not show the full fractional seconds given the value of an option in R; digits.secs. See ?options for more on this particular option and how to change it.
Here's a commented example for the first string:
R> s <- "2004:103:15:24:15.35931"
R> # split on the ":" and convert the result to a numeric vector
R> n <- as.numeric(strsplit(s, ":")[[1]])
R> # Use the year, hour, minute, second to create a POSIXct object
R> # for the first of the year; then add the number of days (as seconds)
R> ISOdatetime(n[1], 1, 1, n[3], n[4], n[5])+n[2]*60*60*24
[1] "2004-04-13 16:24:15 CDT"