I am importing a csv file into R, creating a 3x3 dataframe, and attempting to convert the dataframe to an xts object. But I get error message "do not match the length of object".
#DATSB <- fread("C:/Temp/GoogleDrive/R/temp.csv", select = c("DateTime","Last","Volume"))
#that results in following dput() output:
DATSB <- structure(list(DateTime = c("3/28/2016 20:37", "3/28/2016 20:36","3/28/2016 20:35"), Last = c(1221.7, 1221.8, 1221.9), Volume = c(14L,2L, 22L)), .Names = c("DateTime", "Last", "Volume"), row.names = c(NA,3L), class = "data.frame")
setDF(DATSB)
DATSB$DateTime <- strptime(DATSB$DateTime, format = "%m/%d/%Y %H:%M")
DATSBxts <- as.xts(DATSB[, -2], order.by = as.Date(DATSB$DateTime, "%Y/%m/%d %H:%M"))
DateTime Last Volume
1 3/28/2016 20:37 1221.7 14
2 3/28/2016 20:36 1221.8 2
3 3/28/2016 20:35 1221.9 22
Exact error message is "Error in as.matrix.data.frame(x) :
dims [product 12] do not match the length of object [14]"
Somehow the root of the problem is the column Volume. Without that column, it works. Unfortunately can't figure it out. Thanks for your help!
There was a typo here DATSB[, -2], correcting it works fine. General theme for xts is,
xts(data[,-date_column], order.by = data[,date_column])
Also coredata(DATSBxts) and index(DATSBxts) are helpful functions
DATSBxts = xts(DATSB[, -1], order.by = DATSB[,1] ,dateFormat = "%Y/%m/%d %H:%M:%S");rev(DATSBxts)
DATSBxts
# Last Volume
#2016-03-28 20:35:00 1221.9 22
#2016-03-28 20:36:00 1221.8 2
#2016-03-28 20:37:00 1221.7 14
Related
I`m trying to visualise data of the following form:
date volaEUROSTOXX volaSA volaKENYA25 volaNAM volaNIGERIA
1 10feb2012 0.29844454 0.1675901 0.007862087 0.12084170 0.10247617
2 17feb2012 0.31811157 0.2260064 0.157017220 0.33648935 0.22584127
3 24feb2012 0.30013672 0.1039974 0.083863921 0.11694768 0.16388161
To do so, I first converted the date (stored as a character in the original data frame) into a date-format. Which works just fine:
vola$date <- as.Date(vola$date)
str(vola$date)
Date[1:543], format: "2012-02-10" "2012-02-17" "2012-02-24" "2012-03-02" "2012-03-09"
However, if I now try to graph my data by using the chart.TimeSeries command, I get the following:
chart.TimeSeries(volatility_annul_stringdate,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
Error in if (class(x) == "numeric") { : the condition has length > 1
I tried:
Converting my date variable (in the date format) further into a time series object:
vola$date <- ts(vola$date, frequency=52, start=c(2012,9)) #returned same error from above
Converting the whole data set using its-command:
vol.xts <- xts(vola, order.by= vola$date, unique = TRUE ) # which then returned:
order.by requires an appropriate time-based object
#even though date is a time-series
What am I doing wrong? I am rather new to RStudio.. I really want to use the chart.TimeSeries command. Can someone help me?
Thanks in advance!
My MRE:
library(PerformanceAnalytics)
vola <- structure(list(date_2 = c("2012-02-10", "2012-02-17", "2012-02-24",
"2012-03-02"), volaEUROSTOXX = c(0.298444539308548, 0.318111568689346,
0.300136715173721, 0.299697518348694), volaKENYA25 = c(0.00786208733916283,
0.157017216086388, 0.0838639214634895, 0.152377054095268), volaNAM = c(0.120841704308987,
0.336489349603653, 0.116947680711746, 0.157027021050453), volaNIGERIA = c(0.102476172149181,
0.225841268897057, 0.163881614804268, 0.317349642515182), volaSA = c(0.167590111494064,
0.226006388664246, 0.103997424244881, 0.193037077784538), date = structure(c(1328832000,
1329436800, 1330041600, 1330646400), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
vola <- subset(vola, select = -c(date))
vola$date_2 <- as.Date(vola$date_2)
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#This returns the above mentioned error message.
#Thus, I tried the following:
vola$date_2 <- ts(vola$date_2, frequency=52, start=c(2012,9))
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#Which returned a different error (as described above)
#And I tried:
vol.xts <- xts(vola, order.by= vola$date_2, unique = TRUE )
#This also returned an error message.
#My intention was to then run:
#chart.TimeSeries(vol.xts,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
The documentation of PerformanceAnalytics::chart.TimeSeries is a bit vague. The issue is that when passing a dataframe you have to set the dates as row.names. To this end I first converted your data (which is a tibble) to a data.frame. Afterwards I add the dates as rownames and drop the date column:
library(PerformanceAnalytics)
vola <- as.data.frame(vola)
vola <- subset(vola, select = -c(date))
row.names(vola) <- as.Date(vola$date_2)
vola$date_2 <- NULL
chart.TimeSeries(vola,
lwd = 2, auto.grid = F, ylab = "Annualized Log Volatility", xlab = "Time",
main = "Log Volatility", lty = 1,
legend.loc = "topright"
)
In R, I am trying to read a file that has a timestamp, and update the timestamp based on the condition of another field. The below code works with no problem:
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
Sys.sleep(5)
t$last_update <- as.POSIXlt(ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update), origin = "1970-01-01")
print(t)
The problem is when I read an existing file and try to dynamically change an as.POSIXlt value. The following code is producing the error that accompanies it in the code block afterwards:
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong2#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
write.csv(t, "so_question.csv", row.names = FALSE)
t <- read.csv("so_question.csv")
t$last_update <- as.POSIXlt(t$last_update)
Sys.sleep(5)
t$last_update <- as.POSIXlt(ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update), origin = "1970-01-01")
Error in as.POSIXlt.default(ifelse(t$user == "bshelton#email1.com", Sys.time(), :
do not know how to convert 'ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update)' to class “POSIXlt”
In addition: Warning message:
In ans[!test & ok] <- rep(no, length.out = length(ans))[!test & :
number of items to replace is not a multiple of replacement length
The first case is curiously working only because you don't have what you think—those datetimes are in fact POSIXct, not POSIXlt:
last_update <- rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2)
str(last_update)
#> POSIXlt[1:2], format: "2019-07-28 20:52:10" "2019-07-28 20:52:10"
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = last_update)
str(t)
#> 'data.frame': 2 obs. of 2 variables:
#> $ user : Factor w/ 2 levels "bshelton#email1.com",..: 1 2
#> $ last_update: POSIXct, format: "2019-07-28 20:52:10" "2019-07-28 20:52:10"
If you dig into ?data.frame, it says
data.frame converts each of its arguments to a data frame by calling as.data.frame(optional = TRUE). As that is a generic function, methods can be written to change the behaviour of arguments according to their classes: R comes with many such methods. Character variables passed to data.frame are converted to factor columns unless protected by I or argument stringsAsFactors is false. If a list or data frame or matrix is passed to data.frame it is as if each component or column had been passed as a separate argument (except for matrices protected by I).
This is what's happening: as.data.frame.POSIXlt in fact converts to POSIXct:
now <- Sys.time()
str(now)
#> POSIXct[1:1], format: "2019-07-28 22:50:12"
str(data.frame(time = now))
#> 'data.frame': 1 obs. of 1 variable:
#> $ time: POSIXct, format: "2019-07-28 22:50:12"
as.data.frame.POSIXlt
#> function (x, row.names = NULL, optional = FALSE, ...)
#> {
#> value <- as.data.frame.POSIXct(as.POSIXct(x), row.names,
#> optional, ...)
#> if (!optional)
#> names(value) <- deparse(substitute(x))[[1L]]
#> value
#> }
#> <bytecode: 0x7fc938a11060>
#> <environment: namespace:base>
More immediately, since Sys.time() returns a POSIXct object, ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update) in the second case is getting a POSIXct object for one observation and POSIXlt for the other. The POSIXlt object's class attribute is dropped by ifelse revealing the list underneath, which ifelse then doesn't know how to turn into a vector together with the unclassed POSIXct object (which is just a number).
The solution here, then, is to follow the hint data.frame is giving you and use POSIXct instead of POSIXlt.
If you really want to make it work with POSIXlt, you can iterate over the conditions and POSIXlt vector with Map with if/else (which maintain attributes including class, but only handle scalar conditions) and coerce the resulting list back to a vector with do.call(c, ...):
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
t$last_update <- as.POSIXlt(t$last_update)
t$last_update <- do.call(c, Map(
function(condition, last_update){
if (condition) {
as.POSIXlt(Sys.time() + 5)
} else {
last_update
}
},
condition = t$user == "bshelton#email1.com",
last_update = t$last_update
))
t
#> user last_update
#> 1 bshelton#email1.com 2019-07-28 23:11:04
#> 2 lwong#email1.com 2019-07-28 23:10:59
...but frankly that's a little silly. Just use POSIXct instead, and your life will be better.
I have a sample q below that contains three dates of dd/mm/yy in q$test
test
1 210376
2 141292
3 280280
I want to create a new covariate q$new that calculates the date difference from q$test to today.
I tried
q$new <- as.numeric(difftime(as.Date(q$test,format='%d/%m/%y'), as.Date(Sys.Date()), unit="weeks"))
But I receive an error message
Error in q$new <- as.numeric(difftime(as.Date(q$test, format =
"%d/%m/%y"), : object of type 'closure' is not subsettable
Do you have any idea whats wrong? Or have another solution?
q <- structure(list(test = c(210376L, 141292L, 280280L)), class = "data.frame", row.names = c(NA,
-3L))
You could do
as.numeric(difftime(Sys.Date(), as.Date(as.character(q$test), "%d%m%y"), units = "weeks"))
#[1] 2257.286 1384.143 2051.714
Few pointers -
1) Sys.Date is already of class "Date" so no need for as.Date there
2) as.Date was expecting a character string as input hence wrapped q$test in as.character
3) format in as.Date is used to represent the format we have as input and not the output we want. So in your case you used the format "%d/%m/%y" whereas the format you had was %d%m%y.
I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).
I'm trying to understand my difficulties in the past with inputting zoo objects. The following two uses of read.zoo give different results despite the default argument for tz supposedly being "" and that is the only difference between the two read.zoo calls:
Lines <- "2013-11-25 12:41:21 2
2013-11-25 12:41:22.25 2
2013-11-25 12:41:22.75 75
2013-11-25 12:41:24.22 3
2013-11-25 12:41:25.22 1
2013-11-25 12:41:26.22 1"
library(zoo)
z <- read.zoo(text = Lines, index = 1:2)
> dput(z)
structure(c(2L, 2L, 75L, 3L, 1L, 1L), index = structure(c(16034,
16034, 16034, 16034, 16034, 16034), class = "Date"), class = "zoo")
z <- read.zoo(text = Lines, index = 1:2, tz="")
> dput(z)
structure(c(2L, 2L, 75L, 3L, 1L, 1L), index = structure(c(1385412081,
1385412082.25, 1385412082.75, 1385412084.22, 1385412085.22, 1385412086.22
), class = c("POSIXct", "POSIXt"), tzone = ""), class = "zoo")
>
The answer (of course) is in the sources for read.zoo(), wherein there is:
....
ix <- if (missing(format) || is.null(format)) {
if (missing(tz) || is.null(tz))
processFUN(ix)
else processFUN(ix, tz = tz)
}
else {
if (missing(tz) || is.null(tz))
processFUN(ix, format = format)
else processFUN(ix, format = format, tz = tz)
}
....
Even though the default for tz is "", in your first case tz is considered missing (by missing()) and hence processFUN(ix) is used. When you set tz = "", it is no longer missing and hence you get processFUN(ix, tz = tz).
Without looking at the details of read.zoo() this could possibly be handled better by having tz = NULL or tz (no default) in the arguments and then in the code, if tz needs to be set to "" for some reason, do:
if (missing(tz) || is.null(tz)) {
tz <- ""
}
or perhaps this is not even needed if all the is required is to avoid the confusion about the two different calls?
Effectively, the default index class is "Date" unless tz is used in which case the default is "POSIXct". Thus the first example in the question gives "Date" class since that is the default and the second "POSIXct" since tz was specified.
If you want to specify the class without making use of these defaults then to be explicit use the FUN argument:
read.zoo(...whatever..., FUN = as.Date)
read.zoo(...whatever..., FUN = as.POSIXct) # might need FUN=paste,FUN2=as.POSIXct
read.zoo(...whatever..., FUN = as.yearmon)
# etc.
The FUN argument can also take a custom function as shown in the examples in the package.
Note that it always assumes standard formats (e.g. "%Y-%m-%d" in the case of "Date" class) if no format is specified and never tries to automatically determine the format.
The way it works is explained in detail in ?read.zoo and there are many examples in ?read.zoo (there are 78 lines of code in the examples section) as well as in an entire vignette (one of six vignettes) dedicated just to read.zoo" : Reading Data in zoo.
Added Have expanded the above. Also, in the development version of zoo available here the heuristic has been improved and with that improvement the first example in the question does recognize the date/times and chooses POSIXct. Also some clarification of the simple heuristic has been added to the read.zoo help file so that the many examples provided do not have to be relied upon as much.
Here are some examples. Note that the heuristic referred to is a heuristic to determine the class of the time index only. It can only identify "numeric", "Date" and "POSIXct" classes. The heuristic cannot identify other classes (although you can specify them yourself using FUN=). Also the heuristic does not identify formats. If the format is not provided using format= or implicitly through FUN= then standard format is assumed, e.g. "%Y-%m-%d" in the case of "Date".
Lines <- "2013-11-25 12:41:21 2
2013-12-25 12:41:22.25 3
2013-12-26 12:41:22.75 8"
# explicit. Uses POSIXct.
z <- read.zoo(text = Lines, index = 1:2, FUN = paste, FUN2 = as.POSIXct)
# tz implies POSIXct
z <- read.zoo(text = Lines, index = 1:2, tz = "")
# heuristic: Date now; devel ver uses POSIXct
z <- read.zoo(text = Lines, index = 1:2)
Lines <- "2013-11-251 2
2013-12-25 3
2013-12-26 8"
z <- read.zoo(text = Lines, FUN = as.Date) # explicit. Uses Date.
z <- read.zoo(text = Lines, format = "%Y-%m-%d") # format & no tz implies Date
z <- read.zoo(text = Lines) # heuristic: Date
Note:
(1) In general, its safer to be explicit by using FUN or by using tz and/or format as opposed to relying on the heuristic. If you are explicit by using FUN or semi-explicit by using tz and/or format then there is no change between the current and the development versions of read.zoo.
(2) Its safer to rely on the documentation rather than the internals as the internals can change without warning and in fact have changed in the development version. If you really want to look at the code despite this then the key statement that selects the class of the index if FUN is not explicitly defined is the if (is.null(FUN)) ... statement in the read.zoo source.
(3) I recommend using read.zoo as being easier, direct and compact rather than workarounds such as read.table followed by zoo. I have been using read.zoo for years as have many others and it seems pretty solid to me but if anyone finds specific problems with read.zoo or with the documentation (always possible since there is quite a bit of it) they can always be reported. Even though the package has been around for years improvements are still being made.
I suspect your use of read.zoo tripped you up. Here is what I did:
library(zoo)
tt <- read.table(text=Lines)
z <- zoo(as.integer(tt[,3]), order.by=as.POSIXct(paste(tt[,1], tt[,2])))
Now z is a proper zoo object:
R> z
2013-11-25 12:41:21.00 2013-11-25 12:41:22.25 2013-11-25 12:41:22.75
2 2 75
2013-11-25 12:41:24.22 2013-11-25 12:41:25.22 2013-11-25 12:41:26.22
3 1 1
R> class(z)
[1] "zoo"
R> class(index(z))
[1] "POSIXct" "POSIXt"
R>
And by making sure I used a POSIXct object for the index, I am in fact getting a POSIXct object back.