In R, I am trying to read a file that has a timestamp, and update the timestamp based on the condition of another field. The below code works with no problem:
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
Sys.sleep(5)
t$last_update <- as.POSIXlt(ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update), origin = "1970-01-01")
print(t)
The problem is when I read an existing file and try to dynamically change an as.POSIXlt value. The following code is producing the error that accompanies it in the code block afterwards:
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong2#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
write.csv(t, "so_question.csv", row.names = FALSE)
t <- read.csv("so_question.csv")
t$last_update <- as.POSIXlt(t$last_update)
Sys.sleep(5)
t$last_update <- as.POSIXlt(ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update), origin = "1970-01-01")
Error in as.POSIXlt.default(ifelse(t$user == "bshelton#email1.com", Sys.time(), :
do not know how to convert 'ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update)' to class “POSIXlt”
In addition: Warning message:
In ans[!test & ok] <- rep(no, length.out = length(ans))[!test & :
number of items to replace is not a multiple of replacement length
The first case is curiously working only because you don't have what you think—those datetimes are in fact POSIXct, not POSIXlt:
last_update <- rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2)
str(last_update)
#> POSIXlt[1:2], format: "2019-07-28 20:52:10" "2019-07-28 20:52:10"
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = last_update)
str(t)
#> 'data.frame': 2 obs. of 2 variables:
#> $ user : Factor w/ 2 levels "bshelton#email1.com",..: 1 2
#> $ last_update: POSIXct, format: "2019-07-28 20:52:10" "2019-07-28 20:52:10"
If you dig into ?data.frame, it says
data.frame converts each of its arguments to a data frame by calling as.data.frame(optional = TRUE). As that is a generic function, methods can be written to change the behaviour of arguments according to their classes: R comes with many such methods. Character variables passed to data.frame are converted to factor columns unless protected by I or argument stringsAsFactors is false. If a list or data frame or matrix is passed to data.frame it is as if each component or column had been passed as a separate argument (except for matrices protected by I).
This is what's happening: as.data.frame.POSIXlt in fact converts to POSIXct:
now <- Sys.time()
str(now)
#> POSIXct[1:1], format: "2019-07-28 22:50:12"
str(data.frame(time = now))
#> 'data.frame': 1 obs. of 1 variable:
#> $ time: POSIXct, format: "2019-07-28 22:50:12"
as.data.frame.POSIXlt
#> function (x, row.names = NULL, optional = FALSE, ...)
#> {
#> value <- as.data.frame.POSIXct(as.POSIXct(x), row.names,
#> optional, ...)
#> if (!optional)
#> names(value) <- deparse(substitute(x))[[1L]]
#> value
#> }
#> <bytecode: 0x7fc938a11060>
#> <environment: namespace:base>
More immediately, since Sys.time() returns a POSIXct object, ifelse(t$user == "bshelton#email1.com", Sys.time(), t$last_update) in the second case is getting a POSIXct object for one observation and POSIXlt for the other. The POSIXlt object's class attribute is dropped by ifelse revealing the list underneath, which ifelse then doesn't know how to turn into a vector together with the unclassed POSIXct object (which is just a number).
The solution here, then, is to follow the hint data.frame is giving you and use POSIXct instead of POSIXlt.
If you really want to make it work with POSIXlt, you can iterate over the conditions and POSIXlt vector with Map with if/else (which maintain attributes including class, but only handle scalar conditions) and coerce the resulting list back to a vector with do.call(c, ...):
t <- data.frame(user = as.character(c("bshelton#email1.com", "lwong#email1.com")),
last_update = rep(as.POSIXlt(Sys.time(), tz = "America/Los_Angeles"), 2))
t$last_update <- as.POSIXlt(t$last_update)
t$last_update <- do.call(c, Map(
function(condition, last_update){
if (condition) {
as.POSIXlt(Sys.time() + 5)
} else {
last_update
}
},
condition = t$user == "bshelton#email1.com",
last_update = t$last_update
))
t
#> user last_update
#> 1 bshelton#email1.com 2019-07-28 23:11:04
#> 2 lwong#email1.com 2019-07-28 23:10:59
...but frankly that's a little silly. Just use POSIXct instead, and your life will be better.
Related
Let's consider following data frame for reproducible example:
df <- data.frame(
"Date" =
c(
"2009-11-02", "2009-11-03", "2009-11-04", "2009-11-05", "2009-11-06",
"2009-11-09", "2009-11-10", "2009-11-12", "2009-11-13", "2009-11-16",
"2009-11-17", "2009-11-18", "2009-11-19", "2009-11-20"
),
"Open" = c(
64.97971, 64.64817, 63.88567, 64.34973, 67.16770, 67.63186,
69.48868, 68.95794, 70.08527, 72.47256,
72.53886, 72.73724, 71.07980, 69.75345
),
"High" = c(
65.47689, 65.14544, 65.44378, 66.96887, 68.75883, 69.62065, 70.81439, 73.26807,
71.07980, 73.13536,
73.26807, 72.93625, 71.87532, 72.27345
),
"Low" = c(
63.98508, 62.75843, 63.71976, 64.34973, 65.47689, 66.96887, 68.36125, 68.95794,
69.28966, 72.00803,
72.00803, 71.14620, 69.68705, 69.75345
),
"Close" = c(
64.64817, 62.85784, 65.21174, 66.96887, 65.70910, 69.62065, 70.81439, 71.94172, 70.61537, 72.53886,
72.80355, 71.60999, 69.68705, 70.94709
)
)
This is some data frame in format OHLC (Open, High, Close, Low).
Now let's change this data frame into xts object:
df <- as.xts(df, order.by = as.Date(df[, 1]))
And now I want to apply to.period function e.g.:
to.period(df, period = "days", k =3)
I obtain error:
'to.period(df, period = "days", k = 3)':unsupported type
I read about this error and the source of it lies in the definition of xts object. Becuase xts object is a matrix every variable should have same type. The problem is here, becuase for example column "Open" is created by numeric values and first column is filled with values in date format. This is the reasoning why as.xts() converts everything to strings, as most common data format. However, even if I know the justification why it doens't work - I have no idea how can I made to.period work. Do you have any idea how it can be solved ?
The issue was that the first column was not removed while constructing the xts
df <- as.xts(df[-1], order.by = as.Date(df[, 1]))
to.period(df, period = "days", k =3)
# df.Open df.High df.Low df.Close
#2009-11-03 64.97971 65.47689 62.75843 62.85784
#2009-11-06 63.88567 68.75883 63.71976 65.70910
#2009-11-09 67.63186 69.62065 66.96887 69.62065
#2009-11-12 69.48868 73.26807 68.36125 71.94172
#2009-11-13 70.08527 71.07980 69.28966 70.61537
#2009-11-18 72.47256 73.26807 71.14620 71.60999
#2009-11-20 71.07980 72.27345 69.68705 70.94709
Without removing the first column i.e. a character class column ('Date'), the xts converts the whole data into a character class as it is also a matrix and matrix can have only single class
str(as.xts(df, order.by = as.Date(df[, 1])))
An ‘xts’ object on 2009-11-02/2009-11-20 containing:
Data: chr [1:14, 1:5] "2009-11-02" "2009-11-03" "2009-11-04" "2009-11-05" "2009-11-06" "2009-11-09" "2009-11-10" "2009-11-12" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Date" "Open" "High" "Low" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
I am trying to convert an element of a matrix from what were Excel serial dates to a vector of Date objects before using plot().
I can create a vetor and I get the expected result:
library(chron)
# set date origin as defined in Excel
options(chron.origin = c(month=1, day=1, year=1900))
test_dates <- c(40917:40920)
test_dates
## [1] 40917 40918 40919 40920
chron(test_dates, out.format = "m/d/y")
## [1] 01/11/12 01/12/12 01/13/12 01/14/12
But when I try to use this on my actual vector, it does not work
# first 10 vlaues in vector
pivot_pred$date
## [1] 40917 40918 40919 40920 40921 40922 40923 40924 40925 40926...
chron(pivot_pred$date, out.format = "m/d/y")
## Error in chron(dates. = floor(dts), times. = tms, format = format, out.format = out.format, :
misspecified chron format(s) length
I'm sure this is simple but I have tried many variations and none worked. Any suggestions on what I'm doing wrong?
I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).
I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards
I am importing a csv file into R, creating a 3x3 dataframe, and attempting to convert the dataframe to an xts object. But I get error message "do not match the length of object".
#DATSB <- fread("C:/Temp/GoogleDrive/R/temp.csv", select = c("DateTime","Last","Volume"))
#that results in following dput() output:
DATSB <- structure(list(DateTime = c("3/28/2016 20:37", "3/28/2016 20:36","3/28/2016 20:35"), Last = c(1221.7, 1221.8, 1221.9), Volume = c(14L,2L, 22L)), .Names = c("DateTime", "Last", "Volume"), row.names = c(NA,3L), class = "data.frame")
setDF(DATSB)
DATSB$DateTime <- strptime(DATSB$DateTime, format = "%m/%d/%Y %H:%M")
DATSBxts <- as.xts(DATSB[, -2], order.by = as.Date(DATSB$DateTime, "%Y/%m/%d %H:%M"))
DateTime Last Volume
1 3/28/2016 20:37 1221.7 14
2 3/28/2016 20:36 1221.8 2
3 3/28/2016 20:35 1221.9 22
Exact error message is "Error in as.matrix.data.frame(x) :
dims [product 12] do not match the length of object [14]"
Somehow the root of the problem is the column Volume. Without that column, it works. Unfortunately can't figure it out. Thanks for your help!
There was a typo here DATSB[, -2], correcting it works fine. General theme for xts is,
xts(data[,-date_column], order.by = data[,date_column])
Also coredata(DATSBxts) and index(DATSBxts) are helpful functions
DATSBxts = xts(DATSB[, -1], order.by = DATSB[,1] ,dateFormat = "%Y/%m/%d %H:%M:%S");rev(DATSBxts)
DATSBxts
# Last Volume
#2016-03-28 20:35:00 1221.9 22
#2016-03-28 20:36:00 1221.8 2
#2016-03-28 20:37:00 1221.7 14