I have a data frame that can have values like this:
p<-c("2012-08-14 9:00", "2012-08-14 7:00:00")
I am trying to conver to datetime as this:
p<-as.POSIXct(p)
this converted everyting to to 2012-08-14 09:00:00
for some reason, it is not working anymore. If you have noticed, my data sometimes have seconds and somtimes it does not.
How do you force this to be datetime format?
I get errors like this:
Error in as.POSIXlt.character(p) :
character string is not in a standard unambiguous format
Your vector isn't in a consistent format, so convert it to POSIXlt first because as.POSIXlt.character checks multiple formats.
p <- c("2012-08-14 9:00", "2012-08-14 7:00:00")
plt <- as.POSIXlt(p)
pct <- as.POSIXct(plt)
the package lubridate may help
here an example - perhaps not the most elegant one - but it hs
p<-c("2012-08-14 9:00", "2012-08-14 7:00:00")
require(lubridate) #
NewDate <- c()
for (i in 1 : 2)
{
if (nchar(unlist(strsplit(p[i], ' '))[2]) == 4) {NewDate <- c(NewDate, as.character(ymd_hm(p[i])))}
if (nchar(unlist(strsplit(p[i], ' '))[2]) == 7) {NewDate <- c(NewDate, as.character(ymd_hms(p[i])))}
}
NewDate
Related
Thanks for your help.
One of variable in my dataset looks like this:
> df$TM
> [1] "000054" "000020" "000056" "000051" "000025" "000116" "000219" "000207" "000233" "000206" "000142" "000126" "000237" "000215" "000236" "000246" "000219"
[18] "000227" "000803" "000920"...
The real meaning of each character is hours, minutes and seconds.
When I adjust hms function in Lubridates as follows
> df$TM <- hms(df$TM)
Warning message is coming: "In .parse_hms(..., order = "HMS", quiet = quiet) :
Some strings failed to parse, or all strings are NAs"
After that, all the values in the column changes to NA.
I also tried
> df$TM <- as.POSIXct(df$TM, format = "%H:%M:%S")
and
> df$TM <- chronicle(times = df$TM)
and
> df$TM <- strptime(df$TM, format = "%H:%M:%S")
but... these three trial also have same results.
(Actually all data has changed to NA, so warning message is same as error message to me)
I really appreciate your help.
You can make use of this answer to include a semicolon after every second element. After that you can transform the resulting character string as date (with day, month and year) or leave it as is.
For completeness, the solution for your problem then is
as.POSIXct(sub(":+$", "", gsub('(.{2})', '\\1:', df$TM)), format = "%H:%M:%S")
My Question is divided into 2 parts:
1st part:
I have a function, getdata() which I use to pull information for a date range.
get_data <- function (fac_num, start_date, end_date) {
if (!(is.null(fac_num) | is.null(start_date) | is.null(end_date))) {
if(end_date - start_date > 7) {
start_date <- end_date - 7
#start_date <- as.Date('2017-07-05')
#end_date <- as.Date('2017-07-06')
#fac_num <- "005"
}
new_start_date <- paste0(start_date,' 05:00:00')
new_end_date <- paste0(end_date + 1,' 05:00:00')
qry <- paste0("SELECT FAC_NUM, USER_ID, APPL_ID, FUNC_ID, ST_ID, NXT_ST_ID, RESP_PRMT_DATA,
ST_DT_TM, END_DT_TM, RESP_PRMT_TY_CDE,
REQ_INP_DATA FROM OPSDBA.STG_RFS_INTERACTION WHERE TRANS_ST_DT_TM >= DATE'",
start_date,"' AND TRANS_ST_DT_TM BETWEEN TO_TIMESTAMP('",new_start_date,"', 'YYYY-MM-DD HH:MI:SS') AND TO_TIMESTAMP('",new_end_date,"', 'YYYY-MM-DD HH:MI:SS')
AND APPL_ID='CTS' AND FAC_NUM='",fac_num,"'")
and then I perform calculations on it.
Further, in my program. I use this getdata() function to pull data for a new set of analysis.
rf_log_perform <- get_data(display_facility_decode(input$facNum2),
input$dateRange2, input$dateRange2 + 1)
Here since I am using just a single date instead of range, I have added one to the range so that the getdata() function would work.
I then wanted to modify the date range in such a way that, it does not show anything past 11:59 for the selected date.
rf_log_perform$date <- ifelse(strftime(rf_log_perform$st_dt_tm, format="%H:%M:%S")<'05:00:00',
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm - 1*86400 , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'),
format(as.POSIXct(strptime(rf_log_perform$st_dt_tm , '%Y-%m-%d %H:%M:%S')),format = '%Y-%m-%d'))
By using the getdata() function, I would be able to pull data for date range 08/29/2017, 05:00:00 to 08/30/2017, 05:00:00 which is considered to be a day in my example.
But for my calculations, I want to discard everything which is beyond 08/29/2017, 11:59:59 PM, for more accurate results.
For this purpose, I have added an ifelse statement in there to sort that out. But this isn't behaving as I expect and am confused on why not.
Unfortunately I still can not comment on the main question.
I encourage you to make two adjustments to your question to improve the chances on getting an answer to your question:
1) Please make your example reproducible e.g. provide date ranges, wrap your code in a well defined function etc.
2) Explain what you are trying to achieve. What is your intention and expected result.
I am trying my best at a simple event study in R, with some data retrieved from the Wharton Research Data Service (WRDS). I am not completely new to R, but I would describe my expertise level as intermediate. So, here is the problem. I am using the eventstudies package and one of the steps is converting the physical dates to event time frame dates with the phys2eventtime(..) function. This function takes multiple arguments:
z : time series data for which event frame is to be generated. In the form of an xts object.
Events : it is a data frame with two columns: unit and when. unit has column name of which response is to measured on the event date, while when has the event date.
Width : width corresponds to the number of days on each side of the event date. For a given width, if there is any NA in the event window then the last observation is carried forward.
The authors of the package have provided an example for the xts object (StockPriceReturns) and for Events (SplitDates). This looks like the following:
> data(StockPriceReturns)
> data(SplitDates)
> head(SplitDates)
unit when
5 BHEL 2011-10-03
6 Bharti.Airtel 2009-07-24
8 Cipla 2004-05-11
9 Coal.India 2010-02-16
10 Dr.Reddy 2001-10-10
11 HDFC.Bank 2011-07-14
> head(StockPriceReturns)
Mahindra.&.Mahindra
2000-04-03 -8.3381609
2000-04-04 0.5923550
2000-04-05 6.8097616
2000-04-06 -0.9448889
2000-04-07 7.6843828
2000-04-10 4.1220462
2000-04-11 -1.9078480
2000-04-12 -8.3286900
2000-04-13 -3.8876847
2000-04-17 -8.2886060
So I have constructed my data in the same way, an xts object (DS_xts) and a data.frame (cDS) with the columns "unit" and "when". This is how it looks:
> head(DS_xts)
61241
2011-01-03 0.024247
2011-01-04 0.039307
2011-01-05 0.010589
2011-01-06 -0.022172
2011-01-07 0.018057
2011-01-10 0.041488
> head(cDS)
unit when
1 11754 2012-01-05
2 10104 2012-01-24
3 61241 2012-01-31
4 13928 2012-02-07
5 14656 2012-02-08
6 60097 2012-02-14
These are similar in my opinion, but how it looks does not tell the whole story. I am quite certain that my problem is in how I have constructed these two objects. Below is my R code:
#install.packages("eventstudies")
library("eventstudies")
DS = read.csv("ReturnData.csv")
cDS = read.csv("EventData.csv")
#Calculate Abnormal Returns
DS$AR = DS$RET - DS$VWRETD
#Clean up and let only necessary columns remain
DS = DS[, c("PERMNO", "DATE", "AR")]
cDS = cDS[, c("PERMNO", "DATE")]
#Generate correct date format according to R's as.Date
for (i in 1:nrow(DS)) {
DS$DATE[i] = format(as.Date(toString(DS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
for (i in 1:nrow(cDS)) {
cDS$DATE[i] = format(as.Date(toString(cDS$DATE[i]), format = "%Y %m %d"), format = "%Y-%m-%d")
}
#Rename cDS columns according to phys2eventtime format
colnames(cDS)[1] = "unit"
colnames(cDS)[2] = "when"
#Create list of unique PERMNO's
PERMNO <- unique(DS$PERMNO)
for (i in 1:length(PERMNO)) {
#Subset based on PERMNO
DStmp <- DS[DS$PERMNO == PERMNO[i], ]
#Remove PERMNO column and rename AR to PERMNO
DStmp <- DStmp[, c("DATE", "AR")]
colnames(DStmp)[2] = as.character(PERMNO[i])
dates <- as.Date(DStmp$DATE)
DStmp <- DStmp[, -c(1)]
#Create a temporary XTS object
DStmp_xts <- xts(DStmp, order.by = dates)
#If first iteration, just create new variable, otherwise merge
if (i == 1) {
DS_xts <- DStmp_xts
} else {
DS_xts <- merge(DS_xts, DStmp_xts, all = TRUE)
}
}
#Renaming columns for matching
colnames(DS_xts) <- c(PERMNO)
#Making sure classes are the same
cDS$unit <- as.character(cDS$unit)
eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
So, if I run phys2eventtime(..) it returns:
> eventList <- phys2eventtime(z = DS_xts, events = cDS, width = 10)
Error in if ((location <= 1) | (location >= length(x))) { :
missing value where TRUE/FALSE needed
In addition: Warning message:
In findInterval(when, index(x)) : NAs introduced by coercion
I have looked at the original function (it is available at their GitHub, can't use more than two links yet) to figure out this error, but I ran out of ideas how to debug it. I hope someone can help me sort it out. As a final note, I have also looked at another (magnificent) answer related to this R package (question: "format a zoo object with “dimnames”=List of 2"), but it wasn't enough to help me solve it (or I couldn't yet comprehend it).
Here is the link for the two CSV files if you would like to reproduce my error (or solve it!).
I'm working with minute data of NASDAQ, it has the index "2015-07-13 12:05:00 EST". I adjusted the system time with Sys.setenv(TZ = 'EST').
I want to program a simple buy/hold/sell strategy, therefore I create a vector of flat positions as a foundation.
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
Then I want to apply a constraint, that in a certain time window, positions are bound to be flat, which in my case means equal to 1.
pos_flat["T13:41/T14:00"] <- 1
And this returns the error:
"Error in as.POSIXlt.POSIXct(.POSIXct(.index(x)), tz = indexTZ(x)) :invalid 'tz' value".
I also get this error doing other calculations, I just used this example because it is easy and shows the problem.
As extra information:
> Sys.timezone
function (location = TRUE)
{
tz <- Sys.getenv("TZ", names = FALSE)
if (nzchar(tz))
return(tz)
if (location)
return(.Internal(tzone_name()))
z <- as.POSIXlt(Sys.time())
zz <- attr(z, "tzone")
if (length(zz) == 3L)
zz[2L + z$isdst]
else zz[1L]
}
<bytecode: 0x03648ff4>
<environment: namespace:base>
I don't understand the problem with the tz value... Any ideas?
The source of your "invalid 'tz' value" error is because, for whatever reason, R doesn't accept tz = df$var. If you set tz = 'America/New_York' or some other character value, then it will work.
Better answer (instead of using force_tz below) for converting UTC times to various timezones based on location. It is also simpler and better than looping through or using a nested ifelse. I subset and change tz based on a timezone column (which my data already has, if not you can create it). Just make sure you account for all timezones in your data
(unique(df$timezone))
df$datetime2[df$timezone == 'America/New_York'] <- format(df$datetime, tz="America/New_York")[df$timezone == 'America/New_York']
df$datetime2[df$timezone == 'America/Chicago'] <- format(df$datetime, tz="America/Chicago")[df$timezone == 'America/Chicago']
df$datetime2[df$timezone == 'America/Denver'] <- format(df$datetime, tz="America/Denver")[df$timezone == 'America/Denver']
df$datetime2[df$timezone == 'America/Los_Angeles'] <- format(df$datetime, tz="America/Los_Angeles")[df$timezone == 'America/Los_Angeles']
Previous solution: Converting to Local Time in R - Vector of Timezones
require(lubridate)
require(dplyr)
df = data.frame(timestring = c("2015-12-12 13:34:56", "2015-12-14 16:23:32"), localzone = c("America/Los_Angeles", "America/New_York"), stringsAsFactors = F)
df$moment = as.POSIXct(df$timestring, format="%Y-%m-%d %H:%M:%S", tz="UTC")
df = df %>% rowwise() %>% mutate(localtime = force_tz(moment, localzone))
df
You are getting errors because "EST" is not a valid timezone specification. It's an abbreviation that's often used when printing and displaying timezones.
The index is printed as "2015-07-13 12:05:00 EST" because "EST" probably represents Eastern Standard Time in the United States. If you want to set the TZ environment variable to that timezone, you should use Sys.setenv() with Country/City notation:
Sys.setenv(TZ = "America/New_York")
You can also set the timezone in the xts constructor:
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ), tzone = "America/New_York")
Your error occurs because of a misinterpretation of the time object. You need to have UNIX timestamps in order to use something like
pos_flat["T13:41/T14:00"] <- 1
Try a conversion of your indices by doing something like this:
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
As you want to use EST, you have to change your environment variables (if you are not living in EST timezone). So all in all, this should work:
Sys.setenv(TZ = 'EST')
#load stuff
#...
index(NASDAQ) <- as.POSIXct(strptime(index(NASDAQ), "%Y-%m-%d %H:%M:%S"))
pos_flat <- xts(rep(0, nrow(NASDAQ)), index(NASDAQ))
pos_flat["T13:41/T14:00"] <- 1
For further information, have a look at the POSIXct and POSIXlt structures in R.
Best regards
While splitting a dataset I end up with the following error, and look for some advice to overcome it:
dt=read.csv("C:/xx/fData.csv")
testdata = dt[year(dt$date) < 2010,]
valid = dt[year(dt$date) > 2010,]
> training = dt[year(dt$date) < 2010,]
Error in as.POSIXlt.character(as.character(x), ...) :
character string is not in a standard unambiguous format
> testing = dt[year(dt$date) > 2010,]
Error in as.POSIXlt.character(as.character(x), ...) :
ps: fData looks like
| date | number
----------------------
1 |1/1/2011| 0
2 |1/2/2011| 0
Given that the first part of your string is the month day and the second part is the month you should convert your string into a date format before calling the year function
dt$date <- as.Date(dt$date,'%d/%m/%Y')