Related
I'm playing around with functions in R and want to create a function that takes a character variable and converts it to a POSIXct.
The time variable currently looks like this:
"2020-01-01T05:00:00.283236Z"
I've successfully converted the time variable in my janviews dataset with the following code:
janviews$time <- gsub('T',' ',janviews$time)
janviews$time <- as.POSIXct(janviews$time, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
Since I have to perform this on multiple datasets, I want to create a function that will perform this. I created the following function but it doesn't seem to be working and I'm not sure why:
set.time <- function(dat, variable.name){
dat$variable.name <- gsub('T', ' ', dat$variable.name)
dat$variable.name <- as.POSIXct(dat$variable.name, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
}
Here's the first four rows of the janviews dataset:
structure(list(customer_id = c("S4PpjV8AgTBx", "p5bpA9itlILN",
"nujcp24ULuxD", "cFV46KwexXoE"), product_id = c("kq4dNGB9NzwbwmiE",
"FQjLaJ4B76h0l1dM", "pCl1B4XF0iRBUuGt", "e5DN2VOdpiH1Cqg3"),
time = c("2020-01-01T05:00:00.283236Z", "2020-01-01T05:00:00.895876Z",
"2020-01-01T05:00:01.362329Z", "2020-01-01T05:00:01.873054Z"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x1488180e0>)
Also, if there is a better way to convert my time variable, I am open to changing my method!
I would use the lubridate package and the as_datetime() function.
lubridate::as_datetime("2020-01-01T05:00:00.283236Z")
Returns
"2020-01-01 05:00:00 UTC"
Lubridate Info
I am working with data in R and would like to change the time zone of some POSIXct data, but only for certain rows within the columns (Survey_Start and Survey_End). Some of the data is already in the proper time zone, so converting the entire column is a problem. My code to change the time zone is:
herps2021 <- herps2021 %>%
mutate(Survey_Start = as.POSIXct(Survey_Start, format = "%H:%M:%S",
tz = "UTC"),
Survey_End = as.POSIXct(Survey_End, format = "%H:%M:%S", tz =
"UTC"),
#Change to proper time zone
Survey_Start = with_tz(Survey_Start, tzone = "America/Los_Angeles"),
Survey_End = with_tz(Survey_End , tzone = "America/Los_Angeles")
)
Is there a way to specify which rows for the columns Survey_Start and Survey_End I want to convert, so that the data already in the correct time zone is unaffected?
Thanks!
you could try using parse_date_time that allows you to parse multiple dates and times in a column.
Looks something like this:
library(lubridate)
parse_date_time(c("2016", "2016-04"), orders = c("Y", "Ym"))
#> [1] "2016-01-01 UTC" "2016-04-01 UTC"
here is the link to the documentation: https://lubridate.tidyverse.org/reference/parse_date_time.html
I am having a data set like below and I am trying to impute the value like below.
ID In Out
4 2019-09-20 21:57:22 NA
4 NA 2019-09-21 5:07:03
When there NA's in lead and lag for each ID's, I am trying to impute the time to cut off the previous day and start new time for the next day. I was doing like this, but I am getting error
df1%>%
group_by(ID) %>%
mutate(In= ifelse(is.na(In) & is.na(lag(Out)),
as.POSIXct(as.character(paste(as.Date(In),"05:00:01"))),
In)) %>%
mutate(Out= ifelse(is.na(Out) & lead(In) == "05:00:01",
as.POSIXct(as.character(paste(as.Date(Out),"05:00:00"))),
Out))
The desired output will be
ID In Out
4 2019-09-20 21:57:22 2019-09-21 05:00:00
4 2019-09-21 5:00:01 2019-09-21 5:07:03
Dput for the data
structure(list(concat = c("176 - 2019-09-20", "176 - 2019-09-20",
"176 - 2019-09-20", "176 - 2019-09-20", "176 - 2019-09-21"),
ENTRY = structure(c(1568989081, 1569008386, 1569016635, 1569016646,
NA), class = c("POSIXct", "POSIXt"), tzone = "UTC"), EXIT = structure(c(1569005439,
1569014914, 1569016645, NA, 1569042433), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -5L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000007e21ef0>)
Finally, I got the desired output by separating the date and time and pasting it back. Definitely this is not a efficient way to achieve this. May be some one can suggest other efficient way to do this which gives some learning at least.
df%>%
mutate(ENTRY_date = as.Date(ENTRY)) %>%
mutate(EXIT_date = as.Date(EXIT))%>%
mutate(ENTRY_time = format(ENTRY,"%H:%M:%S"))%>%
mutate(EXIT_time = format(EXIT,"%H:%M:%S"))%>%
mutate(Entry_date1 = if_else(is.na(ENTRY_date)&is.na(lag(EXIT_date)),EXIT_date,ENTRY_date))%>%
mutate(Exit_date1 = if_else(is.na(EXIT_date)& is.na(lead(ENTRY_date)),ENTRY_date,EXIT_date))%>%
mutate(Entry_time1 = if_else(is.na(ENTRY_time)&is.na(lag(EXIT_time)),"05:00:01",ENTRY_time))%>%
mutate(Exit_time1 = if_else(is.na(EXIT_time)& is.na(lead(ENTRY_time)),"04:59:59",EXIT_time))%>%
mutate(ENTRY1 = as.POSIXct(paste(Entry_date1, Entry_time1), format = "%Y-%m-%d %H:%M:%S"))%>%
mutate(EXIT1 = as.POSIXct(paste(Exit_date1, Exit_time1), format = "%Y-%m-%d %H:%M:%S"))
First, using your dput() data did not work for me. Anyway, if I understand your question correctly you can do it like this:
# load pacakge
library(lubridate)
# replace missing In values with the corresponding Out values,
# setting 5:00:01 as time.
df$In[is.na(df$In)] <- ymd_hms(paste0(as.Date(df$Out[is.na(df$In)]), " 5:00:01"))
# same idea but first we save it as a vector...
Out <- ymd_hms(paste0(as.Date(df$In[is.na(df$Out)]), " 5:00:00"))
# ... then we add one day
day(Out) <- day(Out) + 1; df$Out[is.na(df$Out)] <- Out
This works for the data that you provided but if Out time is 2019-09-21 04:07:03, for example, then the correstponding In time is later, namely 2019-09-21 05:00:01. I do not know if this is intended. If not please specify your question.
I used this data
structure(list(In = structure(c(1569016642, NA), tzone = "UTC", class = c("POSIXct",
"POSIXt")), Out = structure(c(NA, 1569042423), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), .Names = c("In", "Out"), row.names = c(NA, -2L), class = "data.frame")
I've never found an efficient way to solve a problem I've encountered every time I try to combine different sources of time series data. By different sources, I mean combining say a data source from the internet (yahoo stock prices) with say a local csv time series.
yahoo.xts # variable containing security prices from yahoo
local.xts # local time series data
cbind(yahoo.xts,local.xts) # combine them
The result is as follows:
I get a combined xts data frame with different time for a given date. What I want is ignore the time for a given day and align them. The way I've been solving this problem is to extract the two separate sources of data's index and converting using as.Date function and then re-wrapping them as xts object. My question is if there is another better more efficient way that I missed.
Note: I am unsure how to provide a good example of a local data source to give you guys a good way to replicate the problem but the following is a snippet of how to get data from online.
require(quantmod)
data.etf = env()
getSymbols.av(c('XOM','AAPL'), src="av", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,
output.size="full",env = data.etf, set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)
Heres some data:
Yahoo:
structure(c(112.68, 109.2, 107.86, 104.35, 104.68, 110.66), class = c("xts",
"zoo"), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "America/Chicago", tzone = "America/Chicago", index = structure(c(1508457600,
1508716800, 1508803200, 1508889600, 1508976000, 1509062400), tzone = "America/Chicago", tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 1L), .Dimnames = list(NULL, "XIV"))
Local structure:
structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332,
-0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784,
0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898,
-0.202388932508539, 0.0382888645282672, -0.00800908217028123,
-0.0798424223984417, 0.00268898461896916, 0.00493307845560457,
0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892,
-0.0597168456705514, -0.0867777000321366, 0.283394650847026,
-0.0100414455118704, 0.106355723615723, -0.0640682814821423,
0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts",
"zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600,
1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30",
"D.60", "D.90", "D.120", "D.150")))
If you understand the sources of your problem, perhaps you can avoid the problem in the first place.
Your problem is that the 19:00:00 stamps in your printed results correspond to UTC dates (as at 12AM UTC) converted to "America/Chicago" POSIXct timestamps, when the merge happens.
As you've pointed out, one solution is to make new xts time indexes which are all of date format. But it does get annoying. It's best to avoid the situation in the first place, if you can, otherwise you have to resort to changing the date time series to a POSIXct time series with appropriate timezones.
The key thing you need to understand when you have misaligning xts objects with date data (or more precisely, what you think is date data), is that the time zones are not aligning in the objects. If the timezones are aligning in the time indexes of your xts objects, then you will get the correct merging without the undesirable behaviour. Of course, date objects don't have timezones, and by default they will be given the timezone "UTC" if they are merged with xts objects with time indexes of type POSIXct.
# reproduce your data (your code isn't reproducible fully for me:
require(quantmod)
data.etf = new.env()
getSymbols(c('XOM','AAPL'), src="yahoo", api.key="your-own-key",from = '1970-01-01',adjusted=TRUE,output.size="full",env = data.etf, set.symbolnames = T, auto.assign = T)
yahoo.xts = Cl(data.etf$XOM)
z <- structure(c(0.176601541324807, -0.914132074513824, -0.0608652702022332,
-0.196679777210441, -0.190397155984135, 0.915313388202916, -0.0530280808936784,
0.263895885521142, 0.10844973759151, 0.0547864992300319, 0.0435149080877898,
-0.202388932508539, 0.0382888645282672, -0.00800908217028123,
-0.0798424223984417, 0.00268898461896916, 0.00493307845560457,
0.132697099147406, 0.074267173330532, -0.336299384720176, -0.0859815663679892,
-0.0597168456705514, -0.0867777000321366, 0.283394650847026,
-0.0100414455118704, 0.106355723615723, -0.0640682814821423,
0.0481841070155836, -0.00321273561708742, -0.13182105331959), .indexCLASS = c("POSIXct",
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = structure("America/Chicago", .Names = "TZ"), tzone = structure("America/Chicago", .Names = "TZ"), class = c("xts",
"zoo"), na.action = structure(1L, class = "omit", index = 1080540000), index = structure(c(1508475600,
1508734800, 1508821200, 1508907600, 1508994000, 1509080400), tzone = structure("America/Chicago", .Names = "TZ"), tclass = c("POSIXct",
"POSIXt")), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("D.30",
"D.60", "D.90", "D.120", "D.150")))
#inspect the index timezones and classes:
> class(index(z))
# [1] "POSIXct" "POSIXt"
> class(index(yahoo.xts))
# [1] "Date"
indexTZ(z)
# TZ
# "America/Chicago"
indexTZ(yahoo.xts)
# [1] "UTC"
You can see that yahoo.xts is using a date class. When this is merged with a POSIXct class (i.e. with z, it will be converted to the "UTC" timestamp.
# Let's see what happens if the timezone of the yahoo.xts2 object is the same as z:
yahoo.xts2 <- xts(coredata(yahoo.xts), order.by = as.POSIXct(as.character(index(yahoo.xts)), tz = "America/Chicago"))
str(yahoo.xts2)
An ‘xts’ object on 1970-01-02/2017-10-27 containing:
Data: num [1:12067, 1] 1.94 1.97 1.96 1.95 1.96 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr "XOM.Close"
Indexed by objects of class: [POSIXct,POSIXt] TZ: America/Chicago
xts Attributes:
NULL
u2 <- merge(z,yahoo.xts2)
tail(u2)
class(index(u2))
# [1] "POSIXct" "POSIXt"
tail(u2, 3)
# D.30 D.60 D.90 D.120 D.150 XOM.Close
# 2017-10-25 -0.1966798 0.05478650 0.002688985 -0.05971685 0.048184107 83.17
# 2017-10-26 -0.1903972 0.04351491 0.004933078 -0.08677770 -0.003212736 83.47
# 2017-10-27 0.9153134 -0.20238893 0.132697099 0.28339465 -0.131821053 83.71
Everything is as expected now.
A shortcut that you might find useful is to this:
z3 <- as.xts(as.data.frame(z), dateFormat="Date")
tail(merge(z3, yahoo.xts))
# D.30 D.60 D.90 D.120 D.150 XOM.Close
# 2017-10-20 0.17660154 -0.05302808 0.038288865 0.07426717 -0.010041446 83.11
# 2017-10-23 -0.91413207 0.26389589 -0.008009082 -0.33629938 0.106355724 83.24
# 2017-10-24 -0.06086527 0.10844974 -0.079842422 -0.08598157 -0.064068281 83.47
# 2017-10-25 -0.19667978 0.05478650 0.002688985 -0.05971685 0.048184107 83.17
# 2017-10-26 -0.19039716 0.04351491 0.004933078 -0.08677770 -0.003212736 83.47
# 2017-10-27 0.91531339 -0.20238893 0.132697099 0.28339465 -0.131821053 83.71
Convert to a data.frame, then convert back to an xts with the appropriate parameter setting : dateFormat="Date". Now you are working with an xts object with a time index that is of type date with no timezone issues:
class(index(merge(z3, yahoo.xts)))
#[1] "Date"
I have a data frame in R names as data.
data <- as.xts(read.zoo("data1.csv",sep=",",tz="" ,header=T))
data index in the format 2004-01-04 09:44:00 IST
I applied the operation to change the index to Dates
index(data) <- as.Date(index(data))
Output should be 2004-01-04 but system produces 2004-01-03.
This works correctly in Windows but does not work on Linux.
Arun is correct that the problem is with locale. Your data has an Indian Standard Time stamp, and you have a US locale, which is at least 10.5 hours behind. Hence the time 09:44 is actually late the previous evening in your time zone.
Dates and time are horribly complicated, and R uses underlying OS capabilities to make it's calculations, which is why you see different results on different machines. Linux is POSIX compliant and understands time zones like "IST", which allows it to make the change to the previous night. Windows does not, which is why it gives the date as 01-04. To get the correct time zone update on Windows, you need to specify the full name of the time zone, "Asia/Kolkata". Wikipedia has a list of time zone names.
EDIT: Actually, R ships with a file containing all the "Continent/City" (Olson-style) names that it accepts. It is stored in
file.path(R.home("share"), "zoneinfo", "zone.tab")
and the example on the help page ?Sys.timezone tells you how to programmatically read it.
I find the lubridate package makes things a little easier to see what is happening.
library(lubridate)
x <- ymd_hms("2004-01-04 09:44:00 IST", tz = "Asia/Kolkata")
x
# [1] "2004-01-04 09:44:00 IST"
with_tz(x, "America/New_York")
# [1] "2004-01-03 23:14:00 EST"
Date <- c("2010-01-04 09:04:00", "2010-01-04 09:05:00")
Open <- c(5222.9, 5220.2)
Low <- c(5224.6, 5222.95)
High <- c(5220.1, 5218.6)
Close <- c(5220.35, 5222.95)
x <- data.frame(Date = Date, Open = Open, Low = Low, High = High, Close = Close)
as.Date(x$Date)
Output:
[1] "2010-01-04" "2010-01-04"
It seems alright to me.
Edit:
require(zoo)
data <- as.xts(read.zoo("data1.csv",sep=",",tz="" ,header=T))
> dput(data)
structure(c(5222.9, 5220.2, 5224.6, 5222.95, 5220.1, 5218.6,
5220.35, 5222.95), .Dim = c(2L, 4L), .Dimnames = list(NULL, c("Open",
"Low", "High", "Close")), index = structure(c(1262592240, 1262592300
), tzone = "", tclass = c("POSIXct", "POSIXt")), class = c("xts",
"zoo"), .indexCLASS = c("POSIXct", "POSIXt"), tclass = c("POSIXct",
"POSIXt"), .indexTZ = "", tzone = "")
> as.Date(index(data))
[1] "2010-01-04" "2010-01-04"
On my Mac it works right. I suspect your system locale is set wrong. Also, you may want to check it within R.
What does this command Sys.getlocale() give you in Windows and in Linux within R?