I am trying to make my water quality sensor data into time-series data for analysis in R. There is one column with date (format m/dd/yyyy) and another with time (hh:mm:ss)
I have managed to paste them together into a character vector and then attempted to use the anytime function to convert the DateTime to POSIXct format.
data$DateTime <- as.character(paste(data$Date, data$Time))
data$DateTime2 <- anytime(as.character(data$DateTime))
The above code works for some of my data but not all of the long time series. It creates NAs for some DateTimes, and converts other periods to all 00:00:00 but on the correct date.
I have also tried strptime and as.POSIXct functions, but both of those do not recognize the input formats. and makes all DateTimes NAs
as.POSIXct(strptime(paste("12/30/2019","05:45:00"),format="%m/%d/%Y %T"))
[1] "2019-12-30 05:45:00 CET"
class(as.POSIXct(strptime(paste("12/30/2019","05:45:00"),format="%m/%d/%Y %T")))
[1] "POSIXct" "POSIXt"
>
Related
I'm using R and trying to convert a datetime field into just the date? R gives me the desired format but keeps rounding up some of the day values. Specifically everything after 12 noon! I could not find any threads that address this exact problem. I actually figured out a solution but wanted to post the question because I spent a whole week troubleshooting.
#Convert the datetime field from character to a datetime
main_df$datetime <- strptime(main_df$ï..Date, format = "%m/%d/%Y %H:%M")
main_df$datetime <- as.POSIXct(main_df$datetime, tz = Sys.timezone())
head(main_df$datetime)
class(main_df$datetime)
#Remove the poorly computer-titled character field that contained datetime info
main_df <- subset(main_df, select = -c(ï..Date))
#Use the NEW datetime field to create a date field
#main_df$Date <- trunc(main_df$datetime,"days")
main_df$Date <- as.Date(main_df$datetime, format = "%m/%d/%Y")
?as.Date()
class(main_df$Date)
head(main_df$Date)
That returned:
head(main_df$datetime)
[1] "2020-05-16 00:31:00 CDT" "2020-05-16 00:30:00 CDT" "2020-05-15 23:33:00 CDT" "2020-05-15 15:33:00 CDT"
[5] "2020-05-15 22:31:00 CDT" "2020-05-15 22:12:00 CDT"
and
> class(main_df$Date)
[1] "Date"
>
> head(main_df$Date)
[1] "2020-05-16" "2020-05-16" "2020-05-16" "2020-05-15" "2020-05-16" "2020-05-16"
Notice how the last 4 values for 'Date' should be 2020-05-15 but instead, they are converted to be 2020-05-16. So what are some other ways to fix this? I'm going to post one way that worked but I doubt it's the cleanest.
If we don't want to make use of the 'time', then use a regex to match a space followed by other characters (" .*"), replace with blank in sub and then convert to Date class. The issue with converting to DateTime is that there are times "23:33:00" that would make it convert to next day
main_df$Date <- as.Date(sub(" .*", "", main_df$datetime), format = "%m/%d/%Y")
Do your input dates include a timezone specification? If not, they are ambiguous and the rounding may be right or it may be wrong. If they do include a timezone specification, the lubridate package should handle them correctly.
I would advise against using tz = Sys.timezone() because that would make the interaction between input data and algorithm dependent on geography if your inputs don't include a timezone specification, so what works for you might not work for a different user in a different location.
You could just extract date from datetime as a substring and convert it to Date type, using substr(x,begin,end) function, where x - your column, begin and end - begin and end of string to extract.
main_df$Date <- as.Date(substr(main_df$datetime,1,10))
I had the same issue and that function helped me to convert Datetime to Date without rounding.
I have an imported CSV in R which contains a column of dates and times - this is imported into R as character. The format is "30/03/2020 08:59". I want to convert these strings into a format that allows me to work on them. For simplicity I have made a dataframe which has a single column of these dates (854) in this format.
I'm trying to use the parse_date_time function from lubridate.
It works fine when I reference a single value, e.g.
b=parse_date_time(consults_dates[3,1],orders="dmy HM")
gives b=2020-03-30 09:08:00
However, when I try to perform this on the entire(consults_dates), I get an error, e.g.
c= parse_date_time(consults_dates,orders="dmy HM") gives error:
Warning message:
All formats failed to parse. No formats found.
Apologies - if this is blatantly a simple question, day 1 of R after years of Matlab.
You need to pass the column to parse_date_time function and not the entire dataframe.
library(lubridate)
consults_dates$colum_name <- parse_date_time(consults_dates$colum_name, "dmy HM")
However, if you have only one format in the column you can use dmy_hm
consults_dates$colum_name <- dmy_hm(consults_dates$colum_name)
In base R, we can use :
consults_dates$colum_name <- as.POSIXct(consults_dates$colum_name,
format = "%d/%m/%Y %H:%M", tz = "UTC")
I have a data frame called RequisitionHistory2 with a variable called RequisitionDateTime and the levels are factors which look like 4/30/2019 14:16 I would like to split this into RequisitionDate and RequisitionTime in a datetime format.
I tried this code, but this still does not solve my issue with needing to split these into their own columns. The code also did not work as I got the error below.
mutate(When = as.POSIXct(RequisitionHistory2, format="%m/%d/%. %H:%M %p"))
Error in as.POSIXct.default(RequisitionHistory2, format = "%m/%d/%. %H:%M %p") : do not know how to convert 'RequisitionHistory2' to class “POSIXct”
I would like to have the variable RequisitionDateTime split into RequisitionDate and another variable RequisitionTime in the dataframe RequisitionHistory2. Any help is greatly appreciated!
Do not convert factors to datetime directly. You will need to convert it to a character first and then use a datetime function.
as.Date(as.character("10/25/2018"), format = "%m/%d/%Y")
would work for your date example.
library(lubridate)
mutate(df,When = mdy_hm(RequisitionHistory2))
If your datetime is in 4/30/2019 14:16 format
Note that as.POSIXct() works only on datetimes already in ISO 8601 format. I wrote a blog post about this and I think would be helpful for you to check out:
https://jackylam.io/tutorial/uber-data/
The anytime package ON CRAN directly converts from many formats, including factor and ordered to dates and datetime objects. It also heuristically tries a number of viable formats so that you do not need a format string. See the README at GitHub for an introduction, there is also a vignette
Your example works:
R> library(anytime)
R> anytime(as.factor("4/30/2019 14:16"))
[1] "2019-04-30 14:16:00 CDT"
R> anytime(as.factor("4/3/2019 14:16:17"), useR=TRUE)
[1] "2019-04-03 14:16:17 CDT"
R>
However, the underlying (Boost C++) parser does not like single digit days or month so you may need to flip back to R's parser via useR=TRUE as I did on the second example.
Is there a way to extract the timezone or format the timezone part of a datetime object, in the form of, say "+530" instead of "IST" or "Asia/Kolkata"?
(Need this as it is the ISO 8601 format used in javascript)
Example:
as.POSIXct(1499773898,tz="Asia/Kolkata",origin="1970-01-01")
[1] "2017-07-11 17:21:38 IST"
Instead, I'd like to maybe specify the format argument in as.POSIXct, so that the output looks something like this:
[1] "2017-07-11 17:21:38 +530"
Or a function which can pull out the timezone offset in this manner:
timezone_offset("2017-07-11 17:21:38 IST")
[1] "+530"
Does lubridate or any other package have the capability to do this?
You can do both with format, but note that result is character string, no longer POSIXct object.
x <- as.POSIXct(1499773898,tz="Asia/Kolkata",origin="1970-01-01")
"2017-07-11 17:21:38 IST"
e.g., Show timestamp in ISO 8601:
format(x, "%Y-%m-%dT%H:%M:%S%z")
"2017-07-11T17:21:38+0530"
e.g., Show just offset from UTC:
format(x, "%z")
"+0530"
Note that for operations in R this has little consequence because all POSIXct objects are stored internally as numeric in UTC; seconds from 1970-01-01 00:00:00.
To write POSIXct timestamps in ISO 8601 to file, you can use format as described above or fwrite function in data.table, which does so by default (see dateTimeAs argument).
I have a monthly data file where dates are stored in %tm format of Stata like 2000m1. How I can convert it to dates?
I could do something like manipulate the strings into 2000-01-01 but I would like to avoid this if possible.
as.Date('2000m1') (unsurprisingly) returns NA.
1) yearmon Using the zoo package, this converts it to a "yearmon" class object which may make more sense than converting it to a "Date" given that you have no day of the month. Such objects are internally represented as a year + 0 for Jan, year + 1/12 for Feb, etc. so they sort properly.
library(zoo)
as.yearmon('2000m1', '%Ym%m')
## [1] "Jan 2000"
If you really want "Date" class then the following give the start and end of month respectively:
as.Date(as.yearmon('2000m1', '%Ym%m'))
## [1] "2000-01-01"
as.Date(as.yearmon('2000m1', '%Ym%m'), frac = 1)
[1] "2000-01-31"
2) paste This does not use any packages and while it does use paste it's a fairly minimal use of string manipulation:
as.Date(paste("2000m1", 1), "%Ym%m %d")
## [1] "2000-01-01"
Note: Be sure not to use any solution that returns a POSIXct object rather than a "yearmon" or "Date" object since then you have introduced the possibility of future potential errors based on time zones into your code which can be completely avoided by using an appropriate class. See the R Help Desk article in R News 4/1.
This can be done very easily with the amazing lubridate package:
data <- c("2001m1","2010m3","2015m12","2009m8")
library(lubridate)
parse_date_time(data,orders="%Y%m"):
[1] "2001-01-01 UTC" "2010-03-01 UTC" "2015-12-01 UTC" "2009-08-01 UTC"