I want to format the date in ISO 8601 format using lubridate. At the moment the code I have parses the date almost the way I want. The only thing I want to change is to have a colon in the time zone offset. My code at the moment:
dateTime <- str_match(fileName, dateTimeRegex)[2] %>% ymd_hms() %>% strftime(format = "%y-%m-%dT%H:%M:%S%z", tz = "UTC")
Sample output:
"19-09-26T10:45:00+0000"
Expected output:
"19-09-26T10:45:00+00:00"
Is there a simple way to do it, without parsing this manually? %z creates 0000, but I need a colon there
From Wikipedia (emphasis mine):
The UTC offset is the difference in hours and minutes from Coordinated Universal Time (UTC, or GMT) for a particular place and date. It is generally shown in the format ±[hh]:[mm], ±[hh][mm], or ±[hh]. So if the time being described is one hour ahead of UTC (such as the time in Berlin during the winter), the UTC offset would be "+01:00", "+0100", or simply "+01".
HH:MM is just one way to format time offsets, the others being HHMM and HH, so your output conforms to ISO 8601.
We can use regex to achieve your desired output. Using sub
x <- "19-09-26T10:45:00+0000"
sub("(.*\\+)(\\d{2})(\\d{2})", "\\1\\2:\\3", x)
#[1] "19-09-26T10:45:00+00:00"
Related
I've got a dataset with tweets and the informations twitter provides about them. I need to transform my dates from the format given to one I can understand properly (preferentially using a function I can choose the format, since I might need to select the tweets by day of the week, time of the day or anything like that) using R, I'm just starting to learn the language.
the format I've got the dates in is:
1420121295000
1420121298000
I've researched a bit before answering and tried to use functions like as.POSIXct, as>POSIXlt and others, they all got me this error:
Error in as.POSIXct.default(date, format = "%a %b %d %H:%M:%S %z %Y", :
do not know how to convert 'date' to class "POSIXct"
The format above is in epochs. Assuming this is in milliseconds since the epoch (you would have to double-check with the Twitter api), you can convert from epoch to UTC time using anytime function from the anytime package as shown below, which returns "2015-01-01 14:08:15 UTC."
anytime(1420121295000*0.001) #times 0.001 to convert to seconds
format(anytime(1420121295000*0.001), tz = "America/New_York", usetz=TRUE) #converting from UTC to EST timezone.
I'm struggling with converting character class dates of many different format types (e.g., yyyy/mm/dd; mm/dd/yyyy; yyyy-mm-dd; mm-dd-yyyy; yy-mm-dd; mm-dd-yy; etc.) to POSIXlt class. Ideally, I would like to convert all birth_dates to POSIXlt class with yyyy/mm/dd format (see sample data below). Is there any simple way to do this in R?:
id birth_date start_date age
102 08/09/1993 2013/09/01 20
103 1995-02-21 2013/09/01 18
104 01-15-94 2013/09/01 19
105 88-12-30 2013/09/01 24
Here is what I have been doing thus far. Unfortunately, this doesn't seem to work (I wind up with more NAs than there should be) given all of the different ways in which the original date is formatted:
library(lubridate)
data$birth_date1<-as.Date(data$birth_date,format="%Y-%m-%d") #Convert character class to date class
data$birth_date2<-ymd(swc3$birth_date1) #Convert date class to POSIXlt class using lubridate pkg
That's horrible. Could be worse though. At least there are delimiters in there, like "-" and "/".
Short Answer
Yes, there's an easy way to parse that in R. Apply parse_date_time() separately to each birth date, giving it a decent orders list to chose from, and carefully set the order of the guesses. You'll need to convert the "integer-time" to a useful time when you're done.
See the Long Answer for details.
Long Answer
This is why the lubridate package has parse_date_time(). But there are problems. Let's see:
require(lubridate)
# WRONG! doesn't work as intended.
as.Date(
parse_date_time(data$birth_date,
orders=c("ymd", "mdy", "mdY", "Ymd")
)
)
[1] "1993-08-09" "1995-02-21" "1994-01-15" "0088-12-30"
That looks great, except for the last one. What's going on?
parse_date_time() is selecting a "best fit" set of orders and formats to use when parsing the dates, and the last element is the odd one out.
To make this work as intended, you'll need to apply parse_date_time() one-by-one to each date, because each date format was apparently selected more-or-less at random. This will be slower, but it will give more useful answers.
# RIGHT. Some conversion of results required.
parsed <- sapply(data[,"birth_date"],
parse_date_time,
orders=c("ymd", "mdy", "mdY", "Ymd") )
parsed
08/09/1993 1995-02-21 01-15-94 88-12-30
744854400 793324800 758592000 599443200
Ok, those look like Unix-time integers, which are the unclass()'d version of what parse_date_time() produces. And none are negative, so they must all have happened after 1970. This is encouraging. Convert:
# Conversion of results
parsed <- as.POSIXct(parsed, origin="1970-01-01", tz = "GMT")
as.Date(parsed)
08/09/1993 1995-02-21 01-15-94 88-12-30
"1993-08-09" "1995-02-21" "1994-01-15" "1988-12-30"
lubridate and parse_date_time() are very good at what they do.
Since you asked for POSIXlt, not Date types:
as.POSIXlt(parsed)
08/09/1993 1995-02-21
"1993-08-09 10:00:00 AEST" "1995-02-21 11:00:00 AEDT"
01-15-94 88-12-30
"1994-01-15 11:00:00 AEDT" "1988-12-30 11:00:00 AEDT"
Though I personally prefer only having dates when the actual time isn't important; these are assumed to be all happening at midnight UTC, and are converted to my time zone (Eastern Australia).
Unix time is 1435617000.
as.Date(1435617000,origin="01-01-1970")
[1] "3930586-11-23"
Which is wrong. I'm trying to (a) get the correct date, which, per epoch converter is GMT: Mon, 29 Jun 2015 22:30:00 GMT.
How do I get R to tell me the month, day, year, hour, minute & second? Thank you.
I think the reason why that happen is because as.Date converts arguments to class date objects. In this case you do not need a date but a class POSIXct object because your input, the x vector, contains other informations that as.Date is not able to manage. Another problem that even with the right function could appear, is that if when you do not specify the right time zone with the tz argument (except the case where your time zone is the same as the original time).
The following code does the job.
x <- 1435617000
as.POSIXct(x, origin = "1970-01-01", tz ="GMT")
[1] "2015-06-29 22:30:00 GMT"
Use as.Date
Just in the case you wanted only the date but you have a complete Unix time like x, you have to just divide by 86400 (which is the number of seconds in a day!) to get only the right date.
as.Date(x/86400L, origin = "1970-01-01")
[1] "2015-06-29"
Another important detail
The origin argument has to be supplied with YYYY-MM-DD and not like you did DD-MM-YYYY I am not sure but I think that the former is the only accepted and correct way.
I'm having trouble changing date format in R. I have a vector "StartDate" with dates and time for instance in the format:
01Feb1991 00:00
I did:
as.POSIXct(as.character(bio$StartDate), format = "%d/%m/%Y %H:%M")
...but I got NAs as a result. Would there be a different way to change the vector into date format?
The format you provide has to match your string. In your case, that's '%d%b%Y %H:%M' (you don't have slashes between day, month and year, and your month is the abbreviated name, not the number).
as.POSIXct('01Feb1991 00:00', format='%d%b%Y %H:%M')
See ?strptime (mentioned in ?as.POSIXct) for various tokens you can use for dates.
In most cases, we convert numeric time to POSIXct format using R. However, if we want to compare two time points, then we would prefer the numeric time format. For example, I have a date format like "2001-03-13 10:31:00",
begin <- "2001-03-13 10:31:00"
Using R, I want to covert this into a numeric (e.g., the Julian time), perhaps something like the passing seconds between 1970-01-01 00:00:00 and 2001-03-13 10:31:00.
Do you have any suggestions?
The Julian calendar began in 45 BC (709 AUC) as a reform of the Roman calendar by Julius Caesar. It was chosen after consultation with the astronomer Sosigenes of Alexandria and was probably designed to approximate the tropical year (known at least since Hipparchus). see http://en.wikipedia.org/wiki/Julian_calendar
If you just want to remove ":" , " ", and "-" from a character vector then this will suffice:
end <- gsub("[: -]", "" , begin, perl=TRUE)
#> end
#[1] "20010313103100"
You should read the section about 1/4 of the way down in ?regex about character classes. Since the "-" is special in that context as a range operator, it needs to be placed first or last.
After your edit then the answer is clearly what #joran wrote, except that you would need first to convert to a DateTime class:
as.numeric(as.POSIXct(begin))
#[1] 984497460
The other point to make is that comparison operators do work for Date and DateTime classed variables, so the conversion may not be necessary at all. This compares 'begin' to a time one second later and correctly reports that begin is earlier:
as.POSIXct(begin) < as.POSIXct(begin) +1
#[1] TRUE
Based on the revised question this should do what you want:
begin <- "2001-03-13 10:31:00"
as.numeric(as.POSIXct(begin))
The result is a unix timestamp, the number of seconds since epoch, assuming the timestamp is in the local time zone.
Maybe this could also work:
library(lubridate)
...
df <- '24:00:00'
as.numeric(hms(df))
hms() will convert your data from one time format into another, this will let you convert it into seconds. See full documentation.
I tried this because i had trouble with data which was in that format but over 24 hours.
The example from ?as.POSIX help gives
as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S"))
so for you it would be
as.numeric(as.POSIXct(strptime(begin, "%Y-%m-%d %H:%M:%S")))