How can I parse_date_time with letters and timezones? - r

I have a pretty standard looking date time as a character. I'm trouble parsing it into the datetime because of the letters and redundancies.
How can I parse this as a datetime?
lubridate::parse_date_time("Fri Feb 05 09:58:38 PST 2021", )

You can specify all the formats by directly referring to the documentation of the function parse_date_time(). You can find it here. The tricky thing here is that you need to specify the time zone via the argument tz and not directly in the argument orders.
Below is the code.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
newdate = parse_date_time("Fri Feb 05 09:58:38 PST 2021",
"%a %b %d %H:%M:%S %Y",
tz = "America/Los_Angeles" )
newdate
#> [1] "2021-02-05 09:58:38 PST"
Created on 2021-02-16 by the reprex package (v0.3.0)
Does this work for you?

Related

Change of Date format

I have a table with date and temperature, and I need to convert the date from this format 1/1/2021 12:00:00 AM to just 1/1/21. Please help. I tried to add a new column with the new date using this code
SFtemps$Date <- as.Date(strptime(SFtemps$Date, "%m/%d/%y %H:%M"))
but it's not working.
It should be %I to represent hours as decimal number (01–12), not %H, and %y to
years without century (00–99).
x <- "1/1/2021 12:00:00 AM"
format(strptime(x, "%m/%d/%Y %I:%M:%S %p"), "%m/%d/%y")
[1] "01/01/21"
Note that after you re-foramt the time object, it'll be a pure character string and lose all attributes of a time object.
Your string is actually a timestamp, using lubridate you can convert it to the desired format and then leverage stamp function to get your data formatted in the manner that suits you.
org_str <- "1/1/2021 12:00:00 AM"
library("lubridate")
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
x <- mdy_hms(x = org_str, tz = "UTC")
sf <- stamp("29/01/1997")
#> Multiple formats matched: "%d/%Om/%Y"(1), "%d/%m/%Y"(1)
#> Using: "%d/%Om/%Y"
sf(x)
#> [1] "01/01/2021"
Created on 2022-04-22 by the reprex package (v2.0.1)
Explanation
Contrary to common perception the date doesn't have a format as such. Only string representation of the date can be of a specific format. Consider the following example. Running code:
x <- Sys.Date()
unclass(x)
will return an integer. This is because1:
Thus, objects of the class Date are internally represented as numbers. More specifically, dates in R are actually integers.
The same is applicable to the POSIX objects
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
xn <- now()
class(xn)
#> [1] "POSIXct" "POSIXt"
unclass(xn)
#> [1] 1650644616
#> attr(,"tzone")
#> [1] ""
Created on 2022-04-22 by the reprex package (v2.0.1)
You can always run:
format.Date(lubridate::now(), "%d-%m")
# [1] "22-04"
However, by doing this you are not formatting the date but creating a string representation of the date / timestamp object. Your string representation would have to be converted to timestamp / date object if you want to use it in common date time operations.
Suggestions
Whenever working with date time / date objects I would argue that it's advisable to keep those objects as a correct class maintaining the relevant details (such as time zone in case of timestamps) and only leverage formatting functions when utilising the data in analysis / visual representation, as shown in the sample.
1 Essentials of dates and times
Your format is a bit off. These are two options that work (lubridate is my favorite):
SFtemps <- list()
SFtemps$Date <- '1/1/2021 12:00:00 AM'
> as.Date(strptime(SFtemps$Date, "%m/%d/%Y %r"))
[1] "2021-01-01"
> as.Date(lubridate::mdy_hms(SFtemps$Date))
[1] "2021-01-01"
This will give you the date. If you want to see it as 1/1/2021, use the format function
You can try tryFormats
as.Date(x, format, tryFormats = c("%m/%d/%y, %H:%M"),
optional = FALSE)

Convert non-standard time data in R

I have a dataset with non-standard time data - the Excel file has numbers in a variety of different of different formats as shown below.
Trying to convert it to something usable in R - probably HH:MM AM/PM so I use mutate(H1B.format = format(strptime(H1B, "%I:%M %p"), "%H:%M")).
How would I do this - I tried: separate(H1B, into = c('time', 'ampm'), sep = -2, convert = TRUE) to put AM/PM into a separate column, but still need to figure out how to add colons and zeros as needed.
I'm also fairly new to R, so any help would be great!
You can use lubridate::parse_date_time to parse times in various formats.
library("lubridate")
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
times_to_parse <- c(
"10:30PM", "9:30pm", "12am", "10pm", "6am", "5:30pm",
"1015PM", "1030pm"
)
time_formats <- c(
"%I:%M %p", "%I %p"
)
lubridate::parse_date_time(
times_to_parse, time_formats
)
#> [1] "0000-01-01 22:30:00 UTC" "0000-01-01 21:30:00 UTC"
#> [3] "0000-01-01 00:00:00 UTC" "0000-01-01 22:00:00 UTC"
#> [5] "0000-01-01 06:00:00 UTC" "0000-01-01 17:30:00 UTC"
#> [7] "0000-01-01 22:15:00 UTC" "0000-01-01 22:30:00 UTC"
Created on 2022-03-15 by the reprex package (v2.0.1)

How to extract date from a multiline string or File in R

Suppose there is a file with a multi-line string like:
/Analysis made on 28 september 2011
people who exercise a lot
are healthy/
How can I extract the date 28 September 2011 from the entire file or string, regardless of the month in the date, or whether it's capitalized?
I assume you have more than one date you want to extract here, and that you want the result to be date types (if not, just pass them to format() with the strptime() specification you want, e.g. %e %B %Y - but converting to date first will standardize them, because, for example, you have a lowercase month name here).
What I'm doing here is using R's built-in month.name vector of full month names, and making a single regex string out of it that will match any text with any month name surrounded by date and year numbers. We end up with a list of character vectors, one vector for each document string, with all the date strings extracted from them in order, and then I map as_date() to them with the appropriate parse pattern so that they're actually R dates now.
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
library(tidyverse)
string <-
"Analysis made on 28 september 2011 people who exercise a lot are healthy
Another analysis on 6 May 1998 found otherwise"
pattern <-
paste("[:digit:]{1,2}", month.name, "[:digit:]{4}",
collapse = "|") %>%
regex(ignore_case = TRUE)
pattern
#> [1] "[:digit:]{1,2} January [:digit:]{4}|[:digit:]{1,2} February [:digit:]{4}|[:digit:]{1,2} March [:digit:]{4}|[:digit:]{1,2} April [:digit:]{4}|[:digit:]{1,2} May [:digit:]{4}|[:digit:]{1,2} June [:digit:]{4}|[:digit:]{1,2} July [:digit:]{4}|[:digit:]{1,2} August [:digit:]{4}|[:digit:]{1,2} September [:digit:]{4}|[:digit:]{1,2} October [:digit:]{4}|[:digit:]{1,2} November [:digit:]{4}|[:digit:]{1,2} December [:digit:]{4}"
#> attr(,"options")
#> attr(,"options")$case_insensitive
#> [1] TRUE
#>
#> attr(,"options")$comments
#> [1] FALSE
#>
#> attr(,"options")$dotall
#> [1] FALSE
#>
#> attr(,"options")$multiline
#> [1] FALSE
#>
#> attr(,"class")
#> [1] "regex" "pattern" "character"
str_extract_all(string, pattern) %>%
map(as_date, tz = "", format = "%e %B %Y")
#> [[1]]
#> [1] "2011-09-28" "1998-05-06"
Created on 2019-09-29 by the reprex package (v0.3.0)

R lubridate time zone issue

I'd like to change the time zone of a POSIXct object in R, using the with_tz() function in the lubridate package.
This example I pulled from the web works for me:
meeting <- ymd_hms("2011-07-01 09:00:00", tz = "Pacific/Auckland")
with_tz(meeting, "America/Chicago")
But this one does not, using a snippet of some data:
atime <- as.POSIXct("2016-11-04 18:04:30",
format="%Y-%m-%d %H:%M:%S",
tz="PST")
atime_utc <- with_tz(atime, "UTC")
str() and tz() show that the new object has a time zone of "UTC", and is a POSIXct object, but the times are identical. There should be 8 hours between them after the time zone conversion.
Another solution using a different function would be fine, too.
The comments above should be well taken, but you can also try force_tz depending on your needs:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#>
#> date
atime <- as.POSIXct("2016-11-04 18:04:30",
format="%Y-%m-%d %H:%M:%S",
tz="PST")
#> Warning in strptime(x, format, tz = tz): unknown timezone 'PST'
#> Warning in as.POSIXct.POSIXlt(as.POSIXlt(x, tz, ...), tz, ...): unknown
#> timezone 'PST'
tz(atime)
#> [1] "PST"
atime_utc <- with_tz(atime, "UTC")
force_tz(atime, "UTC")
#> [1] "2016-11-04 10:04:30 UTC"
Created on 2019-03-03 by the reprex package (v0.2.1)

Formatting a date in R without leading zeros

Is there a way to use the format function on a date object, specifically an object of class POSIXlt, POSIXct, or Date, with the format %Y, %m, %d such that leading zeros are stripped from each of those 3 fields?
For example, I would like format(as.Date("1998-09-02"), "%Y, %m, %d") to return 1998, 9, 2 and not 1998, 09, 02.
Just remove the leading zeros at the end:
gsub(" 0", " ", format(as.Date("1998-09-02"), "%Y, %m, %d"))
## [1] "1998, 9, 2"
Use %e to obtain a leading space instead of a leading zero.
You can do this with a simple change to your strftime format string. However, it depends on your platform (Unix or Windows).
Unix
Insert a minus sign (-) before each term you'd like to remove leading zeros from:
format(as.Date("2020-06-02"), "%Y, %-m, %-d")
[1] "2020, 6, 2"
Windows
Insert a pound sign (#) before each desired term:
format(as.Date("2020-06-02"), "%Y, %#m, %#d")
[1] "2020, 6, 2"
I have discovered a workaround by using year(), month() and day() function of {lubridate} package. With the help of glue::glue(), it is easy to do it as following:
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(glue)
dt <- "1998-09-02"
glue("{year(dt)}, {month(dt)}, {day(dt)}")
#> 1998, 9, 2
Created on 2021-04-19 by the reprex package (v2.0.0)
If {tidyverse} is used (suggested by #banbh), then str_glue() can be used:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
dt <- "1998-09-02"
str_glue("{year(dt)}, {month(dt)}, {day(dt)}")
#> 1998, 9, 2
Created on 2021-04-19 by the reprex package (v2.0.0)
A more general solution using gsub, to remove leading zeros from the day or month digits produced by %m or %d. This deletes any zero that is not preceded by a digit:
gsub("(\\D)0", "\\1", format(as.Date("1998-09-02"), "%Y, %m, %d"))

Resources