I have a dataframe with date and time values as characters, as extracted from a database. Currently the date/time looks like this: 2017-03-17 11-56-20
I want it to loook like this 2017-03-17 11:56:20
It doesn't seem to be as simple as replacing all the dashes using gsub as in R: How to replace . in a string?
I'm thinking it has something to do with the positioning, like telling R to only look after the space. Ideas?
Since you're dealing with a date-time object, you can use strptime:
x <- "2017-03-17 11-56-20"
as.character(strptime(x, "%Y-%m-%d %H-%M-%S", tz = ""))
# [1] "2017-03-17 11:56:20"
Try matching the following pattern:
(\\d\\d)-(\\d\\d)-(\\d\\d)$
and then replace that with:
\\1:\\2:\\3
This will match your timestamp exclusively, because of the terminal anchor $ at the end of the pattern. Then, we rebuild the timestamp the way you want using colons and the three capture groups.
gsub("(\\d\\d)-(\\d\\d)-(\\d\\d)$", "\\1:\\2:\\3", x)
[1] "2017-03-17 11:56:20"
You can use library(anytime) to take care of the formatting for you too (which also coerces to POSIX)
library(anytime)
anytime(x)
# [1] "2017-03-17 11:56:20 AEDT"
as.character(anytime(x))
# [1] "2017-03-17 11:56:20"
Related
As the question states, I want to turn "130041" into "13:00:41" i.e. HMS data
lubridate::ymd("20220413") works no problems but lubridate::hms("130041") does not.
I assume there should be a reasonably simply solution?!
Thank you.
If you need the output as a lubridate Period object, rather than a character vector, as you need to perform operations on it, you can use the approach suggested by Tim Biegeleisen of adding colon separators to the character vector and then using lubridate:
x <- "130041"
gsub("(\\d{2})(?!$)", "\\1:", x, perl = TRUE) |>
lubridate::hms()
# [1] "13H 0M 41S"
The output is similar but it is a Period object. I used a slightly different regex as well (add a colon when there are two digits not followed by the end of string) but it is fundamentally the same approach.
You could use sub here:
x <- "130041"
output <- sub("(\\d{2})(\\d{2})(\\d{2})", "\\1:\\2:\\3", x)
output
[1] "13:00:41"
Another regex which will will also work in case when hour uses only on digit.
gsub("(?=(..){1,2}$)", ":", c("130041", "30041"), perl=TRUE)
#[1] "13:00:41" "3:00:41"
Which can than be used in e.g. in lubridate::hms or hms::as_hms.
In base::as.difftime a format could be given: as.difftime("130041", "%H%M%S")
I have a problem for extract dates in files names, in my example a have the file.name object:
file.name<- c("AZAMBUJAI002A20190518T133231_20190518T133919_T22JCM_2021_05_19_01_18_22.tif","RINCAODOSSOARES051B20210107T133231_20190518T133919_T22JSM_2021_05_19_01_18_22",
"VILAPALMA33K20181018T133231_20190518T133919_T23JCM_2020_05_19_01_18_22.tif")
I need to extract in a new object the specific dates: 20190518, 20210107 and 20181018 inside in the files names. But for this a can't use substr because a have different lengths of areas names (AZAMBUJAI002A,RINCAODOSSOARES051B and VILAPALMA33K) and not to use remove letters too (a cause of numeric area id - 002, 051 and 33). The dates in the end before ".tif" separated by "_" is not useful information.
My desirable output is:
mydates
[1] 2019-05-18
[2] 2021-01-07
[3] 2018-10-18
Is there any solution to the problem described? Thanks!!
Solution using base R functions. Works as long as the format is always "yyyymmdd" and the relevant string appears before the first underscore:
file.name<- c("AZAMBUJAI002A20190518T133231_20190518T133919_T22JCM_2021_05_19_01_18_22.tif",
"RINCAODOSSOARES051B20210107T133231_20190518T133919_T22JSM_2021_05_19_01_18_22",
"VILAPALMA33K20181018T133231_20190518T133919_T23JCM_2020_05_19_01_18_22.tif")
Using gsub twice: First (in the inner function) to get rid of everything after the first underscore, and then to extract the sequence of eight numbers ([0-9]{8}:
dates <- gsub(".*([0-9]{8}).*", "\\1", gsub("^([^_]*)_.*", "\\1", file.name))
Finally using as.Date to convert the strings to a R date object (can be re-cast to a string using format):
dates_as_actual_date <- as.Date(dates, format("%Y%m%d"))
dates_as_actual_date is a R date object and looks like this:
[1] "2019-05-18" "2021-01-07" "2018-10-18"
Here is a way to extract using regex - assume you only have year start with 20xx
library(stringr)
library(lubridate)
date_string <- str_extract(file.name,
"20\\d{2}\\[0,1][1-9]\\[0-3][1-9]")
date_string
#> [1] "20190518" "20210107" "20181018"
ymd(date_string)
#> [1] "2019-05-18" "2021-01-07" "2018-10-18"
Created on 2021-05-19 by the reprex package (v2.0.0)
library(lubridate)
ymd(gsub("(^.*_)(20[0-9]{2}_)([0-9]{2}_)([0-9]{2}_)(.*$)",
"\\2\\3\\4",
file.name))
ymd is a lubridate function that identifies YYYY-MM-DD dates, almost irrespective of the separator used.
gsub converts a string. The regex inside:
(^.*_) is the first capture group. Takes anything from the beginning to an underscore.
(20[0-9]{2}_) is the second capture group. It takes a string that starts with 20 and is followed by any two digits and an underscore.
([0-9]{2}_) is the third and fourth capture groups. It takes two digits followed by an underscore.
(.*$) is the last (5th) capture group. Takes anything to the end of the string.
"\2\3\4" returns second, third and fourth capture groups.
EDIT:
The explanation to the code is still OK, but to retrieve the dates just after the names then the code needed is this:
ymd(gsub("(^.*[A-Z])(20[0-9]{2})([0-9]{2})([0-9]{2})(.*$)",
"\\2\\3\\4",
file.name))
In R I'd like to split file names in the format "a_b_c_d.jpg"
For example:
20190104_080314_2048_1700.jpg
The Date: 2019.01.04 and time 08:03:14 is important to me. The other numbers (2048= pixel, 1700= filter) are not.
So I need the a and b value.
If I use strsplit I get: [1]"a" "b" "c" "d.jpg", but i want [1] a [2] b only.
And in the end i want to use the [1] date and [2] time and put it together into one value: 2019-01-04T08:03:14
Has anyone an idea how to do this?
Thanks for helping me with programming for my astrological research about the sun activity :)
You can use a regular expression to get the pieces of the string you need.
library(stringr)
x <- '20190104_080314_2048_1700.jpg'
str_replace(x, '(^.{4})(.{2})(.{2})_(.{2})(.{2})(.{2}).*', '\\1-\\2-\\3T\\4:\\5:\\6')
#[1] "2019-01-04T08:03:14"
The expression is anchored to the start of the string, then gets the first four characters, then the next 2 characters etc. The first bracket is capture group 1 (i.e. \1)
There are two steps here. First is to split the string, as you suggest, and second to convert those outputs to a datetime object.
Step 1:
strsplit produces a list object. To access individual parts of that list, you need to unlist() it and then call the specific elements you're after.
t <- "20190104_080314_2048_1700.jpg"
t.split <- unlist(strsplit(t, "_"))[c(1,2)]
# [1] "20190104" "080314"
Step 2:
Now you can convert these two strings to a datetime object of your choice. Using lubridate makes it pretty easy:
library(lubridate)
ymd_hms(paste(t.split[1], t.split[2]))
# [1] "2019-01-04 08:03:14 UTC"
or you can use the base R function strptime:
strptime(paste(t.split[1], t.split[2]), format="%Y%m%d %H%M%S")
# [1] "2019-01-04 08:03:14 PST"
Note the difference in the default timezones, and be sure to specify the right one (both functions take a tz= argument).
I try to do the date format conversion in R and I encounter the following problem: my original date is
"Dec-2011"
I want is becomes
2011-12
Then I tried
as.Date("Dec-2011",format = "%b-%Y")
It produces NA
I did some search, and I found that if you type
as.Date(gsub("^", "01-", "Dec-2011"), format="%d-%b-%Y")
It will give you
2011-12-01
I understand the function "gsub" does here, which is replace all the "^" in the "Dec-2011" by "01-". However, as you can see in "Dec-2011", there is no "^" and I am thinking whether it should be "01-" or "-01"? I am a little puzzled. What does the "gsub" really do here? And how should I perform date format conversion in R?
The ^ is a metacharacter to represent the start of a string. In the gsub example, the pattern gives the position for the replacement ("01-"), thus it replaces only at the start. Suppose, if we remove the ^, it will replace the "01-" after every character
gsub("", "01-", "Dec-2011")
#[1] "01-D01-e01-c01--01-201-001-101-101-"
gsub("^", "01-", "Dec-2011")
#[1] "01-Dec-2011"
gsub means global substitution, but we can replace it once with sub too
sub("", "01-", "Dec-2011")
#[1] "01-Dec-2011"
For this case, it may be easier to use paste
paste0("01-", "Dec-2011")
#[1] "01-Dec-2011"
Is it possible to convert from R's default date format to a user-defined format ("m/d/yyyy" here) and avoid getting leading zeros in the resulting date?
In the example below I want to have date_2 look just like date_1. Is there a way to do this with format or another function (ideally in one line of code), or will I need to resort to gsub to find and remove the leading zeros in front of the month ("09") and day ("05") in date_2?
I looked in documentation on DateTimeClasses, strptime, POSIXct, and format, but didn't come across an answer.
date_1<-"9/5/2008"
date_num<-as.Date(date_1,"%m/%d/%Y")
> date_num
[1] "2008-09-05"
date_2<-format(date_num,"%m/%d/%Y")
date_2
[1] "09/05/2008"
We can use gsub
gsub('(?<=\\/)0|^0', '', date_2, perl=TRUE)
#[1] "9/5/2008"
Or another version is
gsub('0(?=[1-9]\\/)', '', date_2, perl=TRUE)
#[1] "9/5/2008"
We could also convert to POSIXlt class and then extract the components, paste it.
v1 <- as.POSIXlt(date_num)
paste(v1$mon+1L, v1$mday, v1$year+1900, sep='/')
#[1] "9/5/2008"
Here is a way to go straight from date_num to your desired result, using the chron package.
paste(chron::month.day.year(date_num), collapse = "/")
# [1] "9/5/2008"
This also works nicely for multiple dates. The code is slightly different as we need do.call() here.
do.call(paste, c(chron::month.day.year(Sys.Date()-0:3), sep = "/"))
# [1] "10/9/2015" "10/8/2015" "10/7/2015" "10/6/2015"