How could I modify raw date values. For example.
> DF2
Date
1 11012018
2 7312014
3 6102015
4 10202017
Into modified date values the one with "/"
> DF2
Date
1 11/01/2018
2 7/31/2014
3 6/10/2015
4 10/20/2017
Use lubridate for all date and time related tasks
> lubridate::mdy(c("11012018", "7/31/2014"))
[1] "2018-11-01" "2014-07-31"
You can also format it if needed:
format(lubridate::mdy(c("11012018", "7/31/2014")), "%m/%d/%Y")
[1] "11/01/2018" "07/31/2014"
Assuming: your date is in month-date-year format. Else you can use other lubridate functions
We could also use(It is assumed that you just need to add a new separator. In any case, you could convert back to date-time type):
new<-gsub("([0-9]{,2})([0-9]{2})([0-9]{4})","\\1 \\2 \\3",df$Date)
gsub(" ","/",new)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
Edit:
More generally as suggested by #jay.sf ,
test4<-gsub("(^[0-1]?\\d)([0-3]?\\d)(\\d{4}$)","\\1 \\2 \\3",df$Date)
gsub(" ","/",test4)
#[1] "11/01/2018" "7/31/2014" "6/10/2015" "10/20/2017"
This is to account for such date formats as:
test3<-c("11012018", "1112015", "7312014", "7312014", "10202017", "772007", "772007",
"7072007")
One possible solution could be:
df <- transform(df, V1 = as.Date(as.character(V1), "%d%m%Y"))
And another which may convert in the required mm/dd/yyyy format is as below:
df <- data.frame(lapply(df, function(x) as.Date(as.character(x), "%m%d%Y")))
Both the solutions are through the base R package.
Related
I would like to change all the mixed date format into one format for example d-m-y
here is the data frame
x <- data.frame("Name" = c("A","B","C","D","E"), "Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
I hv tried using this code down here, but it gives NAs
newdateformat <- as.Date(x$Birthdate,
format = "%m%d%y", origin = "2020-6-25")
newdateformat
Then I tried using parse, but it also gives NAs which means it failed to parse
require(lubridate)
parse_date_time(my_data$Birthdate, orders = c("ymd", "mdy"))
[1] NA NA "2001-09-12 UTC" NA
[5] "2005-02-18 UTC"
and I also could find what is the format for the first date in the data frame which is "36085.0"
i did found this code but still couldn't understand what the number means and what is the "origin" means
dates <- c(30829, 38540)
betterDates <- as.Date(dates,
origin = "1899-12-30")
p/s : I'm quite new to R, so i appreciate if you can use an easier explanation thank youuuuu
You should parse each format separately. For each format, select the relevant rows with a regular expression and transform only those rows, then move on the the next format. I'll give the answer with data.table instead of data.frame because I've forgotten how to use data.frame.
library(lubridate)
library(data.table)
x = data.table("Name" = c("A","B","C","D","E"),
"Birthdate" = c("36085.0","2001-sep-12","Feb-18-2005","05/27/84", "2020-6-25"))
# or use setDT(x) to convert an existing data.frame to a data.table
# handle dates like "2001-sep-12" and "2020-6-25"
# this regex matches strings beginning with four numbers and then a dash
x[grepl('^[0-9]{4}-',Birthdate),Birthdate1:=ymd(Birthdate)]
# handle dates like "36085.0": days since 1904 (or 1900)
# see https://learn.microsoft.com/en-us/office/troubleshoot/excel/1900-and-1904-date-system
# this regex matches strings that only have numeric characters and .
x[grepl('^[0-9\\.]+$',Birthdate),Birthdate1:=as.Date(as.numeric(Birthdate),origin='1904-01-01')]
# assume the rest are like "Feb-18-2005" and "05/27/84" and handle those
x[is.na(Birthdate1),Birthdate1:=mdy(Birthdate)]
# result
> x
Name Birthdate Birthdate1
1: A 36085.0 2002-10-18
2: B 2001-sep-12 2001-09-12
3: C Feb-18-2005 2005-02-18
4: D 05/27/84 1984-05-27
5: E 2020-6-25 2020-06-25
I have dates in the format Apr42016, Aug12017, Apr112018. I am trying to convert in Y/m/d using R. I have tried the codes below but when I have a single digit for the day it returned NA. Anyone could help me, please?
strptime(data$date, "%b%e%Y")
as.Date (data$date, format="%b%d%Y")
as.POSIXct(data$date, format="%b%e%Y")
Thank you
You can modify the strings with sub (and add a 0 if necessary) before using as.Date:
myvec <- c("Apr42016", "Aug12017", "Apr112018") # the data
myvec2 <- sub("(?<=[^0])(?=[0-9]{5})", "0", myvec, perl = TRUE)
# [1] "Apr042016" "Aug012017" "Apr112018"
as.Date(myvec2, format = "%b%d%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
If you can break up the numbers before as.Date, it will make things much easier. (Borrowing Sven's look-behind.)
sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2",
c("Apr42016", "Aug12017", "Apr112018"), perl=TRUE)
# [1] "Apr-4-2016" "Aug-1-2017" "Apr-11-2018"
From here, the format should be rather straight-forward:
as.Date(sub("(?<=\\D)(\\d+)(\\d{4})$", "-\\1-\\2", c("Apr42016", "Aug12017", "Apr112018"), perl = TRUE),
format="%b-%d-%Y")
# [1] "2016-04-04" "2017-08-01" "2018-04-11"
I'm trying to coerce dates from two formats into a single one that I can easily feed into as.Date. Here's a sample:
library(dplyr)
df <- data_frame(date = c("Mar 29 2017 9:30AM", "5/4/2016"))
I've tried this:
df %>%
mutate(date = gsub("([A-z]{3}) (\\d{2}) (\\d{4}).*",
paste0(which(month.abb == "\\1"),"/\\2","/\\3"), date))
But it gave me this:
date
1 /29/2017
2 5/4/2016
but I want this!
date
1 3/29/2017
2 5/4/2016
It looks like when I use month.abb == "\\1", it doesn't use the capturing group output ("Mar"), it just uses the caller text ("\\1"). I want to do this in regex if possible. I know you can do it another way but want to be slick.
Any ideas?
Here is one way with gsubfn
library(gsubfn)
df$date <- gsubfn("^([A-Za-z]{3})\\s+(\\d{2})\\s+(\\d{4}).*", function(x, y, z)
paste(match(x, month.abb),y, z, sep="/"), df$date)
df$date
#[1] "3/29/2017" "5/4/2016"
Or sub in combination with gsubfn
sub("(\\S+)\\s+(\\S+)\\s+(\\S+).*", "\\1/\\2/\\3",
gsubfn("^([A-z]{3})", setNames(as.list(1:12), month.abb), df$date))
#[1] "3/29/2017" "5/4/2016"
I am experiment with R and came across an issue I don't fully understand.
dates = c("03-19-76", "04/19/76", as.character("04\19\76"), "05.19.76", "060766")
dates
[1] "03-19-76" "04/19/76" "04\0019>" "05.19.76" "060766"
Why should the third date be interpreted and what sort of interpretation is taking place. I also got this output when I left out the as.character function.
Thanks
Echoing the comments, make sure to escape backslashes in strings.
dates = c("03-19-76", "04/19/76", "04\\19\\76", "05.19.76", "060766")
> dates
[1] "03-19-76" "04/19/76" "04\\19\\76" "05.19.76" "060766"
Now that you've got the dates stored, there's actually a lot of built in functions you can use with dates. Dates even have their own object types! To do so use as.Date. Since you're using nonstandard date formats, you have to tell R how you've formatted them.
> as.Date(dates[1], "%m-%d-%y")
[1] "1976-03-19"
> as.Date(dates[2], "%m/%d/%y")
[1] "1976-04-19"
> as.Date("20\\10\\1999", "%d\\%m\\%Y")
[1] "1999-10-20"
a <- as.Date(dates[1], "%m-%d-%y")
b <- as.Date(dates[2], "%m/%d/%y")
> b - a
Time difference of 31 days
d <- as.numeric(b-a)
> d
[1] 31
> a + d^2
[1] "1978-11-05"
Note that since you're using 2-digit years, you use %y. If you used 4-digit years, you'd use %Y. If you forget, you'll get oddities like this:
> as.Date("03/14/2001", "%m/%d/%y")
[1] "2020-03-14"
> as.Date("03/14/10", "%m/%d/%Y")
[1] "0010-03-14"
I have a column of time values, except that they are in character format and do not have the colons to separate H, M, S. The column looks similar to the following:
Time
024201
054722
213024
205022
205024
125440
I want to convert all the values in the column to look like actual time values in the format H:M:S. The values are already in HMS format, so it is simply a matter of inserting colons, but that is proving more difficult than I thought. I found a package that adds commas every three digits from the right to make Strings look like currency values, but nothing for time (without also adding a date value, which I do not want to do). Any help would be appreciated.
Since the data is time related, you should consider storing it in a POSIX format:
> df <- data.frame(Time=c("024201", "054722", "213024", "205022", "205024", "125440")
> df$Time <- as.POSIXct(df$Time, format="%H%M%S")
> df
Time
1 2014-01-05 02:42:01
2 2014-01-05 05:47:22
3 2014-01-05 21:30:24
4 2014-01-05 20:50:22
5 2014-01-05 20:50:24
6 2014-01-05 12:54:40
To output just the times:
> format(df, "%H:%M:%S")
Time
1 02:42:01
2 05:47:22
3 21:30:24
4 20:50:22
5 20:50:24
6 12:54:40
A regular expression with lookaround works for this:
gsub('(..)(?=.)', '\\1:', x$Time, perl=TRUE)
The (?=.) means a character (matched by .) must follow, but is not considered part of the match (and is not captured).
Here is a regex solution:
x <- readLines(n=6)
024201
054722
213024
205022
205024
125440
gsub("(\\d\\d)(\\d\\d)(\\d\\d)", "\\1:\\2:\\3", x)
## [1] "02:42:01" "05:47:22" "21:30:24"
## [4] "20:50:22" "20:50:24" "12:54:40 "
Here the (\\d\\d) says we're looking for 2 digits. The parenthesis breaks the string into 3 parts. Then the \\1: says take chunk 1 and place a colon after it.
Or via date/times classes:
time <- c("024201", "054722", "213024", "205022", "205024", "125440")
time <- as.POSIXct(paste0("1970-01-01", time), format="%Y-%d-%m %H%M%S")
(time <- format(time, "%H:%M:%S"))
# [1] "02:42:01" "05:47:22" "21:30:24" "20:50:22" "20:50:24" "12:54:40"
This gives a chron "times" class vector:
> library(chron)
> times(gsub("(..)(..)(..)", "\\1:\\2:\\3", DF$Time))
[1] 02:42:01 05:47:22 21:30:24 20:50:22 20:50:24 12:54:40
The "times" class can display times without having to display the date and supports various methods on the times.
On the other hand, if only a character string is wanted then only the gsub part is needed.