R difference between dates with different formats - r

What I'm trying to do is to calculate the total days between two dates. If the dates are in "yyyy/mm/dd" format I did it this way:
EndDate <- "2012/02/01"
StartDate <- "1900/01/01"
DiffBewtweenDates <- data.frame(date=c(EndDate),start=c(StartDate))
DiffBewtweenDates$date_diff <- as.Date(as.character(DiffBewtweenDates$date), format="%Y/%m/%d")-
as.Date(as.character(DiffBewtweenDates$start), format="%Y/%m/%d")
DiffBewtweenDates
And it worked. But I'm requested to get at least the EndDate in this format "FullDayName, DayNumber of FullMonthName of FullYearNumber". Something like this "Sunday, 1 of February of 2012".
As I understand by the R Manual, it would be ...format="%A, %d of %B of %Y"
But it doesn't work and I can't figure out why.
Thanks in advance for any idea.

Perhaps you got to change your locale to english:
Sys.setlocale("LC_TIME", "english")
date <- "Sunday, 1 of February of 2012"
lubridate::guess_formats(date, orders = "dmy")
# dmy
# "%A, %d of %B of %Y"
as.Date(date, guess_formats(date, orders = "dmy"))
# [1] "2012-02-01"
Anyway, you can use lubridate's guess_formats function to guess the formats for many date strings.

Simply to calculate difference in days and get desired output you can do
Sys.setlocale("LC_TIME", "C")
EndDate <- as.Date("2012/02/01", format = "%Y/%m/%d")
StartDate <- as.Date("1900/01/01", format = "%Y/%m/%d")
EndDate - StartDate
# Time difference of 40938 days
format(EndDate, "%A, %d of %B of %Y")
# [1] "Wednesday, 01 of February of 2012"

Related

Timestamp conversion in R and calculating Time Difference between 2 Columns of different DFs

I need to calculate time difference in minutes/hours/days etc between 2 Date-Time columns of two dataframes, please find the details below
df1 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Mon Apr 1 14:23:09 1980", "Sun Jun 12 12:10:21 1975", "Fri Jan 5 18:45:10 1985", "Thu Feb 19 02:26:19 1990"))
df2 <- data.frame (Name = c("Aks","Bob","Caty","David"),
timestamp = c("Apr-01-1980 14:28:00","Jun-12-1975 12:45:10","Jan-05-1985 17:50:30","Feb-19-1990 02:28:00"))
I am facing problem in converting df1$timestamp and df2$timestamp , here POSIXct & as.Date are not working getting error - non numeric argument to binary operator
I need to calculate time diff in mins/hrs or days
One approach is strptime and indicate the appropriate directives in the datetime format:
df1$timestamp2 <- strptime(df1$timestamp, "%a %b %d %H:%M:%S %Y")
df2$timestamp2 <- strptime(df2$timestamp, "%b-%d-%Y %H:%M:%S")
In this case, you have:
%a abbreviated weekday name
%b abbreviated month name
%d day of the month
%H hour, 24-hour clock
%M minute
%S second
%Y year including century
Then you can use difftime to get the difference, and specify the units (in this case, difference expressed in hours):
difftime(df1$timestamp2, df2$timestamp2, units = "hours")
Output
Time differences in hours
[1] -0.08083333 -0.58027778 0.91111111 -0.02805556
If locale-setting prevent correct reading, try:
# Store current locale
orig_locale <- Sys.getlocale("LC_TIME")
Sys.setlocale("LC_TIME", "C")
# Convert to posix-timestamp
df1$timestamp <- as.POSIXct( df1$timestamp, format = "%a %b %d %H:%M:%S %Y")
df2$timestamp <- as.POSIXct( df2$timestamp, format = "%b-%d-%Y %H:%M:%S")
# Restore locale
Sys.setlocale("LC_TIME", orig_locale)
# Calculate difference
df2$timestamp - df1$timestamp
# Time differences in mins
# [1] 4.850000 34.816667 -54.666667 1.683333

in R, modify date from aug 07, 2020 to 08,07,2020, also how to remove time zone

[Fri Aug 07, 2020 05:12 UTC]
I have this date format in a column, how to modify it to be 08, 07,2020 05:12
also, how to remove UTC from all columns
Check ?strptime for various format options. First convert the data to POSIXct, you can then use format to get it any format that you want.
x <- 'Fri Aug 07, 2020 05:12 UTC'
x1 <- as.POSIXct(x, format = '%a %b %d, %Y %H:%M UTC', tz = 'UTC')
x1
#[1] "2020-08-07 05:12:00 UTC"
format(x1, '%m,%d,%Y %H:%M')
#[1] "08,07,2020 05:12"
If we want to apply this for multiple columns we can use lapply. For example for first 4000 columns where your dataframe is called df we can do :
cols <- 1:4000
df[cols] <- lapply(df[cols], function(x) format(as.POSIXct(x,
format = '%a %b %d, %Y %H:%M UTC', tz = 'UTC'), '%m,%d,%Y %H:%M'))

How to locate and convert the all date format for a file.txt?

Suppose I have a diary.txt file. I import it as a string. All the dates in this string appear to be YYYY.MM.DD, and I want to locate and convert it into DDMMYYYY. What should I do?
For example, here is a diary.txt,
2018.01.01
It's a nice day.
2018.01.02
Today is a rainy day.
It should be converted into
Jan 01 2018
It's a nice day.
Jan 02 2018
Today is a rainy day.
You first need to coerce your dates to proper date objects (as.Date) and then replace them with a newly formatted date. See ?strptime for the syntax on how to specify the new format.
# import data
diary <- tempfile(fileext = ".txt")
cat("2018.01.01
It's a nice day.
2018.01.02
Today is a rainy day.", file = diary)
xy <- readLines(con = diary)
# coerce to proper date format
dates <- as.Date(xy, format = "%Y.%m.%d")
# replace valid dates with new dates formatted using format()
# date should be all non-NAs
xy[!is.na(dates)] <- format(dates[!is.na(dates)], format = "%b %d %Y") # %b will depend on your locale, see ?strptime
# write to file
writeLines(xy, con = "result.txt")
# contents of result.txt
jan. 01 2018
It's a nice day.
jan. 02 2018
Today is a rainy day.
Notice that it doesn't say Jan, but jan. This is due to my local which doesn't match to what you may be used to.
> Sys.getlocale()
[1] "LC_COLLATE=Slovenian_Slovenia.1250;LC_CTYPE=Slovenian_Slovenia.1250;LC_MONETARY=Slovenian_Slovenia.1250;LC_NUMERIC=C;LC_TIME=Slovenian_Slovenia.1250"
If I set time locale to something else (may only work on windows)
> Sys.setlocale(category = "LC_TIME", locale = "English_United States.1252")
the result is
> xy
[1] "Jan 01 2018" "It's a nice day." "" "Jan 02 2018"
[5] "Today is a rainy day."
Try this out:
# Loading data
data <- readLines("diary.txt")
# Identifying lines with dates
date_lines <- grep("^[[:digit:]]", data)
# Creating dates
data[date_lines] <- format(as.POSIXct(data[date_lines], format = "%Y.%m.%d"), "%b %d %Y")
# Writing to new file
fileConn<-file("diary_fixed.txt")
writeLines(data, fileConn)
close(fileConn)

Dealing with date-time string that has day of the week

I have a date-time string that has day of the week and some meta-data in the string.
d <- "Fri, 14 Jul 2000 06:59:00 -0700 (PDT)"
I need to convert it into a date-time object (e.g. I have a column of these in a data.table) for further analysis. I have dealt with this using regexes to strip off meta-data from the string. Is there a better approach?
What I have is:
m <- regexpr("^\\w+,\\s+", d, perl=TRUE)
regmatches(d, m)
m <- regexpr("\\s-?\\d+\\s\\(\\w+\\)$", d, perl=TRUE)
regmatches(d, m)
ds <- sub("^\\w+,\\s+", "", d)
ds <- sub("\\s-?\\d+\\s\\(\\w+\\)$", "", ds)
Now I can convert this to date-time objects of class Date, Posixlt or Posixct for use in analysis.
dd <- strptime(ds, format="%d %b %Y %H:%M:%S")
dd <- as.Date(ds, format="%d %b %Y %H:%M:%S")
dd <- as.POSIXct(ds, format="%d %b %Y %H:%M:%S")
I wrote the anytime package to help with (among other things) these silly format strings -- so it heuristically just tries a number of them (and focuses on sane ones).
The input you have here qualifies (and is in fact a pretty common form):
R> anytime("Fri, 14 Jul 2000 06:59:00 -0700 (PDT)")
[1] "2000-07-14 06:59:00 CDT"
R>
We do not currently try to capture the timezone offset information at the end, so you have to deal with that after the fact. The display is in CDT which is my local timezone.
There is some more information about anytime on its webpage.
assuming the format of string is going to be constant across your data :
time = trimws(unlist(strsplit(d, "[,-]"))[2])
#[1] "14 Jul 2000 06:59:00"
tz = unlist(strsplit(d, "[,-]"))[3]
tz = gsub("[^A-Z]", "", tz)
#[1] "PDT"
> as.Date(time, format = "%d %b %Y")
[1] "2000-07-14"
> as.POSIXct(time, format = "%d %b %Y %H:%M:%S") #specify th etimezone with tz
[1] "2000-07-14 06:59:00 IST"

Change Format of Date Column

I need to turn one date format into another with RStudio, since for lubridate and other date related functions a standard unambiguous format is needed for further work. I've included a few examples and informations below:
Example-Dataset:
Function,HiredDate,FiredDate
Waitress,16-06-01 12:40:02,16-06-13 11:43:12
Chef,16-04-17 15:00:59,16-04-18 15:00:59
Current Date Format (POSIXlt) of HiredDate and FiredDate:
"%y-%m-%d %H:%M:%S"
What I want the Date Format of HireDate and FiredDate to be:
"%Y-%m-%d %H:%M:%S" / 2016-06-01 12:40:02
or
"%Y/%m/%d %H:%M:%S" / 2016/06/01 12:40:02
In principle, you can convert date and time for example using the strftime function:
d <- "2016-06-01 12:40:02"
strftime(d, format="%Y/%m/%d %H:%M:%S")
[1] "2016/06/01 12:40:02"
In your case, the year is causing trouble:
d <- "16-06-01 12:40:02"
strftime(d, format="%Y/%m/%d %H:%M:%S")
[1] "0016/06/01 12:40:02"
As Dave2e suggested, the two digit year can be read by %y:
strftime(d, format="%y/%m/%d %H:%M:%S")
[1] "16/06/01 12:40:02"
Assuming that your data comes from the 20st and 21st century, you can paste a 19 or 20 in front of the HireDate and FireDate:
current <- 16
prefixHire <- ifelse(substr(data$HireDate, 1, 2)<=currentYear,20,19)
prefixFire <- ifelse(substr(data$FireDate, 1, 2)<=currentYear,20,19)
data$HireDate = paste(prefixHire, data$HireDate, sep="")
data$FireDate = paste(prefixFire, data$FireDate, sep="")
The code generates a prefix by assuming that any date from a year greater than the current ('16) is actually from the 20th century. The prefix is then pasted to HireDate and FireDate.

Resources