How to split Monday, July 1, 2019 12:00:00:000 AM - r

I have read, studied, and tested, but I'm just not getting it. Here is my data frame:
MyDate TEMP1 TEMP2
Monday, July 1, 2019 12:00:00:000 AM 90.0 1586
Monday, July 1, 2019 12:01:00:000 AM 88.6 1581
Monday, July 1, 2019 12:02:00:000 AM 89.4 1591
Monday, July 1, 2019 12:03:00:000 AM 90.5 1586
I need to compare it to a second data frame:
Date Time A.B.Flow A.B.Batch.Volume
7/1/2019 14:47:46 1.0 2.0
7/9/2019 14:47:48 3.0 5.0
7/11/2019 14:47:52 0.0 2.0
7/17/2019 14:48:52 3.8 4.0
7/24/2019 14:49:52 0.0 3.1
I just have to combine the two data frames when the minutes dates, hours, and minutes match. The seconds do not have to match.
So far I have gleaned that I need to convert the first Column MyDate into separate Dates and Times. I've been unable to come up with a strsplit command that actually does this.
This just gives each element in quotes:
Tried, newdate <- strsplit(testdate$MyDate, "\\s+ ")[[3]]
This is better but "2019"is gone:
Tried, newdate <- strsplit(testdate$MyDate, "2019")
It looks like this:
[1] "Monday, July 1, " "12:00:00:000 AM"
[[2]]
[1] "Monday, July 1, " "12:01:00:000 AM"
[[3]]
[1] "Monday, July 1, " "12:02:00:000 AM"
[[4]]
[1] "Monday, July 1, " "12:03:00:000 AM"
Please tell me what I am doing wrong. I would love some input as to whether I am barking up the wrong tree.
I've tried a few other things using anytime and lubridate, but I keep coming back to this combined date and time with the day written out as my nemesis.

You could get rid of the day (Monday, ...) in your MyDate field by splitting on ',', removing the first element, then combining the rest and converting to POSIXCt.
Assuming your first dataframe is called df:
dt <- strsplit(df$MyDate, ',')
df$MyDate2 <- sapply(dt, function(x) trimws(paste0(x[-1], collapse = ',')))
df$MyDate2 <- as.POSIXct(df$MyDate2, format = '%b %d, %Y %H:%M:%S')
And since you are not interested in the seconds part of the timestamps, you can do:
df$MyDate2 <- format(df$MyDate2, '%Y-%m-%d %H:%M')
You should similarly convert the Date/Time fields of your second dataframe df2, creating a MyDate2 field there with the seconds part removed as above.
Now you can merge the two dataframes on the MyDate2 column.

This might give you a hint:
Since you have time, you shouldn't used as.Date but rather as.POSIXct, imho.
x=c("Monday, July 1, 2019 12:00:00:000 AM 90.0 1586")
Months=c("January","February","March","April","May","June","July","August","September","October","November","December")
GetDate=function(x){
x=str_remove_all(x,",")#get rid of the
mo=which(Months==word(x,2))
day=word(x,3)
year=word(x,4)
time=word(x,5)
as.POSIXct(paste(paste(year,mo,day,sep="-"),time))
}
GetDate(x)

Related

How to combine 12-hour time sheet and AM/PM column from spreadsheet in r

I have a spreadsheet that has the date and 12 hour time in one column and then another column that specifies AM/PM. How do I combine these files so I can use them as a POSIXct/POSIXlt/POSIXt object?
The spreadsheet has the time column as
DAY/MONTH/YEAR HOUR:MINUTE
while hour is in a 12-hour format from a roster of check in times. The other column just says AM or PM. I am trying to combine these columns and then convert them to 24 hour time and use it as a POSIXt object.
Example of what I see:
Timesheet
AM-PM
8/10/2022 9:00
AM
8/10/2022 9:01
AM
And this continues until 5:00 PM (same day)
What I have tried so far:
Timesheet %>%
unite("timestamp_24", c("timestamp_12","am_pm"),na.rm=FALSE)%>%
mutate(timestamp=(as.POSIXct(timestamp, format = "%d-%m-%Y %H:%M"))
This does not work as when they are combined it gives:
Timestamp_24
DAY/MONTH/YEAR HOUR:MINUTE_AM
and I think this is the crux of the issue because then as.POSIXct can't read it.
Here's my solution. The approach is simply to extract the hour, +12 if it is PM, then format correctly with as.POSXct (you need to use / rather than - in the format argument if the your dataframe is at is appears in your example).
I've done that with stringr::str_replace() which allows you to set a function for the replace argument.
Timesheet %>%
mutate(
time_24hr = stringr::str_replace(
time,
"\\d+(?=:..$)",
function(x) {
hr <- as.numeric(x) %% 12
ifelse(am_pm == "PM", hr + 12, hr)
}
),
time_24hr = as.POSIXct(time_24hr, format = "%d/%m/%Y %H:%M")
)
This is the result:
time am_pm time_24hr
1 8/10/2022 9:00 AM 2022-10-08 09:00:00
2 8/10/2022 9:01 PM 2022-10-08 21:01:00
3 8/10/2022 12:01 PM 2022-10-08 12:01:00
4 8/10/2022 12:01 AM 2022-10-08 00:01:00
EDIT. realized that this didn't work for 11 and 12 as the regex was only extracting the first character before :. Also wasn't working for 12:xx times. Fixed both. Added test cases to show that these work now.

Dates out by 2 days when I convert to Date Format in R

When I am converting dates from characters to "dates" it seems to be off by 2 days from excel?
My example
mydata <- c(38808,40422,40493,40606)
as.Date(mydata, origin="1900-01-01")
# [1] "2006-04-03" "2010-09-03" "2010-11-13" "2011-03-06"
yet in excel the dates are as follows
Date in Excel in R Delta
38808 2006-04-01 2006-04-03 2
40422 2010-09-01 2010-09-03 2
40493 2010-11-11 2010-11-13 2
40606 2011-03-04 2011-03-06 2
I get around it by changing origin date to 1899-12-30 but I am sure I am doing something wrong.
Thanks
It is a known problem that Excel thinks 1900 was a leap year, but it was not. So Excel counts an extra day (for nonexistent Feb 29, 1900). In addition, Excel considers "1900-01-01" as day 1, not day 0.
Maybe the link will help:
http://www.cpearson.com/excel/datetime.htm
For excel dates you need this one:
mydata <- c(38808,40422,40493,40606)
as.Date(mydata, origin = "1899-12-30")
[1] "2006-04-01" "2010-09-01" "2010-11-11" "2011-03-04"

3 letters month and 2 digits year format in R

I am trying to convert the following format to date:
as.Date('Mar.17', format = '%b.%y')
but it returns NA.
What am I missing?
Update, I am expecting to get March 2017, not 2018
it should be:
as.Date('Mar.17', format = '%b.%d')
Assuming the 17 part is the year, you could use sub to add in a day number to make it an actual date.
as.Date(sub("\\.", "01", "Mar.17"), "%b%d%y")
# [1] "2017-03-01"
as.yearmon from zoo package will do the trick and provide date(Mar 2017) as expected by OP.
library(zoo)
as.yearmon("Mar.17", "%b.%y")
#[1] "Mar 2017"
Another option to convert it to 1 March 2017
as.Date(as.yearmon("Mar.17", "%b.%y"), frac = 0)
#[1] "2017-03-01"
You need the point for %b month format, at least in my computer
as.Date(paste0( "01",'mar.2017'), format = '%d%b%Y')
"2017-03-01"

converting multiple date formats into one in r

I am working with messy excel file with multiple date formats
2016-10-17T12:38:41Z
Mon Oct 17 08:03:08 GMT 2016
10-Sep-15
13-Oct-09
18-Oct-2016 05:42:26 UTC
I want to convert all of the above in yyyy-mm-dd format. I am using following code for the conversion but lot of values are coming NA.
as.Date(parse_date_time(df$date,c('mdy', 'ymd_hms','a b d HMS y','d b y HMS')))
How can I do it all of them together. I have read other threads on similar case,but nothing seems to work for my case.
Please help
If I add 'dmy' to the list then at least all of the cases in your example are succesfully parsed:
z <- c("2016-10-17T12:38:41Z", "Mon Oct 17 08:03:08 GMT 2016",
"10-Sep-15", "13-Oct-09", "18-Oct-2016 05:42:26 UTC")
library(lubridate)
parse_date_time(z,c('mdy', 'dmy', 'ymd_HMS','a b d HMS y','d b y HMS'))
## [1] "2016-10-17 12:38:41 UTC" "2016-10-17 08:03:08 UTC"
## [3] "2015-09-10 00:00:00 UTC" "2009-10-13 00:00:00 UTC"
## [5] "2016-10-18 05:42:26 UTC"
Your big problem will be the third and fourth elements: are these actually meant to be 'ymd' and 'dmy' respectively? I'm not sure how any logic will let you auto-detect these differences ... out of context, "15 Sep 2010" and "10 September 2015" both seem perfectly reasonable possibilities ...
For what it's worth I also tried the new anytime package - it only handled the first and last element.
Removing the times first makes it possible to specify only three alternatives in orders to parse the sample data in the question. This interprets 10-Sep-15 and 13-Oct-09 as dmy but if you want them interpreted as ymd then uncomment the commented out line:
orders <- c("dmy", "mdy", "ymd")
# orders <- c("ymd", "dmy", "mdy")
as.Date(parse_date_time(gsub("..:..:..", " ", x), orders = orders))
giving:
[1] "2016-10-17" "2016-10-17" "2015-09-10" "2009-10-13" "2016-10-18"
or if the commented out line is uncommented then:
[1] "2016-10-17" "2016-10-17" "2010-09-15" "2013-10-09" "2016-10-18"
Note: The input is:
x <- c("2016-10-17T12:38:41Z ", "Mon Oct 17 08:03:08 GMT 2016", "10-Sep-15",
"13-Oct-09", "18-Oct-2016 05:42:26 UTC")

Obtaining last Friday's date

I can get today's date:
Sys.Date( )
But how do I get last Friday's date?
I tried:
library(xts)
date1 <- Sys.Date( )
to.weekly(date1 )
But this gives an error.
I think this should work:
library(lubridate)
Sys.Date() - wday(Sys.Date() + 1)
Try this:
library(zoo)
lastfri(Sys.Date())
where lastfri is the same as the one line function nextfri in the this zoo vignette, zoo quickref vignette, except that ceiling is replaced with floor. Note that lastfri is vectorized, i.e. it can take a vector of input dates and produces a vector of output dates. For example,
library(zoo)
Sys.Date()
## 2015-03-10
lastfri(Sys.Date() + 0:6)
## [1] "2015-03-06" "2015-03-06" "2015-03-06" "2015-03-13" "2015-03-13"
## [6] "2015-03-13" "2015-03-13"
Thus last Friday was March 6th and we keep getting March 6th until the day advances to to next Friday at which point the last Friday is March 13th.
Aside: Next Friday is Friday the 13th.
Here is a function that finds the last date for any day of the week:
getlastdate <- function(day) {
library(lubridate)
dates <- seq((Sys.Date()-7), (Sys.Date()-1), by="days")
dates[wday(dates, label=T)==day]
}
getlastdate("Mon")
# "2015-03-09"
Enter the day of the week in abbreviated format: i.e.
Sun Mon Tues Wed Thurs Fri Sat
Last Friday was 4 days ago, thus:
Sys.Date()-4
> Sys.Date()-4
[1] "2015-03-06"
OR for any day of the week, using base:
Sys.Date()-(as.POSIXlt(Sys.Date())$wday+2)

Resources