max(DF1$EFFDT) <= DF2$EFFDT [duplicate] - r

This question already has an answer here:
Searching for nearest date in data frame
(1 answer)
Closed 1 year ago.
I have two dataframes, DF1 containing monthly data snapshot of data whereas DF2 with a particular date and i want to be able to retrieve data only for closest maxdate (<=) from DF1 wrt DF2 data.
DF1
Account
Date
A1000001
1-JAN-2021
A1000002
1-FEB-2021
A1000003
1-MAR-2021
A1000004
1-APR-2021
DF2
Date
15-MAR-2021
Output Expected:
Account
Date
A1000003
1-MAR-2021

Change the dates to actual date class and using sapply you may find the closest date in df1 for each date in df2.
df1$Date <- as.Date(df1$Date, '%d-%b-%Y')
df2$Date <- as.Date(df2$Date, '%d-%b-%Y')
result <- df1[sapply(df2$Date, function(x) which.min(abs(df1$Date - x))), ]
result
# Account Date
#3 A1000003 2021-03-01
data
It is easier to help if you provide data in a reproducible format
df1 <- structure(list(Account = c("A1000001", "A1000002", "A1000003",
"A1000004"), Date = c("1-JAN-2021", "1-FEB-2021", "1-MAR-2021",
"1-APR-2021")), row.names = c(NA, -4L), class = "data.frame")
df2 <- structure(list(Date = "15-MAR-2021"), row.names = c(NA, -1L),
class = "data.frame")

Related

How to insert a character at a certain position in all rows using R [duplicate]

This question already has answers here:
integer data frame to date in R [duplicate]
(3 answers)
Closed 1 year ago.
Hey I have a column of birth dates in this format:
dob
19881011
19590223
19860407
19710921
19640213
I need to edit the column to this format:
dob
1988-10-11
1959-02-23
1986-04-07
1971-09-21
1964-02-13
I came across similar problems being solved with gsub and regex but was able to apply it to my problem. Can someone recommend a solution?
The ymd from lubridate can correctly parse it to Date class if it is numeric or character
library(dplyr)
library(lubridate)
df1 <- df1 %>%
mutate(dob = ymd(dob))
-output
df1
# dob
#1 1988-10-11
#2 1959-02-23
#3 1986-04-07
#4 1971-09-21
#5 1964-02-13
It is not a regex problem as converting to Date class is straightforward with as.Date and format argument. If we need a regex option
sub("(....)(..)(..)", "\\1-\\2-\\3", df1$dob)
data
df1 <- structure(list(dob = c(19881011L, 19590223L, 19860407L, 19710921L,
19640213L)), class = "data.frame", row.names = c(NA, -5L))
A base R option using as.Date
transform(
df,
dob = as.Date(as.character(dob), "%Y%m%d")
)
gives
dob
1 1988-10-11
2 1959-02-23
3 1986-04-07
4 1971-09-21
5 1964-02-13
Data
> dput(df)
structure(list(dob = c(19881011L, 19590223L, 19860407L, 19710921L,
19640213L)), class = "data.frame", row.names = c(NA, -5L))

identify observations based on 2 elements in 2 dataframes that do not match [duplicate]

This question already has answers here:
Delete rows that exist in another data frame? [duplicate]
(3 answers)
Find complement of a data frame (anti - join)
(7 answers)
Closed 2 years ago.
I want to identify observations in 1 df that do not match that of another df using 2 indicators (id and date). Below is sample df1 and df2.
df1
id date n
12-40 12/22/2018 3
11-08 10/02/2016 11
df2
id date interval
12-40 12/22/2018 3
11-08 10/02/2016 32
22-22 11/10/2015 11
I want a df that outputs rows that are in df2, but not in df1, like so. Note that row 3 (based on id and date) of df2 is not in df1.
df3
id date interval
22-22 11/10/2015 11
I tried doing this in tidyverse and was not able to get the code to work. Does anyone have suggestions on how to do this?
We can use anti_join from tidyverse (as the OP mentioned about working with tidyverse). Here we use both 'id' and 'date' as mentioned in the OP's post. More complex joins can be done with tidyverse
library(dplyr)
anti_join(df2, df1, by = c('id', 'date'))
# id date interval
#1 22-22 11/10/2015 11
Or a similar option with data.table and it should be very efficient
library(data.table)
setDT(df2)[!df1, on = .(id, date)]
# id date interval
#1: 22-22 11/10/2015 11
data
df1 <- structure(list(id = c("12-40", "11-08"), date = c("12/22/2018",
"10/02/2016"), n = c(3L, 11L)), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(id = c("12-40", "11-08", "22-22"), date = c("12/22/2018",
"10/02/2016", "11/10/2015"), interval = c(3L, 32L, 11L)), class = "data.frame",
row.names = c(NA,
-3L))
Try this (Both options are base R, follow OP directions and do not require any package):
#Code1
df3 <- df2[!paste(df2$id,df1$date) %in% paste(df1$id,df2$date),]
Output:
id date interval
3 22-22 11/10/2015 11
It can also be considered:
#Code 2
df3 <- subset(df2,!paste(id,date) %in% paste(df1$id,df1$date))
Output:
id date interval
3 22-22 11/10/2015 11
Some data used:
#Data1
df1 <- structure(list(id = c("12-40", "11-08"), date = c("12/22/2018",
"10/02/2016"), n = c(3L, 11L)), class = "data.frame", row.names = c(NA,
-2L))
#Data2
df2 <- structure(list(id = c("12-40", "11-08", "22-22"), date = c("12/22/2018",
"10/02/2016", "11/10/2015"), interval = c(3L, 32L, 11L)), class = "data.frame", row.names = c(NA,
-3L))
Another base R option using merge + subset + complete.cases
df3 <- subset(
u <- merge(df1, df2, by = c("id", "date"), all.y = TRUE),
!complete.cases(u)
)[names(df2)]
which gives
> df3
id date interval
3 22-22 11/10/2015 11

Convert the date into an individual date and time in R

I would like to convert a date that I have in R into an individual date and time. At the moment the format of the date is POSIXct
An example is given here:
"2019-03-29 20:42:07"
I want the date to be in one column and the time of that date in a corresponding column. I have found something similar here, but it doesn't answer my question.
Many thanks
If the column shows POSIXct class. Create two new columns by coercing to Date (as.Date) and the time part with format
df1 <- transform(df1, date = as.Date(datetime), time = format(datetime, "%T"))
df1
# datetime date time
#1 2019-03-29 20:42:07 2019-03-30 20:42:07
data
df1 <- structure(list(datetime = structure(1553910127, class = c("POSIXct",
"POSIXt"), tzone = "")), class = "data.frame", row.names = c(NA,
-1L))

Subsetting data frame with multiple date conditions for ranges in between

I need subsets between multiple dates.
Example data frame:
testdf <- data.frame(short_date = seq(as.Date("2007-03-01"),
as.Date("2008-09-01"), by = 'day'))
An example of data frame with values for date ranges:
dates_cut <- structure(list(emergence = structure(c(13627, 13997), class = "Date"), disease_onset = structure(c(13694, 14062), class = "Date")), .Names = c("emergence", "disease_onset"), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
Obviously this is just a sample, there is a number of years for which I need subsets of data in between ($emergence date and $disese_onset).
This works for one data range:
testdf %>% filter(short_date >=dates_cut[1,1], short_date >=dates_cut[1,2])
The problem is when there are multiple date ranges.
Thanks.
One option would be to lapply over the rows of dates_cut and then store each subset in a list. After that you can rbind them all together with do.call:
list <- lapply(1:nrow(dates_cut), function(i) {
testdf[which(testdf$short_date >= dates_cut[i, "emergence"] &
testdf$short_date <= dates_cut[i, "disease_onset"]), , drop = FALSE]})
res <- do.call(rbind, list)
head(res)
# short_date
#55 2007-04-24
#56 2007-04-25
#57 2007-04-26
#58 2007-04-27
#59 2007-04-28
#60 2007-04-29

How to convert character into date format in R

I have a csv file which has date in following format.
8/13/2016
8/13/2016
8/13/2016
2016-08-13T08:26:04Z
2016-08-13T14:30:23Z
8/13/2016
8/13/2016
When I import this into R it takes it as a character. I want to convert it into Date format,but when I convert it into date format it takes all NA values
as.Date(df$create_date,format="%m%d%y")
Date field in CSV has different formats in which date is recorded. How can I convert it into date format in R
A base R option (assuming that there are only two formats in the OP's 'create_date' column), will be to create a logical index with grepl for those date elements that start with 'year', subset the 'create_date' based on the logical index ('i1'), convert to Date class separately and assign those separately to a Date vector of the same length as the number of rows of the dataset to create the full Date class.
i1 <- grepl("^[0-9]{4}", df$create_date)
v1 <- as.Date(df$create_date[i1])
v2 <- as.Date(df$create_date[!i1], "%m/%d/%Y")
v3 <- Sys.Date() + 0:(nrow(df)-1)
v3[i1] <- v1
v3[!i1] <- v2
df$create_date <- v3
Or as I commented in the OP's post (first) parse_date_time from lubridate can be used
library(lubridate)
as.Date(parse_date_time(df$create_date, c('mdy', 'ymd_hms')))
#[1] "2016-08-13" "2016-08-13" "2016-08-13" "2016-08-13"
#[5] "2016-08-13" "2016-08-13" "2016-08-13"
data
df <- structure(list(create_date = c("8/13/2016", "8/13/2016",
"8/13/2016",
"2016-08-13T08:26:04Z", "2016-08-13T14:30:23Z", "8/13/2016",
"8/13/2016")), .Names = "create_date", class = "data.frame",
row.names = c(NA, -7L))

Resources