Check if two values within consecutive dates are identical - r

Let's say I have a tibble like
df <- tribble(
~date, ~place, ~wthr,
#------------/-----/--------
"2017-05-06","NY","sun",
"2017-05-06","CA","cloud",
"2017-05-07","NY","sun",
"2017-05-07","CA","rain",
"2017-05-08","NY","cloud",
"2017-05-08","CA","rain",
"2017-05-09","NY","cloud",
"2017-05-09","CA",NA,
"2017-05-10","NY","cloud",
"2017-05-10","CA","rain"
)
I want to check if the weather in a specific region on a specific day was same as yesterday, and attach the boolean column to df, so that
tribble(
~date, ~place, ~wthr, ~same,
#------------/-----/------/------
"2017-05-06","NY","sun", NA,
"2017-05-06","CA","cloud", NA,
"2017-05-07","NY","sun", TRUE,
"2017-05-07","CA","rain", FALSE,
"2017-05-08","NY","cloud", FALSE,
"2017-05-08","CA","rain", TRUE,
"2017-05-09","NY","cloud", TRUE,
"2017-05-09","CA", NA, NA,
"2017-05-10","NY","cloud", TRUE,
"2017-05-10","CA","rain", NA
)
Is there a good way to do this?

To get a logical column, you check wthr value if equal to row before using lag after grouping by place. I added arrange for date to make sure in chronological order.
library(dplyr)
df %>%
arrange(date) %>%
group_by(place) %>%
mutate(same = wthr == lag(wthr, default = NA))
Edit: If you want to make sure dates are consecutive (1 day apart), you can include an ifelse to see if the difference is 1 between date and lag(date). If is not 1 day apart, it can be coded as NA.
Note: Also, make sure your date is a Date:
df$date <- as.Date(df$date)
df %>%
arrange(date) %>%
group_by(place) %>%
mutate(same = ifelse(
date - lag(date) == 1,
wthr == lag(wthr, default = NA),
NA))
Output
date place wthr same
<chr> <chr> <chr> <lgl>
1 2017-05-06 NY sun NA
2 2017-05-06 CA cloud NA
3 2017-05-07 NY sun TRUE
4 2017-05-07 CA rain FALSE
5 2017-05-08 NY cloud FALSE
6 2017-05-08 CA rain TRUE
7 2017-05-09 NY cloud TRUE
8 2017-05-09 CA NA NA
9 2017-05-10 NY cloud TRUE
10 2017-05-10 CA rain NA

Related

Aggregate week and date in R by some specific rules

I'm not used to using R. I already asked a question on stack overflow and got a great answer.
I'm sorry to post a similar question, but I tried many times and got the output that I didn't expect.
This time, I want to do slightly different from my previous question.
Merge two data with respect to date and week using R
I have two data. One has a year_month_week column and the other has a date column.
df1<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43))
df2<-data.frame(id=c(1,1,1,2,2,2),
date=c(20220503,20220506,20220512,20220401,20220408,20220409),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
For df1, 2022051 means 1st week of May,2022. Likewise, 2022052 means 2nd week of May,2022. For df2,20220503 means May 3rd, 2022. What I want to do now is merge df1 and df2 with respect to year_month_week. In this case, 20220503 and 20220506 are 1st week of May,2022.If more than one date are in year_month_week, I will just include the first of them. Now, here's the different part. Even if there is no date inside year_month_week,just leave it NA. So my expected output has a same number of rows as df1 which includes the column year_month_week.So my expected output is as follows:
df<-data.frame(id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43),
temperature=c(36.1,36.6,NA,34.3,34.9,NA,NA))
First we can convert the dates in df2 into year-month-date format, then join the two tables:
library(dplyr);library(lubridate)
df2$dt = ymd(df2$date)
df2$wk = day(df2$dt) %/% 7 + 1
df2$year_month_week = as.numeric(paste0(format(df2$dt, "%Y%m"), df2$wk))
df1 %>%
left_join(df2 %>% group_by(year_month_week) %>% slice(1) %>%
select(year_month_week, temperature))
Result
Joining, by = "year_month_week"
id year_month_week points temperature
1 1 2022051 65 36.1
2 1 2022052 58 36.6
3 1 2022053 47 NA
4 2 2022041 21 34.3
5 2 2022042 25 34.9
6 2 2022043 27 NA
7 2 2022044 43 NA
You can build off of a previous answer here by taking the function to count the week of the month, then generate a join key in df2. See here
df1 <- data.frame(
id=c(1,1,1,2,2,2,2),
year_month_week=c(2022051,2022052,2022053,2022041,2022042,2022043,2022044),
points=c(65,58,47,21,25,27,43))
df2 <- data.frame(
id=c(1,1,1,2,2,2),
date=c(20220503,20220506,20220512,20220401,20220408,20220409),
temperature=c(36.1,36.3,36.6,34.3,34.9,35.3))
# Take the function from the previous StackOverflow question
monthweeks.Date <- function(x) {
ceiling(as.numeric(format(x, "%d")) / 7)
}
# Create a year_month_week variable to join on
df2 <-
df2 %>%
mutate(
date = lubridate::parse_date_time(
x = date,
orders = "%Y%m%d"),
year_month_week = paste0(
lubridate::year(date),
0,
lubridate::month(date),
monthweeks.Date(date)),
year_month_week = as.double(year_month_week))
# Remove duplicate year_month_weeks
df2 <-
df2 %>%
arrange(year_month_week) %>%
distinct(year_month_week, .keep_all = T)
# Join dataframes
df1 <-
left_join(
df1,
df2,
by = "year_month_week")
Produces this result
id.x year_month_week points id.y date temperature
1 1 2022051 65 1 2022-05-03 36.1
2 1 2022052 58 1 2022-05-12 36.6
3 1 2022053 47 NA <NA> NA
4 2 2022041 21 2 2022-04-01 34.3
5 2 2022042 25 2 2022-04-08 34.9
6 2 2022043 27 NA <NA> NA
7 2 2022044 43 NA <NA> NA
>
Edit: forgot to mention that you need tidyverse loaded
library(tidyverse)

parse dates from multiple columns with NAs and dates hidden in text

I have a data.frame with dates distributed across columns and in a messy format: the year column contains years and NAs, the column date_old contains the format Month DD or DD (or a date duration) or NAs, and the column hidden_date contains text and dates either in thee format .... YYYY .... or in the format .... DD Month YYYY .... (with .... representing general text of variable length).
An example data.frame looks like this:
df <- data.frame(year = c("1992", "1993", "1995", NA),
date_old = c("February 15", "October 02-24", "15", NA),
hidden_date = c(NA, NA, "The hidden date is 15 July 1995", "The hidden date is 2005"))
I want to get the dates in the format YYYY-MM-DD (take the first day of date durations) and fill unknown values with zeroes.
Using parse_date_time didn't help me so far, and the expected output would be:
year date_old hidden_date date
1 1992 February 15 <NA> 1992-02-15
2 1993 October 02-24 <NA> 1993-10-02
3 1995 15 The hidden date is 15 July 1995 1995-07-15
4 <NA> <NA> The hidden date is 2005 2005-00-00
How do I best go about this?
It's a little complicated because you have a jumble of date information in different columns which you need to extract and combine. I don't quite understand if you only have three columns, or if there could be more, so I've tried to solve the general case of an arbitray number of columns. If you only have three columns, each of which always have the same format, then things could be a little simpler, but not much.
I would start by creating a regex pattern for month names:
# We'll use dplyr, stringr, tidyr, readr, and purrr
library(tidyverse)
# We'll use month names and abbreviations just in case.
ms <- paste(c(month.name, month.abb), collapse = "|")
# [1] "January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"
We can then iterate over each column, extracting the year, month, and day from each row as a data frame, which we then combine into a single data frame. The digit suffixes correspond to the original columns:
df_split_ymd <- map_dfc(df,
~ map_dfr(
.,
~ tibble(
year = str_extract(., "\\b\\d{4}\\b"),
month = str_extract(., str_glue("\\b({ms})\\b")),
day = str_extract(., "\\b\\d{2}\\b")
)
)
)
#### OUTPUT ####
# A tibble: 4 x 9
year month day year1 month1 day1 year2 month2 day2
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1992 NA NA NA February 15 NA NA NA
2 1993 NA NA NA October 02 NA NA NA
3 1995 NA NA NA NA 15 1995 July 15
4 NA NA NA NA NA NA 2005 NA NA
Finally, the year*, month*, and day* columns should be coalesced and then united to make parsing easier. Note that I've replaced NA values in day
with "01" and those in month with "January" because dates can't contain "00":
df_ymd <- df_split_ymd %>%
mutate(year = coalesce(!!!as.list(select(., starts_with("year")))),
month = coalesce(!!!as.list(select(., starts_with("month")))) %>%
replace_na("January"),
day = coalesce(!!!as.list(select(., starts_with("day")))) %>%
replace_na("01")
) %>%
unite(ymd, year, month, day, sep = " ") %>%
select(ymd) %>%
mutate(ymd = parse_date(ymd, "%Y %B %d"))
#### OUTPUT ####
# A tibble: 4 x 1
ymd
<date>
1 1992-02-15
2 1993-10-02
3 1995-07-15
4 2005-01-01

How to diagonally subtract different columns in R

I have a dataset of a hypothetical exam.
id <- c(1,1,3,4,5,6,7,7,8,9,9)
test_date <- c("2012-06-27","2012-07-10","2013-07-04","2012-03-24","2012-07-22", "2013-09-16","2012-06-21","2013-10-18", "2013-04-21", "2012-02-16", "2012-03-15")
result_date <- c("2012-07-29","2012-09-02","2013-08-01","2012-04-25","2012-09-01","2013-10-20","2012-07-01","2013-10-31", "2013-05-17", "2012-03-17", "2012-04-20")
data1 <- as_data_frame(id)
data1$test_date <- test_date
data1$result_date <- result_date
colnames(data1)[1] <- "id"
"id" indicates the ID of the students who have taken a particular exam. "test_date" is the date the students took the test and "result_date" is the date when the students' results are posted. I'm interested in finding out which students retook the exam BEFORE the result of that exam session was released, e.g. students who knew that they have underperformed and retook the exam without bothering to find out their scores. For example, student with "id" 1 took the exam for the second time on "2012-07-10" which was before the result date for his first exam - "2012-07-29".
I tried to:
data1%>%
group_by(id) %>%
arrange(id, test_date) %>%
filter(n() >= 2) %>% #To only get info on students who have taken the exam more than once and then merge it back in with the original data set using a join function
So essentially, I want to create a new column called "re_test" where it would equal 1 if a student retook the exam BEFORE receiving the result of a previous exam and 0 otherwise (those who retook after seeing their marks or those who did not retake).
I have tried to mutate in order to find cases where dates are either positive or negative by subtracting the 2nd test_date from the 1st result_date:
mutate(data1, re_test = result_date - lead(test_date, default = first(test_date)))
However, this leads to mixing up students with different id's. I tried to split but mutate won't work on a list of dataframes so now I'm stuck:
split(data1, data1$id)
Just to add on, this is a part of the desired result:
data2 <- as_data_frame(id <- c(1,1,3,4))
data2$test_date_result <- c("2012-06-27","2012-07-10", "2013-07-04","2012-03-24")
data2$result_date_result <- c("2012-07-29","2012-09-02","2013-08-01","2012-04-25")
data2$re_test <- c(1, 0, 0, 0)
Apologies for the verbosity and hope I was clear enough.
Thanks a lot in advance!
library(reshape2)
library(dplyr)
# first melt so that we can sequence by date
data1m <- data1 %>%
melt(id.vars = "id", measure.vars = c("test_date", "result_date"), value.name = "event_date")
# any two tests in a row is a flag - use dplyr::lag to comapre the previous
data1mc <- data1m %>%
arrange(id, event_date) %>%
group_by(id) %>%
mutate (multi_test = (variable == "test_date" & lag(variable == "test_date"))) %>%
filter(multi_test)
# id variable event_date multi_test
# 1 1 test_date 2012-07-10 TRUE
# 2 9 test_date 2012-03-15 TRUE
## join back to the original
data1 %>%
left_join (data1mc %>% select(id, event_date, multi_test),
by=c("id" = "id", "test_date" = "event_date"))
I have a piecewise answer that may work for you. I first create a data.frame called student that contains the re-test information, and then join it with the data1 object. If students re-took the test multiple times, it will compare the last test to the first, which is a flaw, but I'm unsure if students have the ability to re-test multiple times?
student <- data1 %>%
group_by(id) %>%
summarise(retest=(test_date[length(test_date)] < result_date[1]) == TRUE)
Some re-test values were NA. These were individuals that only took the test once. I set these to FALSE here, but you can retain the NA, as they do contain information.
student$retest[is.na(student$retest)] <- FALSE
Join the two data.frames to a single object called data2.
data2 <- left_join(data1, student, by='id')
I am sure there are more elegant ways to approach this. I did this by taking advantage of the structure of your data (sorted by id) and the lag function that can refer to the previous records while dealing with a current record.
### Ensure Data are sorted by ID ###
data1 <- arrange(data1,id)
### Create Flag for those that repeated ###
data1$repeater <- ifelse(lag(data1$id) == data1$id,1,0)
### I chose to do this on all data, you could filter on repeater flag first ###
data1$timegap <- as.Date(data1$result_date) - as.Date(data1$test_date)
data1$lagdate <- as.Date(data1$test_date) - lag(as.Date(data1$result_date))
### Display results where your repeater flag is 1 and there is negative time lag ###
data1[data1$repeater==1 & !is.na(data1$repeater) & as.numeric(data1$lagdate) < 0,]
# A tibble: 2 × 6
id test_date result_date repeater timegap lagdate
<dbl> <chr> <chr> <dbl> <time> <time>
1 1 2012-07-10 2012-09-02 1 54 days -19 days
2 9 2012-03-15 2012-04-20 1 36 days -2 days
I went with a simple shift comparison. 1 line of code.
data1 <- data.frame(id = c(1,1,3,4,5,6,7,7,8,9,9), test_date = c("2012-06-27","2012-07-10","2013-07-04","2012-03-24","2012-07-22", "2013-09-16","2012-06-21","2013-10-18", "2013-04-21", "2012-02-16", "2012-03-15"), result_date = c("2012-07-29","2012-09-02","2013-08-01","2012-04-25","2012-09-01","2013-10-20","2012-07-01","2013-10-31", "2013-05-17", "2012-03-17", "2012-04-20"))
data1$re_test <- unlist(lapply(split(data1,data1$id), function(x)
ifelse(as.Date(x$test_date) > c(NA, as.Date(x$result_date[-nrow(x)])), 0, 1)))
data1
id test_date result_date re_test
1 1 2012-06-27 2012-07-29 NA
2 1 2012-07-10 2012-09-02 1
3 3 2013-07-04 2013-08-01 NA
4 4 2012-03-24 2012-04-25 NA
5 5 2012-07-22 2012-09-01 NA
6 6 2013-09-16 2013-10-20 NA
7 7 2012-06-21 2012-07-01 NA
8 7 2013-10-18 2013-10-31 0
9 8 2013-04-21 2013-05-17 NA
10 9 2012-02-16 2012-03-17 NA
11 9 2012-03-15 2012-04-20 1
I think there is benefit in leaving NAs but if you really want all others as zero, simply:
data1$re_test <- ifelse(is.na(data1$re_test), 0, data1$re_test)
data1
id test_date result_date re_test
1 1 2012-06-27 2012-07-29 0
2 1 2012-07-10 2012-09-02 1
3 3 2013-07-04 2013-08-01 0
4 4 2012-03-24 2012-04-25 0
5 5 2012-07-22 2012-09-01 0
6 6 2013-09-16 2013-10-20 0
7 7 2012-06-21 2012-07-01 0
8 7 2013-10-18 2013-10-31 0
9 8 2013-04-21 2013-05-17 0
10 9 2012-02-16 2012-03-17 0
11 9 2012-03-15 2012-04-20 1
Let me know if you have any questions, cheers.

Ifelse with different lengths of data frame

I have dataset which is panel data that contains the following variables:
1. Country
2. Company
3. Monthly date
4. Revenue
`A <- data.frame(Country=as.factor(rep('A', 138)),
Company = as.factor(c(rep('AAA', 12), rep('BBB', 8), rep('CCC', 72), rep('DDD', 46))),
Date = c(seq(as.Date('2010-01-01'), as.Date('2011-01-01'), by = 'month'),
seq(as.Date('2010-01-01'), as.Date('2010-07-01'), by = 'month'),
seq(as.Date('2010-01-01'), as.Date('2015-12-01'), by = 'month'),
seq(as.Date('2012-03-01'), as.Date('2015-12-01'), by = 'month')),
Revenue= sample(10000:25000, 138)
)
B<- data.frame(Country=as.factor(rep('B', 108)),
Company = as.factor(c(rep('EEE', 36), rep('FFF', 36), rep('GGG', 36))),
Date = c(seq(as.Date('2013-01-01'), as.Date('2015-12-01'), by = 'month'),
seq(as.Date('2013-01-01'), as.Date('2015-12-01'), by = 'month'),
seq(as.Date('2013-01-01'), as.Date('2015-12-01'), by = 'month')),
Revenue = sample(10000:25000, 108)
)`
I want to add other variable to the dataset - Competitor's revenue, which is the total sum of the revenues of all other companies in the own country for the corresponding month.
I wrote the following code:
new_B<-data.frame()
for(i in 1:nlevels(B$Company)){
temp_i<-B[which(B$Company==levels(B$Company)[i]),]
temp_j<-B[which(B$Company!=levels(B$Company)[i]),]
agg_temp<-aggregate(temp_j$Revenue, by = list(temp_j$Date), sum)
temp_i$competitor_value<-ifelse(agg_temp$Group.1 %in% temp_i$Date, agg_temp$x, 0)
new_B<-rbind(new_B, temp_i)
}
I created two temporary data set inside for loop one containing company i only and the other - all other companies. I summed all revenues of other companies by month. Then using ifelse for the same dates I add new variable to temp_i. It works nice for the companies that operated during the same period, but in country A there are companies that operated for different periods and when I try to use my code, I have error that they are not of the same length
new_A<-data.frame()
for(i in 1:nlevels(A$Company)){
temp_i<-A[which(A$Company==levels(A$Company)[i]),]
temp_j<-A[which(A$Company!=levels(A$Company)[i]),]
agg_temp<-aggregate(temp_j$Revenue, by = list(temp_j$Date), sum)
temp_i$competitor_value<-ifelse(agg_temp$Group.1 %in% temp_i$Date, agg_temp$x, 0)
new_A<-rbind(new_A, temp_i)
}
I found similar answer ifelse statements with dataframes of different lengths, but still do not know how to solve my problem.
I would really appreciate help
I suggest a different approach using the dplyr package:
library(dplyr)
A %>%
bind_rows(B) %>%
group_by(month=format(Date, "%Y-%m")) %>%
mutate(revComp = sum(Revenue)) %>%
group_by(Company, add = T) %>%
mutate(revComp = revComp-Revenue)
# Source: local data frame [246 x 6]
# Groups: month, Company [246]
#
# Country Company Date Revenue month revComp
# (chr) (chr) (date) (int) (chr) (int)
# 1 A AAA 2010-01-01 10657 2010-01 30356
# 2 A AAA 2010-02-01 11620 2010-02 22765
# 3 A AAA 2010-03-01 17285 2010-03 33329
# 4 A AAA 2010-04-01 22886 2010-04 33469
# 5 A AAA 2010-05-01 20129 2010-05 39974
# 6 A AAA 2010-06-01 22865 2010-06 26896
# 7 A AAA 2010-07-01 13087 2010-07 29542
# 8 A AAA 2010-08-01 19451 2010-08 14842
# 9 A AAA 2010-09-01 12364 2010-09 15309
# 10 A AAA 2010-10-01 19375 2010-10 14090

Using ifelse statement when concatenating elements of a date variable

I am attempting to use two ifelse statements to create a new date variable that makes a series of assumptions to fill in the gaps of an existing date variable. Here is an example of what I mean:
id EffectiveDate EffectiveYear ED_NA EY_NA NewEffectiveDate
1 a 1972-10-05 1972 FALSE FALSE 1972-10-05
2 a <NA> 1985 TRUE FALSE 1985-01-01
3 a 1988-11-12 1988 FALSE FALSE 1988-11-12
4 b 2011-09-05 2011 FALSE FALSE 2011-09-05
5 b <NA> NA TRUE TRUE 2011-09-05
6 b <NA> 2012 TRUE FALSE 2012-01-01
7 c 2012-11-11 2012 FALSE FALSE 2012-11-11
8 c 2013-05-15 2013 FALSE FALSE 2013-05-15
quick code for id:EY_NA =
id <- c("a","a","a","b","b","b","c","c")
EffectiveDate <- c("1972-10-05",NA,"1988-11-12","2011-09-05",NA,NA,"2012-11-11","2013-05-15")
EffectiveYear <- c(1972,1985,1988,2011,NA,2012,2012,2013)
tdat <- data.frame(id, EffectiveDate, EffectiveYear)
tdat$ED_NA <- is.na(tdat$EffectiveDate)
tdat$EY_NA <- is.na(tdat$EffectiveYear)
What I'm trying to create in this example is the "NewEffectiveDate" variable. In plain English, what I want is, where EffectiveDate data are missing BUT EffectiveYear data are not missing, assume NewEffectiveDate is equal to January 1 of the EffectiveYear. If EffectiveDate AND EffectiveYear data are missing, assume the prior observation's EffectiveDate. Last, of course, if EffectiveDate data are not missing, select EffectiveDate.
Here is the latest code I used to attempt to solve the problem:
tdat %>% mutate(NewEffectiveDate = ifelse(ED_NA == 1 & EY_NA == 0,
as.Date(paste(EffectiveYear, 1, 1, sep="-")),
ifelse(ED_NA == 1 & EY_NA == 1),
as.Date(lag(EffectiveDate)),
EffectiveDate
))
When I try this particular code, I get an error message that reads: Error: unused arguments (as.Date(c(NA, 1, NA, 2, 3, NA, NA, 4)), c(1, NA, 2, 3, NA, NA, 4, 5))
I searched for similar questions with queries like "ifelse concatenate date" and some variations thereof, but haven't found anything that seems to apply to this particular problem.
I am very new to R (and CLIs, for that matter), so I apologize in advance if I'm overlooking a perfectly obvious solution. The transition from Excel to R has been interesting, but often painful when it comes to doing what seem like relatively straightforward tasks (though the dplyr package has been tremendously helpful).
id <- c("a","a","a","b","b","b","c","c")
EffectiveDate <- c("1972-10-05",NA,"1988-11-12","2011-09-05",NA,NA,"2012-11-11","2013-05-15")
EffectiveYear <- c(1972,1985,1988,2011,NA,2012,2012,2013)
tdat <- data.frame(id, EffectiveDate, EffectiveYear,
stringsAsFactors=FALSE)
library(zoo)
tdat %>%
mutate(NewEffectiveDate = ifelse(!is.na(EffectiveDate),
EffectiveDate,
ifelse(is.na(EffectiveDate) & !is.na(EffectiveYear),
paste0(EffectiveYear, "-01-01"),
NA)),
NewEffecitveDate = na.locf(NewEffectiveDate))
This should give you what you need. I recommend using na.locf (last one carried forward) from the zoo package rather than trying to deal with the previous date issue.
You can do
tdat$EffectiveDate <- as.Date(tdat$EffectiveDate)
tdat %>% mutate(NewEffectiveDate = as.Date(
ifelse(!is.na(EffectiveDate), EffectiveDate,
ifelse(!is.na(EffectiveYear), as.Date(paste(EffectiveYear, 1, 1, sep="-")),
lag(EffectiveDate)))
)) -> res
res
# id EffectiveDate EffectiveYear NewEffectiveDate
# 1 a 1972-10-05 1972 1972-10-05
# 2 a <NA> 1985 1985-01-01
# 3 a 1988-11-12 1988 1988-11-12
# 4 b 2011-09-05 2011 2011-09-05
# 5 b <NA> NA 2011-09-05
# 6 b <NA> 2012 2012-01-01
# 7 c 2012-11-11 2012 2012-11-11
# 8 c 2013-05-15 2013 2013-05-15
There appears to be a problem with your ifelse block you closed the bracket for the second block early and didn't give a yes or no argument and you gave an extra argument to the first ifelse block.
This should work:
tdat %>% mutate(NewEffectiveDate = ifelse(ED_NA == 1 & EY_NA == 0,
as.Date(paste(EffectiveYear, 1, 1, sep="-")),
ifelse(ED_NA == 1 & EY_NA == 1,
as.Date(lag(EffectiveDate))),
EffectiveDate))

Resources