Fixing dates that were coerced into the wrong format - r

I have a large df with dates that were accidentally coerced into the wrong format.
Data:
id <- c(1:12)
date <- c("2014-01-03","2001-08-14","2001-08-14","2014-06-02","2006-06-14", "2006-06-14",
"2014-08-08","2014-08-08","2008-04-14","2009-12-13","2010-09-14","2012-09-14")
df <- data.frame(id,date)
Structure:
id date
1 1 2014-01-03
2 2 2001-08-14
3 3 2001-08-14
4 4 2014-06-02
5 5 2006-06-14
6 6 2006-06-14
7 7 2014-08-08
8 8 2014-08-08
9 9 2008-04-14
10 10 2009-12-13
11 11 2010-09-14
12 12 2012-09-14
The data set only includes, or rather should only include the years 2014 and 2013. The dates 2001-08-14 and 2006-06-14 are most likely 2014-08-01 and 2014-06-06, respectively.
Output:
id date
1 1 2014-01-03
2 2 2014-08-01
3 3 2014-08-01
4 4 2014-06-02
5 5 2014-06-06
6 6 2014-06-06
7 7 2014-08-08
8 8 2014-08-08
9 9 2014-04-08
10 10 2013-12-09
11 11 2014-09-10
12 12 2014-09-12
How can I reconcile this mess?

Package lubridate has the convenient function year that will be useful here.
library(lubridate)
# Convert date to proper date class variable
df$date <- as.Date(df$date)
# Isolate problematic indices; when year is not in 2013 or 2014,
# we'll go to and from character representation. We'll trim
# the "20" in front of the "false year" and then specify the
# proper format to read the character back into a Date class.
tmp.indices <- which(!year(df$date) %in% c("2013", "2014"))
df$date[tmp.indices] <- as.Date(substring(as.character(df$date[tmp.indices]),
first = 3), format = "%d-%m-%y")
Result:
id date
1 1 2014-01-03
2 2 2014-08-01
3 3 2014-08-01
4 4 2014-06-02
5 5 2014-06-06
6 6 2014-06-06
7 7 2014-08-08
8 8 2014-08-08
9 9 2014-04-08
10 10 2013-12-09
11 11 2014-09-10
12 12 2014-09-12

We could convert the 'date' column to 'Date' class, extract the 'year' to create a logical index ('indx') for years 2013, 2014).
df$date <- as.Date(df$date)
indx <- !format(df$date, '%Y') %in% 2013:2014
By using lubridate, convert to 'Date' class using dmy after removing the first two characters.
library(lubridate)
df$date[indx] <- dmy(sub('^..', '', df$date[indx]))
df
# id date
#1 1 2014-01-03
#2 2 2014-08-01
#3 3 2014-08-01
#4 4 2014-06-02
#5 5 2014-06-06
#6 6 2014-06-06
#7 7 2014-08-08
#8 8 2014-08-08
#9 9 2014-04-08
#10 10 2013-12-09
#11 11 2014-09-10
#12 12 2014-09-12

Related

How to print a date when the input is number of days since 01-01-60?

I received a set of dates, but it turns out that time is reported in days since 01-01-1960 in this specific data set.
D_INDDTO
1 20758
2 20856
3 21062
4 19740
5 21222
6 21203
The specific date of interest for Patient 1 is 20758 days since 01-01-60
I want to create a new covariate u$date containing the specific date of interest i d%m%y%. I tried
library(tidyverse)
u %>% mutate(date=as.date(D_INDDTO,origin="1960-01-01")
But that did not solve it.
u <- structure(list(D_INDDTO = c(20758, 20856, 21062, 19740, 21222,
21203, 20976, 20895, 18656, 18746)), row.names = c(NA, 10L), class = "data.frame")
Try this:
#Code 1
u %>% mutate(date=as.Date("1960-01-01")+D_INDDTO)
Output:
D_INDDTO date
1 20758 2016-10-31
2 20856 2017-02-06
3 21062 2017-08-31
4 19740 2014-01-17
5 21222 2018-02-07
6 21203 2018-01-19
7 20976 2017-06-06
8 20895 2017-03-17
9 18656 2011-01-29
10 18746 2011-04-29
Or this:
#Code 2
u %>% mutate(date=as.Date(D_INDDTO,origin="1960-01-01"))
Output:
D_INDDTO date
1 20758 2016-10-31
2 20856 2017-02-06
3 21062 2017-08-31
4 19740 2014-01-17
5 21222 2018-02-07
6 21203 2018-01-19
7 20976 2017-06-06
8 20895 2017-03-17
9 18656 2011-01-29
10 18746 2011-04-29
Or this:
#Code 3
u %>% mutate(date=format(as.Date(D_INDDTO,origin="1960-01-01"),'%d%m%y'))
Output:
D_INDDTO date
1 20758 311016
2 20856 060217
3 21062 310817
4 19740 170114
5 21222 070218
6 21203 190118
7 20976 060617
8 20895 170317
9 18656 290111
10 18746 290411
If more customization is required:
#Code 4
u %>% mutate(date=format(as.Date(D_INDDTO,origin="1960-01-01"),'%d-%m-%Y'))
Output:
D_INDDTO date
1 20758 31-10-2016
2 20856 06-02-2017
3 21062 31-08-2017
4 19740 17-01-2014
5 21222 07-02-2018
6 21203 19-01-2018
7 20976 06-06-2017
8 20895 17-03-2017
9 18656 29-01-2011
10 18746 29-04-2011

How to get the index/position where difference of values exceeds a threshold?

I have this data frame:
> new
group date median
1 1 2012-07-01 1.839646
2 1 2012-08-01 1.839646
3 2 2012-09-01 1.839646
4 2 2012-10-01 1.839646
5 3 2012-11-01 1.839646
6 3 2012-12-01 1.839646
7 4 2013-01-01 5.554302
8 4 2013-02-01 5.554302
9 5 2013-03-01 5.554302
10 5 2013-04-01 5.554302
11 6 2013-05-01 5.554302
12 6 2013-06-01 5.554302
13 7 2013-07-01 2.226150
14 7 2013-08-01 2.226150
15 8 2013-09-01 2.226150
16 8 2013-10-01 2.226150
17 9 2013-11-01 2.226150
18 9 2013-12-01 2.226150
What I want to do now is, I want to compare the median values and if the difference of for example the first and the second unique value exceeds a certain limit, then give me the location where this happens.
Step-by-step:
In this example, I have three unique median values (1.839646,5.554302,2.226150)
1) Compare the first and second unique value. If the difference is bigger than (for example) 50% of the first value, then give me the position of last first value:
So:
a) abs(1.839646 - 5.554302) = 3,714656
b) 50 % of 1.839646 is 0.919823
c) 3,714656 is bigger than 0.919823
d) get index of where this happens: at index 6 (which is at date 2012-12-01)
The same for the second and third (unique) value.
Call your vector of medians x:
# sample data
x = rep(c(1.839646,5.554302,2.226150), each = 6)
which(c(0, abs(diff(x))) > 0.5 * x) - 1
# [1] 6 12
Demo on your data:
new = read.table(text = " group date median
1 1 2012-07-01 1.839646
2 1 2012-08-01 1.839646
3 2 2012-09-01 1.839646
4 2 2012-10-01 1.839646
5 3 2012-11-01 1.839646
6 3 2012-12-01 1.839646
7 4 2013-01-01 5.554302
8 4 2013-02-01 5.554302
9 5 2013-03-01 5.554302
10 5 2013-04-01 5.554302
11 6 2013-05-01 5.554302
12 6 2013-06-01 5.554302
13 7 2013-07-01 2.226150
14 7 2013-08-01 2.226150
15 8 2013-09-01 2.226150
16 8 2013-10-01 2.226150
17 9 2013-11-01 2.226150
18 9 2013-12-01 2.226150", header = TRUE)
results = which(c(0, abs(diff(new$median))) > 0.5 * new$median) - 1
results
# [1] 6 12
new$date[results]
# [1] 2012-12-01 2013-06-01

Proximity to long weekends & holidays

Data:
I have a vector of dates in a tibble.
# A tibble: 10 x 1
Date
<dttm>
1 2017-04-04
2 2017-04-05
3 2017-04-07
4 2017-04-10
5 2017-04-11
6 2017-04-12
7 2017-04-13
8 2017-04-14
9 2017-04-17
10 2017-04-18
Reproducible using:
structure(list(Date = structure(c(1491264000, 1491350400, 1491523200,
1491782400, 1491868800, 1491955200, 1492041600, 1492128000, 1492387200,
1492473600), class = c("POSIXct", "POSIXt"), tzone = "UTC")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -10L), .Names = "Date")
Need:
Two feature variables:
'Proximity to next holiday'
'Proximity to past holiday'
The intent is to determine if my response variable is dependent on if Date is close to a holiday or long weekend. For example, if 04-11 was a Holiday, I would want:
Date ProxNxtHol ProxPastHol
<dttm>
1 2017-04-04 4 32
2 2017-04-05 3 33
3 2017-04-07 2 34
4 2017-04-10 1 35
5 2017-04-11 0 36
6 2017-04-12 58 1
7 2017-04-13 57 2
8 2017-04-14 56 3
9 2017-04-17 55 4
10 2017-04-18 54 5
While I can manually define all the holidays in a vector myself and calculate the difference between the two dates, this is cumbersome because the holidays vary by location globally. (I have a variable which can indicate location.)
Is there a predefined function which can indicate if a given date is a holiday or not, for a specified region?
I have come up with this for loop that computes both proximity as shown in your desired output. Please see the steps below.
Converting your structure to data frame and all its elements to class Date
> qdates <- data.frame(qdates)
> qdates$Date <- as.Date(qdates$Date)
> qdates
Date
1 2017-04-04
2 2017-04-05
3 2017-04-07
4 2017-04-10
5 2017-04-11
6 2017-04-12
7 2017-04-13
8 2017-04-14
9 2017-04-17
10 2017-04-18
Using library(timeDate) to build a data frame of US holidays. You can add/modify your dates here or use other in-built functions that might contain federal holidays.
> library(timeDate)
> hdates <- data.frame(Dates = c(USNewYearsDay(2017), USInaugurationDay(2017), USMLKingsBirthday(2017),
USLincolnsBirthday(2017), USWashingtonsBirthday(2017), USCPulaskisBirthday(2017),
USGoodFriday(2017), USMemorialDay(2017), USIndependenceDay(2017), USLaborDay(2017),
USColumbusDay(2017), USElectionDay(2017), USVeteransDay(2017), USThanksgivingDay(2017),
USChristmasDay(2017)))
> colnames(hdates) <- "HolidayDate"
> hdates$HolidayDate <- as.Date(hdates$HolidayDate)
> hdates
HolidayDate
1 2017-01-01
2 2017-01-20
3 2017-01-16
4 2017-02-12
5 2017-02-22
6 2017-03-06
7 2017-04-14
8 2017-05-29
9 2017-07-04
10 2017-09-04
11 2017-10-09
12 2017-11-07
13 2017-11-11
14 2017-11-23
15 2017-12-25
for loop to compute the date difference, and populate output.
for(i in 1:nrow(qdates)) {
minDate <- max(hdates[which(hdates$HolidayDate <= qdates$Date[i]),])
maxDate <- min(hdates[which(hdates$HolidayDate >= qdates$Date[i]),])
qdates$ProxPastHol[i] <- abs(difftime(minDate, qdates$Date[i], units = "days"))
qdates$ProxNxtHol[i] <- abs(difftime(maxDate, qdates$Date[i], units = "days"))
}
> qdates
Date ProxPastHol ProxNxtHol
1 2017-04-04 29 10
2 2017-04-05 30 9
3 2017-04-07 32 7
4 2017-04-10 35 4
5 2017-04-11 36 3
6 2017-04-12 37 2
7 2017-04-13 38 1
8 2017-04-14 0 0
9 2017-04-17 3 42
10 2017-04-18 4 41
Hope this helps !!!

How can I extract the month using sqldf package

I tried to get a view that is based on group by of date by using sqldf package and a month function but I got an error :Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such function: month
Here is my query: s<-sqldf("select month(dateTime),sum(wolfs) group by dateTime")
Attached is a toy data frame:
df <- read.table(text = "dateTime birds wolfs snakes
2014-05-21 9 7 a
2014-04-28 8 4 b
2014-04-13 2 8 c
2014-03-12 2 3 a
2014-02-04 8 3 a
2014-02-29 1 2 a
2014-01-17 7 1 b
2014-01-16 1 5 c
2014-09-20 9 7 c
2014-08-21 8 7 c ",header = TRUE)
How can I extract the month using sqldf package?
I suspect you are used to SQL Server, but the sqldf backend being used in your case is SQLite, where there is no MONTH function. Try this instead:
R> sqldf("SELECT strftime('%m', dateTime) AS Month
,SUM(wolfs) AS Wolves
FROM df
GROUP BY strftime('%m', dateTime)")
# Month Wolves
# 1 01 6
# 2 02 5
# 3 03 3
# 4 04 12
# 5 05 7
# 6 08 7
# 7 09 7

Calculating mean date by row

I wish to obtain the mean date by row, where each row contains two dates. Eventually I found a way, posted below. However, the approach I used seems rather cumbersome. Is there a better way?
my.data = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE
1 3 6 2012 3 10 2012 1
2 3 10 2012 3 20 2012 1
3 3 16 2012 3 30 2012 1
4 3 20 2012 4 8 2012 1
5 3 20 2012 4 9 2012 1
6 3 20 2012 4 10 2012 1
7 3 20 2012 4 11 2012 1
8 4 4 2012 4 5 2012 1
9 4 6 2012 4 6 2012 1
10 4 6 2012 4 7 2012 1
", header = TRUE, stringsAsFactors = FALSE)
my.data
my.data$MY.DATE1 <- do.call(paste, list(my.data$MONTH1, my.data$DAY1, my.data$YEAR1))
my.data$MY.DATE2 <- do.call(paste, list(my.data$MONTH2, my.data$DAY2, my.data$YEAR2))
my.data$MY.DATE1 <- as.Date(my.data$MY.DATE1, format=c("%m %d %Y"))
my.data$MY.DATE2 <- as.Date(my.data$MY.DATE2, format=c("%m %d %Y"))
my.data
desired.result = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2 mean.date
1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10 2012-03-08
2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20 2012-03-15
3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30 2012-03-23
4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08 2012-03-29
5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09 2012-03-30
6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10 2012-03-30
7 3 20 2012 4 11 2012 1 2012-03-20 2012-04-11 2012-03-31
8 4 4 2012 4 5 2012 1 2012-04-04 2012-04-05 2012-04-04
9 4 6 2012 4 6 2012 1 2012-04-06 2012-04-06 2012-04-06
10 4 6 2012 4 7 2012 1 2012-04-06 2012-04-07 2012-04-06
", header = TRUE, stringsAsFactors = FALSE)
Here is the approach that worked for me:
my.data$mean.date <- (my.data$MY.DATE1 + ((my.data$MY.DATE2 - my.data$MY.DATE1) / 2))
my.data
These approaches did not work:
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2)
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2, trim = 0)
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2, trim = 1)
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2, trim = 0.5)
my.data$mean.data <- apply(my.data, 1, function(x) {(x[9] + x[10]) / 2})
I think I am supposed to use the Ops.Date command, but have not found an example.
Thank you for any suggestions.
Keep things simple and use mean.Date in base R.
mean.Date(as.Date(c("01-01-2014", "01-07-2014"), format=c("%m-%d-%Y")))
[1] "2014-01-04"
Using the good advice of # jaysunice3401, I came up with this. If you want to keep the original data, you can add remove = FALSE in the two lines with unite
library(dplyr)
library(tidyr)
my.data %>%
unite(whatever1, matches("1"), sep = "-") %>%
unite(whatever2, matches("2"), sep = "-") %>%
mutate_each(funs(as.Date(., "%m-%d-%Y")), contains("whatever")) %>%
rowwise %>%
mutate(mean.date = mean.Date(c(whatever1, whatever2)))
# OBS whatever1 whatever2 STATE mean.date
#1 1 2012-03-06 2012-03-10 1 2012-03-08
#2 2 2012-03-10 2012-03-20 1 2012-03-15
#3 3 2012-03-16 2012-03-30 1 2012-03-23
#4 4 2012-03-20 2012-04-08 1 2012-03-29
#5 5 2012-03-20 2012-04-09 1 2012-03-30
#6 6 2012-03-20 2012-04-10 1 2012-03-30
#7 7 2012-03-20 2012-04-11 1 2012-03-31
#8 8 2012-04-04 2012-04-05 1 2012-04-04
#9 9 2012-04-06 2012-04-06 1 2012-04-06
#10 10 2012-04-06 2012-04-07 1 2012-04-06
Maybe something like that?
library(data.table)
setDT(my.data)[, `:=`(MY.DATE1 = as.Date(paste(DAY1 ,MONTH1, YEAR1), format = "%d %m %Y"),
MY.DATE2 = as.Date(paste(DAY2 ,MONTH2, YEAR2), format = "%d %m %Y"))][,
mean.date := MY.DATE2 - ceiling((MY.DATE2 - MY.DATE1)/2)]
my.data
# OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2 mean.date
# 1: 1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10 2012-03-08
# 2: 2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20 2012-03-15
# 3: 3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30 2012-03-23
# 4: 4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08 2012-03-29
# 5: 5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09 2012-03-30
# 6: 6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10 2012-03-30
# 7: 7 3 20 2012 4 11 2012 1 2012-03-20 2012-04-11 2012-03-31
# 8: 8 4 4 2012 4 5 2012 1 2012-04-04 2012-04-05 2012-04-04
# 9: 9 4 6 2012 4 6 2012 1 2012-04-06 2012-04-06 2012-04-06
# 10: 10 4 6 2012 4 7 2012 1 2012-04-06 2012-04-07 2012-04-06
Or if you insist on using mean.date, here's alternative solution:
library(data.table)
setDT(my.data)[, `:=`(MY.DATE1 = as.Date(paste(DAY1 ,MONTH1, YEAR1), format = "%d %m %Y"),
MY.DATE2 = as.Date(paste(DAY2 ,MONTH2, YEAR2), format = "%d %m %Y"))][,
mean.date := mean.Date(c(MY.DATE1, MY.DATE2)), by = OBS]
One-liner (split for readability), uses lubridate and dplyr and (of course) pipes:
> require(lubridate)
> require(dplyr)
> my.data = my.data %>%
mutate(
MY.DATE1=as.Date(mdy(paste(MONTH1,DAY1,YEAR1))),
MY.DATE2=as.Date(mdy(paste(MONTH2,DAY2,YEAR2)))) %>%
rowwise %>%
mutate(mean.data=mean.Date(c(MY.DATE1,MY.DATE2))) %>% data.frame()
> head(my.data)
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2
1 1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10
2 2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20
3 3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30
4 4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08
5 5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09
6 6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10
mean.data
1 2012-03-08
2 2012-03-15
3 2012-03-23
4 2012-03-29
5 2012-03-30
6 2012-03-30
As an afterthought, if you like pipes, you can put a pipe in your pipe so you can pipe while you pipe - rewriting the first mutate step thus:
my.data %>% mutate(
MY.DATE1 = paste(MONTH1,DAY1,YEAR1) %>% mdy %>% as.Date,
MY.DATE2 = paste(MONTH2,DAY2,YEAR2) %>% mdy %>% as.Date)
1) Create Date class columns and then its easy. No external packages are used:
asDate <- function(x) as.Date(x, "1970-01-01")
my.data2 <- transform(my.data,
date1 = as.Date(ISOdate(YEAR1, MONTH1, DAY1)),
date2 = as.Date(ISOdate(YEAR2, MONTH2, DAY2))
)
transform(my.data2, mean.date = asDate(rowMeans(cbind(date1, date2))))
If we did add a library(zoo) call then we could omit the asDate definition using as.Date in the last line instead of asDate since zoo adds a default origin to as.Date.
1a) A dplyr version would look like this (using asDate from above):
library(dplyr)
my.data %>%
mutate(
date1 = ISOdate(YEAR1, MONTH1, DAY1) %>% as.Date,
date2 = ISOdate(YEAR2, MONTH2, DAY2) %>% as.Date,
mean.date = cbind(date1, date2) %>% rowMeans %>% asDate)
2) Another way uses julian in the chron package. julian converts a month/day/year to the number of days since the Epoch. We can average the two julians and convert back to Date class:
library(zoo)
library(chron)
transform(my.data,
mean.date = as.Date( ( julian(MONTH1,DAY1,YEAR1) + julian(MONTH2,DAY2,YEAR2) )/2 )
)
We could omit library(zoo) if we used asDate from (1) in place of as.Date.
Update Discussed use of zoo to shorten the solutions and made further reductions in solution (1).
what about :
apply(my.data[,c("MY.DATE1","MY.DATE2")],1,function(date){substr(strptime(mean(c(strptime(date[1],"%y%y-%m-%d"),strptime(date[2],"%y%y-%m-%d"))),format="%y%y-%m-%d"),1,10)})
?
(I just had to use substr because of CET and CEST that put my output as a list...)
This is a vectorized version of the answer posted by jaysunice3401. It seems fairly straight-forward, except that I had to use trial-and-error to identify the correct origin. I do not know how general origin = "1970-01-01" is or whether a different origin would have to be specified with each data set.
According to this website: http://www.ats.ucla.edu/stat/r/faq/dates.htm
When R looks at dates as integers, its origin is January 1, 1970.
Which seems to suggest that origin = "1970-01-01" is fairly general. Although, if I had dates prior to "1970-01-01" in my data set I would definitely test the code before using it.
my.data = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE
1 3 6 2012 3 10 2012 1
2 3 10 2012 3 20 2012 1
3 3 16 2012 3 30 2012 1
4 3 20 2012 4 8 2012 1
5 3 20 2012 4 9 2012 1
6 3 20 2012 4 10 2012 1
7 3 20 2012 4 11 2012 1
8 4 4 2012 4 5 2012 1
9 4 6 2012 4 6 2012 1
10 4 6 2012 4 7 2012 1
", header = TRUE, stringsAsFactors = FALSE)
desired.result = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2 mean.date
1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10 2012-03-08
2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20 2012-03-15
3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30 2012-03-23
4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08 2012-03-29
5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09 2012-03-30
6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10 2012-03-30
7 3 20 2012 4 11 2012 1 2012-03-20 2012-04-11 2012-03-31
8 4 4 2012 4 5 2012 1 2012-04-04 2012-04-05 2012-04-04
9 4 6 2012 4 6 2012 1 2012-04-06 2012-04-06 2012-04-06
10 4 6 2012 4 7 2012 1 2012-04-06 2012-04-07 2012-04-06
", header = TRUE, stringsAsFactors = FALSE)
my.data$MY.DATE1 <- do.call(paste, list(my.data$MONTH1,my.data$DAY1,my.data$YEAR1))
my.data$MY.DATE2 <- do.call(paste, list(my.data$MONTH2,my.data$DAY2,my.data$YEAR2))
my.data$MY.DATE1 <- as.Date(my.data$MY.DATE1, format=c("%m %d %Y"))
my.data$MY.DATE2 <- as.Date(my.data$MY.DATE2, format=c("%m %d %Y"))
my.data$mean.date2 <- as.Date( apply(my.data, 1, function(x) {
mean.Date(c(as.Date(x['MY.DATE1']), as.Date(x['MY.DATE2'])))
}) , origin = "1970-01-01")
my.data
desired.result

Resources