How can I extract the month using sqldf package

How can I extract the month using sqldf package - r

I tried to get a view that is based on group by of date by using sqldf package and a month function but I got an error :Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such function: month
Here is my query: s<-sqldf("select month(dateTime),sum(wolfs) group by dateTime")
Attached is a toy data frame:
df <- read.table(text = "dateTime birds wolfs snakes
2014-05-21 9 7 a
2014-04-28 8 4 b
2014-04-13 2 8 c
2014-03-12 2 3 a
2014-02-04 8 3 a
2014-02-29 1 2 a
2014-01-17 7 1 b
2014-01-16 1 5 c
2014-09-20 9 7 c
2014-08-21 8 7 c ",header = TRUE)
How can I extract the month using sqldf package?

I suspect you are used to SQL Server, but the sqldf backend being used in your case is SQLite, where there is no MONTH function. Try this instead:
R> sqldf("SELECT strftime('%m', dateTime) AS Month
,SUM(wolfs) AS Wolves
FROM df
GROUP BY strftime('%m', dateTime)")
# Month Wolves
# 1 01 6
# 2 02 5
# 3 03 3
# 4 04 12
# 5 05 7
# 6 08 7
# 7 09 7

Related

Averaging a monthly time series with incomplete observations

I have the following dataset:
id observation_date Observation_value
1 2015-02-23 5
1 2015-02-24 6
1 2015-03-01 24
1 2015-07-16 2
1 2015-09-28 9
1 2015-12-05 12
I would like to create monthly averages of observation_value. In those cases that there are no values for a certain month, I would like to fill in the data with the average between the months where I have data.

Using the data in the Note at the end -- we have added a second id -- convert to zoo using column 1 to split by and column 2 as the index with yearmon class. Also in the same statement aggregate using mean over year/month giving the zoo object z. Then convert to ts which will fill in the missing months with NA and then convert back to zoo and use na.approx to fill in the NAs (or use na.spline or na.locf depending on what you want). fortify.zoo(zz) and fortify.zoo(zz, melt = TRUE) can be used to convert zoo objects to data frames.
library(zoo)
z <- read.zoo(dat, FUN = as.yearmon, index = 2, split = 1, aggregate = mean)
zz <- na.approx(as.zoo(as.ts(z)))
giving
> zz
1 2
Feb 2015 5.5 5.5
Mar 2015 24.0 24.0
Apr 2015 18.5 18.5
May 2015 13.0 13.0
Jun 2015 7.5 7.5
Jul 2015 2.0 2.0
Aug 2015 5.5 5.5
Sep 2015 9.0 9.0
Oct 2015 10.0 10.0
Nov 2015 11.0 11.0
Dec 2015 12.0 12.0
Note
Lines <- "id observation_date Observation_value
1 2015-02-23 5
1 2015-02-24 6
1 2015-03-01 24
1 2015-07-16 2
1 2015-09-28 9
1 2015-12-05 12
2 2015-02-23 5
2 2015-02-24 6
2 2015-03-01 24
2 2015-07-16 2
2 2015-09-28 9
2 2015-12-05 12"
dat <- read.table(text = Lines, header = TRUE)

Match dates from list of data frames in R

I have a list of 100+ time series dataframes my.list with daily observations for each product in its own data frame. Some values are NA without any record of the date. I would like to update each data frame in this list to show the date and NA if it does not have a record on this date.
Dates:
start = as.Date('2016/04/08')
full <- seq(start, by='1 days', length=10)
Sample Time Series Data:
d1 <- data.frame(Date = seq(start, by ='2 days',length=5), Sales = c(5,10,15,20,25))
d2 <- data.frame(Date = seq(start, by= '1 day', length=10),Sales = c(1, 2, 3,4,5,6,7,8,9,10))
my.list <- list(d1, d2)
I want to merge all full date values into each data frame, and if no match exists then sales is NA:
my.list
[[d1]]
Date Sales
2016-04-08 5
2016-04-09 NA
2016-04-10 10
2016-04-11 NA
2016-04-12 15
2016-04-13 NA
2016-04-14 20
2016-04-15 NA
2016-04-16 25
2016-04-17 NA
[[d2]]
Date Sales
2016-04-08 1
2016-04-09 2
2016-04-10 3
2016-04-11 4
2016-04-12 5
2016-04-13 6
2016-04-14 7
2016-04-15 8
2016-04-16 9
2016-04-17 10

If I understand correctly, the OP wants to update each of the dataframes in my.list to contain one row for each date given in the vector of dates full
Base R
In base R, merge() can be used as already mentioned by Hack-R. However, th answer below expands this to work on all dataframes in the list:
# creat dataframe from vector of full dates
full.df <- data.frame(Date = full)
# apply merge on each dataframe in the list
lapply(my.list, merge, y = full.df, all.y = TRUE)
[[1]]
Date Sales
1 2016-04-08 5
2 2016-04-09 NA
3 2016-04-10 10
4 2016-04-11 NA
5 2016-04-12 15
6 2016-04-13 NA
7 2016-04-14 20
8 2016-04-15 NA
9 2016-04-16 25
10 2016-04-17 NA
[[2]]
Date Sales
1 2016-04-08 1
2 2016-04-09 2
3 2016-04-10 3
4 2016-04-11 4
5 2016-04-12 5
6 2016-04-13 6
7 2016-04-14 7
8 2016-04-15 8
9 2016-04-16 9
10 2016-04-17 10
Caveat
The answer assumes that full covers the overall range of Date of all dataframes in the list.
In order to avoid any mishaps, the overall range of Date can be retrieved from the available data in my.list:
overall_date_range <- Reduce(range, lapply(my.list, function(x) range(x$Date)))
full <- seq(overall_date_range[1], overall_date_range[2], by = "1 days")
Using rbindlist()
Alternatively, the list of dataframes which are identical in structure can be stored in a large dataframe. An additional attribute indicates to which product each row belongs to. The homogeneous structure simplifies subsequent operations.
The code below uses the rbindlist() function from the data.table package to create a large data.table. CJ() (cross join) creates all combinations of dates and product id which is then merged / joined to fill in the missing dates:
library(data.table)
all_products <- rbindlist(my.list, idcol = "product.id")[
CJ(product.id = unique(product.id), Date = seq(min(Date), max(Date), by = "1 day")),
on = .(Date, product.id)]
all_products
product.id Date Sales
1: 1 2016-04-08 5
2: 1 2016-04-09 NA
3: 1 2016-04-10 10
4: 1 2016-04-11 NA
5: 1 2016-04-12 15
6: 1 2016-04-13 NA
7: 1 2016-04-14 20
8: 1 2016-04-15 NA
9: 1 2016-04-16 25
10: 1 2016-04-17 NA
11: 2 2016-04-08 1
12: 2 2016-04-09 2
13: 2 2016-04-10 3
14: 2 2016-04-11 4
15: 2 2016-04-12 5
16: 2 2016-04-13 6
17: 2 2016-04-14 7
18: 2 2016-04-15 8
19: 2 2016-04-16 9
20: 2 2016-04-17 10
Subsequent operations can be grouped by product.id, e.g., to determine the number of valid sales data for each product:
all_products[!is.na(Sales), .(valid.sales.data = .N), by = product.id]
product.id valid.sales.data
1: 1 5
2: 2 10
Or, the totals sales per product:
all_products[, .(total.sales = sum(Sales, na.rm = TRUE)), by = product.id]
product.id total.sales
1: 1 75
2: 2 55
If required for some reason the result can be converted back to a list by
split(all_products, by = "product.id")

Looping over unique values [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a data frame in long format, with one observation row per measurement. I want to loop through each unique ID and find the "minimum" date for each unique individual. For example, patient 1 may be measured at three different times, but I want the earliest time. I thought about sorting the dataset by the date (in increasing order) and removing all duplicates, but I'm not sure if this is the best way to go. Any help or suggestions would be greatly appreciated. Thank you!

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'ID', order the 'Date' (assuming that it is in Date class or else change to Date class with as.Date with correct format), and get the first observation with head
library(data.table)
setDT(df1)[order(Date), head(.SD, 1), by = ID]

Here is another way using basic R:
earliestDates = aggregate(list(date = df$date), list(ID = df$ID), min)
result = merge(earliestDates,df)
earliestDates is a two column data frame that has the minimum date by ID. The merge will join the values in the other columns.
Example:
set.seed(1)
ID = floor(runif(20,1,5))
day = as.Date(floor(runif(20,1,25)),origin = "2017-1-1")
weight = floor(runif(20,80,95))
df = data.frame(ID = ID, date = day, weight = weight)
> df
ID date weight
1 2 2017-01-24 92
2 2 2017-01-07 89
3 3 2017-01-17 91
4 4 2017-01-05 88
5 1 2017-01-08 87
6 4 2017-01-11 91
7 4 2017-01-02 80
8 3 2017-01-11 87
9 3 2017-01-22 90
10 1 2017-01-10 90
11 1 2017-01-13 87
12 1 2017-01-16 92
13 3 2017-01-13 86
14 2 2017-01-06 83
15 4 2017-01-21 81
16 2 2017-01-18 81
17 3 2017-01-21 84
18 4 2017-01-04 87
19 2 2017-01-19 89
20 4 2017-01-11 86
After the aggregate and merge, the result is:
> result
ID date weight
1 1 2017-01-08 87
2 2 2017-01-06 83
3 3 2017-01-11 87
4 4 2017-01-02 80

Try the following dplyr code:
library(dplyr)
set.seed(12345)
###Create test dataset
tb <- tibble(id = rep(1:10, each = 3),
date = rep(seq(as.Date("2017-07-01"), by=10, len=10), 3),
obs = rnorm(30))
# # A tibble: 30 × 3
# id date obs
# <int> <date> <dbl>
# 1 2017-07-01 0.5855288
# 1 2017-07-11 0.7094660
# 1 2017-07-21 -0.1093033
# 2 2017-07-31 -0.4534972
# 2 2017-08-10 0.6058875
# 2 2017-08-20 -1.8179560
# 3 2017-08-30 0.6300986
# 3 2017-09-09 -0.2761841
# 3 2017-09-19 -0.2841597
# 4 2017-09-29 -0.9193220
# # ... with 20 more rows
###Pipe the dataset through dplyr's 'group_by' and 'filter' commands
tb %>% group_by(id) %>%
filter(date == min(date)) %>%
ungroup() %>%
distinct()
# # A tibble: 10 × 3
# id date obs
# <int> <date> <dbl>
# 1 2017-07-01 0.5855288
# 2 2017-07-31 -0.4534972
# 3 2017-08-30 0.6300986
# 4 2017-07-01 -0.1162478
# 5 2017-07-21 0.3706279
# 6 2017-08-20 0.8168998
# 7 2017-07-01 0.7796219
# 8 2017-07-11 1.4557851
# 9 2017-08-10 -1.5977095
# 10 2017-09-09 0.6203798

Fixing dates that were coerced into the wrong format

I have a large df with dates that were accidentally coerced into the wrong format.
Data:
id <- c(1:12)
date <- c("2014-01-03","2001-08-14","2001-08-14","2014-06-02","2006-06-14", "2006-06-14",
"2014-08-08","2014-08-08","2008-04-14","2009-12-13","2010-09-14","2012-09-14")
df <- data.frame(id,date)
Structure:
id date
1 1 2014-01-03
2 2 2001-08-14
3 3 2001-08-14
4 4 2014-06-02
5 5 2006-06-14
6 6 2006-06-14
7 7 2014-08-08
8 8 2014-08-08
9 9 2008-04-14
10 10 2009-12-13
11 11 2010-09-14
12 12 2012-09-14
The data set only includes, or rather should only include the years 2014 and 2013. The dates 2001-08-14 and 2006-06-14 are most likely 2014-08-01 and 2014-06-06, respectively.
Output:
id date
1 1 2014-01-03
2 2 2014-08-01
3 3 2014-08-01
4 4 2014-06-02
5 5 2014-06-06
6 6 2014-06-06
7 7 2014-08-08
8 8 2014-08-08
9 9 2014-04-08
10 10 2013-12-09
11 11 2014-09-10
12 12 2014-09-12
How can I reconcile this mess?

Package lubridate has the convenient function year that will be useful here.
library(lubridate)
# Convert date to proper date class variable
df$date <- as.Date(df$date)
# Isolate problematic indices; when year is not in 2013 or 2014,
# we'll go to and from character representation. We'll trim
# the "20" in front of the "false year" and then specify the
# proper format to read the character back into a Date class.
tmp.indices <- which(!year(df$date) %in% c("2013", "2014"))
df$date[tmp.indices] <- as.Date(substring(as.character(df$date[tmp.indices]),
first = 3), format = "%d-%m-%y")
Result:
id date
1 1 2014-01-03
2 2 2014-08-01
3 3 2014-08-01
4 4 2014-06-02
5 5 2014-06-06
6 6 2014-06-06
7 7 2014-08-08
8 8 2014-08-08
9 9 2014-04-08
10 10 2013-12-09
11 11 2014-09-10
12 12 2014-09-12

We could convert the 'date' column to 'Date' class, extract the 'year' to create a logical index ('indx') for years 2013, 2014).
df$date <- as.Date(df$date)
indx <- !format(df$date, '%Y') %in% 2013:2014
By using lubridate, convert to 'Date' class using dmy after removing the first two characters.
library(lubridate)
df$date[indx] <- dmy(sub('^..', '', df$date[indx]))
df
# id date
#1 1 2014-01-03
#2 2 2014-08-01
#3 3 2014-08-01
#4 4 2014-06-02
#5 5 2014-06-06
#6 6 2014-06-06
#7 7 2014-08-08
#8 8 2014-08-08
#9 9 2014-04-08
#10 10 2013-12-09
#11 11 2014-09-10
#12 12 2014-09-12

Calculating mean date by row

I wish to obtain the mean date by row, where each row contains two dates. Eventually I found a way, posted below. However, the approach I used seems rather cumbersome. Is there a better way?
my.data = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE
1 3 6 2012 3 10 2012 1
2 3 10 2012 3 20 2012 1
3 3 16 2012 3 30 2012 1
4 3 20 2012 4 8 2012 1
5 3 20 2012 4 9 2012 1
6 3 20 2012 4 10 2012 1
7 3 20 2012 4 11 2012 1
8 4 4 2012 4 5 2012 1
9 4 6 2012 4 6 2012 1
10 4 6 2012 4 7 2012 1
", header = TRUE, stringsAsFactors = FALSE)
my.data
my.data$MY.DATE1 <- do.call(paste, list(my.data$MONTH1, my.data$DAY1, my.data$YEAR1))
my.data$MY.DATE2 <- do.call(paste, list(my.data$MONTH2, my.data$DAY2, my.data$YEAR2))
my.data$MY.DATE1 <- as.Date(my.data$MY.DATE1, format=c("%m %d %Y"))
my.data$MY.DATE2 <- as.Date(my.data$MY.DATE2, format=c("%m %d %Y"))
my.data
desired.result = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2 mean.date
1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10 2012-03-08
2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20 2012-03-15
3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30 2012-03-23
4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08 2012-03-29
5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09 2012-03-30
6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10 2012-03-30
7 3 20 2012 4 11 2012 1 2012-03-20 2012-04-11 2012-03-31
8 4 4 2012 4 5 2012 1 2012-04-04 2012-04-05 2012-04-04
9 4 6 2012 4 6 2012 1 2012-04-06 2012-04-06 2012-04-06
10 4 6 2012 4 7 2012 1 2012-04-06 2012-04-07 2012-04-06
", header = TRUE, stringsAsFactors = FALSE)
Here is the approach that worked for me:
my.data$mean.date <- (my.data$MY.DATE1 + ((my.data$MY.DATE2 - my.data$MY.DATE1) / 2))
my.data
These approaches did not work:
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2)
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2, trim = 0)
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2, trim = 1)
my.data$mean.date <- mean(my.data$MY.DATE1, my.data$MY.DATE2, trim = 0.5)
my.data$mean.data <- apply(my.data, 1, function(x) {(x[9] + x[10]) / 2})
I think I am supposed to use the Ops.Date command, but have not found an example.
Thank you for any suggestions.

Keep things simple and use mean.Date in base R.
mean.Date(as.Date(c("01-01-2014", "01-07-2014"), format=c("%m-%d-%Y")))
[1] "2014-01-04"

Using the good advice of # jaysunice3401, I came up with this. If you want to keep the original data, you can add remove = FALSE in the two lines with unite
library(dplyr)
library(tidyr)
my.data %>%
unite(whatever1, matches("1"), sep = "-") %>%
unite(whatever2, matches("2"), sep = "-") %>%
mutate_each(funs(as.Date(., "%m-%d-%Y")), contains("whatever")) %>%
rowwise %>%
mutate(mean.date = mean.Date(c(whatever1, whatever2)))
# OBS whatever1 whatever2 STATE mean.date
#1 1 2012-03-06 2012-03-10 1 2012-03-08
#2 2 2012-03-10 2012-03-20 1 2012-03-15
#3 3 2012-03-16 2012-03-30 1 2012-03-23
#4 4 2012-03-20 2012-04-08 1 2012-03-29
#5 5 2012-03-20 2012-04-09 1 2012-03-30
#6 6 2012-03-20 2012-04-10 1 2012-03-30
#7 7 2012-03-20 2012-04-11 1 2012-03-31
#8 8 2012-04-04 2012-04-05 1 2012-04-04
#9 9 2012-04-06 2012-04-06 1 2012-04-06
#10 10 2012-04-06 2012-04-07 1 2012-04-06

Maybe something like that?
library(data.table)
setDT(my.data)[, `:=`(MY.DATE1 = as.Date(paste(DAY1 ,MONTH1, YEAR1), format = "%d %m %Y"),
MY.DATE2 = as.Date(paste(DAY2 ,MONTH2, YEAR2), format = "%d %m %Y"))][,
mean.date := MY.DATE2 - ceiling((MY.DATE2 - MY.DATE1)/2)]
my.data
# OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2 mean.date
# 1: 1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10 2012-03-08
# 2: 2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20 2012-03-15
# 3: 3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30 2012-03-23
# 4: 4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08 2012-03-29
# 5: 5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09 2012-03-30
# 6: 6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10 2012-03-30
# 7: 7 3 20 2012 4 11 2012 1 2012-03-20 2012-04-11 2012-03-31
# 8: 8 4 4 2012 4 5 2012 1 2012-04-04 2012-04-05 2012-04-04
# 9: 9 4 6 2012 4 6 2012 1 2012-04-06 2012-04-06 2012-04-06
# 10: 10 4 6 2012 4 7 2012 1 2012-04-06 2012-04-07 2012-04-06
Or if you insist on using mean.date, here's alternative solution:
library(data.table)
setDT(my.data)[, `:=`(MY.DATE1 = as.Date(paste(DAY1 ,MONTH1, YEAR1), format = "%d %m %Y"),
MY.DATE2 = as.Date(paste(DAY2 ,MONTH2, YEAR2), format = "%d %m %Y"))][,
mean.date := mean.Date(c(MY.DATE1, MY.DATE2)), by = OBS]

One-liner (split for readability), uses lubridate and dplyr and (of course) pipes:
> require(lubridate)
> require(dplyr)
> my.data = my.data %>%
mutate(
MY.DATE1=as.Date(mdy(paste(MONTH1,DAY1,YEAR1))),
MY.DATE2=as.Date(mdy(paste(MONTH2,DAY2,YEAR2)))) %>%
rowwise %>%
mutate(mean.data=mean.Date(c(MY.DATE1,MY.DATE2))) %>% data.frame()
> head(my.data)
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2
1 1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10
2 2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20
3 3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30
4 4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08
5 5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09
6 6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10
mean.data
1 2012-03-08
2 2012-03-15
3 2012-03-23
4 2012-03-29
5 2012-03-30
6 2012-03-30
As an afterthought, if you like pipes, you can put a pipe in your pipe so you can pipe while you pipe - rewriting the first mutate step thus:
my.data %>% mutate(
MY.DATE1 = paste(MONTH1,DAY1,YEAR1) %>% mdy %>% as.Date,
MY.DATE2 = paste(MONTH2,DAY2,YEAR2) %>% mdy %>% as.Date)

1) Create Date class columns and then its easy. No external packages are used:
asDate <- function(x) as.Date(x, "1970-01-01")
my.data2 <- transform(my.data,
date1 = as.Date(ISOdate(YEAR1, MONTH1, DAY1)),
date2 = as.Date(ISOdate(YEAR2, MONTH2, DAY2))
)
transform(my.data2, mean.date = asDate(rowMeans(cbind(date1, date2))))
If we did add a library(zoo) call then we could omit the asDate definition using as.Date in the last line instead of asDate since zoo adds a default origin to as.Date.
1a) A dplyr version would look like this (using asDate from above):
library(dplyr)
my.data %>%
mutate(
date1 = ISOdate(YEAR1, MONTH1, DAY1) %>% as.Date,
date2 = ISOdate(YEAR2, MONTH2, DAY2) %>% as.Date,
mean.date = cbind(date1, date2) %>% rowMeans %>% asDate)
2) Another way uses julian in the chron package. julian converts a month/day/year to the number of days since the Epoch. We can average the two julians and convert back to Date class:
library(zoo)
library(chron)
transform(my.data,
mean.date = as.Date( ( julian(MONTH1,DAY1,YEAR1) + julian(MONTH2,DAY2,YEAR2) )/2 )
)
We could omit library(zoo) if we used asDate from (1) in place of as.Date.
Update Discussed use of zoo to shorten the solutions and made further reductions in solution (1).

what about :
apply(my.data[,c("MY.DATE1","MY.DATE2")],1,function(date){substr(strptime(mean(c(strptime(date[1],"%y%y-%m-%d"),strptime(date[2],"%y%y-%m-%d"))),format="%y%y-%m-%d"),1,10)})
?
(I just had to use substr because of CET and CEST that put my output as a list...)

This is a vectorized version of the answer posted by jaysunice3401. It seems fairly straight-forward, except that I had to use trial-and-error to identify the correct origin. I do not know how general origin = "1970-01-01" is or whether a different origin would have to be specified with each data set.
According to this website: http://www.ats.ucla.edu/stat/r/faq/dates.htm
When R looks at dates as integers, its origin is January 1, 1970.
Which seems to suggest that origin = "1970-01-01" is fairly general. Although, if I had dates prior to "1970-01-01" in my data set I would definitely test the code before using it.
my.data = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE
1 3 6 2012 3 10 2012 1
2 3 10 2012 3 20 2012 1
3 3 16 2012 3 30 2012 1
4 3 20 2012 4 8 2012 1
5 3 20 2012 4 9 2012 1
6 3 20 2012 4 10 2012 1
7 3 20 2012 4 11 2012 1
8 4 4 2012 4 5 2012 1
9 4 6 2012 4 6 2012 1
10 4 6 2012 4 7 2012 1
", header = TRUE, stringsAsFactors = FALSE)
desired.result = read.table(text = "
OBS MONTH1 DAY1 YEAR1 MONTH2 DAY2 YEAR2 STATE MY.DATE1 MY.DATE2 mean.date
1 3 6 2012 3 10 2012 1 2012-03-06 2012-03-10 2012-03-08
2 3 10 2012 3 20 2012 1 2012-03-10 2012-03-20 2012-03-15
3 3 16 2012 3 30 2012 1 2012-03-16 2012-03-30 2012-03-23
4 3 20 2012 4 8 2012 1 2012-03-20 2012-04-08 2012-03-29
5 3 20 2012 4 9 2012 1 2012-03-20 2012-04-09 2012-03-30
6 3 20 2012 4 10 2012 1 2012-03-20 2012-04-10 2012-03-30
7 3 20 2012 4 11 2012 1 2012-03-20 2012-04-11 2012-03-31
8 4 4 2012 4 5 2012 1 2012-04-04 2012-04-05 2012-04-04
9 4 6 2012 4 6 2012 1 2012-04-06 2012-04-06 2012-04-06
10 4 6 2012 4 7 2012 1 2012-04-06 2012-04-07 2012-04-06
", header = TRUE, stringsAsFactors = FALSE)
my.data$MY.DATE1 <- do.call(paste, list(my.data$MONTH1,my.data$DAY1,my.data$YEAR1))
my.data$MY.DATE2 <- do.call(paste, list(my.data$MONTH2,my.data$DAY2,my.data$YEAR2))
my.data$MY.DATE1 <- as.Date(my.data$MY.DATE1, format=c("%m %d %Y"))
my.data$MY.DATE2 <- as.Date(my.data$MY.DATE2, format=c("%m %d %Y"))
my.data$mean.date2 <- as.Date( apply(my.data, 1, function(x) {
mean.Date(c(as.Date(x['MY.DATE1']), as.Date(x['MY.DATE2'])))
}) , origin = "1970-01-01")
my.data
desired.result

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How can I extract the month using sqldf package - r

Related

Averaging a monthly time series with incomplete observations

Match dates from list of data frames in R

Looping over unique values [closed]

Fixing dates that were coerced into the wrong format

Calculating mean date by row

Categories

Resources