Matching date columns with unequal length in R

Matching date columns with unequal length in R - r

I have data in the following format with 3 date columns
X <- c(24/02/2016, 25/02/2016, 26/02/2016, 29/02/2016, 01/03/2016, 02/03/2016, 03/03/2016, 04/03/2016, 07/03/2016, 08/03/2016, 09/03/2016, 10/03/2016, 11/03/2016, 14/03/2016, 15/03/2016)
Y <- c(26/08/2014, 10/09/2014,24/09/2014, 09/10/2014, 24/02/2016, 09/03/2016, 24/03/2016, 11/04/2016, 26/04/2016)
Z <- c(15/08/2014, 29/08/2014, 15/09/2014, 30/09/2014, 12/02/2016, 29/02/2016, 15/03/2016, 31/03/2016, 15/04/2016)
the output i want is like below
X Output
24/02/2016 12/02/2016
25/02/2016 NA
26/02/2016 NA
29/02/2016 NA
01/03/2016 NA
02/03/2016 NA
03/03/2016 NA
04/03/2016 NA
07/03/2016 NA
08/03/2016 NA
09/03/2016 29/02/2016
10/03/2016 NA
11/03/2016 NA
14/03/2016 NA
15/03/2016 NA
Basically the problem is wherever there is a match between X and Y, i need Z corresponding to X in a new column.
I am not really good with R so not able to figure out how to come up with a solution. Any ideas ?

You could do this in base R using match, but I find it cleaner to use the dplyr package and left_join.
library(dplyr)
# make a data frame with X as a column
X.df <- data.frame(X = c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016"), stringsAsFactors = F)
# make a data frame with Y and Z as columns
YZ.df <- data.frame(Y = c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016"), Z = c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016"), stringsAsFactors = F)
# do a left join, specifying variables X and Y
left_join(X.df, YZ.df, by = c("X" = "Y"))
Note that the above will create duplicate rows for X if there is more than one corresponding Z value for a Y value that matches an X value.

For the sake of completeness, here is the data.table version complementing gatsky's answer:
library(data.table)
data.table(Y, Z)[data.table(X), on = .(Y == X), .(X, Z)]
X Z
1: 24/02/2016 12/02/2016
2: 25/02/2016 NA
3: 26/02/2016 NA
4: 29/02/2016 NA
5: 01/03/2016 NA
6: 02/03/2016 NA
7: 03/03/2016 NA
8: 04/03/2016 NA
9: 07/03/2016 NA
10: 08/03/2016 NA
11: 09/03/2016 29/02/2016
12: 10/03/2016 NA
13: 11/03/2016 NA
14: 14/03/2016 NA
15: 15/03/2016 NA
Data
Z <- c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016")
Y <- c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016")
X <- c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016")

Using match
# Construct data
Z = c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016")
Y = c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016")
df <- data.frame(X = c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016"), stringsAsFactors = F)
# Match df$X to Y and return that index of Z
df$Output<-Z[match(df$X,Y)]
Output
> df
X Output
1 24/02/2016 12/02/2016
2 25/02/2016 <NA>
3 26/02/2016 <NA>
4 29/02/2016 <NA>
5 01/03/2016 <NA>
6 02/03/2016 <NA>
7 03/03/2016 <NA>
8 04/03/2016 <NA>
9 07/03/2016 <NA>
10 08/03/2016 <NA>
11 09/03/2016 29/02/2016
12 10/03/2016 <NA>
13 11/03/2016 <NA>
14 14/03/2016 <NA>
15 15/03/2016 <NA>

Related

How to create data.frame with different number of rows RData

I have a file (format RData).https://stepik.org/media/attachments/course/724/all_data.Rdata This file contains 7 lists with id and temperature of patients.
I need to make one data.frame from these lists and then remove all rows with NA
id temp i.temp i.temp.1 i.temp.2 i.temp.3 i.temp.4 i.temp.5
1: 1 36.70378 36.73161 36.22944 36.05907 35.66014 37.32798 35.88121
2: 2 36.43545 35.96814 36.86782 37.20890 36.45172 36.82727 36.83450
3: 3 36.87599 36.38842 36.70508 37.44710 36.73362 37.09359 35.92993
4: 4 36.17120 35.95853 36.33405 36.45134 37.17186 36.87482 35.45489
5: 5 37.20341 37.04881 36.53252 36.22922 36.78106 36.89219 37.13207
6: 6 36.12201 36.53433 37.29784 35.96451 36.70838 36.58684 36.60122
7: 7 36.92314 36.16220 36.48154 37.05324 36.57829 36.24955 37.23835
8: 8 35.71390 37.26879 37.01673 36.65364 36.89143 36.46331 37.15398
9: 9 36.63558 37.03452 36.40129 37.53705 36.03568 36.78083 36.71873
10: 10 36.77329 36.07161 36.42992 36.20715 36.78880 36.79875 36.15004
11: 11 36.66199 36.74958 36.28661 36.72539 36.17700 37.47495 35.60980
12: 12 NA 36.97689 36.00473 36.64292 35.96789 36.73904 36.93957
13: 13 NA NA NA NA NA 36.63760 36.83916
14: 14 37.40307 35.89668 36.30619 36.64382 37.21882 35.87420 35.45550
15: 15 NA NA NA 37.03758 36.72512 36.45281 37.54388
16: 16 NA 36.44912 36.57126 36.20703 36.83076 36.48287 35.99391
17: 17 NA NA NA 36.39900 36.54043 36.75989 36.47079
18: 18 36.51696 37.09903 37.31166 36.51000 36.42414 36.87976 36.45736
19: 19 37.05117 37.42526 36.15820 36.11824 37.07024 36.60699 36.80168
20: 20 NA NA NA NA NA NA 36.74118
I wrote:
load("https://stepik.org/media/attachments/course/724/all_data.Rdata")
library(data.table)
day1<-as.data.table(all_data[1])
day2<-as.data.table(all_data[2])
day3<-as.data.table(all_data[3])
day4<-as.data.table(all_data[4])
day5<-as.data.table(all_data[5])
day6<-as.data.table(all_data[6])
day7<-as.data.table(all_data[7])
setkey(day1, id)
setkey(day2, id)
setkey(day3, id)
setkey(day4, id)
setkey(day5, id)
setkey(day6, id)
setkey(day7, id)
all_day<-day1[day2,][day3, ][day4,][day5,][day6,][day7,]
all_day<-na.omit(all_day)
But it takes too long. How can I make it faster?

here is a data.table solution
library( data.table )
#set names for all_data
names( all_data ) <- paste0( "day", 1:length(all_data))
#bind lists to data.table
DT <- data.table::rbindlist( all_data, use.names = TRUE, fill = TRUE, idcol = "day" )
#cast to wide
ans <- dcast( DT, id ~ day, value.var = "temp" )
#only keep complete rows and present output (using [] at the end)
ans[ complete.cases( ans ), ][]
# id day1 day2 day3 day4 day5 day6 day7
# 1: 1 36.70378 36.73161 36.22944 36.05907 35.66014 37.32798 35.88121
# 2: 2 36.43545 35.96814 36.86782 37.20890 36.45172 36.82727 36.83450
# 3: 3 36.87599 36.38842 36.70508 37.44710 36.73362 37.09359 35.92993
# 4: 4 36.17120 35.95853 36.33405 36.45134 37.17186 36.87482 35.45489
# 5: 5 37.20341 37.04881 36.53252 36.22922 36.78106 36.89219 37.13207
# 6: 6 36.12201 36.53433 37.29784 35.96451 36.70838 36.58684 36.60122
# 7: 7 36.92314 36.16220 36.48154 37.05324 36.57829 36.24955 37.23835
# 8: 8 35.71390 37.26879 37.01673 36.65364 36.89143 36.46331 37.15398
# 9: 9 36.63558 37.03452 36.40129 37.53705 36.03568 36.78083 36.71873
# 10:10 36.77329 36.07161 36.42992 36.20715 36.78880 36.79875 36.15004
# 11:11 36.66199 36.74958 36.28661 36.72539 36.17700 37.47495 35.60980
# 12:14 37.40307 35.89668 36.30619 36.64382 37.21882 35.87420 35.45550
# 13:18 36.51696 37.09903 37.31166 36.51000 36.42414 36.87976 36.45736
# 14:19 37.05117 37.42526 36.15820 36.11824 37.07024 36.60699 36.80168

Take unique values of a column and add each in unique column in same row as `by` in data.table

Apologies in advance...I couldn't articulate a better title.
Here is the problem:
I am working with a data.table and have grouped rows using 'by'. This results in the same number of rows as the unique values of the column of interest. For each unique 'by' value (in this example, 'lat_lon'), I want to take the unique values in another column (ID) and add them to the same row as the unique by column.
Here is an example:
lat_lon ID
1: 42.04166667_-80.4375 26D25
2: 42.04166667_-80.4375 26D26
3: 42.04166667_-80.3125 26D34
4: 42.04166667_-80.3125 26D35
5: 42.04166667_-80.3125 26D36
6: 42.125_-80.1875 26D41
7: 42.125_-80.1875 27C46
8: 42.125_-80.1875 27D42
9: 42.04166667_-80.1875 26D43
10: 42.04166667_-80.1875 26D45
11: 42.04166667_-80.1875 27D44
12: 42.04166667_-80.1875 27D46
13: 42.29166667_-79.8125 27B76
14: 42.20833333_-80.0625 27C53
15: 42.20833333_-80.0625 27C54
16: 42.125_-80.0625 27C55
17: 42.125_-80.0625 27C56
18: 42.125_-80.0625 27D51
19: 42.125_-80.0625 27D52
What I really want is this:
lat_lon ID.1 ID.2 ID.3 ID.4 ID.5 ID.6 ID.7 ID.8 ID.9 ID.10
42.04166667_-80.4375 26D25 26D26 NA NA NA NA NA NA NA NA
42.04166667_-80.3125 26D34 26D35 26D36 NA NA NA NA NA NA NA
...
42.125_-80.0625 27C55 27C56 27D51 27D52 NA NA NA NA NA NA
Thank you for your patience and helpful comments.

For a data.table solution, adding a idx column (rn) first then pivot using dcast.data.table would help:
dcast.data.table(dat[, rn := paste0("ID.", seq_len(.N)), by=.(lat_lon)],
lat_lon ~ rn, value.var="ID")
# lat_lon ID.1 ID.2 ID.3 ID.4
# 1: 42.04166667_-80.1875 26D43 26D45 27D44 27D46
# 2: 42.04166667_-80.3125 26D34 26D35 26D36 NA
# 3: 42.04166667_-80.4375 26D25 26D26 NA NA
# 4: 42.125_-80.0625 27C55 27C56 27D51 27D52
# 5: 42.125_-80.1875 26D41 27C46 27D42 NA
# 6: 42.20833333_-80.0625 27C53 27C54 NA NA
# 7: 42.29166667_-79.8125 27B76 NA NA NA
data:
dat <- fread("lat_lon ID
42.04166667_-80.4375 26D25
42.04166667_-80.4375 26D26
42.04166667_-80.3125 26D34
42.04166667_-80.3125 26D35
42.04166667_-80.3125 26D36
42.125_-80.1875 26D41
42.125_-80.1875 27C46
42.125_-80.1875 27D42
42.04166667_-80.1875 26D43
42.04166667_-80.1875 26D45
42.04166667_-80.1875 27D44
42.04166667_-80.1875 27D46
42.29166667_-79.8125 27B76
42.20833333_-80.0625 27C53
42.20833333_-80.0625 27C54
42.125_-80.0625 27C55
42.125_-80.0625 27C56
42.125_-80.0625 27D51
42.125_-80.0625 27D52")

This is a departure from data.table (not that it can't be done there I'm sure but I'm less familiar) into the tidyverse
require(tidyr)
require(dplyr)
wide_data <- dat %>% group_by(lat_lon) %>% mutate(IDno = paste0("ID.",row_number())) %>% spread(IDno, ID)
This assumes that there are no duplicated lines with an ID repeated for a lat_lon. You could add distinct() to the chain before the grouping if this isn't the case

apply a rolling mean to a database by an index

I would like to calculate a rolling mean on data in a single data frame by multiple ids. See my example dataset below.
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)
df <- data.frame(date, index, x, y, z)
I would like to calculate the rolling mean for x, y and z, by a and then by b.
I tried the following, but I am getting an error.
test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))
The error:
Error in xu[k:n] - xu[c(1, seq_len(n - k))] :
non-numeric argument to binary operator
It seems like there is an issue with the fact that index is a character, but I need it in order to calculate the means...

1) ave Try ave rather than tapply and make sure it is applied only over the columns of interest, i.e. columns 3, 4, 5.
roll <- function(x) rollmean(x, 5, fill = NA)
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))
giving:
date index x y z
1 2015-02-01 a NA NA NA
2 2015-02-02 a NA NA NA
3 2015-02-03 a 66.50522 127.45650 129.8472
4 2015-02-04 a 61.71320 123.83633 129.7673
5 2015-02-05 a 56.56125 120.86158 126.1371
6 2015-02-06 a 66.13340 119.93428 127.1819
7 2015-02-07 a 59.56807 105.83208 125.1244
8 2015-02-08 a 49.98779 95.66024 139.2321
9 2015-02-09 a NA NA NA
10 2015-02-10 a NA NA NA
11 2015-02-01 b NA NA NA
12 2015-02-02 b NA NA NA
13 2015-02-03 b 55.71327 117.52219 139.3961
14 2015-02-04 b 54.58450 107.81763 142.6101
15 2015-02-05 b 50.48102 104.94084 136.3167
16 2015-02-06 b 37.89790 95.45489 135.4044
17 2015-02-07 b 33.05259 85.90916 150.8673
18 2015-02-08 b 49.91385 90.04940 147.1376
19 2015-02-09 b NA NA NA
20 2015-02-10 b NA NA NA
2) by Another way is to use by. roll2 handles one group, by applies it to each group producing a by list and do.call("rbind", ...) puts it back together.
roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
do.call("rbind", by(df, df$index, roll2))
giving:
date index x y z
a.1 2015-02-01 a NA NA NA
a.2 2015-02-02 a NA NA NA
a.3 2015-02-03 a 66.50522 127.45650 129.8472
a.4 2015-02-04 a 61.71320 123.83633 129.7673
a.5 2015-02-05 a 56.56125 120.86158 126.1371
a.6 2015-02-06 a 66.13340 119.93428 127.1819
a.7 2015-02-07 a 59.56807 105.83208 125.1244
a.8 2015-02-08 a 49.98779 95.66024 139.2321
a.9 2015-02-09 a NA NA NA
a.10 2015-02-10 a NA NA NA
b.11 2015-02-01 b NA NA NA
b.12 2015-02-02 b NA NA NA
b.13 2015-02-03 b 55.71327 117.52219 139.3961
b.14 2015-02-04 b 54.58450 107.81763 142.6101
b.15 2015-02-05 b 50.48102 104.94084 136.3167
b.16 2015-02-06 b 37.89790 95.45489 135.4044
b.17 2015-02-07 b 33.05259 85.90916 150.8673
b.18 2015-02-08 b 49.91385 90.04940 147.1376
b.19 2015-02-09 b NA NA NA
b.20 2015-02-10 b NA NA NA
3) wide form Another approach is to convert df from long form to wide form in which case a plain rollmean will do it.
rollmean(read.zoo(df, split = 2), 5, fill = NA)
giving:
x.a y.a z.a x.b y.b z.b
2015-02-01 NA NA NA NA NA NA
2015-02-02 NA NA NA NA NA NA
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
2015-02-06 66.13340 119.93428 127.1819 37.89790 95.45489 135.4044
2015-02-07 59.56807 105.83208 125.1244 33.05259 85.90916 150.8673
2015-02-08 49.98779 95.66024 139.2321 49.91385 90.04940 147.1376
2015-02-09 NA NA NA NA NA NA
2015-02-10 NA NA NA NA NA NA
This works because the dates are the same for both groups. If the dates were different then it could introduce NAs and rollmean cannot handle those. In that case use
rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)
Note: Since the input uses random numbers in its definition to make it reproducible we must issue set.seed first. We used this:
set.seed(123)
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

This ought to do the trick using the library dplyr and zoo:
library(dplyr)
library(zoo)
df %>%
group_by(index) %>%
mutate(x_mean = rollmean(x, 5, fill = NA),
y_mean = rollmean(y, 5, fill = NA),
z_mean = rollmean(z, 5, fill = NA))
You could probably tidy this up more using mutate_each or some other form of mutate.
You can also change the arguments within rollmean to fit your needs, such as align = "right" or na.pad = TRUE

How to do a data.table rolling join?

I have two data tables that I'm trying to merge. One is data on company market values through time and the other is company dividend history through time. I'm trying to find out how much each company has paid each quarter and put that value next to the market value data through time.
library(magrittr)
library(data.table)
library(zoo)
library(lubridate)
set.seed(1337)
# data table of company market values
companies <-
data.table(companyID = 1:10,
Sedol = rep(c("91772E", "7A662B"), each = 5),
Date = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)),
50 + cumsum(rnorm(5,1,5)))) %>%
setkey(Sedol, Date)
# data table of dividends
dividends <-
data.table(DivID = 1:7,
Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
Date = as.Date(c('2004-11-19', '2005-01-13', '2005-01-29',
'2005-10-01', '2005-06-29', '2005-06-30',
'2006-04-17')),
DivAmnt = rnorm(7, .8, .3)) %>%
setkey(Sedol, Date)
I believe this is a situation where you could use a data.table rolling join, something like:
dividends[companies, roll = "nearest"]
to try and get a dataset that looks like
DivID Sedol Date DivAmnt companyID MktCap
1: NA 7A662B <NA> NA 6 61.21061
2: 5 7A662B 2005-06-29 0.7772631 7 66.92951
3: 6 7A662B 2005-06-30 1.1815343 7 66.92951
4: NA 7A662B <NA> NA 8 78.33914
5: NA 7A662B <NA> NA 9 88.92473
6: NA 7A662B <NA> NA 10 87.85067
7: 2 91772E 2005-01-13 0.2964291 1 105.19249
8: 3 91772E 2005-01-29 0.8472649 1 105.19249
9: NA 91772E <NA> NA 2 108.74579
10: 4 91772E 2005-10-01 1.2467408 3 113.42261
11: NA 91772E <NA> NA 4 120.04491
12: NA 91772E <NA> NA 5 124.35588
(note that I've matched the dividends to the company market values by the exact quarter)
But I'm not exactly sure how to execute it. The CRAN pdf is rather vague about what the number is or should be if roll is a value (Can you pass dates? Does a number quantify the days forward to carry? the number of obersvations?) and changing rollends around doesn't seem to get me what I want.
In the end, I ended up mapping the dividend dates to their quarter end and then joining on that. A good solution, but not useful if I end up needing to know how to perform rolling joins. In your answer, could you describe a situation where rolling joins are the only solution as well as help me understand how to perform them?

Instead of a rolling join, you may want to use an overlap join with the foverlaps function of data.table:
# create an interval in the 'companies' datatable
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
# create a second date in the 'dividends' datatable
dividends[, Date2 := divDate]
# set the keys for the two datatable
setkey(companies, Sedol, start, end)
setkey(dividends, Sedol, divDate, Date2)
# create a vector of columnnames which can be removed afterwards
deletecols <- c("Date2","start","end")
# perform the overlap join and remove the helper columns
res <- foverlaps(companies, dividends)[, (deletecols) := NULL]
the result:
> res
Sedol DivID divDate DivAmnt companyID compDate MktCap
1: 7A662B NA <NA> NA 6 2005-03-31 61.21061
2: 7A662B 5 2005-06-29 0.7772631 7 2005-06-30 66.92951
3: 7A662B 6 2005-06-30 1.1815343 7 2005-06-30 66.92951
4: 7A662B NA <NA> NA 8 2005-09-30 78.33914
5: 7A662B NA <NA> NA 9 2005-12-31 88.92473
6: 7A662B NA <NA> NA 10 2006-03-31 87.85067
7: 91772E 2 2005-01-13 0.2964291 1 2005-03-31 105.19249
8: 91772E 3 2005-01-29 0.8472649 1 2005-03-31 105.19249
9: 91772E NA <NA> NA 2 2005-06-30 108.74579
10: 91772E 4 2005-10-01 1.2467408 3 2005-09-30 113.42261
11: 91772E NA <NA> NA 4 2005-12-31 120.04491
12: 91772E NA <NA> NA 5 2006-03-31 124.35588
In the meantime the data.table authors have introduced non-equi joins (v1.9.8). You can also use that to solve this problem. Using a non-equi join you just need:
companies[, `:=` (start = compDate - days(90), end = compDate + days(15))]
dividends[companies, on = .(Sedol, divDate >= start, divDate <= end)]
to get the intended result.
Used data (the same as in the question, but without the creation of the keys):
set.seed(1337)
companies <- data.table(companyID = 1:10, Sedol = rep(c("91772E", "7A662B"), each = 5),
compDate = (as.Date("2005-04-01") + months(seq(0, 12, 3))) - days(1),
MktCap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5))))
dividends <- data.table(DivID = 1:7, Sedol = c(rep('91772E', each = 4), rep('7A662B', each = 3)),
divDate = as.Date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')),
DivAmnt = rnorm(7, .8, .3))

rowSums but keeping NA values

I am pretty sure this is quite simple, but seem to have got stuck...I have two xts vectors that have been merged together, which contain numeric values and NAs.
I would like to get the rowSums for each index period, but keeping the NA values.
Below is a reproducible example
set.seed(120)
dd <- xts(rnorm(100),Sys.Date()-c(100:1))
dd1 <- ifelse(dd<(-0.5),dd*-1,NA)
dd2 <- ifelse((dd^2)>0.5,dd,NA)
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm,na.rm=TRUE)
tail(mm,10)
dd1 dd2 m
2013-08-02 NA NA 0.000000
2013-08-03 NA NA 0.000000
2013-08-04 NA NA 0.000000
2013-08-05 1.2542692 -1.2542692 0.000000
2013-08-06 NA 1.3325804 1.332580
2013-08-07 NA 0.7726740 0.772674
2013-08-08 0.8158402 -0.8158402 0.000000
2013-08-09 NA 1.2292919 1.229292
2013-08-10 NA NA 0.000000
2013-08-11 NA 0.9334900 0.933490
In the above example on the 10th Aug 2013 I was hoping it would say NA instead of 0, the same goes for the 2nd-4th Aug 2013.
Any suggestions for an elegant way of getting NAs in the relevant places?

If you have a variable number of columns you could try this approach:
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm, na.rm=TRUE) * ifelse(rowSums(is.na(mm)) == ncol(mm), NA, 1)
# or, as #JoshuaUlrich commented:
#mm$m <- ifelse(apply(is.na(mm),1,all),NA,rowSums(mm,na.rm=TRUE))
tail(mm, 10)
# dd1 dd2 m
#2013-08-02 NA NA NA
#2013-08-03 NA NA NA
#2013-08-04 NA NA NA
#2013-08-05 1.2542692 -1.2542692 0.000000
#2013-08-06 NA 1.3325804 1.332580
#2013-08-07 NA 0.7726740 0.772674
#2013-08-08 0.8158402 -0.8158402 0.000000
#2013-08-09 NA 1.2292919 1.229292
#2013-08-10 NA NA NA
#2013-08-11 NA 0.9334900 0.933490

Use logical indexing with [ and is.na(·) to localize the entries where both are NA and then replace them with NA.
Try this:
> mm[is.na(mm$dd1) & is.na(mm$dd2), "m"] <- NA
> mm
dd1 dd2 m
2013-08-02 NA NA NA
2013-08-03 NA NA NA
2013-08-04 NA NA NA
2013-08-05 1.2542692 -1.2542692 0.000000
2013-08-06 NA 1.3325804 1.332580
2013-08-07 NA 0.7726740 0.772674
2013-08-08 0.8158402 -0.8158402 0.000000
2013-08-09 NA 1.2292919 1.229292
2013-08-10 NA NA NA
2013-08-11 NA 0.9334900 0.933490

mm$m <- "is.na<-"(rowSums(mm, na.rm = TRUE), !rowSums(!is.na(mm)))
> tail(mm)
# dd1 dd2 m
# 2013-08-06 NA 1.3325804 1.332580
# 2013-08-07 NA 0.7726740 0.772674
# 2013-08-08 0.8158402 -0.8158402 0.000000
# 2013-08-09 NA 1.2292919 1.229292
# 2013-08-10 NA NA NA
# 2013-08-11 NA 0.9334900 0.933490

My solution would be
library(magrittr)
mm <- mm %>%
transform(ccardNA = rowSums(!is.na(.))/rowSums(!is.na(.)), m = rowSums(., na.rm = TRUE)) %>%
transform(m = ifelse(is.nan(ccardNA), NA, m), ccardNA = NULL) %>%
as.xts()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Matching date columns with unequal length in R - r

Related

How to create data.frame with different number of rows RData

Take unique values of a column and add each in unique column in same row as `by` in data.table

apply a rolling mean to a database by an index

How to do a data.table rolling join?

rowSums but keeping NA values

Categories

Resources