apply a rolling mean to a database by an index - r

I would like to calculate a rolling mean on data in a single data frame by multiple ids. See my example dataset below.
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)
df <- data.frame(date, index, x, y, z)
I would like to calculate the rolling mean for x, y and z, by a and then by b.
I tried the following, but I am getting an error.
test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))
The error:
Error in xu[k:n] - xu[c(1, seq_len(n - k))] :
non-numeric argument to binary operator
It seems like there is an issue with the fact that index is a character, but I need it in order to calculate the means...

1) ave Try ave rather than tapply and make sure it is applied only over the columns of interest, i.e. columns 3, 4, 5.
roll <- function(x) rollmean(x, 5, fill = NA)
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))
giving:
date index x y z
1 2015-02-01 a NA NA NA
2 2015-02-02 a NA NA NA
3 2015-02-03 a 66.50522 127.45650 129.8472
4 2015-02-04 a 61.71320 123.83633 129.7673
5 2015-02-05 a 56.56125 120.86158 126.1371
6 2015-02-06 a 66.13340 119.93428 127.1819
7 2015-02-07 a 59.56807 105.83208 125.1244
8 2015-02-08 a 49.98779 95.66024 139.2321
9 2015-02-09 a NA NA NA
10 2015-02-10 a NA NA NA
11 2015-02-01 b NA NA NA
12 2015-02-02 b NA NA NA
13 2015-02-03 b 55.71327 117.52219 139.3961
14 2015-02-04 b 54.58450 107.81763 142.6101
15 2015-02-05 b 50.48102 104.94084 136.3167
16 2015-02-06 b 37.89790 95.45489 135.4044
17 2015-02-07 b 33.05259 85.90916 150.8673
18 2015-02-08 b 49.91385 90.04940 147.1376
19 2015-02-09 b NA NA NA
20 2015-02-10 b NA NA NA
2) by Another way is to use by. roll2 handles one group, by applies it to each group producing a by list and do.call("rbind", ...) puts it back together.
roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
do.call("rbind", by(df, df$index, roll2))
giving:
date index x y z
a.1 2015-02-01 a NA NA NA
a.2 2015-02-02 a NA NA NA
a.3 2015-02-03 a 66.50522 127.45650 129.8472
a.4 2015-02-04 a 61.71320 123.83633 129.7673
a.5 2015-02-05 a 56.56125 120.86158 126.1371
a.6 2015-02-06 a 66.13340 119.93428 127.1819
a.7 2015-02-07 a 59.56807 105.83208 125.1244
a.8 2015-02-08 a 49.98779 95.66024 139.2321
a.9 2015-02-09 a NA NA NA
a.10 2015-02-10 a NA NA NA
b.11 2015-02-01 b NA NA NA
b.12 2015-02-02 b NA NA NA
b.13 2015-02-03 b 55.71327 117.52219 139.3961
b.14 2015-02-04 b 54.58450 107.81763 142.6101
b.15 2015-02-05 b 50.48102 104.94084 136.3167
b.16 2015-02-06 b 37.89790 95.45489 135.4044
b.17 2015-02-07 b 33.05259 85.90916 150.8673
b.18 2015-02-08 b 49.91385 90.04940 147.1376
b.19 2015-02-09 b NA NA NA
b.20 2015-02-10 b NA NA NA
3) wide form Another approach is to convert df from long form to wide form in which case a plain rollmean will do it.
rollmean(read.zoo(df, split = 2), 5, fill = NA)
giving:
x.a y.a z.a x.b y.b z.b
2015-02-01 NA NA NA NA NA NA
2015-02-02 NA NA NA NA NA NA
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
2015-02-06 66.13340 119.93428 127.1819 37.89790 95.45489 135.4044
2015-02-07 59.56807 105.83208 125.1244 33.05259 85.90916 150.8673
2015-02-08 49.98779 95.66024 139.2321 49.91385 90.04940 147.1376
2015-02-09 NA NA NA NA NA NA
2015-02-10 NA NA NA NA NA NA
This works because the dates are the same for both groups. If the dates were different then it could introduce NAs and rollmean cannot handle those. In that case use
rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)
Note: Since the input uses random numbers in its definition to make it reproducible we must issue set.seed first. We used this:
set.seed(123)
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

This ought to do the trick using the library dplyr and zoo:
library(dplyr)
library(zoo)
df %>%
group_by(index) %>%
mutate(x_mean = rollmean(x, 5, fill = NA),
y_mean = rollmean(y, 5, fill = NA),
z_mean = rollmean(z, 5, fill = NA))
You could probably tidy this up more using mutate_each or some other form of mutate.
You can also change the arguments within rollmean to fit your needs, such as align = "right" or na.pad = TRUE

Related

Reference the previous non-zero row, find the difference and divide by nrows

I must be asking the question terribly because I can't find what I looking for!
I have a large excel file that looks like this for every day of the month:
Date
Well1
1/1/16
10
1/2/16
NA
1/3/16
NA
1/4/16
NA
1/5/16
20
1/6/16
NA
1/7/16
25
1/8/16
NA
1/9/16
NA
1/10/16
35
etc
NA
I want to make a new column that has the difference between the non-zero rows and divide that by the number of rows between each non zero row. Aiming for something like this:
Date
Well1
Adjusted
1/1/16
10
=(20-10)/4 = 2.5
1/2/16
NA
1.25
1/3/16
NA
1.25
1/4/16
NA
1.25
1/5/16
20
=(25-20)/2= 2.5
1/6/16
NA
2.5
1/7/16
25
=(35-25)/3 = 3.3
1/8/16
NA
3.3
1/9/16
NA
3.3
1/10/16
35
etc
etc
NA
etc
I'm thinking I should use lead or lag, but the thing is that the steps are different between each nonzero row (so I'm not sure how to use n in the lead/lag function). I've used group_by so that each month stands alone, as well as attempted case_when and ifelse Mostly need ideas on translating excel format into a workable R format.
With some diff-ing and repeating of values, you should be able to get there.
dat$Date <- as.Date(dat$Date, format="%m/%d/%y")
nas <- is.na(dat$Well1)
dat$adj <- with(dat[!nas,],
diff(Well1) / as.numeric(diff(Date), units="days")
)[cumsum(!nas)]
# Date Well1 adj
#1 2016-01-01 10 2.5
#2 2016-01-02 NA 2.5
#3 2016-01-03 NA 2.5
#4 2016-01-04 NA 2.5
#5 2016-01-05 20 2.5
#6 2016-01-06 NA 2.5
#7 2016-01-07 25 5.0
#8 2016-01-08 NA 5.0
#9 2016-01-09 NA 5.0
#10 2016-01-10 40 NA
dat being used is:
dat <- read.table(text="Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 40", header=TRUE, stringsAsFactors=FALSE)
Base R in the same vein as #thelatemail but with transformations all in one expression:
nas <- is.na(dat$Well1)
res <- within(dat, {
Date <- as.Date(Date, "%m/%d/%y")
Adjusted <- (diff(Well1[!nas]) /
as.numeric(diff(Date[!nas]), units = "days"))[cumsum(!nas)]
}
)
Data:
dat <- read.table(text="Date Well1
1/1/16 10
1/2/16 NA
1/3/16 NA
1/4/16 NA
1/5/16 20
1/6/16 NA
1/7/16 25
1/8/16 NA
1/9/16 NA
1/10/16 40", header=TRUE, stringsAsFactors=FALSE)
Maybe this should work
library(dplyr)
df1 %>%
#// remove the rows with NA
na.omit %>%
# // create a new column with the lead values of Well1
transmute(Date, Well2 = lead(Well1)) %>%
# // join with original data
right_join(df1 %>%
mutate(rn = row_number())) %>%
# // order by the original order
arrange(rn) %>%
# // create a grouping column based on the NA values
group_by(grp = cumsum(!is.na(Well1))) %>%
# // subtract the first element of Well2 with Well1 and divide
# // by number of rows - n() in the group
mutate(Adjusted = (first(Well2) - first(Well1))/n()) %>%
ungroup %>%
select(-grp, - Well2)

Rolling cumulative product with NAs R

I have the following database (X) that contains monthly stock returns over time. I show the first 12 rows. The stock returns can contain random NAs.
Obs. Asset Date Ret
1 DJ 1997-10-06 NA
2 DJ 1997-10-07 NA
3 DJ 1997-10-08 -1.13
4 DJ 1997-10-09 -0.136
5 DJ 1997-10-10 NA
6 DJ 1997-10-14 NA
7 DJ 1997-10-15 NA
8 DJ 1997-10-16 -0.225
9 DJ 1997-10-17 -0.555
10 DJ 1997-10-20 NA
11 DJ 1997-10-21 0.102
12 DJ 1997-10-22 NA
I want to calculate the cumulative return over a 5 day window. So I get a cumulative return from observation 5 on, ignoring NAs. The cumulative return will only be a NA when the returns within the window are also NA.
I tried:
Y <- Y %>%
mutate(product = (as.numeric(rollapply(1 + ret/100, 5, prod,
partial = TRUE, na.rm = TRUE, align = "right"))-1)*100)
Which gives an undesired result:
> 1 1997-10-06 DJ NA 0.000000000
> 2 1997-10-07 DJ NA 0.000000000
> 3 1997-10-08 DJ -1.1277917526 -1.127791753
> 4 1997-10-09 DJ -0.1364864885 -1.262738958
> 5 1997-10-10 DJ NA -1.262738958
> 6 1997-10-14 DJ NA -1.262738958
> 7 1997-10-15 DJ NA -1.262738958
> 8 1997-10-16 DJ -0.2250333841 -0.361212732
> 9 1997-10-17 DJ -0.5545946845 -0.778380045
> 10 1997-10-20 DJ NA -0.778380045
> 11 1997-10-21 DJ 0.1022404757 -0.676935389
> 12 1997-10-22 DJ NA -0.676935389
I want to get NAs before the 5th observation, so row 1-4 are NA. Row 5 computes the cumulative return over row 1-5, Row 6 computes the cumulative return over 2-6 etc.
Reprex:
X <- data.frame(Date=c("1997-10-06" ,"1997-10-07", "1997-10-08" ,"1997-10-09", "1997-10-10",
"1997-10-14", "1997-10-15" ,"1997-10-16", "1997-10-17","1997-10-20", "1997-10-21" ,"1997-10-22"),
ret=c(NA,NA,-1.1277918,-0.1364865, NA , NA , NA ,-0.2250334 ,-0.5545947, NA, 0.1022405, NA))
You could replace the NA with 0 and then use zoo::rollsum default behavior:
library(zoo)
library(tidyr)
library(dplyr)
df %>%
dplyr::mutate(ret = zoo::rollsum(tidyr::replace_na(ret, 0), k = 5, na.pad = T, align = "right))
Date ret
1 1997-10-06 NA
2 1997-10-07 NA
3 1997-10-08 NA
4 1997-10-09 NA
5 1997-10-10 -1.2642783
6 1997-10-14 -1.2642783
7 1997-10-15 -1.2642783
8 1997-10-16 -0.3615199
9 1997-10-17 -0.7796281
10 1997-10-20 -0.7796281
11 1997-10-21 -0.6773876
12 1997-10-22 -0.6773876
Here's some code that takes your data, changes all the NA in ret to 0 and then calculates a rolling 5-period sum using the RcppRoll and tidyverse packages.
# Load libraries
library('RcppRoll')
library('tidyverse')
# Load data
df <- data.frame(Date=c("1997-10-06" ,"1997-10-07", "1997-10-08" ,"1997-10-09", "1997-10-10",
"1997-10-14", "1997-10-15" ,"1997-10-16", "1997-10-17","1997-10-20", "1997-10-21" ,"1997-10-22"),
ret=c(NA,NA,-1.1277918,-0.1364865, NA , NA , NA ,-0.2250334 ,-0.5545947, NA, 0.1022405, NA), stringsAsFactors = F)
# Change NA's to Zeros
df[is.na(df[,2]),2] <- 0
# Calculate Rolling Sum
df_new <- df %>% mutate(rollsum = roll_sum(ret, n=5, align = 'right', fill=NA))

Efficient way for insertion of multiple rows at given indices & with repetitions

I have a data frame (DATA) with > 2 million rows (observations at different time points) and another data frame (INSERTION) which gives info about missing observations. The latter object contains 2 columns: 1st column with row indices after which empty (NA) rows should be inserted into DATA, and 2nd column with the number of empty rows that should be inserted at that position.
Below is a minimum working example:
DATA <- data.frame(datetime=strptime(as.character(c(201301011700, 201301011701, 201301011703, 201301011704, 201301011705, 201301011708, 201301011710, 201301011711, 201301011715, 201301011716, 201301011718, 201301011719, 201301011721, 201301011722, 201301011723, 201301011724, 201301011725, 201301011726, 201301011727, 201301011729, 201301011730, 201301011731, 201301011732, 201301011733, 201301011734, 201301011735, 201301011736, 201301011737, 201301011738, 201301011739)), format="%Y%m%d%H%M"), var1=rnorm(30), var2=rnorm(30), var3=rnorm(30))
INSERTION <- data.frame(index=c(2, 5, 6, 8, 10, 12, 19), repetition=c(1, 2, 1, 3, 1, 1, 1))
Now I'm looking for an efficient (and thus fast) way to insert the n empty rows at given row indices of the original file. How can I additionally complement the correct datetimes for these empty rows (add 1 minute for every new row; however, every weekend and bank holidays there are some regular gaps which are not contained in INSERTION!)?
Any help is appreciated!
Looking at the pattern in INSERTION and matching it with DATA most probably you are trying to fill the missing minutes in datetime of DATA. You can create a dataframe with every minute sequence from min to max value of datetime from DATA and then merge
merge(data.frame(datetime = seq(min(DATA$datetime), max(DATA$datetime),
by = "1 min")),DATA, all.x = TRUE)
# datetime var1 var2 var3
#1 2013-01-01 17:00:00 -1.063326 0.11925 -0.788622
#2 2013-01-01 17:01:00 1.263185 0.24369 -0.502199
#3 2013-01-01 17:02:00 NA NA NA
#4 2013-01-01 17:03:00 -0.349650 1.23248 1.496061
#5 2013-01-01 17:04:00 -0.865513 -0.51606 -1.137304
#6 2013-01-01 17:05:00 -0.236280 -0.99251 -0.179052
#7 2013-01-01 17:06:00 NA NA NA
#8 2013-01-01 17:07:00 NA NA NA
#9 2013-01-01 17:08:00 -0.197176 1.67570 1.902362
#10 2013-01-01 17:09:00 NA NA NA
#...
#...
Or using similar logic with tidyr::complete
tidyr::complete(DATA, datetime = seq(min(datetime), max(datetime), by = "1 min"))
If performance is a factor on a large data frame, this approach avoids joins:
# Generate new data.frame containing missing datetimes
tmp <- data.frame(datetime = DATA$datetime[with(INSERTION, rep(index, repetition))] + sequence(INSERTION$repetition)*60)
# Create variables filled with NA to match main data.frame
tmp[setdiff(names(DATA), names(tmp))] <- NA
# Bind and sort
new_df <- rbind(DATA, tmp)
new_df <- new_df[order(new_df$datetime),]
head(new_df, 15)
datetime var1 var2 var3
1 2013-01-01 17:00:00 0.98789253 0.68364933 0.70526985
2 2013-01-01 17:01:00 -0.68307496 0.02947599 0.90731512
31 2013-01-01 17:02:00 NA NA NA
3 2013-01-01 17:03:00 -0.60189915 -1.00153188 0.06165694
4 2013-01-01 17:04:00 -0.87329313 -1.81532302 -2.04930719
5 2013-01-01 17:05:00 -0.58713154 -0.42313098 0.37402224
32 2013-01-01 17:06:00 NA NA NA
33 2013-01-01 17:07:00 NA NA NA
6 2013-01-01 17:08:00 2.41350911 -0.13691754 1.57618578
34 2013-01-01 17:09:00 NA NA NA
7 2013-01-01 17:10:00 -0.38961552 0.83838954 1.18283382
8 2013-01-01 17:11:00 0.02290672 -2.10825367 0.87441448
35 2013-01-01 17:12:00 NA NA NA
36 2013-01-01 17:13:00 NA NA NA
37 2013-01-01 17:14:00 NA NA NA

Matching date columns with unequal length in R

I have data in the following format with 3 date columns
X <- c(24/02/2016, 25/02/2016, 26/02/2016, 29/02/2016, 01/03/2016, 02/03/2016, 03/03/2016, 04/03/2016, 07/03/2016, 08/03/2016, 09/03/2016, 10/03/2016, 11/03/2016, 14/03/2016, 15/03/2016)
Y <- c(26/08/2014, 10/09/2014,24/09/2014, 09/10/2014, 24/02/2016, 09/03/2016, 24/03/2016, 11/04/2016, 26/04/2016)
Z <- c(15/08/2014, 29/08/2014, 15/09/2014, 30/09/2014, 12/02/2016, 29/02/2016, 15/03/2016, 31/03/2016, 15/04/2016)
the output i want is like below
X Output
24/02/2016 12/02/2016
25/02/2016 NA
26/02/2016 NA
29/02/2016 NA
01/03/2016 NA
02/03/2016 NA
03/03/2016 NA
04/03/2016 NA
07/03/2016 NA
08/03/2016 NA
09/03/2016 29/02/2016
10/03/2016 NA
11/03/2016 NA
14/03/2016 NA
15/03/2016 NA
Basically the problem is wherever there is a match between X and Y, i need Z corresponding to X in a new column.
I am not really good with R so not able to figure out how to come up with a solution. Any ideas ?
You could do this in base R using match, but I find it cleaner to use the dplyr package and left_join.
library(dplyr)
# make a data frame with X as a column
X.df <- data.frame(X = c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016"), stringsAsFactors = F)
# make a data frame with Y and Z as columns
YZ.df <- data.frame(Y = c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016"), Z = c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016"), stringsAsFactors = F)
# do a left join, specifying variables X and Y
left_join(X.df, YZ.df, by = c("X" = "Y"))
Note that the above will create duplicate rows for X if there is more than one corresponding Z value for a Y value that matches an X value.
For the sake of completeness, here is the data.table version complementing gatsky's answer:
library(data.table)
data.table(Y, Z)[data.table(X), on = .(Y == X), .(X, Z)]
X Z
1: 24/02/2016 12/02/2016
2: 25/02/2016 NA
3: 26/02/2016 NA
4: 29/02/2016 NA
5: 01/03/2016 NA
6: 02/03/2016 NA
7: 03/03/2016 NA
8: 04/03/2016 NA
9: 07/03/2016 NA
10: 08/03/2016 NA
11: 09/03/2016 29/02/2016
12: 10/03/2016 NA
13: 11/03/2016 NA
14: 14/03/2016 NA
15: 15/03/2016 NA
Data
Z <- c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016")
Y <- c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016")
X <- c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016")
Using match
# Construct data
Z = c("15/08/2014", "29/08/2014", "15/09/2014", "30/09/2014", "12/02/2016", "29/02/2016", "15/03/2016", "31/03/2016", "15/04/2016")
Y = c("26/08/2014", "10/09/2014", "24/09/2014", "09/10/2014", "24/02/2016", "09/03/2016", "24/03/2016", "11/04/2016", "26/04/2016")
df <- data.frame(X = c("24/02/2016", "25/02/2016", "26/02/2016", "29/02/2016", "01/03/2016", "02/03/2016", "03/03/2016", "04/03/2016", "07/03/2016", "08/03/2016", "09/03/2016", "10/03/2016", "11/03/2016", "14/03/2016", "15/03/2016"), stringsAsFactors = F)
# Match df$X to Y and return that index of Z
df$Output<-Z[match(df$X,Y)]
Output
> df
X Output
1 24/02/2016 12/02/2016
2 25/02/2016 <NA>
3 26/02/2016 <NA>
4 29/02/2016 <NA>
5 01/03/2016 <NA>
6 02/03/2016 <NA>
7 03/03/2016 <NA>
8 04/03/2016 <NA>
9 07/03/2016 <NA>
10 08/03/2016 <NA>
11 09/03/2016 29/02/2016
12 10/03/2016 <NA>
13 11/03/2016 <NA>
14 14/03/2016 <NA>
15 15/03/2016 <NA>

rowSums but keeping NA values

I am pretty sure this is quite simple, but seem to have got stuck...I have two xts vectors that have been merged together, which contain numeric values and NAs.
I would like to get the rowSums for each index period, but keeping the NA values.
Below is a reproducible example
set.seed(120)
dd <- xts(rnorm(100),Sys.Date()-c(100:1))
dd1 <- ifelse(dd<(-0.5),dd*-1,NA)
dd2 <- ifelse((dd^2)>0.5,dd,NA)
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm,na.rm=TRUE)
tail(mm,10)
dd1 dd2 m
2013-08-02 NA NA 0.000000
2013-08-03 NA NA 0.000000
2013-08-04 NA NA 0.000000
2013-08-05 1.2542692 -1.2542692 0.000000
2013-08-06 NA 1.3325804 1.332580
2013-08-07 NA 0.7726740 0.772674
2013-08-08 0.8158402 -0.8158402 0.000000
2013-08-09 NA 1.2292919 1.229292
2013-08-10 NA NA 0.000000
2013-08-11 NA 0.9334900 0.933490
In the above example on the 10th Aug 2013 I was hoping it would say NA instead of 0, the same goes for the 2nd-4th Aug 2013.
Any suggestions for an elegant way of getting NAs in the relevant places?
If you have a variable number of columns you could try this approach:
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm, na.rm=TRUE) * ifelse(rowSums(is.na(mm)) == ncol(mm), NA, 1)
# or, as #JoshuaUlrich commented:
#mm$m <- ifelse(apply(is.na(mm),1,all),NA,rowSums(mm,na.rm=TRUE))
tail(mm, 10)
# dd1 dd2 m
#2013-08-02 NA NA NA
#2013-08-03 NA NA NA
#2013-08-04 NA NA NA
#2013-08-05 1.2542692 -1.2542692 0.000000
#2013-08-06 NA 1.3325804 1.332580
#2013-08-07 NA 0.7726740 0.772674
#2013-08-08 0.8158402 -0.8158402 0.000000
#2013-08-09 NA 1.2292919 1.229292
#2013-08-10 NA NA NA
#2013-08-11 NA 0.9334900 0.933490
Use logical indexing with [ and is.na(ยท) to localize the entries where both are NA and then replace them with NA.
Try this:
> mm[is.na(mm$dd1) & is.na(mm$dd2), "m"] <- NA
> mm
dd1 dd2 m
2013-08-02 NA NA NA
2013-08-03 NA NA NA
2013-08-04 NA NA NA
2013-08-05 1.2542692 -1.2542692 0.000000
2013-08-06 NA 1.3325804 1.332580
2013-08-07 NA 0.7726740 0.772674
2013-08-08 0.8158402 -0.8158402 0.000000
2013-08-09 NA 1.2292919 1.229292
2013-08-10 NA NA NA
2013-08-11 NA 0.9334900 0.933490
mm$m <- "is.na<-"(rowSums(mm, na.rm = TRUE), !rowSums(!is.na(mm)))
> tail(mm)
# dd1 dd2 m
# 2013-08-06 NA 1.3325804 1.332580
# 2013-08-07 NA 0.7726740 0.772674
# 2013-08-08 0.8158402 -0.8158402 0.000000
# 2013-08-09 NA 1.2292919 1.229292
# 2013-08-10 NA NA NA
# 2013-08-11 NA 0.9334900 0.933490
My solution would be
library(magrittr)
mm <- mm %>%
transform(ccardNA = rowSums(!is.na(.))/rowSums(!is.na(.)), m = rowSums(., na.rm = TRUE)) %>%
transform(m = ifelse(is.nan(ccardNA), NA, m), ccardNA = NULL) %>%
as.xts()

Resources