rowSums but keeping NA values

rowSums but keeping NA values - r

I am pretty sure this is quite simple, but seem to have got stuck...I have two xts vectors that have been merged together, which contain numeric values and NAs.
I would like to get the rowSums for each index period, but keeping the NA values.
Below is a reproducible example
set.seed(120)
dd <- xts(rnorm(100),Sys.Date()-c(100:1))
dd1 <- ifelse(dd<(-0.5),dd*-1,NA)
dd2 <- ifelse((dd^2)>0.5,dd,NA)
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm,na.rm=TRUE)
tail(mm,10)
dd1 dd2 m
2013-08-02 NA NA 0.000000
2013-08-03 NA NA 0.000000
2013-08-04 NA NA 0.000000
2013-08-05 1.2542692 -1.2542692 0.000000
2013-08-06 NA 1.3325804 1.332580
2013-08-07 NA 0.7726740 0.772674
2013-08-08 0.8158402 -0.8158402 0.000000
2013-08-09 NA 1.2292919 1.229292
2013-08-10 NA NA 0.000000
2013-08-11 NA 0.9334900 0.933490
In the above example on the 10th Aug 2013 I was hoping it would say NA instead of 0, the same goes for the 2nd-4th Aug 2013.
Any suggestions for an elegant way of getting NAs in the relevant places?

If you have a variable number of columns you could try this approach:
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm, na.rm=TRUE) * ifelse(rowSums(is.na(mm)) == ncol(mm), NA, 1)
# or, as #JoshuaUlrich commented:
#mm$m <- ifelse(apply(is.na(mm),1,all),NA,rowSums(mm,na.rm=TRUE))
tail(mm, 10)
# dd1 dd2 m
#2013-08-02 NA NA NA
#2013-08-03 NA NA NA
#2013-08-04 NA NA NA
#2013-08-05 1.2542692 -1.2542692 0.000000
#2013-08-06 NA 1.3325804 1.332580
#2013-08-07 NA 0.7726740 0.772674
#2013-08-08 0.8158402 -0.8158402 0.000000
#2013-08-09 NA 1.2292919 1.229292
#2013-08-10 NA NA NA
#2013-08-11 NA 0.9334900 0.933490

Use logical indexing with [ and is.na(·) to localize the entries where both are NA and then replace them with NA.
Try this:
> mm[is.na(mm$dd1) & is.na(mm$dd2), "m"] <- NA
> mm
dd1 dd2 m
2013-08-02 NA NA NA
2013-08-03 NA NA NA
2013-08-04 NA NA NA
2013-08-05 1.2542692 -1.2542692 0.000000
2013-08-06 NA 1.3325804 1.332580
2013-08-07 NA 0.7726740 0.772674
2013-08-08 0.8158402 -0.8158402 0.000000
2013-08-09 NA 1.2292919 1.229292
2013-08-10 NA NA NA
2013-08-11 NA 0.9334900 0.933490

mm$m <- "is.na<-"(rowSums(mm, na.rm = TRUE), !rowSums(!is.na(mm)))
> tail(mm)
# dd1 dd2 m
# 2013-08-06 NA 1.3325804 1.332580
# 2013-08-07 NA 0.7726740 0.772674
# 2013-08-08 0.8158402 -0.8158402 0.000000
# 2013-08-09 NA 1.2292919 1.229292
# 2013-08-10 NA NA NA
# 2013-08-11 NA 0.9334900 0.933490

My solution would be
library(magrittr)
mm <- mm %>%
transform(ccardNA = rowSums(!is.na(.))/rowSums(!is.na(.)), m = rowSums(., na.rm = TRUE)) %>%
transform(m = ifelse(is.nan(ccardNA), NA, m), ccardNA = NULL) %>%
as.xts()

Related

Convert column of time (as character decimals from Excel) to time in R

I'm trying to convert a column of time (which I imported from Excel) that R has converted into a decimal/character string back into hh:mm:ss time. I have seen many good answers (using library chron, for example), but I keep getting these errors:
My data:
> head(env$Time, 10)
[1] "0.41736111111111113" "0.6020833333333333" "0.45" "0.47222222222222227" "0.5131944444444444"
[6] "0.51250000000000007" "0.47361111111111115" "0.44791666666666669" "0.35138888888888892" "0.45277777777777778"
times(env$Time)
Error in convert.times(times., fmt) : format h:m:s may be incorrect
In addition: Warning message:
In unpaste(times, sep = fmt$sep, fnames = fmt$periods, nfields = 3) :
8173 entries set to NA due to wrong number of fields
chron(times(env$Time))
Error in convert.times(times., fmt) : format h:m:s may be incorrect
In addition: Warning message:
In unpaste(times, sep = fmt$sep, fnames = fmt$periods, nfields = 3) :
8173 entries set to NA due to wrong number of fields
strptime(env$Time, format = "%H:%M:%S")
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
[38] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Found this answer here: How to express a decimal time in HH:MM
x <- as.numeric(env$Time) # Store time variable as numeric vector
env$Time2 <- sub(":00$", "", round(times(x), "min")) # Run this code to save as new column in dataframe (note, don't need to divide by 24 if decimal is fraction of a day like my data is

Converting date in R

I want to convert those dates as this format "1995-01", "1995-02", etc.
Here is some of my data
Date Change
1 January-1995 0.01417476
2 February-1995 0.01427050
3 March-1995 0.01556348
4 April-1995 0.01644737
5 May-1995 0.01603727
6 June-1995 0.01627500
7 July-1995 0.01557800
8 August-1995 0.01429773
9 September-1995 0.01344300
10 October-1995 0.01334667
11 November-1995 0.01328429
12 December-1995 0.01345368
13 January-1996 0.01293091
14 February-1996 0.01301762
15 March-1996 0.01289048
16 April-1996 0.01268476
17 May-1996 0.01287364
18 June-1996 0.01253400
19 July-1996 0.01254591
20 August-1996 0.01271238
21 September-1996 0.01245700
22 October-1996 0.01201636
23 November-1996 0.01191300
24 December-1996 0.01195600
I tried this :
date <- as.Date(Data$Date,format="%B/%Y")
and this
date <- as.Date(paste0("01/", Data$Date),format = "%m/%Y")
But it juste return me
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
I'm stuck!

The last option can be modified to include the %d for day when we are pasteing the day. Also, in format, specify the correct delimiter and %B- (Full month name) where as %m- month as decimal number)
as.Date(paste0("01/", Data$Date),format = "%d/%B-%Y")
Or use lubridate
library(lubridate)
my(Data$Date)
my("January-1995")
[1] "1995-01-01"

Here is another base R option, but with output of type string (not Date)
x <- c("January-1995", "February-1996")
paste0(
gsub("\\D", "", x),
"-",
sprintf("%02d", match(gsub("-.*", "", x), month.name))
)
which gives
[1] "1995-01" "1996-02"

format input to ts object in R

I am having some data with a time column expressed in week.year and a corresponding unit that was measured in that week.
Week-Year Units
01.2020 39.12727273
02.2020 33.34545455
03.2020 118.7181818
04.2020 83.71818182
05.2020 58.56985
. .
52.2020 89.54651534
I have to create a ts object which takes these Week-Year values as input.
The reason for requiring this step is- there are sometimes values missing for certain weeks so using an auto generated time scale (start=, end=, frequency=) will mess up the readings.
Is there any way of achieving it? or is there any way to accommodate such a situation?
R novice here, would really appreciate some guidance. :)

Assuming the input is the data frame DF shown reproducibly in the Note at the end, convert it to a zoo object and then use as.ts to create a ts series with frequency 52.
library(zoo)
week <- as.integer(DF[[1]])
year <- as.numeric(sub("...", "", DF[[1]]))
z <- zoo(DF[[2]], year + (week - 1) / 52)
tt <- as.ts(z)
tt
## Time Series:
## Start = c(2020, 1)
## End = c(2020, 52)
## Frequency = 52
## [1] 39.12727 33.34545 118.71818 83.71818 58.56985 NA NA
## [8] NA NA NA NA NA NA NA
## [15] NA NA NA NA NA NA NA
## [22] NA NA NA NA NA NA NA
## [29] NA NA NA NA NA NA NA
## [36] NA NA NA NA NA NA NA
## [43] NA NA NA NA NA NA NA
## [50] NA NA 89.54652
frequency(tt)
## [1] 52
class(tt)
## [1] "ts"
Note
Lines <- " Week-Year Units
01.2020 39.12727273
02.2020 33.34545455
03.2020 118.7181818
04.2020 83.71818182
05.2020 58.56985
52.2020 89.54651534"
DF <- read.table(text = Lines, header = TRUE, colClasses = c("character", NA))

apply a rolling mean to a database by an index

I would like to calculate a rolling mean on data in a single data frame by multiple ids. See my example dataset below.
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)
df <- data.frame(date, index, x, y, z)
I would like to calculate the rolling mean for x, y and z, by a and then by b.
I tried the following, but I am getting an error.
test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))
The error:
Error in xu[k:n] - xu[c(1, seq_len(n - k))] :
non-numeric argument to binary operator
It seems like there is an issue with the fact that index is a character, but I need it in order to calculate the means...

1) ave Try ave rather than tapply and make sure it is applied only over the columns of interest, i.e. columns 3, 4, 5.
roll <- function(x) rollmean(x, 5, fill = NA)
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))
giving:
date index x y z
1 2015-02-01 a NA NA NA
2 2015-02-02 a NA NA NA
3 2015-02-03 a 66.50522 127.45650 129.8472
4 2015-02-04 a 61.71320 123.83633 129.7673
5 2015-02-05 a 56.56125 120.86158 126.1371
6 2015-02-06 a 66.13340 119.93428 127.1819
7 2015-02-07 a 59.56807 105.83208 125.1244
8 2015-02-08 a 49.98779 95.66024 139.2321
9 2015-02-09 a NA NA NA
10 2015-02-10 a NA NA NA
11 2015-02-01 b NA NA NA
12 2015-02-02 b NA NA NA
13 2015-02-03 b 55.71327 117.52219 139.3961
14 2015-02-04 b 54.58450 107.81763 142.6101
15 2015-02-05 b 50.48102 104.94084 136.3167
16 2015-02-06 b 37.89790 95.45489 135.4044
17 2015-02-07 b 33.05259 85.90916 150.8673
18 2015-02-08 b 49.91385 90.04940 147.1376
19 2015-02-09 b NA NA NA
20 2015-02-10 b NA NA NA
2) by Another way is to use by. roll2 handles one group, by applies it to each group producing a by list and do.call("rbind", ...) puts it back together.
roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
do.call("rbind", by(df, df$index, roll2))
giving:
date index x y z
a.1 2015-02-01 a NA NA NA
a.2 2015-02-02 a NA NA NA
a.3 2015-02-03 a 66.50522 127.45650 129.8472
a.4 2015-02-04 a 61.71320 123.83633 129.7673
a.5 2015-02-05 a 56.56125 120.86158 126.1371
a.6 2015-02-06 a 66.13340 119.93428 127.1819
a.7 2015-02-07 a 59.56807 105.83208 125.1244
a.8 2015-02-08 a 49.98779 95.66024 139.2321
a.9 2015-02-09 a NA NA NA
a.10 2015-02-10 a NA NA NA
b.11 2015-02-01 b NA NA NA
b.12 2015-02-02 b NA NA NA
b.13 2015-02-03 b 55.71327 117.52219 139.3961
b.14 2015-02-04 b 54.58450 107.81763 142.6101
b.15 2015-02-05 b 50.48102 104.94084 136.3167
b.16 2015-02-06 b 37.89790 95.45489 135.4044
b.17 2015-02-07 b 33.05259 85.90916 150.8673
b.18 2015-02-08 b 49.91385 90.04940 147.1376
b.19 2015-02-09 b NA NA NA
b.20 2015-02-10 b NA NA NA
3) wide form Another approach is to convert df from long form to wide form in which case a plain rollmean will do it.
rollmean(read.zoo(df, split = 2), 5, fill = NA)
giving:
x.a y.a z.a x.b y.b z.b
2015-02-01 NA NA NA NA NA NA
2015-02-02 NA NA NA NA NA NA
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
2015-02-06 66.13340 119.93428 127.1819 37.89790 95.45489 135.4044
2015-02-07 59.56807 105.83208 125.1244 33.05259 85.90916 150.8673
2015-02-08 49.98779 95.66024 139.2321 49.91385 90.04940 147.1376
2015-02-09 NA NA NA NA NA NA
2015-02-10 NA NA NA NA NA NA
This works because the dates are the same for both groups. If the dates were different then it could introduce NAs and rollmean cannot handle those. In that case use
rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)
Note: Since the input uses random numbers in its definition to make it reproducible we must issue set.seed first. We used this:
set.seed(123)
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04",
"2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",
"2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02",
"2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06",
"2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
"b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

This ought to do the trick using the library dplyr and zoo:
library(dplyr)
library(zoo)
df %>%
group_by(index) %>%
mutate(x_mean = rollmean(x, 5, fill = NA),
y_mean = rollmean(y, 5, fill = NA),
z_mean = rollmean(z, 5, fill = NA))
You could probably tidy this up more using mutate_each or some other form of mutate.
You can also change the arguments within rollmean to fit your needs, such as align = "right" or na.pad = TRUE

Formatting dates with only month and year in R

I have quite a simple problem that I've not found anywhere on here.
I have date format:
times = c("Dec_2011" , "July_2011", "Dec_2010" ,"July_2010" , "Dec_2009" , "July_2009", "Dec_2008" ,
"July_2008" ,"Dec_2007" , "July_2007", "Dec_2006" , "July_2006" ,"Dec_2005" , "July_2005",
"Dec_2004" , "July_2004" ,"Dec_2003" , "July_2003", "Dec_2002" , "July_2002", "Dec_2001" ,
"July_2001", "Dec_2000" , "July_2000")
How can I get these into date format:
31-07-2000, 31-07-2001, etc...
31-12-2000, 31-12-2001, etc...
I've tried:
times <- format(as.Date(time, "%B_%Y")
times
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
times <- format(as.Date(time, "%B_%Y), "31-%m-%Y)
times
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
times <- as.Date(paste("31", times, sep="-"), "%d-%m-%Y")
times
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
times <- format(as.Date(time, "%b_%Y"), "31-%m-%Y")
# NA
I'm not quite sure how to proceed.

If we need 31 as the day for all the elements, use paste to join 31, convert to Date class and get the desired format with format.
format(as.Date(paste(times, "31", sep="_"), "%b_%Y_%d"), "%d-%m-%Y")
#[1] "31-12-2011" "31-07-2011" "31-12-2010" "31-07-2010" "31-12-2009" "31-07-2009" "31-12-2008" "31-07-2008" "31-12-2007" "31-07-2007" "31-12-2006"
#[12] "31-07-2006" "31-12-2005" "31-07-2005" "31-12-2004" "31-07-2004" "31-12-2003" "31-07-2003" "31-12-2002" "31-07-2002" "31-12-2001" "31-07-2001"
#[23] "31-12-2000" "31-07-2000"
Instead of manually pasteing 31, we can automate this with as.yearmon from zoo. The advantage is that for months that have less than 31 days, we get the last day by doing that.
library(zoo)
format(as.Date(as.yearmon(times, "%b_%Y"), frac=1), "%d-%m-%Y")
#[1] "31-12-2011" "31-07-2011" "31-12-2010" "31-07-2010" "31-12-2009" "31-07-2009" "31-12-2008" "31-07-2008" "31-12-2007" "31-07-2007" "31-12-2006"
#[12] "31-07-2006" "31-12-2005" "31-07-2005" "31-12-2004" "31-07-2004" "31-12-2003" "31-07-2003" "31-12-2002" "31-07-2002" "31-12-2001" "31-07-2001"
#[23] "31-12-2000" "31-07-2000"