Is it possible to skip NA values in "+" operator? - r

I want to calculate an equation in R. I don't want to use the function sum because it's returning 1 value. I want the full vector of values.
x = 1:10
y = c(21:29,NA)
x+y
[1] 22 24 26 28 30 32 34 36 38 NA
x = 1:10
y = c(21:30)
x+y
[1] 22 24 26 28 30 32 34 36 38 40
I don't want:
sum(x,y, na.rm = TRUE)
[1] 280
Which does not return a vector.
This is a toy example but I have a more complex equation using multiple vector of length 84647 elements.
Here is another example of what I mean:
x = 1:10
y = c(21:29,NA)
z = 11:20
a = c(NA,NA,NA,30:36)
5 +2*(x+y-50)/(x+y+z+a)
[1] NA NA NA 4.388889 4.473684 4.550000 4.619048 4.681818 4.739130 NA

1) %+% Define a custom + operator:
`%+%` <- function(x, y) mapply(sum, x, y, MoreArgs = list(na.rm = TRUE))
5 + 2 * (x %+% y - 50) / (x %+% y %+% z %+% a)
giving:
[1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
[9] 4.739130 3.787879
Here are some simple examples:
1 %+% 2
## [1] 3
NA %+% 2
## [1] 2
2 %+% NA
## [1] 2
NA %+% NA
## [1] 0
2) na2zero Another possibility is to define a function which maps NA to 0 like this:
na2zero <- function(x) ifelse(is.na(x), 0, x)
X <- na2zero(x)
Y <- na2zero(y)
Z <- na2zero(z)
A <- na2zero(a)
5 + 2 * (X + Y - 50) / (X + Y + Z + A)
giving:
[1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
[9] 4.739130 3.787879
3) combine above A variation combining (1) with the idea in (2) is:
X <- x %+% 0
Y <- y %+% 0
Z <- z %+% 0
A <- a %+% 0
5 + 2 * (X + Y - 50) / (X + Y + Z + A)
4) numeric0 class We can define a custom class "numeric0" with its own + operator:
as.numeric0 <- function(x) structure(x, class = "numeric0")
`+.numeric0` <- `%+%`
X <- as.numeric0(x)
Y <- as.numeric0(y)
Z <- as.numeric0(z)
A <- as.numeric0(a)
5 + 2 * (X + Y - 50) / (X + Y + Z + A)
Note: The inputs used were those in the question, namely:
x = 1:10
y = c(21:29,NA)
z = 11:20
a = c(NA,NA,NA,30:36)

Using rowSums:
To elaborate on my comment, you can concatenate the vectors and then apply your calculations on the resulted matrix. This is the solution for the example that you provided at the end of your question;
5 + 2 * (rowSums(cbind(x,y), na.rm = T)-50)/(rowSums(cbind(x,y,z,a), na.rm = T))
# [1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
# [9] 4.739130 3.787879
Repalcing NA:
I have seen solutions here with the idea of replacing NA in the vectors; I think this would be helpful too:
y[is.na(y)] <- 0 #indexing NA values and replacing with zero

you can use ifelse()
x = 1:10
y = c(21:29,NA)
x+y
[1] 22 24 26 28 30 32 34 36 38 NA
x + ifelse(is.na(y), 0, y)
[1] 22 24 26 28 30 32 34 36 38 10

DATA
x = 1:10
y = c(21:29,NA)
x+y
# [1] 22 24 26 28 30 32 34 36 38 NA
1
foo1 = function(...){
return(rowSums(cbind(...), na.rm = TRUE))
}
foo1(x, y)
# [1] 22 24 26 28 30 32 34 36 38 10
2
foo2 = function(...){
Reduce('+', lapply(list(...), function(x) replace(x, is.na(x), 0)))
}
foo2(x, y)
# [1] 22 24 26 28 30 32 34 36 38 10

Just for laffs:
x=1:10
y=c(21:29, NA)
"[<-"(x, is.na(x), 0) + "[<-"(y, is.na(y), 0)
# [1] 22 24 26 28 30 32 34 36 38 10
which again illustrates the fact that everything in R is a function (and also shows that the R interpreter is smart enough to turn a string into a function when required).
Syntactically sweetened:
na.zero <- function(x)
{
"[<-"(x, is.na(x), 0)
}
na.zero(x) + na.zero(y)
# [1] 22 24 26 28 30 32 34 36 38 10
More broadly applicable version:
na.replace <- function(x, value)
{
"[<-"(x, is.na(x), value)
}
na.replace(x, 1) * na.replace(x, 1)
# [1] 1 4 9 16 25 36 49 64 81 100

Related

R lapply on a function with three variables

I have a function with three variables
fcalc <- function(n1,n2,n3){
calc1 <- n1*5+n2*10-n3*2
)
I want to pass the values:
1:2 to n1
3:5 to n2
and
6:9 to n3
However when I try
list1 <-lapply(1:2,3:5,6:9,fcalc)
liat1
I get an error.
eg when n1=1 and n2=3 and n3=6
the function would give
calc1 <- 1*5+3*10-6*2
23
I would be grateful for your help.
An option with pmap
library(purrr)
library(tidyr)
fcalc <- function(n1, n2, n3) n1 * 5 + n2 * 10 - n3 * 2
pmap_dbl(crossing(n1 = 1:2, n2 = 3:5, n3 = 6:9), fcalc)
If what you actually want to do is to get calc1 for each combination of the values then this could be your solution:
fcalc <- function(x) {
x[1]*5+x[2]*10-x[3]*2
}
(df <- expand.grid(x1=1:2,x2=3:5,x3=6:9))
(df$calc1 <- apply(df,1,fcalc))
You can also use outer():
fcalc <- function(x, y, z) {
c(outer(outer(x * 5, y * 10, FUN = "+"), z * 2, FUN = "-"))
}
fcalc(1:2, 3:5, 6:9)
[1] 23 28 33 38 43 48 21 26 31 36 41 46 19 24 29 34 39 44 17 22 27 32 37 42

Selective elimination from a list in R

I was wondering how I could eliminate the x elements from the second variable on (in this case, x[[2]] i.e., 0:90) in list x whose corresponding y is 0?
x = list(0:5, 0:90) # from the second variable on, in this list, eliminate elements whose
# corresponding `y` is `0` ?
y = lapply(list(dbinom(x[[1]], 5, .9), dpois(x[[2]], 50)), round, digits = 4)
P.S. My goal is to possibly do this using lapply for any larger list.
In this case, you could do
x[[2]][y[[2]] != 0]
to get your expected output.
However, as mentioned you have a larger list and want to do it for each one of them. In that case, we could use mapply
mapply(function(p, q) p[q != 0], x[2:length(x)], y[2:length(y)], SIMPLIFY = FALSE)
OR if we want to use lapply we could do
lapply(2:length(x), function(i) x[[i]][y[[i]] != 0])
If we want to keep the 1st element as it is we could do
c(list(x[[1]]), lapply(2:length(x), function(i) x[[i]][y[[i]] != 0]))
EDIT
To maintain the order we can rearrange the both x and y based on smallest_max
get_new_list <- function(x, y) {
smallest_max <- which.min(sapply(x, max))
new_x <- c(x[smallest_max], x[-smallest_max])
new_y <- c(y[smallest_max], y[-smallest_max])
c(new_x[1], lapply(2:length(new_x), function(i) new_x[[i]][new_y[[i]] != 0]))
}
x = list(0:5, 0:40)
y = lapply(list(dbinom(x[[1]], 5, .9), dpois(x[[2]], 50)), round, digits = 4)
get_new_list(x, y)
#[[1]]
#[1] 0 1 2 3 4 5
#[[2]]
#[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
x = list(0:40, 0:5)
y = lapply(list(dpois(x[[1]], 50), dbinom(x[[2]], 5, .9)), round, digits = 4)
get_new_list(x, y)
#[[1]]
#[1] 0 1 2 3 4 5
#[[2]]
#[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Rolling sum in R

df <- data.frame(x = seq(1:10))
I want this:
df$y <- c(1, 2, 3, 4, 5, 15, 20 , 25, 30, 35)
i.e. each y is the sum of previous five x values. This implies the first
five y will be same as x
What I get is this:
df$y1 <- c(df$x[1:4], RcppRoll::roll_sum(df$x, 5))
x y y1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 15
6 15 20
7 20 25
8 25 30
9 30 35
10 35 40
In summary, I need y but I am only able to achieve y1
1) enhanced sum function Define a function Sum which sums its first 5 values if it receives 6 values and returns the last value otherwise. Then use it with partial=TRUE in rollapplyr:
Sum <- function(x) if (length(x) < 6) tail(x, 1) else sum(head(x, -1))
rollapplyr(x, 6, Sum, partial = TRUE)
## [1] 1 2 3 4 5 15 20 25 30 35
2) sum 6 and subtract off original Another possibility is to take the running sum of 6 elements filling in the first 5 elements with NA and subtracting off the original vector. Finally fill in the first 5.
replace(rollsumr(x, 6, fill = NA) - x, 1:5, head(x, 5))
## [1] 1 2 3 4 5 15 20 25 30 35
3) specify offsets A third possibility is to use the offset form of width to specify the prior 5 elements:
c(head(x, 5), rollapplyr(x, list(-(1:5)), sum))
## [1] 1 2 3 4 5 15 20 25 30 35
4) alternative specification of offsets In this alternative we specify an offset of 0 for each of the first 5 elements and offsets of -(1:5) for the rest.
width <- replace(rep(list(-(1:5)), length(x)), 1:5, list(0))
rollapply(x, width, sum)
## [1] 1 2 3 4 5 15 20 25 30 35
Note
The scheme for filling in the first 5 elements seems quite unusual and you might consider using partial sums for the first 5 with NA or 0 for the first one since there are no prior elements fir that one:
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = NA)
## [1] NA 1 3 6 10 15 20 25 30 35
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = 0)
## [1] 0 1 3 6 10 15 20 25 30 35
rollapplyr(x, 6, sum, partial = TRUE) - x
## [1] 0 1 3 6 10 15 20 25 30 35
A simple approach would be:
df <- data.frame(x = seq(1:10))
mysum <- function(x, k = 5) {
res <- rep(NA, length(x))
for (i in seq_along(x)) {
if (i <= k) { # edited ;-)
res[i] <- x[i]
} else {
res[i] <- sum(x[(i-k):(i-1)])
}
}
res
}
mysum(df$x)
# [1] 1 2 3 4 5 15 20 25 30 35
mysum <- function(x, k = 5) {
res <- x[1:k]
append<-sapply(2:(len(x)+1-k),function(i) sum(x[i:(i+k-1)]))
return(c(res,append))
}
mysum(df$x)

Add number to vector repeatdly and duplicate vector

I have a two value
3 and 5
and I make vector
num1 <- 3
num2 <- 12
a <- c(num1, num2)
I want add number(12) to vector "a" and
also I want to make new vector with repeat and append
like this:
3,12, 15,24, 27,36, 39,48 ....
repeat number "n" is 6
I don't have any idea.
Here are two methods in base R.
with outer, you could do
c(outer(c(3, 12), (12 * 0:4), "+"))
[1] 3 12 15 24 27 36 39 48 51 60
or with sapply, you can explicitly loop through and calculate the pairs of sums.
c(sapply(0:4, function(i) c(3, 12) + (12 * i)))
[1] 3 12 15 24 27 36 39 48 51 60
outer returns a matrix where every pair of elements of the two vectors have been added together. c is used to return a vector. sapply loops through 0:4 and then calculates the element-wise sum. It also returns a matrix in this instance, so c is used to return a vector.
Here is a somewhat generic function that takes as input your original vector a, the number to add 12, and n,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n/len1), function(i) x*i)
v2 <- rep(v1, each = n/length(v1))
v3 <- rep(vec, n/len1)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48
f1(a, 11, 12)
#[1] 3 12 14 23 25 34 36 45 47 56 58 67 69 78
f1(a, 3, 2)
#[1] 3 12 6 15
EDIT
If by n=6 you mean 6 times the whole vector then,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n), function(i) x*i)
v2 <- rep(v1, each = len1)
v3 <- rep(vec, n)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48 51 60 63 72 75 84
Using rep for repeating and cumsum for the addition:
n = 6
rep(a, n) + cumsum(rep(c(12, 0), n))
# [1] 15 24 27 36 39 48 51 60 63 72 75 84

Get a seq() in R with alternating steps

The seq function in R would give me a sequence from x to y with a constant step m:
seq(x, y, m)
E.g. seq(1,9,2) = c(1,3,5,7,9).
What would be the most elegant way to get a sequence from x to y with alternating steps m1 and m2, such that something like "seq(x, y, c(m1, m2))" would give me c(x, x + m1, (x + m1) + m2, (x + m1 + m2) + m1, ..., y), each time adding one of the steps (not necessarily reaching up to y, of course, as in seq)?
Example: x = 1; y = 19; m1 = 2; m2 = 4 and I get c(1,3,7,9,13,15,19).
I arrived the solution by:
1. Use cumsum with a vector c(from,rep(by,times),...), with by repeated times = ceiling((to-from)/sum(by)) times.
2. Truncate the sequence by !(seq > to).
seq_alt <- function(from, to, by) {
seq <- cumsum(c(from,rep(by,ceiling((to-from)/sum(by)))))
return(seq[! seq > to])
}
First n terms of this sequence you can generate with
x = 1; m1 = 2; m2 = 4
n <- 0:10 # first 11 terms
x + ceiling(n/2)*m1 + ceiling((n-1)/2)*m2
# [1] 1 3 7 9 13 15 19 21 25 27 31
Here is another idea,
fun1 <- function(x, y, j, z){
if(j >= y) {return(x)}else{
s1 <- seq(x, y, j+z)
s2 <- seq(x+j, y, j+z)
return(sort(c(s1, s2)))
}
}
fun1(1, 19, 2, 4)
#[1] 1 3 7 9 13 15 19
fun1(1, 40, 4, 3)
#[1] 1 5 8 12 15 19 22 26 29 33 36 40
fun1(3, 56, 7, 10)
#[1] 3 10 20 27 37 44 54
fun1(1, 2, 2, 4)
#[1] 1
Here is an alternative that uses diffinv This method over allocates the values, so as a stopping rule, I get the elements that are less than or equal to the stopping value.
seqAlt <- function(start, stop, by1, by2) {
out <- diffinv(rep(c(by1, by2), ceiling(stop / (by1 + by2))), xi=start)
return(out[out <= stop])
}
seqAlt(1, 19, 2, 4)
[1] 1 3 7 9 13 15 19
You could use Reduce with accumulate = TRUE to iteratively add either 2 or 4:
Reduce(`+`, rep(c(2,4), 10), init = 1, accumulate = TRUE)
# [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
The number of times you repeat c(2,4) will determine sequence length; since it is 10 above, the sequence is length 20.
The purrr package has an accumulate wrapper, if you prefer the syntax:
purrr::accumulate(rep(c(2,4), 10), `+`, .init = 1)
## [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
perfect example of recycling vectors in R
# 1.
x = 1; y = 19; m1 = 2; m2 = 4
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 1 3 7 9 13 15 19
# 2.
x = 3; y = 56; m1 = 7; m2 = 10
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 3 10 20 27 37 44 54

Resources