Selective elimination from a list in R - r

I was wondering how I could eliminate the x elements from the second variable on (in this case, x[[2]] i.e., 0:90) in list x whose corresponding y is 0?
x = list(0:5, 0:90) # from the second variable on, in this list, eliminate elements whose
# corresponding `y` is `0` ?
y = lapply(list(dbinom(x[[1]], 5, .9), dpois(x[[2]], 50)), round, digits = 4)
P.S. My goal is to possibly do this using lapply for any larger list.

In this case, you could do
x[[2]][y[[2]] != 0]
to get your expected output.
However, as mentioned you have a larger list and want to do it for each one of them. In that case, we could use mapply
mapply(function(p, q) p[q != 0], x[2:length(x)], y[2:length(y)], SIMPLIFY = FALSE)
OR if we want to use lapply we could do
lapply(2:length(x), function(i) x[[i]][y[[i]] != 0])
If we want to keep the 1st element as it is we could do
c(list(x[[1]]), lapply(2:length(x), function(i) x[[i]][y[[i]] != 0]))
EDIT
To maintain the order we can rearrange the both x and y based on smallest_max
get_new_list <- function(x, y) {
smallest_max <- which.min(sapply(x, max))
new_x <- c(x[smallest_max], x[-smallest_max])
new_y <- c(y[smallest_max], y[-smallest_max])
c(new_x[1], lapply(2:length(new_x), function(i) new_x[[i]][new_y[[i]] != 0]))
}
x = list(0:5, 0:40)
y = lapply(list(dbinom(x[[1]], 5, .9), dpois(x[[2]], 50)), round, digits = 4)
get_new_list(x, y)
#[[1]]
#[1] 0 1 2 3 4 5
#[[2]]
#[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
x = list(0:40, 0:5)
y = lapply(list(dpois(x[[1]], 50), dbinom(x[[2]], 5, .9)), round, digits = 4)
get_new_list(x, y)
#[[1]]
#[1] 0 1 2 3 4 5
#[[2]]
#[1] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Related

R how to find a series of common values in a vector (identifying growing season)

I'm looking for a way to identify a growing season which consists of a number of days greater than say 60 between the last frost day of spring and the first frost day in the fall. A general version of this problem is this. If I have a vector of numbers like testVec, I want the item numbers of the beginning and end range of values where the number of items is 5 or greater and all of them are greater than 0.
testVec <- c(1,3,4,0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
In this example, the relevant range is 1,3,4,6,7,5,9 which is testVec[9] to testVec[15]
One option could be:
testVec[with(rle(testVec > 0), rep(lengths * values >= 5, lengths))]
[1] 1 3 4 6 7 5 9
Here, the idea is to, first, create runs of values that are smaller or equal to zero and bigger than zero. Second, it checks whether the runs of values bigger than zero are of length 5 or more. Finally, it subsets the original vector for the runs of values bigger than zero with length 5 or more.
1) rleid This also handles any number of sequences including zero. rleid(ok) is a vector the same length as ok such that the first run of identical elements is replaced with 1, the second run with 2 and so on. The result is a list of vectors where each vector has its positions in the original input as its names.
library(data.table)
getSeq <- function(x) {
names(x) <- seq_along(x)
ok <- x > 0
s <- split(x[ok], rleid(ok)[ok])
unname(s)[lengths(s) >= 5]
}
getSeq(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
getSeq(numeric(16))
## list()
getSeq(c(testVec, 10 * testVec))
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
##
## [[2]]
## 25 26 27 28 29 30 31
## 10 30 40 60 70 50 90
If a data frame were desired then following gives the values and which sequence the row came from. The row names indicate the positions in the original input.
gs <- getSeq(c(testVec, 10 * testVec))
names(gs) <- seq_along(gs)
if (length(gs)) stack(gs) else gs
## values ind
## 9 1 1
## 10 3 1
## 11 4 1
## 12 6 1
## 13 7 1
## 14 5 1
## 15 9 1
## 25 10 2
## 26 30 2
## 27 40 2
## 28 60 2
## 29 70 2
## 30 50 2
## 31 90 2
2) gregexpr Replace each element that is > 0 with 1 and each other element with 0 pasting the 0's and 1's into a single character string. Then use gregexpr to look for sequences of 1's at least 5 long and for the ith such nonoverlapping sequence return the first positions, g, and lengths, attr(g, "match.length"). Define a function vals which extracts the values at the required positions from testVec of the ith such nonoverlapping sequence returning a list such that the ith component of the list is the ith such sequence. The names in the output vector are its positions in the input.
getSeq2 <- function(x) {
g <- gregexpr("1{5,}", paste(+(x > 0), collapse = ""))[[1]]
vals <- function(i) {
ix <- seq(g[i], length = attr(g, "match.length")[i])
setNames(x[ix], ix)
}
if (length(g) == 1 && g == -1) list() else lapply(seq_along(g), vals)
}
getSeq2(testVec)
## [[1]]
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
The above handles any number of sequences including 0 but if we knew there were exactly one sequence (which is the case for the example in the question) then it could be simplified to the following where the return value is just that vector:
g <- gregexpr("1{5,}", paste(+(testVec > 0), collapse = ""))[[1]]
ix <- seq(g, length = attr(g, "match.length"))
setNames(testVec[ix], ix)
## 9 10 11 12 13 14 15
## 1 3 4 6 7 5 9
You could "fix" #tmfmnk's solution like this:
f1 <- function(x, threshold, n) {
range(which(with(rle(x > threshold), rep(lengths * values >= n, lengths))))
}
x <- c(1, 3, 4, 0, 1, -5, 6, 0, 1,3,4,6,7,5,9, 0)
f1(x, 0, 5)
#[1] 9 15
But that does not work well when there are multiple runs
xx <- c(x, x)
f1(xx, 0, 5)
#[1] 9 31
Here is another, not so concise approach that returns the start and end of the longest run (the first one if there are ties).
f2 <- function(x, threshold, n) {
y <- x > threshold
y[is.na(y)] <- FALSE
a <- ave(y, cumsum(!y), FUN=cumsum)
m <- max(a)
if (m < n) return (c(NA, NA))
i <- which(a == m)[1]
c(i-m+1, i)
}
f2(x, 0, 5)
#[1] 9 15
f2(xx, 0, 5)
#[1] 9 15
or with rle
f3 <- function(x, threshold, n) {
y <- x > threshold
r <- rle(y)
m <- max(r$lengths)
if (m < n) return (c(NA, NA))
i <- sum(r$lengths[1:which.max(r$lengths)[1]])
c(i-max(r$lengths)+1, i)
}
f3(x, 0, 5)
#[1] 9 15
f3(xx, 0, 5)
#[1] 9 15
If you wanted the first run that is at least n, that is you do not want a next run, even if it is longer, you could do
f4 <- function(x, threshold, n) {
y <- with(rle(x > threshold), rep(lengths * values >= n, lengths))
i <- which(y)[1]
j <- i + which(!y[-c(1:i)])[1] - 1
c(i, j)
}

Rolling sum in R

df <- data.frame(x = seq(1:10))
I want this:
df$y <- c(1, 2, 3, 4, 5, 15, 20 , 25, 30, 35)
i.e. each y is the sum of previous five x values. This implies the first
five y will be same as x
What I get is this:
df$y1 <- c(df$x[1:4], RcppRoll::roll_sum(df$x, 5))
x y y1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 15
6 15 20
7 20 25
8 25 30
9 30 35
10 35 40
In summary, I need y but I am only able to achieve y1
1) enhanced sum function Define a function Sum which sums its first 5 values if it receives 6 values and returns the last value otherwise. Then use it with partial=TRUE in rollapplyr:
Sum <- function(x) if (length(x) < 6) tail(x, 1) else sum(head(x, -1))
rollapplyr(x, 6, Sum, partial = TRUE)
## [1] 1 2 3 4 5 15 20 25 30 35
2) sum 6 and subtract off original Another possibility is to take the running sum of 6 elements filling in the first 5 elements with NA and subtracting off the original vector. Finally fill in the first 5.
replace(rollsumr(x, 6, fill = NA) - x, 1:5, head(x, 5))
## [1] 1 2 3 4 5 15 20 25 30 35
3) specify offsets A third possibility is to use the offset form of width to specify the prior 5 elements:
c(head(x, 5), rollapplyr(x, list(-(1:5)), sum))
## [1] 1 2 3 4 5 15 20 25 30 35
4) alternative specification of offsets In this alternative we specify an offset of 0 for each of the first 5 elements and offsets of -(1:5) for the rest.
width <- replace(rep(list(-(1:5)), length(x)), 1:5, list(0))
rollapply(x, width, sum)
## [1] 1 2 3 4 5 15 20 25 30 35
Note
The scheme for filling in the first 5 elements seems quite unusual and you might consider using partial sums for the first 5 with NA or 0 for the first one since there are no prior elements fir that one:
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = NA)
## [1] NA 1 3 6 10 15 20 25 30 35
rollapplyr(x, list(-(1:5)), sum, partial = TRUE, fill = 0)
## [1] 0 1 3 6 10 15 20 25 30 35
rollapplyr(x, 6, sum, partial = TRUE) - x
## [1] 0 1 3 6 10 15 20 25 30 35
A simple approach would be:
df <- data.frame(x = seq(1:10))
mysum <- function(x, k = 5) {
res <- rep(NA, length(x))
for (i in seq_along(x)) {
if (i <= k) { # edited ;-)
res[i] <- x[i]
} else {
res[i] <- sum(x[(i-k):(i-1)])
}
}
res
}
mysum(df$x)
# [1] 1 2 3 4 5 15 20 25 30 35
mysum <- function(x, k = 5) {
res <- x[1:k]
append<-sapply(2:(len(x)+1-k),function(i) sum(x[i:(i+k-1)]))
return(c(res,append))
}
mysum(df$x)

Is it possible to skip NA values in "+" operator?

I want to calculate an equation in R. I don't want to use the function sum because it's returning 1 value. I want the full vector of values.
x = 1:10
y = c(21:29,NA)
x+y
[1] 22 24 26 28 30 32 34 36 38 NA
x = 1:10
y = c(21:30)
x+y
[1] 22 24 26 28 30 32 34 36 38 40
I don't want:
sum(x,y, na.rm = TRUE)
[1] 280
Which does not return a vector.
This is a toy example but I have a more complex equation using multiple vector of length 84647 elements.
Here is another example of what I mean:
x = 1:10
y = c(21:29,NA)
z = 11:20
a = c(NA,NA,NA,30:36)
5 +2*(x+y-50)/(x+y+z+a)
[1] NA NA NA 4.388889 4.473684 4.550000 4.619048 4.681818 4.739130 NA
1) %+% Define a custom + operator:
`%+%` <- function(x, y) mapply(sum, x, y, MoreArgs = list(na.rm = TRUE))
5 + 2 * (x %+% y - 50) / (x %+% y %+% z %+% a)
giving:
[1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
[9] 4.739130 3.787879
Here are some simple examples:
1 %+% 2
## [1] 3
NA %+% 2
## [1] 2
2 %+% NA
## [1] 2
NA %+% NA
## [1] 0
2) na2zero Another possibility is to define a function which maps NA to 0 like this:
na2zero <- function(x) ifelse(is.na(x), 0, x)
X <- na2zero(x)
Y <- na2zero(y)
Z <- na2zero(z)
A <- na2zero(a)
5 + 2 * (X + Y - 50) / (X + Y + Z + A)
giving:
[1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
[9] 4.739130 3.787879
3) combine above A variation combining (1) with the idea in (2) is:
X <- x %+% 0
Y <- y %+% 0
Z <- z %+% 0
A <- a %+% 0
5 + 2 * (X + Y - 50) / (X + Y + Z + A)
4) numeric0 class We can define a custom class "numeric0" with its own + operator:
as.numeric0 <- function(x) structure(x, class = "numeric0")
`+.numeric0` <- `%+%`
X <- as.numeric0(x)
Y <- as.numeric0(y)
Z <- as.numeric0(z)
A <- as.numeric0(a)
5 + 2 * (X + Y - 50) / (X + Y + Z + A)
Note: The inputs used were those in the question, namely:
x = 1:10
y = c(21:29,NA)
z = 11:20
a = c(NA,NA,NA,30:36)
Using rowSums:
To elaborate on my comment, you can concatenate the vectors and then apply your calculations on the resulted matrix. This is the solution for the example that you provided at the end of your question;
5 + 2 * (rowSums(cbind(x,y), na.rm = T)-50)/(rowSums(cbind(x,y,z,a), na.rm = T))
# [1] 3.303030 3.555556 3.769231 4.388889 4.473684 4.550000 4.619048 4.681818
# [9] 4.739130 3.787879
Repalcing NA:
I have seen solutions here with the idea of replacing NA in the vectors; I think this would be helpful too:
y[is.na(y)] <- 0 #indexing NA values and replacing with zero
you can use ifelse()
x = 1:10
y = c(21:29,NA)
x+y
[1] 22 24 26 28 30 32 34 36 38 NA
x + ifelse(is.na(y), 0, y)
[1] 22 24 26 28 30 32 34 36 38 10
DATA
x = 1:10
y = c(21:29,NA)
x+y
# [1] 22 24 26 28 30 32 34 36 38 NA
1
foo1 = function(...){
return(rowSums(cbind(...), na.rm = TRUE))
}
foo1(x, y)
# [1] 22 24 26 28 30 32 34 36 38 10
2
foo2 = function(...){
Reduce('+', lapply(list(...), function(x) replace(x, is.na(x), 0)))
}
foo2(x, y)
# [1] 22 24 26 28 30 32 34 36 38 10
Just for laffs:
x=1:10
y=c(21:29, NA)
"[<-"(x, is.na(x), 0) + "[<-"(y, is.na(y), 0)
# [1] 22 24 26 28 30 32 34 36 38 10
which again illustrates the fact that everything in R is a function (and also shows that the R interpreter is smart enough to turn a string into a function when required).
Syntactically sweetened:
na.zero <- function(x)
{
"[<-"(x, is.na(x), 0)
}
na.zero(x) + na.zero(y)
# [1] 22 24 26 28 30 32 34 36 38 10
More broadly applicable version:
na.replace <- function(x, value)
{
"[<-"(x, is.na(x), value)
}
na.replace(x, 1) * na.replace(x, 1)
# [1] 1 4 9 16 25 36 49 64 81 100

Add number to vector repeatdly and duplicate vector

I have a two value
3 and 5
and I make vector
num1 <- 3
num2 <- 12
a <- c(num1, num2)
I want add number(12) to vector "a" and
also I want to make new vector with repeat and append
like this:
3,12, 15,24, 27,36, 39,48 ....
repeat number "n" is 6
I don't have any idea.
Here are two methods in base R.
with outer, you could do
c(outer(c(3, 12), (12 * 0:4), "+"))
[1] 3 12 15 24 27 36 39 48 51 60
or with sapply, you can explicitly loop through and calculate the pairs of sums.
c(sapply(0:4, function(i) c(3, 12) + (12 * i)))
[1] 3 12 15 24 27 36 39 48 51 60
outer returns a matrix where every pair of elements of the two vectors have been added together. c is used to return a vector. sapply loops through 0:4 and then calculates the element-wise sum. It also returns a matrix in this instance, so c is used to return a vector.
Here is a somewhat generic function that takes as input your original vector a, the number to add 12, and n,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n/len1), function(i) x*i)
v2 <- rep(v1, each = n/length(v1))
v3 <- rep(vec, n/len1)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48
f1(a, 11, 12)
#[1] 3 12 14 23 25 34 36 45 47 56 58 67 69 78
f1(a, 3, 2)
#[1] 3 12 6 15
EDIT
If by n=6 you mean 6 times the whole vector then,
f1 <- function(vec, x, n){
len1 <- length(vec)
v1 <- sapply(seq(n), function(i) x*i)
v2 <- rep(v1, each = len1)
v3 <- rep(vec, n)
return(c(vec, v3 + v2))
}
f1(a, 12, 6)
#[1] 3 12 15 24 27 36 39 48 51 60 63 72 75 84
Using rep for repeating and cumsum for the addition:
n = 6
rep(a, n) + cumsum(rep(c(12, 0), n))
# [1] 15 24 27 36 39 48 51 60 63 72 75 84

Get a seq() in R with alternating steps

The seq function in R would give me a sequence from x to y with a constant step m:
seq(x, y, m)
E.g. seq(1,9,2) = c(1,3,5,7,9).
What would be the most elegant way to get a sequence from x to y with alternating steps m1 and m2, such that something like "seq(x, y, c(m1, m2))" would give me c(x, x + m1, (x + m1) + m2, (x + m1 + m2) + m1, ..., y), each time adding one of the steps (not necessarily reaching up to y, of course, as in seq)?
Example: x = 1; y = 19; m1 = 2; m2 = 4 and I get c(1,3,7,9,13,15,19).
I arrived the solution by:
1. Use cumsum with a vector c(from,rep(by,times),...), with by repeated times = ceiling((to-from)/sum(by)) times.
2. Truncate the sequence by !(seq > to).
seq_alt <- function(from, to, by) {
seq <- cumsum(c(from,rep(by,ceiling((to-from)/sum(by)))))
return(seq[! seq > to])
}
First n terms of this sequence you can generate with
x = 1; m1 = 2; m2 = 4
n <- 0:10 # first 11 terms
x + ceiling(n/2)*m1 + ceiling((n-1)/2)*m2
# [1] 1 3 7 9 13 15 19 21 25 27 31
Here is another idea,
fun1 <- function(x, y, j, z){
if(j >= y) {return(x)}else{
s1 <- seq(x, y, j+z)
s2 <- seq(x+j, y, j+z)
return(sort(c(s1, s2)))
}
}
fun1(1, 19, 2, 4)
#[1] 1 3 7 9 13 15 19
fun1(1, 40, 4, 3)
#[1] 1 5 8 12 15 19 22 26 29 33 36 40
fun1(3, 56, 7, 10)
#[1] 3 10 20 27 37 44 54
fun1(1, 2, 2, 4)
#[1] 1
Here is an alternative that uses diffinv This method over allocates the values, so as a stopping rule, I get the elements that are less than or equal to the stopping value.
seqAlt <- function(start, stop, by1, by2) {
out <- diffinv(rep(c(by1, by2), ceiling(stop / (by1 + by2))), xi=start)
return(out[out <= stop])
}
seqAlt(1, 19, 2, 4)
[1] 1 3 7 9 13 15 19
You could use Reduce with accumulate = TRUE to iteratively add either 2 or 4:
Reduce(`+`, rep(c(2,4), 10), init = 1, accumulate = TRUE)
# [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
The number of times you repeat c(2,4) will determine sequence length; since it is 10 above, the sequence is length 20.
The purrr package has an accumulate wrapper, if you prefer the syntax:
purrr::accumulate(rep(c(2,4), 10), `+`, .init = 1)
## [1] 1 3 7 9 13 15 19 21 25 27 31 33 37 39 43 45 49 51 55 57 61
perfect example of recycling vectors in R
# 1.
x = 1; y = 19; m1 = 2; m2 = 4
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 1 3 7 9 13 15 19
# 2.
x = 3; y = 56; m1 = 7; m2 = 10
(x:y)[c(TRUE, rep(FALSE, m1-1), TRUE, rep(FALSE,m2-1))]
# [1] 3 10 20 27 37 44 54

Resources