List comprehension in R - r

Is there a way to implement list comprehension in R?
Like python:
sum([x for x in range(1000) if x % 3== 0 or x % 5== 0])
same in Haskell:
sum [x| x<-[1..1000-1], x`mod` 3 ==0 || x `mod` 5 ==0 ]
What's the practical way to apply this in R?
Nick

Something like this?
l <- 1:1000
sum(l[l %% 3 == 0 | l %% 5 == 0])

Yes, list comprehension is possible in R:
sum((1:1000)[(1:1000 %% 3) == 0 | (1:1000 %% 5) == 0])

And, (kind of) the for-comprehension of scala:
for(i in {x <- 1:100;x[x%%2 == 0]})print(i)

This is many years later but there are three list comprehension packages now on CRAN. Each has slightly different syntax. In alphabetical order:
library(comprehenr)
sum(to_vec(for(x in 1:1000) if (x %% 3 == 0 | x %% 5 == 0) x))
## [1] 234168
library(eList)
Sum(for(x in 1:1000) if (x %% 3 == 0 | x %% 5 == 0) x else 0)
## [1] 234168
library(listcompr)
sum(gen.vector(x, x = 1:1000, x %% 3 == 0 | x %% 5 == 0))
## [1] 234168
In addition the following is on github only.
# devtools::install.github("mailund/lc")
library(lc)
sum(unlist(lc(x, x = seq(1000), x %% 3 == 0 | x %% 5 == 0)))
## [1] 234168

The foreach package by Revolution Analytics gives us a handy interface to list comprehensions in R. https://www.r-bloggers.com/list-comprehensions-in-r/
Example
Return numbers from the list which are not equal as tuple:
Python
list_a = [1, 2, 3]
list_b = [2, 7]
different_num = [(a, b) for a in list_a for b in list_b if a != b]
print(different_num)
# Output:
[(1, 2), (1, 7), (2, 7), (3, 2), (3, 7)]
R
require(foreach)
list_a = c(1, 2, 3)
list_b = c(2, 7)
different_num <- foreach(a=list_a ,.combine = c ) %:% foreach(b=list_b) %:% when(a!=b) %do% c(a,b)
print(different_num)
# Output:
[[1]]
[1] 1 2
[[2]]
[1] 1 7
[[3]]
[1] 2 7
[[4]]
[1] 3 2
[[5]]
[1] 3 7
EDIT:
The foreach package is very slow for certain tasks.
A faster list comprehension implementation is given at List comprehensions for R
. <<- structure(NA, class="comprehension")
comprehend <- function(expr, vars, seqs, guard, comprehension=list()){
if(length(vars)==0){ # base case of recursion
if(eval(guard)) comprehension[[length(comprehension)+1]] <- eval(expr)
} else {
for(elt in eval(seqs[[1]])){
assign(vars[1], elt, inherits=TRUE)
comprehension <- comprehend(expr, vars[-1], seqs[-1], guard,
comprehension)
}
}
comprehension
}
## List comprehensions specified by close approximation to set-builder notation:
##
## { x+y | 0<x<9, 0<y<x, x*y<30 } ---> .[ x+y ~ {x<-0:9; y<-0:x} | x*y<30 ]
##
"[.comprehension" <- function(x, f,rectangularizing=T){
f <- substitute(f)
## First, we pluck out the optional guard, if it is present:
if(is.call(f) && is.call(f[[3]]) && f[[3]][[1]]=='|'){
guard <- f[[3]][[3]]
f[[3]] <- f[[3]][[2]]
} else {
guard <- TRUE
}
## To allow omission of braces around a lone comprehension generator,
## as in 'expr ~ var <- seq' we make allowances for two shapes of f:
##
## (1) (`<-` (`~` expr
## var)
## seq)
## and
##
## (2) (`~` expr
## (`{` (`<-` var1 seq1)
## (`<-` var2 seq2)
## ...
## (`<-` varN <- seqN)))
##
## In the former case, we set gens <- list(var <- seq), unifying the
## treatment of both shapes under the latter, more general one.
syntax.error <- "Comprehension expects 'expr ~ {x1 <- seq1; ... ; xN <- seqN}'."
if(!is.call(f) || (f[[1]]!='<-' && f[[1]]!='~'))
stop(syntax.error)
if(is(f,'<-')){ # (1)
lhs <- f[[2]]
if(!is.call(lhs) || lhs[[1]] != '~')
stop(syntax.error)
expr <- lhs[[2]]
var <- as.character(lhs[[3]])
seq <- f[[3]]
gens <- list(call('<-', var, seq))
} else { # (2)
expr <- f[[2]]
gens <- as.list(f[[3]])[-1]
if(any(lapply(gens, class) != '<-'))
stop(syntax.error)
}
## Fill list comprehension .LC
vars <- as.character(lapply(gens, function(g) g[[2]]))
seqs <- lapply(gens, function(g) g[[3]])
.LC <- comprehend(expr, vars, seqs, guard)
## Provided the result is rectangular, convert it to a vector or array
if(!rectangularizing) return(.LC)
tryCatch({
if(!length(.LC))
return(.LC)
dim1 <- dim(.LC[[1]])
if(is.null(dim1)){
lengths <- sapply(.LC, length)
if(all(lengths == lengths[1])){ # rectangular
.LC <- unlist(.LC)
if(lengths[1] > 1) # matrix
dim(.LC) <- c(lengths[1], length(lengths))
} else { # ragged
# leave .LC as a list
}
} else { # elements of .LC have dimension
dim <- c(dim1, length(.LC))
.LC <- unlist(.LC)
dim(.LC) <- dim
}
return(.LC)
}, error = function(err) {
return(.LC)
})
}
This implementation is faster then foreach, it allows nested comprehension, multiple parameters and parameters scoping.
N <- list(10,20)
.[.[c(x,y,z)~{x <- 2:n;y <- x:n;z <- y:n} | {x^2+y^2==z^2 & z<15}]~{n <- N}]
[[1]]
[[1]][[1]]
[1] 3 4 5
[[1]][[2]]
[1] 6 8 10
[[2]]
[[2]][[1]]
[1] 3 4 5
[[2]][[2]]
[1] 5 12 13
[[2]][[3]]
[1] 6 8 10

Another way
sum(l<-(1:1000)[l %% 3 == 0 | l %% 5 == 0])

I hope it's okay to self-promote my package listcompr which implements a list comprehension syntax for R.
The example from the question can be solved in the following way:
library(listcompr)
sum(gen.vector(x, x = 1:1000, x %% 3 == 0 || x %% 5 == 0))
## Returns: 234168
As listcompr does a row-wise (and not a vector-vise) evaluation of the conditions, it makes no difference if || or | is used a logical operator. It accepts arbitrary many arguments: First, a base expression which is transformed into the list or vector entries. Next, arbitrary many arguments which specify the variable ranges and the conditions.
More examples can be found on the readme page on the github repository of listcompr: https://github.com/patrickroocks/listcompr

For a strict mapping from Python to R, this might be the most direct equivalence:
Python:
sum([x for x in range(1000) if x % 3== 0 or x % 5== 0])
R:
sum((x <- 0:999)[x %% 3 == 0 | x %% 5 == 0])
One important difference: the R version works like Python 2 where the x variable is globally scoped outside of the expression. (I call it an "expression" here since R does not have the notion of "list comprehension".) In Python 3, the iterator is restricted to the local scope of the list comprehension. In other words:
In R (as in Python 2), the x variable persists after the expression. If it existed before the expression, then its value is changed to the final value of the expression.
In Python 3, the x variable exists only within the list comprehension. If there was an x variable created before the list comprehension, the list comprehension does not change it at all.

This list comprehension of the form:
[item for item in list if test]
is pretty straightforward with boolean indexing in R. But for more complex expressions, like implementing vector rescaling (I know this can be done with scales package too), in Python it's easy:
x = [1, 3, 5, 7, 9, 11] # -> [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
[(xi - min(x))/(max(x) - min(x)) for xi in x]
But in R this is the best I could come up with. Would love to know if there's something better:
sapply(x, function(xi, mn, mx) {(xi-mn)/(mx-mn)}, mn = min(x), mx = max(x))

You could convert a sequence of random numbers to a binary sequence as follows:
x=runif(1000)
y=NULL
for (i in x){if (i>.5){y<-c(y,1)}else{y=c(y,-1)}}
this could be generalized to operate on any list to another list based on:
x = [item for item in x if test == True]
where the test could use the else statement to not append the list y.
For the problem at hand:
x <- 0:999
y <- NULL
for (i in x){ if (i %% 3 == 0 | i %% 5 == 0){ y <- c(y, i) }}
sum( y )

Related

Create a function to find the length of a vector WITHOUT using length()

I already tried max(seq_along(x)) but I need it to also return 0 if we, let's say, inputted numeric(0).
So yeah, it works for anything else other than numeric(0). This is what I have so far:
my_length <- function(x){
max(seq_along(x))
}
You can just include a 0 to the max() call in your attempt:
my_length <- function(x) max(0, seq_along(x))
my_length(10:1)
[1] 10
my_length(NULL)
[1] 0
my_length(numeric())
[1] 0
Using forloop:
my_length <- function(x){
l = 0
for(i in x) l <- l + 1
return(l)
}
x <- numeric(0)
my_length(x)
# [1] 0
x <- 1:10
my_length(x)
# [1] 10
Another option:
my_length <- function(x) nrow(matrix(x))
You can use NROW():
len <- \(x) NROW(x)
Examples:
len(numeric(0))
#> [1] 0
len(letters)
#> [1] 26
len(c(3, 0, 9, 1))
#> [1] 4
From the documentation:
nrow and ncol return the number of rows or columns present in x. NCOL and NROW do the same treating a vector as 1-column matrix, even a 0-length vector ...
Here are a few more functional programming approaches:
Using mapping and summation:
length = function (x) {
sum(vapply(x, \(.) 1L, integer(1L)))
}
Using reduction:
length = function (x) {
Reduce(\(x, .) x + 1L, x, 0L)
}
Using recursion:
length = function (x, len = 0L) {
if (is_empty(x)) len else Recall(x[-1L], len + 1L)
}
Alas, the last one needs to define the helper function and that is unfortunately not trivial without using length():
is_empty = function (x) {
is.null(x) || identical(x, vector(typeof(x), 0L))
}

Faster ways to generate Yellowstone sequence (A098550) in R?

I just saw a YouTube video from Numberphile on the Yellowstone sequence (A098550). It's base on a sequence starting with 1 and 2, with subsequent terms generated by the rules:
no repeated terms
always pick the lowest integer
gcd(a_n, a_(n-1)) = 1
gcd(a_n, a_(n-2)) > 1
The first 15 terms would be: 1 2 3 4 9 8 15 14 5 6 25 12 35 16 7
A Q&D approach in R could be something like this, but understandably, this becomes very slow at attempts to make longer sequences. It also make some assumptions about the highest number that is possible within the sequence (as info: the sequence of 10,000 items never goes higher than 5000).
What can we do to make this faster?
library(DescTools)
a <- c(1, 2, 3)
p <- length(a)
# all natural numbers
all_ints <- 1:5000
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
next_a_set <- all_ints[which(!all_ints %in% a)]
# rule 3 - search the remaining set for numbers that have gcd == 1
next_a_option <- next_a_set[which(
sapply(
next_a_set,
function(x) GCD(a[n], x)
) == 1
)]
# rule 4 - search the remaining number for gcd > 1
next_a <- next_a_option[which(
sapply(
next_a_option,
function(x) GCD(a[n - 1], x)
) > 1
)]
# select the lowest
a <- c(a, min(next_a))
n <- n + 1
}
Here's a version that's about 20 times faster than yours, with comments about the changes:
# Set a to the final length from the start.
a <- c(1, 2, 3, rep(NA, 997))
p <- 3
# Define a vectorized gcd() function. We'll be testing
# lots of gcds at once. This uses the Euclidean algorithm.
gcd <- function(x, y) { # vectorized gcd
while (any(y != 0)) {
x1 <- ifelse(y == 0, x, y)
y <- ifelse(y == 0, 0, x %% y)
x <- x1
}
x
}
# Guess at a reasonably large vector to work from,
# but we'll grow it later if not big enough.
allnum <- 1:1000
# Keep a logical record of what has been used
used <- c(rep(TRUE, 3), rep(FALSE, length(allnum) - 3))
for (n in p:1000) {
# rule 1 - remove all number that are in sequence already
# nothing to do -- used already records that.
repeat {
# rule 3 - search the remaining set for numbers that have gcd == 1
keep <- !used & gcd(a[n], allnum) == 1
# rule 4 - search the remaining number for gcd > 1
keep <- keep & gcd(a[n-1], allnum) > 1
# If we found anything, break out of this loop
if (any(keep))
break
# Otherwise, make the set of possible values twice as big,
# and try again
allnum <- seq_len(2*length(allnum))
used <- c(used, rep(FALSE, length(used)))
}
# select the lowest
newval <- which.max(keep)
# Assign into the appropriate place
a[n+1] <- newval
# Record that it has been used
used[newval] <- TRUE
}
If you profile it, you'll see it spends most of its time in the gcd() function. You could probably make that a lot faster by redoing it in C or C++.
The biggest change here is pre-allocation and restricting the search to numbers that have not yet been used.
library(numbers)
N <- 5e3
a <- integer(N)
a[1:3] <- 1:3
b <- logical(N) # which numbers have been used already?
b[1:3] <- TRUE
NN <- 1:N
system.time({
for (n in 4:N) {
a1 <- a[n - 1L]
a2 <- a[n - 2L]
for (k in NN[!b]) {
if (GCD(k, a1) == 1L & GCD(k, a2) > 1L) {
a[n] <- k
b[k] <- TRUE
break
}
}
if (!a[n]) {
a <- a[1:(n - 1L)]
break
}
}
})
#> user system elapsed
#> 1.28 0.00 1.28
length(a)
#> [1] 1137
For a fast C++ algorithm, see here.

Find components of a vector which increase continually by k-times

I want to create a function which finds components of a vector which increase continually by k-times.
That is, if the contrived function is f(x,k) and x=c(2,3,4,3,5,6,5,7), then
the value of f(x,1) is 2,3,3,5,5 since only these components of x increase by 1 time.
In addition, if k=2, then the value of f(x,2) is 2,3 since only these components increase continually by 2 times.(2→3→4 and 3→5→6)
I guess that I ought to use repetitive syntax like for for this purpose.
1) Use rollapply from the zoo package:
library(zoo)
f <- function(x, k)
x[rollapply(x, k+1, function(x) all(diff(x) > 0), align = "left", fill = FALSE)]
Now test out f:
x <- c(2,3,4,3,5,6,5,7)
f(x, 1)
## [1] 2 3 3 5 5
f(x, 2)
## [1] 2 3
f(x, 3)
## numeric(0)
1a) This variation is slightly shorter and also works:
f2 <- function(x, k) head(x, -k)[ rollapply(diff(x) > 0, k, all) ]
2) Here is a version of 1a that uses no packages:
f3 <- function(x, k) head(x, -k)[ apply(embed(diff(x) > 0, k), 1, all) ]
A fully vectorized solution:
f <- function(x, k = 1) {
rlecumsum = function(x)
{ #cumsum with resetting
#http://stackoverflow.com/a/32524260/1412059
cs = cumsum(x)
cs - cummax((x == 0) * cs)
}
x[rev(rlecumsum(rev(c(diff(x) > 0, FALSE) ))) >= k]
}
f(x, 1)
#[1] 2 3 3 5 5
f(x, 2)
#[1] 2 3
f(x, 3)
#numeric(0)
I don't quite understand the second part of your question (that with k=2) but for the first part you can use something like this:
test<-c(2,3,4,3,5,6,5,7) #Your vector
diff(test) #Differentiates the vector
diff(test)>0 #Turns the vector in a logical vector with criterion >0
test[diff(test)>0] #Returns only the elements of test that correspond to a TRUE value in the previous line

Find vector overlap from the start

I am looking for an efficient way to get the first k elements that are the same between two vectors in R.
For example:
orderedIntersect(c(1,2,3,4), c(1,2,5,4))
# [1] 1 2
orderedIntersect(c(1,2,3), c(1,2,3,4))
# [1] 1 2 3
This is the same as the intersect behavior, but any values after the first mismatch should be dropped.
I also want this to work for strings.
So far, the solution that I have is this:
orderedIntersect <- function(a,b) {
a <- as.vector(a)
NAs <- is.na(match(a, as.vector(b)))
last <- ifelse(any(NAs), min(which(NAs)) - 1, length(a))
a[1:last]
}
I am troubled by the fact that I have to iterate over n input elements 6 times: match, is.na, any, which, min, and the subset [].
Clearly, it would be faster to write an external C function (with a for loop and a break), but I am wondering if there is any clever R trick I can use here.
You can compare the values of your vectors and drop elements when the first FALSE is reached:
orderedIntersect <- function(a,b) {
# check the lengths are equal and if not, "cut" the vectors so they are (to avoid warnings)
l_a <- length(a) ; l_b <- length(b)
if(l_a != l_b) {m_l <- min(l_a, l_b) ; a <- a[1:m_l] ; b <- b[1:m_l]}
# compare the elements : they are equal if both are not NA and have the same value or if both are NA
comp <- (!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))
# return the right vector : nothing if the first elements do not match, everything if all elements match or just the part that match
if(!comp[1]) return(c()) else if (all(comp)) return(a) else return(a[1:(which(!comp)[1]-1)])
}
orderedIntersect(c(1,2,3,4), c(1,2,5,4))
#[1] 1 2
orderedIntersect(c(1,2,3), c(1,2,3,4))
#[1] 1 2 3
orderedIntersect(c(1,2,3), c(2,3,4))
#NULL
The simple C solution (for integers) isn't really any longer than the R version, but it would be a little more work to extend to all the other classes.
library(inline)
orderedIntersect <- cfunction(
signature(x='integer', y='integer'),
body='
int i, l = length(x) > length(y) ? length(y) : length(x),
*xx = INTEGER(x), *yy = INTEGER(y);
SEXP res;
for (i = 0; i < l; i++) if (xx[i] != yy[i]) break;
PROTECT(res = allocVector(INTSXP, i));
for (l = 0; l < i; l++) INTEGER(res)[l] = xx[l];
UNPROTECT(1);
return res;'
)
## Tests
a <- c(1L,2L,3L,4L)
b <- c(1L,2L,5L,4L)
c <- c(1L,2L,8L,9L,9L,9L,9L,3L)
d <- c(9L,0L,0L,8L)
orderedIntersect(a,b)
# [1] 1 2
orderedIntersect(a,c)
# [1] 1 2
orderedIntersect(a,d)
# integer(0)
orderedIntersect(a, integer())
# integer(0)
This might work:
#test data
a <- c(1,2,3,4)
b <- c(1,2,5,4)
c <- c(1,2,8,9,9,9,9,3)
d <- c(9,0,0,8)
empty <- c()
string1 <- c("abc", "def", "ad","k")
string2 <- c("abc", "def", "c", "lds")
#function
orderedIntersect <- function(a, b) {
l <- min(length(a), length(b))
if (l == 0) return(numeric(0))
a1 <- a[1:l]
comp <- a1 != b[1:l]
if (all(!comp)) return(a1)
a1[ 0:(min(which(comp)) - 1) ]
}
#testing
orderedIntersect(a,b)
# [1] 1 2
orderedIntersect(a,c)
# [1] 1 2
orderedIntersect(a,d)
# numeric(0)
orderedIntersect(a, empty)
# numeric(0)
orderedIntersect(string1,string2)
# [1] "abc" "def"

Euler Project #1 in R

Problem
Find the sum of all numbers below 1000 that can be divisible by 3 or 5
One solution I created:
x <- c(1:999)
values <- x[x %% 3 == 0 | x %% 5 == 0]
sum(values
Second solution I can't get to work and need help with. I've pasted it below.
I'm trying to use a loop (here, I use while() and after this I'll try for()). I am still struggling with keeping references to indexes (locations in a vector) separate from values/observations within vectors. Loops seem to make it more challenging for me to distinguish the two.
Why does this not produce the answer to Euler #1?
x <- 0
i <- 1
while (i < 100) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- c(x, i)
}
i <- i + 1
}
sum(x)
And in words, line by line this is what I understand is happening:
x gets value 0
i gets value 1
while object i's value (not the index #) is < 1000
if is divisible by 3 or 5
add that number i to the vector x
add 1 to i in order (in order to keep the loop going to defined limit of 1e3
sum all items in vector x
I am guessing x[i] <- c(x, i) is not the right way to add an element to vector x. How do I fix this and what else is not accurate?
First, your loop runs until i < 100, not i < 1000.
Second, replace x[i] <- c(x, i) with x <- c(x, i) to add an element to the vector.
Here is a shortcut that performs this sum, which is probably more in the spirit of the problem:
3*(333*334/2) + 5*(199*200/2) - 15*(66*67/2)
## [1] 233168
Here's why this works:
In the set of integers [1,999] there are:
333 values that are divisible by 3. Their sum is 3*sum(1:333) or 3*(333*334/2).
199 values that are divisible by 5. Their sum is 5*sum(1:199) or 5*(199*200/2).
Adding these up gives a number that is too high by their intersection, which are the values that are divisible by 15. There are 66 such values, and their sum is 15*(1:66) or 15*(66*67/2)
As a function of N, this can be written:
f <- function(N) {
threes <- floor(N/3)
fives <- floor(N/5)
fifteens <- floor(N/15)
3*(threes*(threes+1)/2) + 5*(fives*(fives+1)/2) - 15*(fifteens*(fifteens+1)/2)
}
Giving:
f(999)
## [1] 233168
f(99)
## [1] 2318
And another way:
x <- 1:999
sum(which(x%%5==0 | x%%3==0))
# [1] 233168
A very efficient approach is the following:
div_sum <- function(x, n) {
# calculates the double of the sum of all integers from 1 to n
# that are divisible by x
max_num <- n %/% x
(x * (max_num + 1) * max_num)
}
n <- 999
a <- 3
b <- 5
(div_sum(a, n) + div_sum(b, n) - div_sum(a * b, n)) / 2
In contrast, a very short code is the following:
x=1:999
sum(x[!x%%3|!x%%5])
Here is an alternative that I think gives the same answer (using 99 instead of 999 as the upper bound):
iters <- 100
x <- rep(0, iters-1)
i <- 1
while (i < iters) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- i
}
i <- i + 1
}
sum(x)
# [1] 2318
Here is the for-loop mentioned in the original post:
iters <- 99
x <- rep(0, iters)
i <- 1
for (i in 1:iters) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- i
}
i <- i + 1
}
sum(x)
# [1] 2318

Resources