Related
Is it possible to sum all vector variables with a common prefix ?
Exemple:
x1 <- c(1,2,3)
x2 <- c(4,5,6)
.
.
.
xn <- c(n,n,n)
y = x1 + x2 + ... xn
The number of variables xn (ie with prefix x) is only known at runtime.
Assuming your y has the same dimension as x, you could try capturing all the variables into the list and apply a summation operation.
> x2 <- c(4,5,6)
> x1 <- c(1,2,3)
> ls(pattern = "^x\\d+$") # this is regex for finding "x" and "digits",
# ^ is start of string, $ is end of string
[1] "x1" "x2"
> sapply(ls(pattern = "^x\\d+$"), get, simplify = FALSE)
$x1
[1] 1 2 3
$x2
[1] 4 5 6
> out <- sapply(ls(pattern = "^x\\d+$"), get, simplify = FALSE)
> Reduce("+", out)
[1] 5 7 9
You can also use mget as suggested by #LyzandeR's, especially if fancy one-liners.
Reduce("+", mget(ls(pattern = "^x\\d+$")))
You can check an example:
xx <- 1
xx2 <- 2
xx3 <- 3
#get the names of the variables containing xx
vars <- ls(pattern = 'xx')
#mget will get the variables from the names, unlist will add them in an atomic vector
sum(unlist(mget(vars)))
#[1] 6
A very naive solution could be:
# first 2 vectors are of interest
x1 <- c(1,2,3)
x2 <- c(4,5,6)
# answer doesn't need to have z sum in it
z <- c(7,8,9)
# create a dummy answer vector, initialize it will all 0; length will be the length of single vector that we are adding
answer<-rep(0,length(x1))
# loop through each variable in current environment
for (var in ls()){
# see if variable name begins with x
if (startsWith(var,'x')){
# add it to our answer
answer = answer + get(var)
}
}
# print the answer
print(answer)
Suppose I have a matrix,
mat <- matrix((1:9)^2, 3, 3)
I can slice the matrix like so
> mat[2:3, 2]
[1] 25 36
How does one store the subscript as a variable? That is, what should my_sub be, such that
> mat[my_sub]
[1] 25 36
A list gets "invalid subscript type" error. A vector will lose the multidimensionality. Seems like such a basic operation to not have a primitive type that fits this usage.
I know I can access the matrix via vector addressing, which means converting from [2:3, 2] to c(5, 6), but that mapping presumes knowledge of matrix shape. What if I simply want [2:3, 2] for any matrix shape (assuming it is at least those dimensions)?
Here are some alternatives. They both generalize to higher dimenional arrays.
1) matrix subscripting If the indexes are all scalar except possibly one, as in the question, then:
mi <- cbind(2:3, 2)
mat[mi]
# test
identical(mat[mi], mat[2:3, 2])
## [1] TRUE
In higher dimensions:
a <- array(1:24, 2:4)
mi <- cbind(2, 2:3, 3)
a[mi]
# test
identical(a[mi], a[2, 2:3, 3])
## [1] TRUE
It would be possible to extend this to eliminate the scalar restriction using:
L <- list(2:3, 2:3)
array(mat[as.matrix(do.call(expand.grid, L))], lengths(L))
however, in light of (2) which also uses do.call but avoids the need for expand.grid it seems unnecessarily complex.
2) do.call This approach does not have the scalar limitation. mat and a are from above:
L2 <- list(2:3, 1:2)
do.call("[", c(list(mat), L2))
# test
identical(do.call("[", c(list(mat), L2)), mat[2:3, 1:2])
## [1] TRUE
L3 <- list(2, 2:3, 3:4)
do.call("[", c(list(a), L3))
# test
identical(do.call("[", c(list(a), L3)), a[2, 2:3, 3:4])
## [1] TRUE
This could be made prettier by defining:
`%[%` <- function(x, indexList) do.call("[", c(list(x), indexList))
mat %[% list(2:3, 1:2)
a %[% list(2, 2:3, 3:4)
Use which argument arr.ind = TRUE.
x <- c(25, 36)
inx <- which(mat == x, arr.ind = TRUE)
Warning message:
In mat == x :
longer object length is not a multiple of shorter object length
mat[inx]
#[1] 25 36
This is an interesting question. The subset function can actually help. You cannot subset directly your matrix using a vector or a list, but you can store the indexes in a list and use subset to do the trick.
mat <- matrix(1:12, nrow=4)
mat[2:3, 1:2]
# example using subset
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2)
# double check
identical(mat[2:3, 1:2],
subset(mat, subset = 1:nrow(mat) %in% 2:3, select = 1:2))
# TRUE
Actually, we can write a custom function if we want to store the row- and column- indexes in the same list.
cust.subset <- function(mat, dim.list){
subset(mat, subset = 1:nrow(mat) %in% dim.list[[1]], select = dim.list[[2]])
}
# initialize a list that includes your sub-setting indexes
sbdim <- list(2:3, 1:2)
sbdim
# [[1]]
# [1] 2 3
# [[2]]
# [1] 1 2
# subset using your custom f(x) and your list
cust.subset(mat, sbdim)
# [,1] [,2]
# [1,] 2 6
# [2,] 3 7
I have a list of elements say:
l <- c("x","ya1","xb3","yb3","ab","xc3","y","xa1","yd4")
Out of this list I would like to make a list of the matching x,y pairs, i.e.
(("xa1" "ya1") ("xb3" "yb3") ("x" "y"))
In essence, I need to capture the X elements, the Y elements and then pair them up:
I know how to do the X,Y extraction part:
xelems <- grep("^x", l, perl=TRUE, value=TRUE)
yelems <- grep("^y", l, perl=TRUE, value=TRUE)
An X element pairs up with a Y element when
1. xElem == yElem # if xElem and yElem are one char long, i.e. 'x' and 'y'
2. substr(xElem,1,nchar(xElem)) == substr(yElem,1,nchar(yElem))
There is no order, i.e. matching xElem and yElem can be positioned anywhere.
I am however not very sure about the next part. I am more familiar with the SKILL programming language (SKILL is a LISP derivative) and this is how I write it:
procedure( get_xy_pairs(inputList "l")
let(( yElem (xyPairs nil) xList yList)
xList=setof(i inputList rexMatchp("^x" i))
yList=setof(i inputList rexMatchp("^y" i))
when(xList && yList
unless(length(xList)==length(yList)
warn("xList and yList mismatch : %d vs %d\n" length(xList) length(yList))
)
foreach(xElem xList
if(xElem=="x"
then yElem="y"
else yElem=strcat("y" substring(xElem 2 strlen(xElem)))
)
if(member(yElem yList)
then xyPairs=cons(list(xElem yElem) xyPairs)
else warn("x element %s has no matching y element \n" xElem)
)
)
)
xyPairs
)
)
When run on l, this would return
get_xy_pairs(l)
*WARNING* x element xc3 has no matching y element
(("xa1" "ya1") ("xb3" "yb3") ("x" "y"))
As I am still new to R, I would appreciate if you folks can help. Besides, I do understand the R folks tend to avoid for loops and are more into lapply ?
Maybe something like this would work. (Only tested on your sample data.)
## Remove any item not starting with x or y
l2 <- l[grepl("^x|^y", l)]
## Split into a list of items starting with x
## and items starting with y
L <- split(l2, grepl("^x", l2))
## Give "names" to the "starting with y" group
names(L[[1]]) <- gsub("^y", "x", L[[1]])
## Use match to match the names in the y group with
## the values from the x group. This results in a
## nice named vector with the pairs you want
Matches <- L[[1]][match(L[[2]], names(L[[1]]), nomatch=0)]
Matches
# x xb3 xa1
# "y" "yb3" "ya1"
As a data.frame:
MatchesDF <- data.frame(x = names(Matches), y = unname(Matches))
MatchesDF
# x y
# 1 x y
# 2 xb3 yb3
# 3 xa1 ya1
I would store tuples in a list, i.e:
xypairs
[[1]]
[1] "x" "y"
[[2]]
[2] "xb3" "yb3"
Your procedure can be simplified with match and substring.
xends <- substring(xelems, 2)
yends <- substring(yelems, 2)
ypaired <- match(xends, yends) # Indices of yelems that match xelems
# Now we need to handle the no-matches:
xsorted <- c(xelems, rep(NA, sum(is.na(ypaired))))
ysorted <- yelems[ypaired]
ysorted <- c(ysorted, yelems[!(yelems %in% ysorted)])
# Now we create the list of tuples:
xypairs <- lapply(1:length(ysorted), function(i) {
c(xsorted[i], ysorted[i])
})
Result:
xypairs
[[1]]
[1] "x" "y"
[[2]]
[1] "xb3" "yb3"
[[3]]
[1] "xc3" NA
[[4]]
[1] "xa1" "ya1"
[[5]]
[1] NA "yd4"
Is there a way to implement list comprehension in R?
Like python:
sum([x for x in range(1000) if x % 3== 0 or x % 5== 0])
same in Haskell:
sum [x| x<-[1..1000-1], x`mod` 3 ==0 || x `mod` 5 ==0 ]
What's the practical way to apply this in R?
Nick
Something like this?
l <- 1:1000
sum(l[l %% 3 == 0 | l %% 5 == 0])
Yes, list comprehension is possible in R:
sum((1:1000)[(1:1000 %% 3) == 0 | (1:1000 %% 5) == 0])
And, (kind of) the for-comprehension of scala:
for(i in {x <- 1:100;x[x%%2 == 0]})print(i)
This is many years later but there are three list comprehension packages now on CRAN. Each has slightly different syntax. In alphabetical order:
library(comprehenr)
sum(to_vec(for(x in 1:1000) if (x %% 3 == 0 | x %% 5 == 0) x))
## [1] 234168
library(eList)
Sum(for(x in 1:1000) if (x %% 3 == 0 | x %% 5 == 0) x else 0)
## [1] 234168
library(listcompr)
sum(gen.vector(x, x = 1:1000, x %% 3 == 0 | x %% 5 == 0))
## [1] 234168
In addition the following is on github only.
# devtools::install.github("mailund/lc")
library(lc)
sum(unlist(lc(x, x = seq(1000), x %% 3 == 0 | x %% 5 == 0)))
## [1] 234168
The foreach package by Revolution Analytics gives us a handy interface to list comprehensions in R. https://www.r-bloggers.com/list-comprehensions-in-r/
Example
Return numbers from the list which are not equal as tuple:
Python
list_a = [1, 2, 3]
list_b = [2, 7]
different_num = [(a, b) for a in list_a for b in list_b if a != b]
print(different_num)
# Output:
[(1, 2), (1, 7), (2, 7), (3, 2), (3, 7)]
R
require(foreach)
list_a = c(1, 2, 3)
list_b = c(2, 7)
different_num <- foreach(a=list_a ,.combine = c ) %:% foreach(b=list_b) %:% when(a!=b) %do% c(a,b)
print(different_num)
# Output:
[[1]]
[1] 1 2
[[2]]
[1] 1 7
[[3]]
[1] 2 7
[[4]]
[1] 3 2
[[5]]
[1] 3 7
EDIT:
The foreach package is very slow for certain tasks.
A faster list comprehension implementation is given at List comprehensions for R
. <<- structure(NA, class="comprehension")
comprehend <- function(expr, vars, seqs, guard, comprehension=list()){
if(length(vars)==0){ # base case of recursion
if(eval(guard)) comprehension[[length(comprehension)+1]] <- eval(expr)
} else {
for(elt in eval(seqs[[1]])){
assign(vars[1], elt, inherits=TRUE)
comprehension <- comprehend(expr, vars[-1], seqs[-1], guard,
comprehension)
}
}
comprehension
}
## List comprehensions specified by close approximation to set-builder notation:
##
## { x+y | 0<x<9, 0<y<x, x*y<30 } ---> .[ x+y ~ {x<-0:9; y<-0:x} | x*y<30 ]
##
"[.comprehension" <- function(x, f,rectangularizing=T){
f <- substitute(f)
## First, we pluck out the optional guard, if it is present:
if(is.call(f) && is.call(f[[3]]) && f[[3]][[1]]=='|'){
guard <- f[[3]][[3]]
f[[3]] <- f[[3]][[2]]
} else {
guard <- TRUE
}
## To allow omission of braces around a lone comprehension generator,
## as in 'expr ~ var <- seq' we make allowances for two shapes of f:
##
## (1) (`<-` (`~` expr
## var)
## seq)
## and
##
## (2) (`~` expr
## (`{` (`<-` var1 seq1)
## (`<-` var2 seq2)
## ...
## (`<-` varN <- seqN)))
##
## In the former case, we set gens <- list(var <- seq), unifying the
## treatment of both shapes under the latter, more general one.
syntax.error <- "Comprehension expects 'expr ~ {x1 <- seq1; ... ; xN <- seqN}'."
if(!is.call(f) || (f[[1]]!='<-' && f[[1]]!='~'))
stop(syntax.error)
if(is(f,'<-')){ # (1)
lhs <- f[[2]]
if(!is.call(lhs) || lhs[[1]] != '~')
stop(syntax.error)
expr <- lhs[[2]]
var <- as.character(lhs[[3]])
seq <- f[[3]]
gens <- list(call('<-', var, seq))
} else { # (2)
expr <- f[[2]]
gens <- as.list(f[[3]])[-1]
if(any(lapply(gens, class) != '<-'))
stop(syntax.error)
}
## Fill list comprehension .LC
vars <- as.character(lapply(gens, function(g) g[[2]]))
seqs <- lapply(gens, function(g) g[[3]])
.LC <- comprehend(expr, vars, seqs, guard)
## Provided the result is rectangular, convert it to a vector or array
if(!rectangularizing) return(.LC)
tryCatch({
if(!length(.LC))
return(.LC)
dim1 <- dim(.LC[[1]])
if(is.null(dim1)){
lengths <- sapply(.LC, length)
if(all(lengths == lengths[1])){ # rectangular
.LC <- unlist(.LC)
if(lengths[1] > 1) # matrix
dim(.LC) <- c(lengths[1], length(lengths))
} else { # ragged
# leave .LC as a list
}
} else { # elements of .LC have dimension
dim <- c(dim1, length(.LC))
.LC <- unlist(.LC)
dim(.LC) <- dim
}
return(.LC)
}, error = function(err) {
return(.LC)
})
}
This implementation is faster then foreach, it allows nested comprehension, multiple parameters and parameters scoping.
N <- list(10,20)
.[.[c(x,y,z)~{x <- 2:n;y <- x:n;z <- y:n} | {x^2+y^2==z^2 & z<15}]~{n <- N}]
[[1]]
[[1]][[1]]
[1] 3 4 5
[[1]][[2]]
[1] 6 8 10
[[2]]
[[2]][[1]]
[1] 3 4 5
[[2]][[2]]
[1] 5 12 13
[[2]][[3]]
[1] 6 8 10
Another way
sum(l<-(1:1000)[l %% 3 == 0 | l %% 5 == 0])
I hope it's okay to self-promote my package listcompr which implements a list comprehension syntax for R.
The example from the question can be solved in the following way:
library(listcompr)
sum(gen.vector(x, x = 1:1000, x %% 3 == 0 || x %% 5 == 0))
## Returns: 234168
As listcompr does a row-wise (and not a vector-vise) evaluation of the conditions, it makes no difference if || or | is used a logical operator. It accepts arbitrary many arguments: First, a base expression which is transformed into the list or vector entries. Next, arbitrary many arguments which specify the variable ranges and the conditions.
More examples can be found on the readme page on the github repository of listcompr: https://github.com/patrickroocks/listcompr
For a strict mapping from Python to R, this might be the most direct equivalence:
Python:
sum([x for x in range(1000) if x % 3== 0 or x % 5== 0])
R:
sum((x <- 0:999)[x %% 3 == 0 | x %% 5 == 0])
One important difference: the R version works like Python 2 where the x variable is globally scoped outside of the expression. (I call it an "expression" here since R does not have the notion of "list comprehension".) In Python 3, the iterator is restricted to the local scope of the list comprehension. In other words:
In R (as in Python 2), the x variable persists after the expression. If it existed before the expression, then its value is changed to the final value of the expression.
In Python 3, the x variable exists only within the list comprehension. If there was an x variable created before the list comprehension, the list comprehension does not change it at all.
This list comprehension of the form:
[item for item in list if test]
is pretty straightforward with boolean indexing in R. But for more complex expressions, like implementing vector rescaling (I know this can be done with scales package too), in Python it's easy:
x = [1, 3, 5, 7, 9, 11] # -> [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
[(xi - min(x))/(max(x) - min(x)) for xi in x]
But in R this is the best I could come up with. Would love to know if there's something better:
sapply(x, function(xi, mn, mx) {(xi-mn)/(mx-mn)}, mn = min(x), mx = max(x))
You could convert a sequence of random numbers to a binary sequence as follows:
x=runif(1000)
y=NULL
for (i in x){if (i>.5){y<-c(y,1)}else{y=c(y,-1)}}
this could be generalized to operate on any list to another list based on:
x = [item for item in x if test == True]
where the test could use the else statement to not append the list y.
For the problem at hand:
x <- 0:999
y <- NULL
for (i in x){ if (i %% 3 == 0 | i %% 5 == 0){ y <- c(y, i) }}
sum( y )
Say I have a list:
> fs
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
[1] 61.90298 58.29699 54.90104 51.70293 48.69110
I want to "reverse fill" the rest of the list by using it's values. Example:
The [[3]] should have the function value of [[4]] pairs:
c( myFunction(fs[[4]][1], fs[[4]][2]), myFunction(fs[[4]][2], fs[[4]][3]), .... )
The [[2]] should have myFunction values of [[3]] etc...
I hope that's clear. What's the right way to do it? For loops? *applys? My last attempt, which leaves 1-3 empty:
n = length(fs)
for (i in rev(1:(n-1)))
child_fs = fs[[i+1]]
res = c()
for (j in 1:(i+1))
up = v(child_fs[j])
do = v(child_fs[j+1])
this_f = myFunction(up, do)
res[j] = this_f
fs[[i]] = res
Make fs easily reproducible
fs <- list(NULL, NULL, NULL, c(61.90298, 58.29699, 54.90104, 51.70293, 48.69110))
To be able to show an example, make a trivial myFunction
myFunction <- function(a, b) {a + b}
You can loop over all but the last positions in fs (in reverse order), and compute each. Just call myFunciton with the vectors which are the next higher position's vectors without the last and without the first element.
for (i in rev(seq_along(fs))[-1]) {
fs[[i]] <- myFunction(head(fs[[i+1]], -1), tail(fs[[i+1]], -1))
}
That assumes myFunction is vectorized (given vectors for inputs, will give a vector for output). If it isn't, you can easily make a version which is.
myFunction <- function(a, b) {a[[1]] + b[[1]]}
for (i in rev(seq_along(fs))[-1]) {
fs[[i]] <- Vectorize(myFunction)(head(fs[[i+1]], -1), tail(fs[[i+1]], -1))
}
In either case, you get
> fs
[[1]]
[1] 453.2 426.8
[[2]]
[1] 233.398 219.802 206.998
[[3]]
[1] 120.200 113.198 106.604 100.394
[[4]]
[1] 61.90298 58.29699 54.90104 51.70293 48.69110
Really, what you have is a starting point
start <- c(61.90298, 58.29699, 54.90104, 51.70293, 48.69110)
a function you want to apply (I made this one up which adds 1 everywhere and deletes the last element)
myFunction <- function(x) head(x + 1, -1L)
and the number of times you want to apply the function (recursively):
n <- 3L
So I would write a function to apply the function n times recursively, then reverse the output list:
apply.n.times <- function(fun, n, x)
if (n == 0L) list(x) else c(list(x), Recall(fun, n - 1L, fun(x)))
rev(apply.n.times(myFunction, n, start))
# [[1]]
# [1] 64.90298 61.29699
#
# [[2]]
# [1] 63.90298 60.29699 56.90104
#
# [[3]]
# [1] 62.90298 59.29699 55.90104 52.70293
#
# [[4]]
# [1] 61.90298 58.29699 54.90104 51.70293 48.69110
Here is a one-line solution (if myFunction can be replaced with something like sum, or in this case rowSums):
Reduce( function(x,y) rowSums( embed(y,2) ), fs, right=TRUE, accumulate=TRUE )
If myFunction needs to accept 2 values and do something with them then this can be expanded a bit to:
Reduce( function(x,y) apply( embed(y,2), 1, function(z) myFunction(z[1],z[2]) ),
fs, right=TRUE, accumulate=TRUE )