Related
I have a two argument function that takes as its first input a triple of pairs of numbers in the form "(a, b)(c, d)(e, f)" (as a character string) and as second argument a pair of numbers (also written as a character string of the form "(a, b)") and outputs a logical that states if the pair (the second argument) is one of the three pairs in the triple (the first argument). I actually wrote two versions:
version1 <- function(x, y){#x is a triple of pairs, y is a pair
pairsfromthistriple <- paste(c("", "(", "("), strsplit(x, split = ")(", fixed = T)[[1]], c(")", ")", ""), sep = "")
y %in% pairsfromthistriple
}
version2 <- function(x, y){#x is triple of pairs, y is pair
y == substr(x, 1, 6) | y == substr(x, 7, 12) | y == substr(x, 13, 18)
}
I want to set this function loose for every triple-of-pairs from a vector of triples an every pair from some vector of pairs using outer. For here I'll us the following very short vectors:
triples <- c("(1, 2)(3, 4)(5, 6)", "(1, 2)(3, 5)(4, 6)")
names(triples) <- triples
pairs <- c("(5, 6)", "(3, 5)")
names(pairs) <- pairs
So here we go:
test1 <- outer(X = triples, Y = pairs, FUN = version1)
test2 <- outer(X = triples, Y = pairs, FUN = version2)
test2 evaluates to exactly what you expect, but test1 gives a non-sensical output:
> test1
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) TRUE FALSE
> test2
(5, 6) (3, 5)
(1, 2)(3, 4)(5, 6) TRUE FALSE
(1, 2)(3, 5)(4, 6) FALSE TRUE
The natural conclusion is that there is an error in version1, but it is not as simple as that. 'Manually' computing the terms in the matrix using version1 gives:
> version1(triples[1], pairs[1])
[1] TRUE
> version1(triples[1], pairs[2])
[1] FALSE
> version1(triples[2], pairs[1])
[1] FALSE
> version1(triples[2], pairs[2])
[1] TRUE
exactly as it should! So at least part of the fault is with the function outer. In fact what happens (in this small example it is not so clear, but this is very visible in larger examples) is that outer correctly computes the first row of its output matrix, but then copies this first row over and over to make up the subsequent rows. Obviously this is not what I want. If I only wanted to compute version1(x, y) for all y in some vector but just one single x, I would have used sapply rather than outer.
What is going on here?
Note this detail from the documentation for ?outer:
X and Y must be suitable arguments for FUN. Each will be extended by rep to length the products of the lengths of X and Y before FUN is called.
FUN is called with these two extended vectors as arguments (plus any arguments in ...). It must be a vectorized function (or the name of one) expecting at least two arguments and returning a value with the same length as the first (and the second).
Your version1 function is not vectorized properly like version2 is. You can see this by simply testing it on the original triples and pairs vectors, which should both match.
version1(triples, pairs)
#> [1] TRUE FALSE
version2(triples, pairs)
#> (5, 6) (3, 5)
#> TRUE TRUE
Your version1 function seems designed for use with apply(), because you retrieve a list from strsplit() but then just take the first element. If you want to maintain the approach of splitting the vector, then you would have to use the apply family of functions. Without using them, you are going to expand the triples or x vector into something much longer than y and you can't do element wise comparison.
However, I would just use something very simple. stringr::str_detect is already vectorized for string and pattern, so you can just use that directly.
library(stringr)
outer(X = triples, Y = pairs, FUN = str_detect)
#> (5, 6) (3, 5)
#> (1, 2)(3, 4)(5, 6) TRUE FALSE
#> (1, 2)(3, 5)(4, 6) FALSE TRUE
I have an array x and I would like to repeat each entry of x a number of times specified by the corresponding entries of another array y, of the same length of x.
x = [1, 2, 3, 4, 5] # Array to be repeated
y = [3, 2, 1, 2, 3] # Repetitions for each element of x
# result should be [1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]
Is there a way to do this in Julia?
Your x and y vectors constitute what is called a run-length encoding of the vector [1, 1, 1, 2, 2, 3, 4, 4, 5, 5, 5]. So if you take the inverse of the run-length encoding, you will get the vector you are looking for. The StatsBase.jl package contains the rle and inverse_rle functions. We can use inverse_rle like this:
julia> using StatsBase
julia> x = [1, 2, 3, 4, 5];
julia> y = [3, 2, 1, 2, 3];
julia> inverse_rle(x, y)
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
You've given what I would have suggested as the answer already in your comment:
vcat(fill.(x, y)...)
How does this work? Start with fill:
help?> fill
fill(x, dims::Tuple)
fill(x, dims...)
Create an array filled with the value x. For example, fill(1.0, (5,5)) returns a 5×5 array of floats, with each element initialized to 1.0.
This is a bit more complicated than it needs to be for our case (where we only have one dimension to fill into), so let's look at a simple example:
julia> fill(1, 3)
3-element Vector{Int64}:
1
1
1
so fill(1, 3) just means "take the number one, and put this number into a one-dimensional array 3 times."
This of course is exactly what we want to do here: for every element in x, we want an array that holds this element multiple times, with the multiple given by the corresponding element in y. We could therefore loop over x and y and do something like:
julia> for (xᵢ, yᵢ) ∈ zip(x, y)
fill(xᵢ, yᵢ)
end
Now this loop doesn't return anything, so we'd have to preallocate some storage and assign to that within the loop. A more concise way of writing this while automatically returning an object would be a comprehension:
julia> [fill(xᵢ, yᵢ) for (xᵢ, yᵢ) ∈ zip(x, y)]
5-element Vector{Vector{Int64}}:
[1, 1, 1]
[2, 2]
[3]
[4, 4]
[5, 5, 5]
and even more concisely, we can just use broadcasting:
julia> fill.(x, y)
5-element Vector{Vector{Int64}}:
[1, 1, 1]
[2, 2]
[3]
[4, 4]
[5, 5, 5]
so from the comprehension or the broadcast we are getting a vector of vectors, each vector being an element of x repeated y times. Now all that remains is to put these together into a single vector by concatenating them vertically:
julia> vcat(fill.(x, y)...)
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
Here we are using splatting to essentially do:
z = fill.(x, y)
vcat(z[1], z[2], z[3], z[4], z[5])
Note that splatting can have suboptimal performance for arrays of variable length, so a better way is to use reduce which is special cased for this and will give the same result:
reduce(vcat, fill.(x, y))
If performance is a priority, you can also do it the long, manual way:
function runlengthdecode(vals::Vector{T}, reps::Vector{<:Integer}) where T
length(vals) == length(reps) || throw(ArgumentError("Same number of values and counts expected"))
result = Vector{T}(undef, sum(reps))
resind = 1
for (valind, numrep) in enumerate(reps)
for i in 1:numrep
#inbounds result[resind] = vals[valind]
resind += 1
end
end
result
end
This runs about 12 times faster than the vcat/fill based method for the given data, likely because of avoiding creating all the intermediate filled vectors.
You can also instead use fill! on the preallocated result's #views, by replacing the loop in above code with:
for (val, numrep) in zip(vals, reps)
fill!(#view(result[resind:resind + numrep - 1]), val)
resind += numrep
end
which has comparable performance.
Also, for completeness, a comprehension can be quite handy for this. And it's faster than fill and vcat.
julia> [x[i] for i=1:length(x) for j=1:y[i]]
11-element Vector{Int64}:
1
1
1
2
2
3
4
4
5
5
5
Can I pass a custom compare function to order that, given two items, indicates which one is ranked higher?
In my specific case I have the following list.
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
If we take two vectors a and b, the index of the first element i at which a[i] > b[i] or a[i] < b[i] should determine what vector comes first. In this example, scores[['d']] > scores[['a']] because scores[['d']][2] > scores[['a']][2] (note that it doesn't matter that scores[['d']][5] < scores[['a']][5]).
Comparing two of those vectors could look something like this.
compare <- function(a, b) {
# get first element index at which vectors differ
i <- which.max(a != b)
if(a[i] > b[i])
1
else if(a[i] < b[i])
-1
else
0
}
The sorted keys of scores by using this comparison function should then be d, b, a, c.
From other solutions I've found, they mess with the data before ordering or introduce S3 classes and apply comparison attributes. With the former I fail to see how to mess with my data (maybe turn it into strings? But then what about numbers above 9?), with the latter I feel uncomfortable introducing a new class into my R package only for comparing vectors. And there doesn't seem to be a sort of comparator parameter I'd want to pass to order.
Here's an attempt. I've explained every step in the comments.
compare <- function(a, b) {
# subtract vector a from vector b
comparison <- a - b
# get the first non-zero result
restult <- comparison[comparison != 0][1]
# return 1 if result == 1 and 2 if result == -1 (0 if equal)
if(is.na(restult)) {return(0)} else if(restult == 1) {return(1)} else {return(2)}
}
compare_list <- function(list_) {
# get combinations of all possible comparison
comparisons <- combn(length(list_), 2)
# compare all possibilities
results <- apply(comparisons, 2, function(x) {
# get the "winner"
x[compare(list_[[x[1]]], list_[[x[2]]])]
})
# get frequency table (how often a vector "won" -> this is the result you want)
fr_tab <- table(results)
# vector that is last in comparison
last_vector <- which(!(1:length(list_) %in% as.numeric(names(fr_tab))))
# return the sorted results and add the last vectors name
c(as.numeric(names(sort(fr_tab, decreasing = T))), last_vector)
}
If you run the function on your example, the result is
> compare_list(scores)
[1] 4 2 1 3
I haven't dealt with the case that the two vectors are identical, you haven't explained how to deal with this.
The native R way to do this is to introduce an S3 class.
There are two things you can do with the class. You can define a method for xtfrm that converts your list entries to numbers. That could be vectorized, and conceivably could be really fast.
But you were asking for a user defined compare function. This is going to be slow because R function calls are slow, and it's a little clumsy because nobody does it. But following the instructions in the xtfrm help page, here's how to do it:
scores <- list(
'a' = c(1, 1, 2, 3, 4, 4),
'b' = c(1, 2, 2, 2, 3, 4),
'c' = c(1, 1, 2, 2, 3, 4),
'd' = c(1, 2, 3, 3, 3, 4)
)
# Add a class to the list
scores <- structure(scores, class = "lexico")
# Need to keep the class when subsetting
`[.lexico` <- function(x, i, ...) structure(unclass(x)[i], class = "lexico")
# Careful here: identical() might be too strict
`==.lexico` <- function(a, b) {identical(a, b)}
`>.lexico` <- function(a, b) {
a <- a[[1]]
b <- b[[1]]
i <- which(a != b)
length(i) > 0 && a[i[1]] > b[i[1]]
}
is.na.lexico <- function(a) FALSE
sort(scores)
#> $c
#> [1] 1 1 2 2 3 4
#>
#> $a
#> [1] 1 1 2 3 4 4
#>
#> $b
#> [1] 1 2 2 2 3 4
#>
#> $d
#> [1] 1 2 3 3 3 4
#>
#> attr(,"class")
#> [1] "lexico"
Created on 2021-11-27 by the reprex package (v2.0.1)
This is the opposite of the order you asked for, because by default sort() sorts to increasing order. If you really want d, b, a, c use sort(scores, decreasing = TRUE.
Here's another, very simple solution:
sort(sapply(scores, function(x) as.numeric(paste(x, collapse = ""))), decreasing = T)
What it does is, it takes all the the vectors, "compresses" them into a single numerical digit and then sorts those numbers in decreasing order.
I have a list of vectors in the vector space Q with a dimension of 5, which I want to order in a list and use Combinations(list, 4) to get all sublists with 4 elements. I then want to
check how many of those sublists are linear independent in the Vector Space with V.linear dependence(vs) == [].
I'm running into an error when running my code:
V = VectorSpace(QQ,5)
V.list = ([2, 2, 2,-3,-3],[2, 2,-3,2,-3],[2,2,-3,-3,2],[2,-3,2,2,-3],[2,-3,2,-3,2],[2,-3,-3,2,2],[-3,2,2,2,-3],[-3,2,2,-3,2],[-3,2,-3,2,2],[-3,-3,2,2,2])
C = Combinations(list, 4)
V.linear_dependence(C) == []
"ValueError: vector [[2, 2, 2, -3, -3], [2, 2, -3, 2, -3], [2, 2, -3, -3, 2], [2, -3, 2, 2, -3]] is not an element of Vector space of dimension 5 over Rational Field"
Anyone got any clues as to what im missing?
You are asking it to just take a list (or actually, tuple) and put it in the vector space, but I think Sage doesn't do that automatically. Try this.
V = VectorSpace(QQ,5)
list = ([2, 2, 2,-3,-3],[2, 2,-3,2,-3],[2,2,-3,-3,2],[2,-3,2,2,-3],[2,-3,2,-3,2],[2,-3,-3,2,2],[-3,2,2,2,-3],[-3,2,2,-3,2],[-3,2,-3,2,2],[-3,-3,2,2,2])
C = Combinations(list, 4)
for c in C:
if V.linear_dependence([V(x) for x in c]) == []: print c
The reason for a double list is that neither of these things are inherently in a vector space.
A slight modification to this, replacing print c with z+=1 (having predefined z=0) says that 185 of your 210 combinations appear to be linearly independent.
By the way, comparing to the empty list might not be as efficient as other options.
My question is about getting rid of a for loop while retaining the functionality of the code.
I have a matrix of pairwise orderings of elements A_1, A_2, ... A_N. Each ordering is represented as a row of a matrix. The code below shows an example.
# Matrix representing the relations
# A1 < A2, A1 < A5, A2 < A4
(mat <- matrix(c(1, 2, 1, 5, 2, 4), ncol = 2, byrow = TRUE))
#> [,1] [,2]
#> [1,] 1 2
#> [2,] 1 5
#> [3,] 2 4
I want this whole matrix as a set of ordered pairs. The reason is that I later need to generate the transitive closure of these relations. I have been using the sets package and created the function below.
create_sets <- function(mat){
# Empty set
my_set <- sets::set()
# For loop for adding pair elements to the set, one at a time
for(i in seq(from = 1, to = nrow(mat), by = 1)){
my_set <- sets::set_union(my_set,
sets::pair(mat[[i, 1]], mat[[i, 2]]))
}
return(my_set)
}
create_sets(mat)
#> {(1, 2), (1, 5), (2, 4)}
This function works well, but I believe the for loop is unnecessary, and am not capable of replacing it. For the particular example matrix above with exactly three rows, I could instead have used to following code:
my_set2 <- sets::set(
sets::pair(mat[[1, 1]], mat[[1, 2]]),
sets::pair(mat[[2, 1]], mat[[2, 2]]),
sets::pair(mat[[3, 1]], mat[[3, 2]])
)
my_set2
#> {(1, 2), (1, 5), (2, 4)}
The reason why this works, is that sets::set takes any number of pairs.
args(sets::set)
#> function (...)
#> NULL
However, the matrix mat will have an arbitrary number of rows, and I want the function to be able to handle all possible cases. This is why I have not been able to get rid of the for loop.
My question is hence: Given a matrix mat in which each row represents an ordered pair, is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
The OP has asked
[...] is there some generic way of passing the pairs in each row as separate arguments to sets::set, without looping?
Yes, the do.call() function is probably what you are looking for. From help(do.call):
do.call constructs and executes a function call from a name or a function and a list of arguments to be passed to it.
So, OP's create_sets() function can be replaced by
do.call(sets::set, apply(mat, 1, function(x) sets::pair(x[1], x[2])))
{(1, 2), (1, 5), (2, 4)}
The second argument to do.call() requires a list. This is created by
apply(mat, 1, function(x) sets::pair(x[1], x[2]))
which returns the list
[[1]]
(1, 2)
[[2]]
(1, 5)
[[3]]
(2, 4)
apply(mat, 1, FUN) is a kind of implied for loop which loops over the rows of a matrix mat and takes the vector of row values as argument when calling function FUN.
Edit: as.tuple() instead of pair()
The pair() function requires exactly two arguments. This is why we were forced to define an anonymous function function(x) sets::pair(x[1], x[2]).
The as.tuple() function coerces the elements of an object into elements of a set. So, the code can be even more simplified :
do.call(sets::set, apply(mat, 1, sets::as.tuple))
{(1, 2), (1, 5), (2, 4)}
Here, as.tuple() takes the whole vector of row values and coerces it to a set.
Option 1: do nothing
for loops aren't always the end of the world, this doesn't look too bad if your matrices aren't enormous.
Option 2: the split, apply, combine way (by way of a new function)
Write a function that combines the row things (there is a shorter way to do this, but this makes your task explicit)
f <- function(x) {
sets::pair(x[1], x[2])
}
Reduce(sets::set_union, lapply(split(mat, 1:nrow(mat)), f))
## {(1, 2), (1, 5), (2, 4)}
The Reduce does the same thing as the for loop (repeatedly apply set_union), and the lapply turns the matrix into a list of pairs (also like a for loop would)