Apply 1 channel mask to 3 channel Tensor in tensorflow - mask

I'm trying to apply a mask (binary, only one channel) to an RGB image (3 channels, normalized to [0, 1]). My current solution is, that I split the RGB image into it's channels, multiply it with the mask and concatenate these channels again:
with tf.variable_scope('apply_mask') as scope:
# Output mask is in range [-1, 1], bring to range [0, 1] first
zero_one_mask = (output_mask + 1) / 2
# Apply mask to all channels.
channels = tf.split(3, 3, output_img)
channels = [tf.mul(c, zero_one_mask) for c in channels]
output_img = tf.concat(3, channels)
However, this seems pretty inefficient, especially since, to my understanding, none of these computations are done in-place. Is there a more efficient way for doing this?

The tf.mul() operator supports numpy-style broadcasting, which would allow you to simplify and optimize the code slightly.
Let's say that zero_one_mask is an m x n tensor, and output_img is a b x m x n x 3 (where b is the batch size - I'm inferring this from the fact that you split output_img on dimension 3)*. You can use tf.expand_dims() to make zero_one_mask broadcastable to channels, by reshaping it to be an m x n x 1 tensor:
with tf.variable_scope('apply_mask') as scope:
# Output mask is in range [-1, 1], bring to range [0, 1] first
# NOTE: Assumes `output_mask` is a 2-D `m x n` tensor.
zero_one_mask = tf.expand_dims((output_mask + 1) / 2, 2)
# Apply mask to all channels.
# NOTE: Assumes `output_img` is a 4-D `b x m x n x c` tensor.
output_img = tf.mul(output_img, zero_one_mask)
(* This would work equally if output_img were a 4-D b x m x n x c (for any number of channels c) or 3-D m x n x c tensor, due to the way broadcasting works.)

Related

Creating an array with values that obey a criteria of smallest distance to the mean

I am trying to build credible bands in Julia, however, there is a technical procedure that I am not aware of how to do. The code is the following:
#Significance level 95%
alpha_sign=0.05
#Genrate random values
N_1 = 100
Fs_1 = Array{Float64}(undef, length(x), N_1);
x = 0.0:0.01:1.0
for k in 1:100
f = rand(postΠ)
Fs_1[:,k] = f.(x)
end
# sup|theta_i(t)-average|
dif_b=Array{Float64}(undef, length(x),N);
for k in 1:100
dif_b[:,k] = Fs_1[:,k]-average_across
end
#Defining a function that allows to compute the n smallest values
using Base.Sort
function smallestn(a, n)
sort(a; alg=Sort.PartialQuickSort(n))[1:n]
end
#Compute the maximum of the difference across time
sup_b=Array{Float64}(undef, N_1)
for k in 1:100
sup_b[k]=(maximum(abs.(dif_b[:,k] )))
end
#Build a matrix with the smallest distances
N_min=(1-alpha_sign)*N
using Base.Sort
min_sup_b=smallestn(sup_b,95)
To simplify the problem I am creating this example:
Imagine I have the matrix down there and I want to create a matrix with the values that are closest to the mean. I am able to compute the distances and store into a vector as displayed in the code above and later get the smallest values but I need to get back to the original matrix to extract those values.
X=[1,2,7,4,5]
av_X=mean(X,dims=1)
Question:
I am able to compute the distances and store them into a vector as displayed in the code above and later get the smallest values but I need to get back to the original matrix to extract those values.
How do I do that?
Thanks in advance!
using Statistics
arr = rand(1:20, (4, 4))
colmeans = [mean(col) for col in eachcol(arr)]
deltas = map(cart -> abs(arr[cart] - colmeans[first(Tuple(cart))]) => cart, CartesianIndices(arr))
sorteddeltas = sort(deltas, lt = (x, y) -> first(x) < first(y), dims=1)
sarr = zeros(Int, (4, 4))
for (i, d) in enumerate(sorteddeltas)
sarr[i] = arr[last(d)]
end
println(arr) # [7 1 2 15; 18 7 14 10; 3 11 10 13; 7 14 20 8]
println(colmeans) # [8.75, 8.25, 11.5, 11.5]
println(sarr) # [7 11 10 13; 7 7 14 10; 3 14 2 8; 18 1 20 15]
println(sarr') # [7 7 3 18; 11 7 14 1; 10 14 2 20; 13 10 8 15]
This should give you a sorted list of pairs of the distances from the mean of each column, with the second part of the pair the Cartesian coordinates of the original matrix.
sarr is the original matrix sorted column-major by closeness to the mean for each column.
I think the function you are looking for is findmin(). It gives both the minimum value and its index.
julia> x = randn(5)
5-element Vector{Float64}:
-0.025159738348978562
-0.24720173332739662
-0.32508319212563325
0.9470582053428686
1.1467087893336048
julia> findmin(x)
(-0.32508319212563325, 3)
If you want to do this for every column in a matrix, you can do something like:
julia> X = randn(3, 5)
3×5 Matrix{Float64}:
1.06405 1.03267 -0.826687 -1.68299 0.00319586
-0.129021 0.0615327 0.0756477 1.05258 0.525504
0.569748 -0.0877886 -1.48372 0.823895 0.319364
julia> min_inds = [findmin(X[:, i]) for i = 1:5]
5-element Vector{Tuple{Float64, Int64}}:
(-0.12902069012799203, 2)
(-0.08778864856976668, 3)
(-1.4837211369655696, 3)
(-1.6829919363620507, 1)
(0.003195860366775878, 1)

Generate a random subset of the powerset directly

It is easy to generate a random subset of the powerset if we are able to compute all elements of the powerset first and then randomly draw a sample out of it:
set.seed(12)
x = 1:4
n.samples = 3
library(HapEstXXR)
power.set = HapEstXXR::powerset(x)
sample(power.set, size = n.samples, replace = FALSE)
# [[1]]
# [1] 2
#
# [[2]]
# [1] 3 4
#
# [[3]]
# [1] 1 3 4
However, if the length of x is large, there will be too many elements for the powerset. I am therefore looking for a way to directly compute a random subset.
One possibility is to first draw a "random length" and then draw random subset of x using the "random length":
len = sample(1:length(x), size = n.samples, replace = TRUE)
len
# [1] 2 1 1
lapply(len, function(l) sort(sample(x, size = l)))
# [[1]]
# [1] 1 2
#
# [[2]]
# [1] 1
#
# [[3]]
# [1] 1
This, however, generates duplicates. Of course, I could now remove the duplicates and repeat the previous sampling using a while loop until I end up with n.samples non-duplicate random subsets of the powerset:
drawSubsetOfPowerset = function(x, n) {
ret = list()
while(length(ret) < n) {
# draw a "random length" with some meaningful prob to reduce number of loops
len = sample(0:n, size = n, replace = TRUE, prob = choose(n, 0:n)/2^n)
# draw random subset of x using the "random length" and sort it to better identify duplicates
random.subset = lapply(len, function(l) sort(sample(x, size = l)))
# remove duplicates
ret = unique(c(ret, random.subset))
}
return(ret)
}
drawSubsetOfPowerset(x, n.samples)
Of course, I could now try to optimize several components of my drawSubsetOfPowerset function, e.g. (1) trying to avoid the copying of the object ret in each iteration of the loop, (2) using a faster sort, (3) using a faster way to remove duplicates of the list, ...
My question is: Is there maybe a different way (which is more efficient) of doing this?
How about using binary representation? This way we can generate a random subset of integers from the length of the total number of power sets given by 2^length(v). From there we can make use of intToBits along with indexing to guarantee we generate random unique subsets of the power set in an ordered fashion.
randomSubsetOfPowSet <- function(v, n, mySeed) {
set.seed(mySeed)
lapply(sample(2^length(v), n) - 1, function(x) v[intToBits(x) > 0])
}
Taking x = 1:4, n.samples = 5, and a random seed of 42, we have:
randomSubsetOfPowSet(1:4, 5, 42)
[[1]]
[1] 2 3 4
[[2]]
[1] 1 2 3 4
[[3]]
[1] 3
[[4]]
[1] 2 4
[[5]]
[1] 1 2 3
Explanation
What does binary representation have to do with power sets?
It turns out that given a set, we can find all subsets by turning to bits (yes, 0s and 1s). By viewing the elements in a subset as on elements in the original set and the elements not in that subset as off, we now have a very tangible way of thinking about how to generate each subset. Observe:
Original set: {a, b, c, d}
| | | |
V V V V b & d
Existence in subset: 1/0 1/0 1/0 1/0 are on
/ \
/ \
| |
V V
Example subset: {b, d} gets mapped to {0, 1, 0, 1}
| \ \ \_______
| | \__ \
| |___ \____ \____
| | | |
V V V V
Thus, {b, d} is mapped to the integer 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3 = 10
This is now a problem of combinations of bits of length n. If you map this out for every subset of A = {a, b, c, d}, you will obtain 0:15. Therefore, to obtain a random subset of the power set of A, we simply generate a random subset of 0:15 and map each integer to a subset of A. How might we do this?
sample comes to mind.
Now, it is very easy to go the other way as well (i.e. from an integer to a subset of our original set)
Observe:
Given the integer 10 and set A given above (i.e. {a, b, c, d}) we have:
10 in bits is -->> {0, 1, 0, 1}
Which indices are greater than 0?
Answer: 2 and 4
Taking the 2nd the 4th element of our set gives: {b, d} et Voila!

R - How to get row & column subscripts of matched elements from a distance matrix

I have an integer vector vec1 and I am generating a distant matrix using dist function. I want to get the coordinates (row and column) of element of certain value in the distance matrix. Essentially I would like to get the pair of elements that are d-distant apart. For example:
vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)
# 1 2 3 4
#2 1
#3 4 3
#4 10 9 6
#5 15 14 11 5
Say, I am interested in pair of elements in the vector that are 5 unit apart. I wanted to get the coordinate1 which are the rows and coordinate2 which are the columns of the distance matrix. In this toy example, I would expect
coord1
# [1] 5
coord2
# [1] 4
I am wondering if there is an efficient way to get these values that doesn't involve converting the dist object to a matrix or looping through the matrix?
A distance matrix is a lower triangular matrix in packed format, where the lower triangular is stored as a 1D vector by column. You can check this via
str(distMatrix)
# Class 'dist' atomic [1:10] 1 4 10 15 3 9 14 6 11 5
# ...
Even if we call dist(vec1, diag = TRUE, upper = TRUE), the vector is still the same; only the printing styles changes. That is, no matter how you call dist, you always get a vector.
This answer focus on how to transform between 1D and 2D index, so that you can work with a "dist" object without first making it a complete matrix using as.matrix. If you do want to make it a matrix, use the dist2mat function defined in as.matrix on a distance object is extremely slow; how to make it faster?.
R functions
It is easy to write vectorized R functions for those index transforms. We only need some care dealing with "out-of-bound" index, for which NA should be returned.
## 2D index to 1D index
f <- function (i, j, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (i >= 1) & (j >= 1) & (i > j) & (i <= n) & (j <= n)
k <- (2 * n - j) * (j - 1) / 2 + (i - j)
k[!valid] <- NA_real_
k
}
## 1D index to 2D index
finv <- function (k, dist_obj) {
if (!inherits(dist_obj, "dist")) stop("please provide a 'dist' object")
n <- attr(dist_obj, "Size")
valid <- (k >= 1) & (k <= n * (n - 1) / 2)
k_valid <- k[valid]
j <- rep.int(NA_real_, length(k))
j[valid] <- floor(((2 * n + 1) - sqrt((2 * n - 1) ^ 2 - 8 * (k_valid - 1))) / 2)
i <- j + k - (2 * n - j) * (j - 1) / 2
cbind(i, j)
}
These functions are extremely cheap in memory usage, as they work with index instead of matrices.
Applying finv to your question
You can use
vec1 <- c(2,3,6,12,17)
distMatrix <- dist(vec1)
finv(which(distMatrix == 5), distMatrix)
# i j
#[1,] 5 4
Generally speaking, a distance matrix contains floating point numbers. It is risky to use == to judge whether two floating point numbers are equal. Read Why are these numbers not equal? for more and possible strategies.
Alternative with dist2mat
Using the dist2mat function given in as.matrix on a distance object is extremely slow; how to make it faster?, we may use which(, arr.ind = TRUE).
library(Rcpp)
sourceCpp("dist2mat.cpp")
mat <- dist2mat(distMatrix, 128)
which(mat == 5, arr.ind = TRUE)
# row col
#5 5 4
#4 4 5
Appendix: Markdown (needs MathJax support) for the picture
## 2D index to 1D index
The lower triangular looks like this: $$\begin{pmatrix} 0 & 0 & \cdots & 0\\ \times & 0 & \cdots & 0\\ \times & \times & \cdots & 0\\ \vdots & \vdots & \ddots & 0\\ \times & \times & \cdots & 0\end{pmatrix}$$ If the matrix is $n \times n$, then there are $(n - 1)$ elements ("$\times$") in the 1st column, and $(n - j)$ elements in the j<sup>th</sup> column. Thus, for element $(i,\ j)$ (with $i > j$, $j < n$) in the lower triangular, there are $$(n - 1) + \cdots (n - (j - 1)) = \frac{(2n - j)(j - 1)}{2}$$ "$\times$" in the previous $(j - 1)$ columns, and it is the $(i - j)$<sup>th</sup> "$\times$" in the $j$<sup>th</sup> column. So it is the $$\left\{\frac{(2n - j)(j - 1)}{2} + (i - j)\right\}^{\textit{th}}$$ "$\times$" in the lower triangular.
----
## 1D index to 2D index
Now for the $k$<sup>th</sup> "$\times$" in the lower triangular, how can we find its matrix index $(i,\ j)$? We take two steps: 1> find $j$; 2> obtain $i$ from $k$ and $j$.
The first "$\times$" of the $j$<sup>th</sup> column, i.e., $(j + 1,\ j)$, is the $\left\{\frac{(2n - j)(j - 1)}{2} + 1\right\}^{\textit{th}}$ "$\times$" of the lower triangular, thus $j$ is the maximum value such that $\frac{(2n - j)(j - 1)}{2} + 1 \leq k$. This is equivalent to finding the max $j$ so that $$j^2 - (2n + 1)j + 2(k + n - 1) \geq 0.$$ The LHS is a quadratic polynomial, and it is easy to see that the solution is the integer no larger than its first root (i.e., the root on the left side): $$j = \left\lfloor\frac{(2n + 1) - \sqrt{(2n-1)^2 - 8(k-1)}}{2}\right\rfloor.$$ Then $i$ can be obtained from $$i = j + k - \left\{\frac{(2n - j)(j - 1)}{2}\right\}.$$
If the vector is not too large, the best way is probably to wrap the output of dist into as.matrix and to use which with the option arr.ind=TRUE. The only disadvantage of this standard method to retrieve the index numbers within a dist matrix is an increase of memory usage, which may become important in the case of very large vectors passed to dist. This is because the conversion of the lower triangular matrix returned by dist into a regular, dense matrix effectively doubles the amount of stored data.
An alternative consists in converting the dist object into a list, such that each column in the lower triangular matrix of dist represents one member of the list. The index number of the list members and the position of the elements within the list members can then be mapped to the column and row number of the dense N x N matrix, without generating the matrix.
Here is one possible implementation of this list-based approach:
distToList <- function(x) {
idx <- sum(seq(length(x) - 1)) - rev(cumsum(seq(length(x) - 1))) + 1
listDist <- unname(split(dist(x), cumsum(seq_along(dist(x)) %in% idx)))
# http://stackoverflow.com/a/16358095/4770166
}
findDistPairs <- function(vec, theDist) {
listDist <- distToList(vec)
inList <- lapply(listDist, is.element, theDist)
matchedCols <- which(sapply(inList, sum) > 0)
if (length(matchedCols) > 0) found <- TRUE else found <- FALSE
if (found) {
matchedRows <- sapply(matchedCols, function(x) which(inList[[x]]) + x )
} else {matchedRows <- integer(length = 0)}
matches <- cbind(col=rep(matchedCols, sapply(matchedRows,length)),
row=unlist(matchedRows))
return(matches)
}
vec1 <- c(2, 3, 6, 12, 17)
findDistPairs(vec1, 5)
# col row
#[1,] 4 5
The parts of the code that might be somewhat unclear concern the mapping of the position of an entry within the list to a column / row value of the N x N matrix. While not trivial, these transformations are straightforward.
In a comment within the code I have pointed out an answer on StackOverflow which has been used here to split a vector into a list. The loops (sapply, lapply) should be unproblematic in terms of performance since their range is of order O(N). The memory usage of this code is largely determined by the storage of the list. This amount of memory should be similar to that of the dist object since both objects contain the same data.
The dist object is calculated and transformed into a list in the function distToList(). Because of the dist calculation, which is required in any case, this function could be time-consuming in the case of large vectors. If the goal is to find several pairs with different distance values, then it may be better to calculate listDist only once for a given vector and to store the resulting list, e.g., in the global environment.
Long story short
The usual way to treat such problems is simple and fast:
distMatrix <- as.matrix(dist(vec1)) * lower.tri(diag(vec1))
which(distMatrix == 5, arr.ind = TRUE)
# row col
#5 5 4
I suggest using this method by default. More complicated solutions may become necessary in situations where memory limits are reached, i.e., in the case of very large vectors vec1. The list-based approach described above could then provide a remedy.

Is there a general algorithm to identify a numeric series?

I am looking for a general purpose algorithm to identify short numeric series from lists with a max length of a few hundred numbers. This will be used to identify series of masses from mass spectrometry (ms1) data.
For instance, given the following list, I would like to identify that 3 of these numbers fit the series N + 1, N +2, etc.
426.24 <= N
427.24 <= N + 1/x
371.10
428.24 <= N + 2/x
851.47
451.16
The series are all of the format: N, N+1/x, N+2/x, N+3/x, N+4/x, etc, where x is an integer (in the example x=1). I think this constraint makes the problem very tractable. Any suggestions for a quick/efficient way to tackle this in R?
This routine will generate series using x from 1 to 10 (you could increase it). And will check how many are contained in the original list of numbers.
N = c(426.24,427.24,371.1,428.24,851.24,451.16)
N0 = N[1]
x = list(1,2,3,4,5,6,7,8,9,10)
L = 20
Series = lapply(x, function(x){seq(from = N0, by = 1/x,length.out = L)})
countCoincidences = lapply(Series, function(x){sum(x %in% N)})
Result:
unlist(countCoincidences)
[1] 3 3 3 3 3 3 3 3 3 2
As you can see, using x = 1 will have 3 coincidences. The same goes for all x until x=9. Here you have to decide which x is the one you want.
Since you're looking for an arithmetic sequence, the difference k is constant. Thus, you can loop over the vector and subtract each value from the sequence. If you have a sequence, subtracting the second term from the vector will result in values of -k, 0, and k, so you can find the sequence by looking for matches between vector - value and its opposite, value - vector:
x <- c(426.24, 427.24, 371.1, 428.24, 851.47, 451.16)
unique(lapply(x, function(y){
s <- (x - y) %in% (y - x);
if(sum(s) > 1){x[s]}
}))
# [[1]]
# NULL
#
# [[2]]
# [1] 426.24 427.24 428.24

recursive map dependent of two vectors

Basically;
a<-c(1,2,1,2)
b<-c(1,2,3,4)
I seek a function that returns a vector c with c[n]=b[n]+b[n-1] if a[n] even or b[n]+2b[n-1] otherwise.
Is there anything easier than a brute force for-loop? Some sort of advanced "Reduce" or equivalent.
x <- c(0, b[-length(b)]) # shifted b, 0 for first element
c <- ifelse((a %% 2) == 0, b + x, b + 2*x)
Be careful, length of a should be equal to length of b.

Resources