Efficient way to copy a matrix except for one column - julia

Consider a matrix where you don't need the third column:
X = zeros(Int64, (4, 3));
X[:, 1] = [0, 0, 1, 1];
X[:, 2] = [1, 2, 1, 2];
julia> X
4×3 Matrix{Int64}:
0 1 0
0 2 0
1 1 0
1 2 0
So you want to select (copy) everything except column 3:
4×2 Matrix{Int64}:
0 1
0 2
1 1
1 2
Is there a shorthand way to express this?
These work, but feel impractical when you have a large number of columns:
X[:, [1, 2]]
X[:, sort(collect(setdiff(Set([1, 2, 3]), Set([3]))))]

There are plenty of ways to do this. Below is a solution in which you express which ranges of column numbers to include:
X = zeros(Int64, (8, 3));
X[:, 1] = [0, 0, 0, 0, 1, 1, 1, 1];
X[:, 2] = [1, 1, 2, 2, 1, 1, 2, 2];
return X[:,1:2] #Columns 1 through 2 are being directly included.
Alternatively, you could express which you would like to exclude, which is perhaps a more widely useful version of the code:
return X[:, 1:end .!= 3] #column number 3 is being directly excluded.
Both of which would return:
8×2 Matrix{Int64}:
0 1
0 1
0 2
0 2
1 1
1 1
1 2
1 2

If it is some column in the middle you can get perhaps get most elegant code by using InvertedIndices. (This also gets loaded by other packages such as DataFrames).:
julia> A = collect(reshape(1:16,4,4))
4×4 Matrix{Int64}:
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
julia> A[:, Not(3)]
4×3 Matrix{Int64}:
1 5 13
2 6 14
3 7 15
4 8 16

Related

Generate all combinations of items with two values in Julia?

I have m items. Each item is a pair of two values. For example, for m=4, I have the matrix:
julia> valid_pairs = [0 1;
1 2;
1 2;
2 3];
I would like to generate all combinations of the four items where each item i can take only the values in valid_pairs[i, :]. Based on the previous example, I would like to have:
julia> all_combs
4x16 Array{Int,2}
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2
1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2
2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3
I feel like this can be done easily using Combinatorics.jl.
Though I used Combinatorics.jl, what I did was the following:
using Combinatorics
m = 4
combs = combinations(1:m) |> collect
L = length(combs)
all_combs = zeros(Int, m, L+1)
for j in 1:L
for i in 1:m
if !in(i, combs[j])
all_combs[i, j] = valid_pairs[i, 1]
else
all_combs[i, j] = valid_pairs[i, 2]
end
end
end
all_combs[:, end] = valid_pairs[:, 1]
Not the same order, but
julia> [collect(x) for x in Iterators.product(eachrow(valid_pairs)...)]
2×2×2×2 Array{Array{Int64,1},4}:
[:, :, 1, 1] =
[0, 1, 1, 2] [0, 2, 1, 2]
[1, 1, 1, 2] [1, 2, 1, 2]
[:, :, 2, 1] =
[0, 1, 2, 2] [0, 2, 2, 2]
[1, 1, 2, 2] [1, 2, 2, 2]
[:, :, 1, 2] =
[0, 1, 1, 3] [0, 2, 1, 3]
[1, 1, 1, 3] [1, 2, 1, 3]
[:, :, 2, 2] =
[0, 1, 2, 3] [0, 2, 2, 3]
[1, 1, 2, 3] [1, 2, 2, 3]
should do. If you really want a matrix (2D array), then you can hcat the previous answer, or directly do
julia> reduce(hcat, collect(x) for x in Iterators.product(eachrow(valid_pairs)...))
4×16 Array{Int64,2}:
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2
1 1 1 1 2 2 2 2 1 1 1 1 2 2 2 2
2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3
EDIT: side note, I would define the pairs as tuples to clarify what's happening, so something like
valid_pairs = [(0,1), (1,2), (1,2), (2,3)]
and I would not create the 2D (or 4D, or m-D) array, but, instead, do
comb_pairs = Iterators.product(valid_pairs...)
which then gives you a lazy version of all the pair combinations, so that you can iterate on it without actually creating it first, which should be more efficient (and looks cleaner) I think.

Assignement in matrixes using double indexes

I can't figure out how to obtain this behavior:
From this matrix:
julia> a = [1 1 1; 1 1 1; 1 1 2]
3×3 Array{Int64,2}:
1 1 1
1 1 1
1 1 2
I want to change all the 1s to 5s but only in the last row.
What I did is a[3, :][a[3, :] .== 1] .= 5 but the value of a isn't changed.
I've noticed that with:
foo[foo .== 1] .= 5
a[3, :] = foo
It works, but I'm trying to reduce allocations and this should be removed.
Thanks in advance
You can use #view and replace!:
julia> a = [1 1 1
1 1 1
1 1 2]
3×3 Array{Int64,2}:
1 1 1
1 1 1
1 1 2
julia> replace!(#view(a[end, :]), 1 => 5)
3-element view(::Array{Int64,2}, 3, :) with eltype Int64:
5
5
2
julia> a
3×3 Array{Int64,2}:
1 1 1
1 1 1
5 5 2
The problem is
a[3, :][a[3, :] .== 1] .= 5
is the same as getindex(a, 3, :)[a[3, :] .== 1] .=5
getindex returns a copy of that part of a
You are mutating the copy, not the original a
You want to use a view
view(a, 3, :)[a[3, :] .== 1] .=5
You can also do this with the #view or #views macro.

How to insert elements in a vector at regular intervals in R

Is there any alternative method of R for the problem explained here: How to insert elements in a vector at regular intervals in Matlab
Namely, from a vector x <- c(1,2,3,4,5,6,7,8,9,10,11,12), I want to obtain a vector y given by
y <- c(0, 1, 2, 3,
0, 4, 5, 6,
0, 7, 8, 9,
0,10,11,12)
... I found the following page,... maybe duplicate
R: insert elements into vector (a variation)
Edit I slighly modified the answer of #jay.sf . I think his interval.length is not our intuitive interval length.
x <- 1:16
interval.length <- 2
co_interval.length <- length(x)/interval.length
as.vector(t(cbind(0, matrix(x, co_interval.length, byrow=T))))
[1] 0 1 2 0 3 4 0 5 6 0 7 8 0 9 10 0 11 12 0 13 14 0 15 16
You could make a matrix and coerce it into a vector.
interval.length <- 4
as.vector(t(cbind(0, matrix(x, interval.length, byrow=T))))
# [1] 0 1 2 3 0 4 5 6 0 7 8 9 0 10 11 12
Another way is to make use of arithmetical indexing:
y <- numeric(16)
y[x + 1 + (x - 1) %/% 3] <- x
y
#> [1] 0 1 2 3 0 4 5 6 0 7 8 9 0 10 11 12

Generate new column in dataframe, based on group-event in nested groups

I have a dataframe with three "main"-groups (x: 1, 2, 3), three groups within the main-groups (v: 2, 3 or 1) and some events within the main-groups (0 and 1 in y):
x <- c(1, 1, 1, 2, 2, 3, 3, 3, 3)
v <- c(2, 3, 3, 2, 2, 1, 1, 2, 2)
y <- c(0, 0, 1, 0, 0, 0, 0, 0, 1)
df <- data.frame(x, v, y)
df
> df
x v y
1 1 2 0
2 1 3 0
3 1 3 1
4 2 2 0
5 2 2 0
6 3 1 0
7 3 1 0
8 3 2 0
9 3 2 1
For example: In group 1 (x = 1) there are two more groups (v = 2 and v = 3), event y = 1 happens in group x = 1 and v = 3.
Now i want to generate a new column z, based on the events in y: if there is any y = 1 in one group, all cases in group v in x should get a 1 for z; else NA. How can z be generated this way? df should look like:
> df
x v y z
1 1 2 0 NA
2 1 3 0 1
3 1 3 1 1
4 2 2 0 NA
5 2 2 0 NA
6 3 1 0 1
7 3 1 1 1
8 3 2 0 NA
9 3 2 0 NA
I am grateful for any help.
df %>% group_by(x, v) %>% mutate(z = if(any(y == 1)) 1 else NA)
After grouping by x and y, the new column z is filled with 1's if there are any 1's in y and with NA's otherwise.
Try this:
library(dplyr)
df %>%
group_by(x, v) %>%
mutate(
z = ifelse(any(y == 1), 1, NA)
)

Removing rows/columns with only one element from a binary matrix

I'm trying to remove "singletons" from a binary matrix. Here, singletons refers to elements that are the only "1" value in the row AND the column in which they appear. For example, given the following matrix:
> matrix(c(0,1,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0,0,1,1), nrow=6)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0 1 0 0 0 0 0
[2,] 1 0 1 0 0 0 0
[3,] 0 0 0 1 0 0 0
[4,] 1 1 0 0 0 0 0
[5,] 0 0 0 0 1 1 1
[6,] 0 0 0 0 1 0 1
...I would like to remove all of row 3 (and, if possible, all of column 4), because the 1 in [3,4] is the only 1 in that row/column combination. [1,2] is fine, since there are other 1's in column [,2]; similarly, [2,3] is fine, since there are other 1's in row [2,]. Any help would be appreciated - thanks!
You first want to find which rows and columns are singletons and then check if there are pairs of singletons rows and columns that share an index. Here is a short bit of code to accomplish this task:
foo <- matrix(c(0,1,0,...))
singRows <- which(rowSums(foo) == 1)
singCols <- which(colSums(foo) == 1)
singCombinations <- expand.grid(singRows, singCols)
singPairs <- singCombinations[apply(singCombinations, 1,
function(x) which(foo[x[1],] == 1) == x[2]),]
noSingFoo <- foo[-unique(singPairs[,1]), -unique(singPairs[,2])]
With many sinlgeton ros or columns you might need to make this a bit more efficient, but it does the job.
UPDATE: Here is the more efficient version I knew could be done. This way you loop only over the rows (or columns if desired) and not all combinations. Thus it is much more efficient for matrices with many singleton rows/columns.
## starting with foo and singRows as before
singPairRows <- singRows[sapply(singRows, function(singRow)
sum(foo[,foo[singRow,] == 1]) == 1)]
singPairs <- sapply(singPairRows, function(singRow)
c(singRow, which(foo[singRow,] == 1)))
noSingFoo <- foo[-singPairs[1,], -singPairs[2,]]
UPDATE 2: I have compared the two methods (mine=nonsparse and #Chris's=sparse) using the rbenchmark package. I have used a range of matrix sizes (from 10 to 1000 rows/columns; square matrices only) and levels of sparsity (from 0.1 to 5 non-zero entries per row/column). The relative level of performance is shown in the heat map below. Equal performance (log2 ratio of run times) is designated by white, faster with sparse method is red and faster with non-sparse method is blue. Note that I am not including the conversion to a sparse matrix in the performance calculation, so that will add some time to the sparse method. Just thought it was worth a little effort to see where this boundary was.
cr1msonB1ade's way is a great answer. For more computationally intensive matrices (millions x millions), you can use this method:
Encode your matrix in sparse notation:
DT <- structure(list(i = c(1, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6), j = c(2,
1, 3, 4, 1, 2, 5, 6, 7, 5, 7), val = c(1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1)), .Names = c("i", "j", "val"), row.names = c(NA, -11L
), class = "data.frame")
Gives (0s are implicit)
> DT
i j val
1 1 2 1
2 2 1 1
3 2 3 1
4 3 4 1
5 4 1 1
6 4 2 1
7 5 5 1
8 5 6 1
9 5 7 1
10 6 5 1
11 6 7 1
Then we can filter using:
DT <- data.table(DT)
DT[, rowcount := .N, by = i]
DT[, colcount := .N, by = j]
Giving:
>DT[!(rowcount*colcount == 1)]
i j val rowcount colcount
1: 1 2 1 1 2
2: 2 1 1 2 2
3: 2 3 1 2 1
4: 4 1 1 2 2
5: 4 2 1 2 2
6: 5 5 1 3 2
7: 5 6 1 3 1
8: 5 7 1 3 2
9: 6 5 1 2 2
10: 6 7 1 2 2
(Note the (3,4) row is now missing)

Resources