Split a tensor in torch - torch

Given an input tensor of size n x 2A x B x C, how to split it into two tensors, each of size n x A x B x C? Essentially, n is the batch size.

You can use torch.split:
torch.split(input_tensor, split_size_or_sections=A, dim=1)

I think you could do something like:
tensor_a = torch.Tensor(n, 2A, B,C)
-- Initialize tensor_a with the data
tensor_b = torch.Tensor(n, A, B, C)
tensor_b = tensor_a[{{},1,{},{}}]
tensor_c = torch.Tensor(n, A, B, C)
tensor_c = tensor_a[{{},2,{},{}}]

Related

How Pytorch do row normalization for each matrix in a 3D Tensor(Variable)?

If I have a 3D Tensor (Variable) with size [a,b,c].
consider it as a b*c matrix, and I hope that all these a matrix got row normalized.
You can use the normalize function.
import torch.nn.functional as f
f.normalize(input, p=2, dim=2)
The dim=2 argument tells along which dimension to normalize (divide each row vector by its p-norm.
The following should work.
import torch
import torch.nn.functional as f
a, b, c = 10, 20, 30
t = torch.rand(a, b, c)
g = f.normalize(t.view(t.size(0), t.size(1) * t.size(2)), p=1, dim=1)
print(g.sum(1)) # it confirms the normalization
g = g.view(*t.size())
print(g) # get the normalized output vector of shape axbxc
To normalize a matrix in such a way that the sum of each row is 1, simply divide by the sum of each row:
import torch
a, b, c = 10, 20, 30
t = torch.rand(a, b, c)
t = t / (torch.sum(t, 2).unsqueeze(-1))
print(t.sum(2))

Check if each number in a vector is between some numbers in another vector in R

Say I have two vectors, A and B. A has 15 variables and B has 28 variables.
A = c(13,14,29,31,32,39,42,51,59,61,68,91,102,109,120)
B = c(26,26,28,29,30,30,33,38,41,42,45,46,47,47,49,49,80,81,86,86,90,90,92,100,101,105,105,107)
I want a 14 by 27 matrix, Z, where a i by j entry is 1 if (B_j,B_{j+1}] overlaps with (A_i, A_{i+1}].
For instance, the (3,4) entry of Z would be 1 since (29,31] and (29,30] overlap, with 30 as a common number. Is there a fast way to compute this?
I have the following code:
Z = matrix(0, length(A)-1, length(B)-1)
for (i in 1:(length(A)-1)){
nn = which(B > A[i] & B <= A[(i+1)])
if (length(nn)>0){
Z[i,(nn-1)] = 1}}
It works well but my A and B vector often contain 30,000+ elements and it is incredibly slow. Making of the matrix Z even takes unnecessarily long time. Can anyone help with this?
Ideally, there is a vectorized solution to this or a well written function from a package that can do this like cutting a cake.
Here's an option using matrix multiplication. As commented the matrix can get big, and you'll have to see if the speed improvement is worth it:
res1 <- outer(A, B, FUN = function(A, B){B > A})
res2 <- outer(A, B, FUN = function(A, B){B <= A})
dim(res1); dim(res2)
res3 <- (res1[-nrow(res1),] + res2[-1,]) == 2
image(res3)
dim(res3)
op <- par(mfcol=c(1,2))
image(Z, main="Z")
image(res3, main="res3")
par(op)
If closed Intervals [B_j,B_{j+1}] and [A_i, A_{i+1}] are ok for you as well you could use
A <- as.integer(c(13,14,29,31,32,39,42,51,59,61,68,91,102,109,120))
B <- as.integer(c(26,26,28,29,30,30,33,38,41,42,45,46,47,47,49,49,80,81,86,86,90,90,92,100,101,105,105,107))
DT_A <- data.table(A0 = A, A1 = shift(A, type = "lead"), key=c("A0", "A1"))[-length(A)]
DT_B <- data.table(B0 = B, B1 = shift(B, type = "lead"), key=c("B0", "B1"))[-length(B)]
ind_true <- foverlaps(DT_A, DT_B, type="any", mult="all", which=TRUE)[!is.na(yid)]
mat <- matrix(0, length(A)-1, length(B)-1)
mat[ind_true$xid, ind_true$yid] = 1
This answer uses matrix indexing and relies on expand.grid though there are much faster implementations of it. You lag your vectors to create matrices of A and B, then with a function that does simple boolean check, we can index into the matrices with an expanded grid. Then it returns a matrix.
overlap = function(id,x1,x2){
idA = id[,1]
idB = id[,2]
o = (x1[idA,1] >= x2[idB,1] & x1[idA,1] <= x2[idB,2]) | (x1[idA,2] >= x2[idB,1] & x1[idA,2] <= x2[idB,2]) |
(x1[idA,1] <= x2[idB,1] & x1[idA,2] >= x2[idB,1]) | (x1[idA,1] <= x2[idB,2] & x1[idA,2] >= x2[idB,2])
matrix(o,nrow=nrow(x1))
}
A = c(13,14,29,31,32,39,42,51,59,61,68,91,102,109,120)
nA = cbind(lag(A),A)[-1,]
B = c(26,26,28,29,30,30,33,38,41,42,45,46,47,47,49,49,80,81,86,86,90,90,92,100,101,105,105,107)
nB = cbind(lag(B),B)[-1,]
expand.grid.jc <- function(seq1,seq2) {
cbind(Var1 = rep.int(seq1, length(seq2)),
Var2 = rep.int(seq2, rep.int(length(seq1),length(seq2))))
}
ids = expand.grid.jc(1:nrow(nA),1:nrow(nB))
overlap(ids,nA,nB)

Extract rows / columns of a matrix into separate variables

The following question came up in my course yesterday:
Suppose I have a matrix M = rand(3, 10) that comes out of a calculation, e.g. an ODE solver.
In Python, you can do
x, y, z = M
to extract the rows of M into the three variables, e.g. for plotting with matplotlib.
In Julia we could do
M = M' # transpose
x = M[:, 1]
y = M[:, 2]
z = M[:, 3]
Is there a nicer way to do this extraction?
It would be nice to be able to write at least (approaching Python)
x, y, z = columns(M)
or
x, y, z = rows(M)
One way would be
columns(M) = [ M[:,i] for i in 1:size(M, 2) ]
but this will make an expensive copy of all the data.
To avoid this would we need a new iterator type, ColumnIterator, that returns slices? Would this be useful for anything other than using this nice syntax?
columns(M) = [ slice(M,:,i) for i in 1:size(M, 2) ]
and
columns(M) = [ sub(M,:,i) for i in 1:size(M, 2) ]
They both return a view, but slice drops all dimensions indexed with
scalars.
A nice alternative that I have just found if M is a Vector of Vectors (instead of a matrix) is using zip:
julia> M = Vector{Int}[[1,2,3],[4,5,6]]
2-element Array{Array{Int64,1},1}:
[1,2,3]
[4,5,6]
julia> a, b, c = zip(M...)
Base.Zip2{Array{Int64,1},Array{Int64,1}}([1,2,3],[4,5,6])
julia> a, b, c
((1,4),(2,5),(3,6))

do.call(), multiple parameters

I have a function with many arguments:
fun(A,B,C,D,E)
Now I want to assign fixed value a,b,c,d to A,B,C,D and assign E a list of 1 : 7
I want to use do.call() as below, but it doesn't work.
a <- do.call(function(x) fun(A = a, B = b, C = c, D = d, E = x), list(1:7))
I turn to lapply, and it works,
a <- lapply(c(1:7), function(x) fun(A = a, B = b, C = c, D = d, E = x))
As Joshua Ulrich's answer, when I try
a `<- do.call(fun, list(A = a, B = b, C = c, D = d, E = list(1:7)))`
it says
(list) object cannot be coerced to type 'double'
So I guess fun needs a double value for E, but do.call() doesn't give the values one by one, but a list.
I don't want to use lapply because it returns a list of list, which, if I want to point at a special list, I have to use [[]], and only single value is allowed in [[]], and I cannot use a vector to point at, e.g. [[a]],with a <- c(1:7).
How to make do.call() work?
That should be:
a <- do.call(fun, list(A = a, B = b, C = c, D = d, E = list(1:7)))
And I have a feeling you want E to be a vector, not a list, in which case it should be:
a <- do.call(fun, list(A = a, B = b, C = c, D = d, E = 1:7))

Priority/Decision Based Choice of Row

I have a data.frame that has a number of duplicate rows, akin to something like this:
con <- textConnection(Lines <- "
First, Last, Address, Address 2, Email, Custom1, Custom2, Custom3
A, B, C, D, F#G.com,1,2,3
A, B, C, D, F#G.com,1,2,2
A, B, C, D, F#G.com,1,2,1
")
x <- read.csv(con)
close(con)
Now, when I de-duplicate, in the following manner:
x <- x[!duplicated(x[,c("email")]),]
Could you recommend a method for prioritizing those rows that contain Custom3=1? Or is there a better mechanism for de-duplication?
Try sorting before finding duplicates:
x <- x[order(x[,c("Custom3")]),]
x <- x[!duplicated(x[,c("email")]),]

Resources