How to mask specific rows elements of matrix in Julia? - julia

I have a matrix A with A[:,1] as Bus_id. So Bus_id are 1,3,4, and 6. For processing, I equated Bus_id's to consecutive row indexing, see A_new matrix.
julia> A=[1 1 3;3 1 1; 4 1 7;6 1 1]
4×3 Array{Int64,2}:
1 1 3
3 1 1
4 1 7
6 1 1
julia> A_new
1 1 1
2 1 1
3 1 1
4 1 1
Now, I have another matrix B, which has some elements of matrix A. I wish to convert B matrix's bus_ids to b_new. I don't know how to explain this problem.
julia> B= [3 1 1; 4 1 7]
2×3 Array{Int64,2}:
3 1 1
6 1 1
julia> B_new
2 1 1
4 1 7
I have tried masking by it works only for one element.
Please help me find a way.

It is possible that you are using Bus_id as an index. If you want to renumber the business ID's, but not lose track of transactions indexed with the original business id's, what you want to do fits naturally into a Dict that translates the Bus_id from one to another.
One problem that immediately arises is what should happen if some of the entries in B have no translation from A, but are already set to a number that is in A's new key? Potential cross-linked database chaos! Instead, the new ids need to be unique if at all possible! I suggest making them negative.
If you use matrix A as your key to translation (and assuming that all entries in A[:,1] are unique--if not the logic might need to drop duplicates first) the dict usage then looks like this:
A = [1 1 3; 3 1 1; 4 1 7; 6 1 1]
B = [3 1 1; 6 1 1]
function consecutive_row_indexing(mat)
dict = Dict{Int, Int}()
for (i,n) in enumerate(mat[:,1])
dict[n] = -i
end
dict
end
function renumberbus_ids!(mat, dict)
for i in 1:size(mat)[1]
if haskey(dict, mat[i,1])
mat[i,1] = dict[mat[i,1]]
end
end
mat
end
d = consecutive_row_indexing(A)
println(renumberbus_ids!(A, d))
println(renumberbus_ids!(B, d))
output: <code>
[-1 1 3; -2 1 1; -3 1 7; -4 1 1]
[-2 1 1; -4 1 1]
If you still really want your B matrix with positive integers for its index column, just replace = -i with = i on the seventh line of the code above.

Related

Get 2d index from flattened array

Given the array:
arr = [1 2; 3 4; 5 6]
3×2 Array{Int64,2}:
1 2
3 4
5 6
which is flattened flat_arr = collect(Iterators.flatten(arr))
6-element Array{Int64,1}:
1
3
5
2
4
6
I sometimes need to go between both index formats. For example, if I got the sorted indices of flat_arr, I may want to iterate over arr using these sorted indices. In Python, this is typically done with np.unravel_index. How is this done in Julia? Do I just need to write my own function?
vec() creates a 1-d view of the array. Hence you can have both pointers to the array in the memory and use whichever one you need in any minute (they point to the same array):
julia> arr = [1 2; 3 4; 5 6]
3×2 Array{Int64,2}:
1 2
3 4
5 6
julia> arr1d = vec(arr)
6-element Array{Int64,1}:
1
3
5
2
4
6
julia> arr1d[4] = 99
99
julia> arr
3×2 Array{Int64,2}:
1 99
3 4
5 6
Note that in Julia arrays are stored in column major order and hence the fourth value is the first value in the second column
This can be accomplished using CartesianIndices.
c_i = CartesianIndices(arr)
flat_arr[2] == arr[c_i[2]]) == 3

I want to create 2D array with 5 rows by 1 column

If I want to create 2D array with 1 row by 5 columns.
I could do this
julia> a = [1 2 3 4 5]
1×5 Array{Int64,2}:
1 2 3 4 5
But to create 2D array with 5 rows by 1 column. I have tried
julia> b = [1; 2; 3; 4; 5]
5-element Array{Int64,1}:
1
2
3
4
5
But I got back a 1D array which is NOT what I wanted
The only way to get it to work is
julia> b=reshape([1 2 3 4 5],5,1)
5×1 Array{Int64,2}:
1
2
3
4
5
Perhaps I am missing some crucial information here.
You could also do a = [1 2 3 4 5]'.
On a side note, for Julia versions > 0.6 the type of a wouldn't be Array{Int64, 2} but a LinearAlgebra.Adjoint{Int64,Array{Int64,2}} as conjugate transpose is lazy in this case. One can get <= 0.6 behavior by a = copy([1 2 3 4 5]').
AFAIK there is no syntactic sugar for it.
I usually write:
hcat([1, 2, 3, 4, 5])
which is short and I find it easy to remember.
If you use reshape you can replace one dimension with : which means you do not have to count (it is useful e.g. when you get an input vector as a variable):
reshape([1 2 3 4 5], :, 1)
Finally you could use:
permutedims([1 2 3 4 5])

display interaction with Julia list comprehension

julia> display([i*j for i=1:3, j=1:3])
3×3 Array{Int64,2}:
1 2 3
2 4 6
3 6 9
julia> display([i*j for i=1:3, j=1:3 i>=j])
6-element Array{Int64,1}:
1
2
3
4
6
9
not a surprise. what i'd like is:
3×3 Array{Int64,2}:
1
2 4
3 6 9
i suppose a for loop is needed. what i don't want is to generate the entire array and then filter out or replace the ones.
while the example is symmetric, it not really relevant to the q. any f(i,j) could be substituted for i*j. (symmetric or not)
I suppose you wanted to write [i*j for i=1:3, j=1:3 if i>=j]. The if condition will always make your result a vector.
What you can do to avoid generating an entire array is e.g.:
x = Matrix{Int}(3,3)
for i in 1:3, j in 1:i
x[i,j] = i*j
end
y = LowerTriangular(x)

Treshold values row-wise in a dataframe

Consider an example data frame:
A B C v
5 4 2 3
7 1 3 5
1 2 1 1
I want to set all elements of a row to 1 if the element is bigger or equal than v, and 0 otherwise. The example data frame would result in the following:
A B C v
1 1 0 3
1 0 0 5
1 1 1 1
How can I do this efficiently? The number of columns will be much higher, and I would like a solution that does not require me to specify the names of the columns individually, and will apply it to all of them (except v) instead.
My solution with a for loop is way too slow.
We can create a logical matrix and coerce to binary
df1[-4] <- +(df1[-4] >= df1$v)

What's the difference between [1], [1,], [,1], [[1]] for a dataframe in R? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
In R, what is the difference between the [] and [[]] notations for accessing the elements of a list?
I'm confused with the difference of [1], [1,], [,1], [[1]] for dataframe type.
As I know, [1,] will fetch the first row of a matrix, [,1] will fetch the first column. [[1]] will fetch the first element of a list.
But I checked the document of data.frame, which says
A data frame is a list of variables of the same number of rows with
unique row names
Then I typed in some code to test the usage.
>L3 <- LETTERS[1:3]
>(d <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE)))
x y fac
1 1 1 C
2 1 2 B
3 1 3 C
4 1 4 C
5 1 5 A
6 1 6 B
7 1 7 C
8 1 8 A
9 1 9 A
10 1 10 A
> d[1]
x
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 1
10 1
>d[1,]
x y fac
1 1 1 C
>d[,1]
[1] 1 1 1 1 1 1 1 1 1 1
>d[[1]]
[1] 1 1 1 1 1 1 1 1 1 1
What confused me is: [1,] and [,1] is only used in matrix. [[1]] is only used in list, and [1] is used in vector, but why all of them are available in dataframe?
Could anybody explain the difference of these usage?
In R, operators are not used for one data type only. Operators can be overloaded for whatever data type you like (e.g. also S3/S4 classes).
In fact, that's the case for data.frames.
as data.frames are lists, the [i] and [[i]] (and $) show list-like behaviour.
row, colum indices do have an intuitive meaning for tables, and data.frames look like tables. Probably that is the reason why methods for data.frame [i, j] were defined.
You can even look at the definitions, they are coded in the S3 system (so methodname.class):
> `[.data.frame`
and
> `[[.data.frame`
(the backticks quote the function name, otherwise R would try to use the operator and end up with a syntax error)

Resources