How can I have a cell array in Julia? - julia

Does a cell array exist in Julia? I want an array which its elements are vector or matrix.
for example A={1,[2 3],[5 6;7 8];"salam", [1 2 3 4],magic(5)}.
if you don't mind please help me.

An Array{Any} is equivalent to a MATLAB cell array. You can put anything in there. ["hi",:bye,10]. a = Array{Any}(undef,5) builds an uninitialized one, you can a[1] = ... to modify values, push!(a,...) to increase its length, etc.

A cell array is a data type with indexed data containers called cells, where each cell can contain any type of data
In Julia, arrays can contain values of homogeneous ([1, 2, 3]) or heterogeneous types ([1, 2.5, "3"]). Julia will try to promote the values to a common concrete type by default. If Julia can not promote the types contained, the resulting array would be of the abstract type Any.
Example ported from Access Data in Cell Array, using Julia 1.0.3:
julia> C = ["one" "two" "three"; # Matrix literal
1 2 3 ]
2×3 Array{Any,2}:
"one" "two" "three"
1 2 3
julia> upperLeft = C[1:2,1:2] # slicing
2×2 Array{Any,2}:
"one" "two"
1 2
julia> C[1,1:3] = ["first","second","third"] # slice assignment
3-element Array{String,1}:
"first"
"second"
"third"
julia> C
2×3 Array{Any,2}:
"first" "second" "third"
1 2 3
julia> numericCells = C[2,1:3]
3-element Array{Any,1}:
1
2
3
julia> last = C[2,3] # indexing
3
julia> C[2,3] = 300 # indexing assignment
300
julia> C
2×3 Array{Any,2}:
"first" "second" "third"
1 2 300
julia> r1c1, r2c1, r1c2, r2c2 = C[1:2,1:2] # destructuring
2×2 Array{Any,2}:
"first" "second"
1 2
julia> r1c1
"first"
julia> r2c1
1
julia> r1c2
"second"
julia> r2c2
2
julia> nums = C[2,:]
3-element Array{Any,1}:
1
2
300
Example ported from Combining Cell Arrays with Non-Cell Arrays:
Notice the use of the splice operator (...) to incorporate the values of the inner array into the outer one, and the usage of the Any[] syntax to prevent Julia from promoting the UInt8 to an Int.
julia> A = [100, Any[UInt8(200), 300]..., "Julia"]
4-element Array{Any,1}:
100
0xc8
300
"Julia"
The .( broadcast syntax, applies the function typeof element wise.
julia> typeof.(A)
4-element Array{DataType,1}:
Int64
UInt8
Int64
String
So in summary Julia doesn't need cell arrays, it uses parametric n-dimensional arrays instead. Also Julia only uses brackets for both slicing and indexing (A[n], A[i, j], A[a:b, x:y]), parenthesis after a variable symbol is reserved for function calls (foo(), foo(args...), foo(bar = "baz")).

Related

How do you access multi-dimension array by N array of index element-wise?

Suppose we have
A = [1 2; 3 4]
In numpy, the following syntax will produce
A[[1,2],[1,2]] = [1,4]
But, in julia, the following produce a permutation which output
A[[1,2],[1,2]] = [1 2; 3 4]
Is there a concise way to achieve the same thing as numpy without using for loops?
To get what you want I would use CartesianIndex like this:
julia> A[CartesianIndex.([(1,1), (2,2)])]
2-element Vector{Int64}:
1
4
or
julia> A[[CartesianIndex(1,1), CartesianIndex(2,2)]]
2-element Vector{Int64}:
1
4
Like Bogumil said, you probably want to use CartesianIndex. But if you want to get your result from supplying the vectors of indices for each dimensions, as in your Python [1,2],[1,2] example, you need to zip these indices first:
julia> A[CartesianIndex.(zip([1,2], [1,2]))]
2-element Vector{Int64}:
1
4
How does this work? zip traverses both vectors of indices at the same time (like a zipper) and returns an iterator over the tuples of indices:
julia> zip([1,2],[1,2]) # is a lazy iterator
zip([1, 2], [1, 2])
julia> collect(zip([1,2],[1,2])) # collect to show all the tuples
2-element Vector{Tuple{Int64, Int64}}:
(1, 1)
(2, 2)
and then CartesianIndex turns them into cartesian indices, which can then be used to get the corresponding values in A:
julia> CartesianIndex.(zip([1,2],[1,2]))
2-element Vector{CartesianIndex{2}}:
CartesianIndex(1, 1)
CartesianIndex(2, 2)

When will changing an object also change the copy of the object?

I am confused by the copy() function. As I understood, = is pointer style assignment and deepcopy() is creating a new independent copy. However, I found copy() is not very "stable". Please see the following two examples:
b = [[1,2,3], [4,5,6]];
a = copy(b);
b[1][1] = 10;
a
b
In the example above, a also changed after the assignment of b[1][1]
While in the second example:
b = [[1,2,3], [4,5,6]];
a = copy(b);
b[1] = [10,2,3];
a
b
The assignment of b[1] does not really change a. This is really confusing. Can anyone explain briefly what is happening? Thank you!
copy craetes a shallow copy and hence in your case references to the object are copied rather than real data.
This happens because your b is a Vector of Vectors so this is storied as:
b = [<reference to the first vector>, <reference to the second vector>]
When you create a shallow-copy only those references are being copied but no the underlying data. Hence the copied references still point to the same memory address.
In your second example you are replacing the actual reference. Since the object a holds a copy of the reference, replacing the entire reference in b is not seen in a.
This behavior will be seen everywhere where you have "objects inside objects" data structure. On the other hand if you have arrays of primitives (on references) you will get an actual copy such as:
julia> a = [1 3; 3 4]
2×2 Matrix{Int64}:
1 3
3 4
julia> b = copy(a); b[1,1] = 100
100
julia> a
2×2 Matrix{Int64}:
1 3
3 4
This is a more detailed explanation of the differences between the equal sign, copy and deepcopy functions, extracted from the chapter 2 of my "Julia Quick Syntax Reference: A Pocket Guide for Data Science Programming" (Apress 2019) book:
Memory and copy issues
In order to avoid copying large amount of data, Julia by default copies only the memory address of objects, unless the programmer explicitly request a so-called "deep" copy or the compiler "judges" an actual copy more efficient.
Use copy() or deepcopy() when you don't want that subsequent modifications to the copied object would apply to the original object.
In details:
Equal sign (a=b)
performs a name binding, i.e. binds (assigns) the entity (object) referenced by b also to the a identifier (the variable name)
it results that:
if b then rebinds to some other object, a remains referenced to the original object
if the object referenced by b mutates (i.e. it internally changes), so does (being the same object) those referenced by a
if b is immutable and small in memory, under some circumstances, the compiler would instead create a new object and bind it to a, but being immutable for the user this difference would not be noticeable
as for many high level languages, we don't need to explicitly worry about memory leaks. A Garbage Collector exists such that objects that are no longer accessible are automatically destroyed.
a = copy(b)
creates a new, "independent" copy of the object and bind it to a. This new object may however reference in turn other objects trough their memory address. In this case it is their memory address that is copied and not the referenced objects themselves.
it results that:
if these referenced objects (e.g. the individual elements of a
vector) are rebound to some other objects, the new object referenced
by a maintains the reference to the original objects
if these referenced objects mutate, so do (being the same objects) those referenced by the new object referenced by a
a = deepcopy(b)
everything is deep copied recursively
The following code snippet highlights the differences between these three methods of "copying" an object:
julia> a = [[[1,2],3],4]
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> b = a
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> c = copy(a)
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> d = deepcopy(a)
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a[2] to an other objects.
# At the same time mutates object a:
julia> a[2] = 40
40
julia> b
2-element Array{Any,1}:
Any[[1, 2], 3]
40
julia> c
2-element Array{Any,1}:
Any[[1, 2], 3]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a[1][2] and at the same
# time mutates both a and a[1]:
julia> a[1][2] = 30
30
julia> b
2-element Array{Any,1}:
Any[[1, 2], 30]
40
julia> c
2-element Array{Any,1}:
Any[[1, 2], 30]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a[1][1][2] and at the same
# time mutates a, a[1] and a[1][1]:
julia> a[1][1][2] = 20
20
julia> b
2-element Array{Any,1}:
Any[[1, 20], 30]
40
julia> c
2-element Array{Any,1}:
Any[[1, 20], 30]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
# rebinds a:
julia> a = 5
5
julia> b
2-element Array{Any,1}:
Any[[1, 20], 30]
40
julia> c
2-element Array{Any,1}:
Any[[1, 20], 30]
4
julia> d
2-element Array{Any,1}:
Any[[1, 2], 3]
4
We can check if two objects have the same values with == and if two objects are actually the same with === (in the sense that immutable objects are checked at the bit level and mutable objects are checked for their memory address):
given a = [1, 2]; b = [1, 2]; a == b and a === a are true, but a === b is false;
given a = (1, 2); b = (1, 2); all a == b, a === a and a === b are true.

Create a Vector of Integers and missing Values

What a hazzle...
I'm trying to create a vector of integers and missing values. This works fine:
b = [4, missing, missing, 3]
But I would actually like the vector to be longer with more missing values and therefore use repeat(), but this doesn't work
append!([1,2,3], repeat([missing], 1000))
and this also doesn't work
[1,2,3, repeat([missing], 1000)]
Please, help me out, here.
It is also worth to note that if you do not need to do an in-place operation with append! actually in such cases it is much easier to do vertical concatenation:
julia> [[1, 2, 3]; repeat([missing], 2); 4; 5] # note ; that denotes vcat
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
julia> vcat([1,2,3], repeat([missing], 2), 4, 5) # this is the same but using a different syntax
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
The benefit of vcat is that it automatically does the type promotion (as opposed to append! in which case you have to correctly specify the eltype of the target container before the operation).
Note that because vcat does automatic type promotion in corner cases you might get a different eltype of the result of the operation:
julia> x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> append!(x, [1.0, 2.0]) # conversion from Float64 to Int happens here
5-element Array{Int64,1}:
1
2
3
1
2
julia> [[1, 2, 3]; [1.0, 2.0]] # promotion of Int to Float64 happens in this case
5-element Array{Float64,1}:
1.0
2.0
3.0
1.0
2.0
See also https://docs.julialang.org/en/v1/manual/arrays/#man-array-literals.
This will work:
append!(Union{Int,Missing}[1,2,3], repeat([missing], 1000))
[1,2,3] creates just a Vector{Int} and since Julia is strongly typed the Vector{Int} cannot accept values of non-Int type. Hence, when defining a structure, that you plan to hold more data types within, you need to explicitly state it - here we have defined Vector{Union{Int,Missing}}.

Vectorized splatting

I'd like to be able to splat an array of tuples into a function in a vectorized fashion. For example, if I have the following function,
function foo(x, y)
x + y
end
and the following array of tuples,
args_array = [(1, 2), (3, 4), (5, 6)]
then I could use a list comprehension to obtain the desired result:
julia> [foo(args...) for args in args_array]
3-element Array{Int64,1}:
3
7
11
However, I would like to be able to use the dot vectorization notation for this operation:
julia> foo.(args_array...)
ERROR: MethodError: no method matching foo(::Int64, ::Int64, ::Int64)
But as you can see, that particular syntax doesn't work. Is there a vectorized way to do this?
foo.(args_array...) doesn't work because it's doing:
foo.((1, 2), (3, 4), (5, 6))
# which is roughly equivalent to
[foo(1,3,5), foo(2,4,6)]
In other words, it's taking each element of args_array as a separate argument and then broadcasting foo over those arguments. You want to broadcast foo over the elements directly. The trouble is that running:
foo.(args_array)
# is roughly equivalent to:
[foo((1,2)), foo((3,4)), foo((5,6))]
In other words, the broadcast syntax is just passing each tuple as a single argument to foo. We can fix that with a simple intermediate function:
julia> bar(args) = foo(args...);
julia> bar.(args_array)
3-element Array{Int64,1}:
3
7
11
Now that's doing what you want! You don't even need to construct the second argument if you don't want to. This is exactly equivalent:
julia> (args->foo(args...)).(args_array)
3-element Array{Int64,1}:
3
7
11
And in fact you can generalize this quite easily:
julia> splat(f) = args -> f(args...);
julia> (splat(foo)).(args_array)
3-element Array{Int64,1}:
3
7
11
You could zip the args_array, which effectively transposes the array of tuples:
julia> collect(zip(args_array...))
2-element Array{Tuple{Int64,Int64,Int64},1}:
(1, 3, 5)
(2, 4, 6)
Then you can broadcast foo over the transposed array (actually an iterator) of tuples:
julia> foo.(zip(args_array...)...)
(3, 7, 11)
However, this returns a tuple instead of an array. If you need the return value to be an array, you could use any of the following somewhat cryptic solutions:
julia> foo.(collect.(zip(args_array...))...)
3-element Array{Int64,1}:
3
7
11
julia> collect(foo.(zip(args_array...)...))
3-element Array{Int64,1}:
3
7
11
julia> [foo.(zip(args_array...)...)...]
3-element Array{Int64,1}:
3
7
11
How about
[foo(x,y) for (x,y) in args_array]

How do you select a subset of an array based on a condition in Julia

How do you do simply select a subset of an array based on a condition? I know Julia doesn't use vectorization, but there must be a simple way of doing the following without an ugly looking multi-line for loop
julia> map([1,2,3,4]) do x
return (x%2==0)?x:nothing
end
4-element Array{Any,1}:
nothing
2
nothing
4
Desired output:
[2, 4]
Observed output:
[nothing, 2, nothing, 4]
You are looking for filter
http://docs.julialang.org/en/release-0.4/stdlib/collections/#Base.filter
Here is example an
filter(x->x%2==0,[1,2,3,5]) #anwers with [2]
There are element-wise operators (beginning with a "."):
julia> [1,2,3,4] % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
julia> x % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x[x % 2 .== 0]
2-element Array{Int64,1}:
2
4
julia> x .% 2
4-element Array{Int64,1}:
1
0
1
0
You can use the find() function (or the .== syntax) to accomplish this. E.g.:
julia> x = collect(1:4)
4-element Array{Int64,1}:
1
2
3
4
julia> y = x[find(x%2.==0)]
2-element Array{Int64,1}:
2
4
julia> y = x[x%2.==0] ## more concise and slightly quicker
2-element Array{Int64,1}:
2
4
Note the .== syntax for the element-wise operation. Also, note that find() returns the indices that match the criteria. In this case, the indices matching the criteria are the same as the array elements that match the criteria. For the more general case though, we want to put the find() function in brackets to denote that we are using it to select indices from the original array x.
Update: Good point #Lutfullah Tomak about the filter() function. I believe though that find() can be quicker and more memory efficient. (though I understand that anonymous functions are supposed to get better in version 0.5 so perhaps this might change?) At least in my trial, I got:
x = collect(1:100000000);
#time y1 = filter(x->x%2==0,x);
# 9.526485 seconds (100.00 M allocations: 1.554 GB, 2.76% gc time)
#time y2 = x[find(x%2.==0)];
# 3.187476 seconds (48.85 k allocations: 1.504 GB, 4.89% gc time)
#time y3 = x[x%2.==0];
# 2.570451 seconds (57.98 k allocations: 1.131 GB, 4.17% gc time)
Update2: Good points in comments to this post that x[x%2.==0] is faster than x[find(x%2.==0)].
Another updated version:
v[v .% 2 .== 0]
Probably, for the newer versions of Julia, one needs to add broadcasting dot before both % and ==

Resources