Sample from Vector/Array with Probabilities - julia

I have a Bool vector, simply [true, false]. I can draw 10 samples from that vector with
rand([true,false], 10)
but how can I achieve that true is drawn with a 80%-probability and false is drawn with a 20%-probability?

Use the sample function from StatsBase.jl with Weights argument:
julia> using StatsBase
julia> sample([true, false], Weights([0.8, 0.2]), 10)
10-element Array{Bool,1}:
1
0
1
1
1
1
1
1
1
1
And to make sure you get what you wanted you can write:
julia> countmap(sample([true, false], Weights([0.8, 0.2]), 10^8))
Dict{Bool,Int64} with 2 entries:
false => 20003766
true => 79996234
(of course your exact numbers will differ)
Also if you specifically need binary sampling you can use Bernoulli distribution from Distributions.jl:
julia> using Distributions
julia> rand(Bernoulli(0.8), 10)
10-element Array{Bool,1}:
0
1
1
0
1
1
1
1
1
1
julia> countmap(rand(Bernoulli(0.8), 10^8))
Dict{Bool,Int64} with 2 entries:
false => 20005900
true => 79994100
(you can expect this method to be faster)
Finally - if you do not want to use any packages and need a binary result you can just write rand(10) .< 0.8, and again - you get what you wanted:
julia> countmap(rand(10^8) .< 0.8)
Dict{Bool,Int64} with 2 entries:
false => 20003950
true => 79996050

Related

Convert a Set to an Array in Julia

How can I convert a Set to an Array in Julia?
E.g. I want to transform the following Set to an Array.
x = Set([1,2,3])
x
Set{Int64} with 3 elements:
2
3
1
The collect() function can be used for this. E.g.
collect(x)
3-element Vector{Int64}:
2
3
1
Notice, however, that the order of the elements has changed. This is because sets are unordered.
You can also use the splat operator on sets:
julia> [x...]
3-element Vector{Int64}:
2
3
1
However, this is slower than collect.
You can use [ ] to create an array.
x = Set([1,2,3])
y = [a for a in x]
y
2
3
1
typeof(y)
Vector{Int64} (alias for Array{Int64, 1})
You can use a comprehension:
x = Set(1:5)
#time y = [i for i in x]
> 0.000006 seconds (2 allocations: 112 bytes)
typeof(y)
> Vector{Int64} (alias for Array{Int64, 1})

SymPy.jl strange report of eigenvalues compared with using SymPy in python

SymPy in Python:
>>> M = Matrix([[-4, sqrt(2)], [sqrt(2), -5]])
>>> M
Matrix([
[ -4, sqrt(2)],
[sqrt(2), -5]])
>>> dict_eig = M.eigenvals()
>>> dict_eig
{-6: 1, -3: 1}
SymPy.jl (Julia):
julia> M = sympy.Matrix([[-4, sqrt(2)], [sqrt(2), -5]])
2×2 Array{Sym,2}:
-4.00000000000000 1.41421356237310
1.41421356237310 -5.00000000000000
julia> dict_eig = M.eigenvals()
Dict{Any,Any} with 2 entries:
-9/2 - sqrt(225000000000001400410360361)/10000000000000 => 1
-9/2 + sqrt(225000000000001400410360361)/10000000000000 => 1
The result is actually correct but pretty weird.. why is that and how can I get the form reported in Python ?
You're implicitly using sympy's sqrt implementation in the python version. If you call sympy's sqrt directly you'll get equivalent results.
julia> M = [[-4 sympy.sqrt(2)]; [sympy.sqrt(2) -5]]
2×2 Array{Sym,2}:
-4 sqrt(2)
sqrt(2) -5
julia> M.eigenvals()
Dict{Any,Any} with 2 entries:
-3 => 1
-6 => 1

How do you select a subset of an array based on a condition in Julia

How do you do simply select a subset of an array based on a condition? I know Julia doesn't use vectorization, but there must be a simple way of doing the following without an ugly looking multi-line for loop
julia> map([1,2,3,4]) do x
return (x%2==0)?x:nothing
end
4-element Array{Any,1}:
nothing
2
nothing
4
Desired output:
[2, 4]
Observed output:
[nothing, 2, nothing, 4]
You are looking for filter
http://docs.julialang.org/en/release-0.4/stdlib/collections/#Base.filter
Here is example an
filter(x->x%2==0,[1,2,3,5]) #anwers with [2]
There are element-wise operators (beginning with a "."):
julia> [1,2,3,4] % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x = [1,2,3,4]
4-element Array{Int64,1}:
1
2
3
4
julia> x % 2 .== 0
4-element BitArray{1}:
false
true
false
true
julia> x[x % 2 .== 0]
2-element Array{Int64,1}:
2
4
julia> x .% 2
4-element Array{Int64,1}:
1
0
1
0
You can use the find() function (or the .== syntax) to accomplish this. E.g.:
julia> x = collect(1:4)
4-element Array{Int64,1}:
1
2
3
4
julia> y = x[find(x%2.==0)]
2-element Array{Int64,1}:
2
4
julia> y = x[x%2.==0] ## more concise and slightly quicker
2-element Array{Int64,1}:
2
4
Note the .== syntax for the element-wise operation. Also, note that find() returns the indices that match the criteria. In this case, the indices matching the criteria are the same as the array elements that match the criteria. For the more general case though, we want to put the find() function in brackets to denote that we are using it to select indices from the original array x.
Update: Good point #Lutfullah Tomak about the filter() function. I believe though that find() can be quicker and more memory efficient. (though I understand that anonymous functions are supposed to get better in version 0.5 so perhaps this might change?) At least in my trial, I got:
x = collect(1:100000000);
#time y1 = filter(x->x%2==0,x);
# 9.526485 seconds (100.00 M allocations: 1.554 GB, 2.76% gc time)
#time y2 = x[find(x%2.==0)];
# 3.187476 seconds (48.85 k allocations: 1.504 GB, 4.89% gc time)
#time y3 = x[x%2.==0];
# 2.570451 seconds (57.98 k allocations: 1.131 GB, 4.17% gc time)
Update2: Good points in comments to this post that x[x%2.==0] is faster than x[find(x%2.==0)].
Another updated version:
v[v .% 2 .== 0]
Probably, for the newer versions of Julia, one needs to add broadcasting dot before both % and ==

Location of minimum in Julia

Does Julia have a build in command to find the index of the minimum of a vector? R, for example, has a which.min command (and a which.max, of course).
Obviously, I could write the following myself, but it would be nice not to have to.
function whichmin( x::Vector )
i = 1
min_x=minimum(x)
while( x[i] > min_x )
i+=1
end
return i
end
Apologies if this has been asked before, but I couldn't find it. Thanks!
Since 0.7-alpha, indmin and indmax are deprecated.
Use argmin and argmax instead.
For a vector it just returns the linear index
julia> x = rand(1:9, 4)
4-element Array{Int64,1}:
9
5
8
5
julia> argmin(x)
2
julia> argmax(x)
1
If looking for both the index and the value, use findmin and findmax.
For multidimensional array, all these functions return the CartesianIndex.
I believe indmax(itr) does what you want. From the julia documentation:
indmax(itr) → Integer
Returns the index of the maximum element in a collection.
And here's an example of it in use:
julia> x = [8, -4, 3.5]
julia> indmax(x)
1
There's also findmax, that returns both the maximum value and its position.
For multidim array, you'll have to switch between linear indexes et multidim indexes:
x = rand(1:9, 2,3)
# 2×3 Array{Int64,2}:
# 5 1 9
# 3 3 8
indmin(x)
# 3
# => third element in the column-major ordered array (value=1)
ind2sub(size(x),indmin(x))
# (1, 2)
# => (row,col) indexes: what you are looking for.
-- Maurice

Bad results in sparse matrix assignment with logical indexing

In Matlab/Octave, I can use logical indexing to assign a value to matrix B in every location that meets a certain requirement in matrix A.
octave:1> A = [.1;.2;.3;.4;.11;.13;.14;.01;.04;.09];
octave:2> C = A < .12
C =
1
0
0
0
1
0
0
1
1
1
octave:3> B = spalloc(10,1);
octave:4> B(C) = 1
B =
Compressed Column Sparse (rows = 10, cols = 1, nnz = 5 [50%])
(1, 1) -> 1
(5, 1) -> 1
(8, 1) -> 1
(9, 1) -> 1
(10, 1) -> 1
However, if I attempt essentially the same code in Julia, the results are incorrect:
julia> A = [.1;.2;.3;.4;.11;.13;.14;.01;.04;.09];
julia> B = spzeros(10,1)
10x1 sparse matrix with 0 Float64 entries:
julia> C = A .< .12
10-element BitArray{1}:
true
false
false
false
true
false
false
true
true
true
julia> B[C] = 1
1
julia> B
10x1 sparse matrix with 5 Float64 entries:
[0 , 1] = 1.0
[0 , 1] = 1.0
[1 , 1] = 1.0
[1 , 1] = 1.0
[1 , 1] = 1.0
Have I made a mistake in the syntax somewhere, am I misunderstanding something, or is this a bug? Note, I get the correct results if I use full matrices in Julia, but since the matrix in my application is really sparse (essential boundary conditions in a finite element simulation), I would much prefer to use the sparse matrices
It looks as if sparse has some problems with BitArray's.
julia> VERSION
v"0.3.5"
julia> A = [.1;.2;.3;.4;.11;.13;.14;.01;.04;.09]
julia> B = spzeros(10,1)
julia> C = A .< .12
julia> B[C] = 1
julia> B
10x1 sparse matrix with 5 Float64 entries:
[0 , 1] = 1.0
[0 , 1] = 1.0
[1 , 1] = 1.0
[1 , 1] = 1.0
[1 , 1] = 1.0
So I get the same thing as the questioner. However when I do things "my way"
julia> B = sparse(C)
ERROR: `sparse` has no method matching sparse(::BitArray{1})
julia> B = sparse(float(C))
10x1 sparse matrix with 5 Float64 entries:
[1 , 1] = 1.0
[5 , 1] = 1.0
[8 , 1] = 1.0
[9 , 1] = 1.0
[10, 1] = 1.0
So this works if you convert the BitArray to Float. I imagine that this workaround will get you going, but it does seem that sparse should work with BitArray.
Some Additional Thoughts (Edit)
As I thought further about this, it occurs to me that one reason why there is no BitArray method for sparse() is that it is not terribly useful to implement sparse storage for an already highly compact type. Considering B and C from above:
julia> sizeof(C)
8
julia> sizeof(B)
40
So for these data, the sparse version is much larger than the original. It's actually worse than this simple (perhaps simplistic) check shows at first glance. sizeof(::BitArray{1}) appears to be the size of the entire array, but sizeof(::SparseMatrixCSC{}) shows the size of each element stored. So the real size disparity is something like 8 versus 200 bytes.
Of course if the data is sparse enough (somewhat less than 1% true), sparse storage begins to win out, despite it's high overhead.
julia> C = rand(10^6) .< 0.01
julia> B = sparse(float(C))
julia> sizeof(C)
125000
julia> sum(C)*sizeof(B)
394520
julia> C = rand(10^6) .< 0.001
julia> B = sparse(float(C))
julia> sizeof(C)
125000
julia> sum(C)*sizeof(B)
40280
So perhaps it is not an oversight that sparse() has no BitArray method. Cases where it would represent a significant space saving may be less common than one might think at first glance.

Resources