&-ing two BitArrays in Julia? - julia

Using Julia 1.5.3 and Julia 1.6.0 neither versions seems to support & for BitArrays.
I have two BitArrays like for instance
x = BitArray([1,0,1])
and
y=BitArray([0,0,1])
and wish to intersect them to find:
x&y=BitArray([0,0,1])
but the operator & does not seem to support BitArrays and using .* seems to be very time consuming.
Does anyone know a good method for finding the intersection of two bit arrays in Julia?

& works for scalar values, while you are applying them to arrays. When applying scalar operators (or functions) to an array, you should use 'broadcasting', which you can do by adding a dot to the operator:
jl> x .& y
3-element BitVector:
0
0
1
BTW, I cannot see any timing difference between .* and .&. In fact it seems that * just calls &.
What sort of performance are you seeing?
jl> using BenchmarkTools
jl> #btime $x .* $y;
48.479 ns (2 allocations: 128 bytes)
jl> #btime $x .& $y;
48.426 ns (2 allocations: 128 bytes)

Related

Problem with power operator for specific values

I am trying to do a simple function to check the differences between factorial and Stirling's approximation:
using DataFrames
n = 24
df_res = DataFrame(i = BigInt[],
f = BigInt[],
st = BigInt[])
for i in 1:n
fac = factorial(big(i))
sterling = i^i*exp(-i)*sqrt(i)*sqrt(2*pi)
res = DataFrame(i = [i],
f = [fac],
st = [sterling])
df_res = [df_res;res]
end
first(df_res, 24)
The result for sterling when i= 16 and i= 24 is 0!. So, I checked power for both values and the result is 0:
julia> 16^16
0
julia> 24^24
0
I did the same code in R, and there are no issues. What am I doing wrong or what I don't know about Julia and I probably should?
It appears that Julia integers are either 32-bit or 64-bit, depending on your system, according to the Julia documentation for Integers and Floating-Point Numbers. Your exponentiation is overflowing your values, even if they're 64 bits.
Julia looks like it supports Arbitrary Precision Arithmetic, which you'll need to store the large resultant values.
According to the Overflow Section, writing big(n) makes n arbitrary precision.
While the question has been answered at the another post one more thing is worth saying.
Julia is one of very few languages that allows you to define your very own primitive types - so you can be still with fast fixed precision numbers yet handle huge values. There is a package BitIntegers for that:
BitIntegers.#define_integers 512
Now you can do:
julia> Int512(2)^500
3273390607896141870013189696827599152216642046043064789483291368096133796404674554883270092325904157150886684127560071009217256545885393053328527589376
Usually you will get better performance for for even big fixed point arithmetic numbers. For an example:
julia> #btime Int512(2)^500;
174.687 ns (0 allocations: 0 bytes)
julia> #btime big(2)^500;
259.643 ns (9 allocations: 248 bytes)
There is a simple solution to your problem that does not involve using BigInt or any specialized number types, and which is much faster. Simply tweak your mathematical expression slightly.
foo(i) = i^i*exp(-i)*sqrt(i)*sqrt(2*pi) # this is your function
bar(i) = (i / exp(1))^i * sqrt(i) * sqrt(2*pi) # here's a better way
OK, let's test it:
1.7.2> foo(16)
0.0 # oops. not what we wanted
1.7.2> foo(big(16)) # works
2.081411441522312838373895982304611417026205959453251524254923609974529540404514e+13
1.7.2> bar(16) # also works
2.0814114415223137e13
Let's try timing it:
1.7.2> using BenchmarkTools
1.7.2> #btime foo(n) setup=(n=16)
18.136 ns (0 allocations: 0 bytes)
0.0
1.7.2> #btime foo(n) setup=(n=big(16))
4.457 μs (25 allocations: 1.00 KiB) # horribly slow
2.081411441522312838373895982304611417026205959453251524254923609974529540404514e+13
1.7.2> #btime bar(n) setup=(n=16)
99.682 ns (0 allocations: 0 bytes) # pretty fast
2.0814114415223137e13
Edit: It seems like
baz(i) = float(i)^i * exp(-i) * sqrt(i) * sqrt(2*pi)
might be an even better solution, since the numerical values are closer to the original.

Julia: Array of ones: add one to zero or repeat?

I have a concrete Array and want to efficiently construct a similar array of the same dimensions filled with ones. What would be the recommended approach?
Here's a random array to work with:
julia> A = rand(0:1, 10, 5)
10×5 Matrix{Int64}:
or A = rand(0:1., 10, 5) (with a dot on 0. and/or 1.) for a random matrix of floats.
Two approaches are very natural. I could do this:
julia> zero(A) .+ 1
5×10 Matrix{Int64}:
Or I could do it this way:
julia> repeat(ones(size(A)[2])', outer = size(A)[1])
5×10 Matrix{Float64}:
The first approach is more elegant. The second approach feels more clunky and prone to error (accidentally exchanging [1] and [2]), but at the same time it doesn't involve the addition operation and so possibly involves fewer allocations (or maybe not because the compiler is super smart Edit: quick benchmark below suggests the compiler is super smart).
And of course there may be another, better approach.
using BenchmarkTools
A = rand(0:1, 1000, 1000)
#btime zero(A) .+ 1
## 1.609 ms (6 allocations: 15.26 MiB)
#btime repeat(ones(size(A)[2])', outer = size(A)[1])
## 3.032 ms (10 allocations: 7.64 MiB)
Edit 2: Follow-up onBogumił's answer
The following method for a unit-array J, defined for convenience, is efficient:
function J(A::AbstractArray{T,N}) where {T,N}
ones(T, size(A))
end
J(A)
#btime J(A)
## 789.929 μs (2 allocations: 7.63 MiB)
What about:
ones(Int, size(A))
or
fill(1, size(A))

How recompute the eltype of a vector in Julia

Let's say I have a vector
v = Any[1,2,3,4]
And I would like to recompute its eltype in such a way that
typeof(v) = Vector{Int}
Is it possible to accomplish this without having to manually concatenate each of the elements in v?
You can't "retype" the existing v, just create a copy of it with the more concrete type1.
Conversion
Assuming you already (statically) know the result type, you have multiple options. Most readable (and IMO, idiomatic) would be
Vector{Int}(v)
which is almost equivalent to
convert(Vector{Int}, v)
except that the latter does not copy if the input types is already the target type. Alternatively:
convert.(Int, v)
which surely copies as well.
What to convert to
If you don't know what the "common type" would be, there are multiple options how to get one that matches. In general, typejoin can be used to find a least upper bound:
mapreduce(typeof, typejoin, v; init=Union{})
The result will most likely be abstract, e.g. Real for an array of Ints and Float64s. So, for numeric types, you might be better off with promote_type:
mapreduce(typeof, promote_type, v; init=Union{}) # or init=Number
This at least gives you Float64 for mixed Ints and Float64s.
But all of this is not really recommended, since it might be fragile, surprising, and is certainly not type stable.
1For certain combinations of types, with compatible binary form, reinterpret will work and return a view with a different type, but this is only possible for bits types, which Any is not. For converting Any[1,2,3] to Int[1,2,3] copying is fundamentally necessary because the two arrays have different layouts in memory: the former is an array of pointers to individually allocated integers objects, whereas the latter stores the Int values inline in contiguous memory.
If you don't know the output type, then consider using a comprehension
foo(v) = [x for x in v]
This is considerably faster on my computer than identity.(v):
julia> v = Any[1,2,3,4];
julia> #btime foo($v);
153.018 ns (2 allocations: 128 bytes)
julia> #btime identity.($v);
293.908 ns (5 allocations: 208 bytes)
julia> #btime foo(v) setup=(v=Any[rand(0:9) for _ in 1:1000]);
1.331 μs (2 allocations: 7.95 KiB)
julia> #btime identity.(v) setup=(v=Any[rand(0:9) for _ in 1:1000]);
25.498 μs (494 allocations: 15.67 KiB)
This is a quick and dirty trick that usually solves the problem
julia> v = Any[1,2,3,4]
4-element Array{Any,1}:
1
2
3
4
julia> identity.(v)
4-element Array{Int64,1}:
1
2
3
4

What is the difference between `UnitRange` and `Array`?

I have two versions of code that seem to do the same thing:
sum = 0
for x in 1:100
sum += x
end
sum = 0
for x in collect(1:100)
sum += x
end
Is there a practical difference between the two approaches?
In Julia, 1:100 returns a particular struct called UnitRange that looks like this:
julia> dump(1:100)
UnitRange{Int64}
start: Int64 1
stop: Int64 100
This is a very compact struct to represent ranges with step 1 and arbitrary (finite) size. UnitRange is subtype of AbstractRange, a type to represent ranges with arbitrary step, subtype of AbstractVector.
The instances of UnitRange dynamically compute their elements whenever the you use getindex (or the syntactic sugar vector[index]). For example, with #less (1:100)[3] you can see this method:
function getindex(v::UnitRange{T}, i::Integer) where {T<:OverflowSafe}
#_inline_meta
val = v.start + (i - 1)
#boundscheck _in_unit_range(v, val, i) || throw_boundserror(v, i)
val % T
end
This is returning the i-th element of the vector by adding i - 1 to the first element (start) of the range. Some functions have optimised methods with UnitRange, or more generally with AbstractRange. For instance, with #less sum(1:100) you can see the following
function sum(r::AbstractRange{<:Real})
l = length(r)
# note that a little care is required to avoid overflow in l*(l-1)/2
return l * first(r) + (iseven(l) ? (step(r) * (l-1)) * (l>>1)
: (step(r) * l) * ((l-1)>>1))
end
This method uses the formula for the sum of an arithmetic progression, which is extremely efficient as it's evaluated in a time independent of the size of the vector.
On the other hand, collect(1:100) returns a plain Vector with one hundred elements 1, 2, 3, ..., 100. The main difference with UnitRange (or other types of AbstractRange) is that getindex(vector::Vector, i) (or vector[i], with vector::Vector) doesn't do any computation but simply accesses the i-th element of the vector. The downside of a Vector over a UnitRange is that generally speaking there aren't efficient methods when working with them as the elements of this container are completely arbitrary, while UnitRange represents a set of numbers with peculiar properties (sorted, constant step, etc...).
If you compare the performance of methods for which UnitRange has super-efficient implementations, this type will win hands down (note the use of interpolation of variables with $(...) when using macros from BenchmarkTools):
julia> using BenchmarkTools
julia> #btime sum($(1:1000_000))
0.012 ns (0 allocations: 0 bytes)
500000500000
julia> #btime sum($(collect(1:1000_000)))
229.979 μs (0 allocations: 0 bytes)
500000500000
Remember that UnitRange comes with the cost of dynamically computing the elements every time you access them with getindex. Consider for example this function:
function test(vec)
sum = zero(eltype(vec))
for idx in eachindex(vec)
sum += vec[idx]
end
return sum
end
Let's benchmark it with a UnitRange and a plain Vector:
julia> #btime test($(1:1000_000))
812.673 μs (0 allocations: 0 bytes)
500000500000
julia> #btime test($(collect(1:1000_000)))
522.828 μs (0 allocations: 0 bytes)
500000500000
In this case the function calling the plain array is faster than the one with a UnitRange because it doesn't have to dynamically compute 1 million elements.
Of course, in these toy examples it'd be more sensible to iterate over all elements of vec rather than its indices, but in real world cases a situation like these may be more sensible. This last example, however, shows that a UnitRange is not necessarily more efficient than a plain array, especially if you need to dynamically compute all of its elements. UnitRanges are more efficient when you can take advantage of specialised methods (like sum) for which the operation can be performed in constant time.
As a file remark, note that if you originally have a UnitRange it's not necessarily a good idea to convert it to a plain Vector to get good performance, especially if you're going to use it only once or very few times, as the conversion to Vector involves itself the dynamic computation of all elements of the range and the allocation of the necessary memory:
julia> #btime collect($(1:1000_000));
422.435 μs (2 allocations: 7.63 MiB)
julia> #btime test(collect($(1:1000_000)))
882.866 μs (2 allocations: 7.63 MiB)
500000500000

Julia: How to convert numeric string with power symbol "^" into floating point number

I'm having trouble converting an array of numeric strings into an array of corresponding floating point numbers. A (hypothetical) string array is:
arr = ["8264.", "7.1050^-7", "9970.", "2.1090^-6", "5.2378^-7"]
I would like to convert it into:
arr = [8264., 1.0940859076672388e-6, 9970., 0.011364243260505457, 9.246079446497013e-6]
As a novice of Julia, I have no clue on how to make power operator "^" in the string format to do the correct job in the conversion. I highly appreciate your suggestions!
This function would parse both forms, with without the exponent.
function foo(s)
a=parse.(Float64,split(s,'^'))
length(a)>1 && return a[1]^a[2]
a[1]
end
Somewhat ugly, but gets the job done:
eval.(Meta.parse.(arr))
UPDATE:
Let me elaborate a bit what this does and why it's maybe not good style.
Meta.parse converts a String into a Julia Expression. The dot indicates that we want to broadcast Meta.parse to every string in arr, that is apply it to every element. Afterwards, we use eval - again broadcasted - to evalute the expressions.
This produces the correct result as it literally takes every string as a Julia "command" and hence knows that ^ indicates a power. However, besides being slow, this is potentially insecure as one could inject arbitrary Julia code.
UPDATE:
A safer and faster way to obtain the desired result is to define a short function that does the conversion:
julia> function mystr2float(s)
!occursin('^', s) && return parse(Float64, s)
x = parse.(Float64, split(s, '^'))
return x[1]^x[2]
end
mystr2float (generic function with 1 method)
julia> mystr2float.(arr)
5-element Array{Float64,1}:
8264.0
1.0940859076672388e-6
9970.0
0.011364243260505457
9.246079446497013e-6
julia> using BenchmarkTools
julia> #btime eval.(Meta.parse.($arr));
651.000 μs (173 allocations: 9.27 KiB)
julia> #btime mystr2float.($arr);
5.567 μs (18 allocations: 1.02 KiB)
UPDATE:
Performance comparison with #dberge's suggestion below:
julia> #btime mystr2float.($arr);
5.516 μs (18 allocations: 1.02 KiB)
julia> #btime foo.($arr);
5.767 μs (24 allocations: 1.47 KiB)

Resources