StaticArrays and Statsbase - julia

I want to use StaticArray with StatsBase. Consider the following function
function update_weights_1(N, M)
weights_vector_to_update = ones(N) / N
wvector = Weights(weights_vector_to_update, 1)
res = [0.0]
for m in 1:M
sample!(M_vector, wvector, res)
end
end
function update_weights_2(N, M)
weights_vector_to_update = ones(N) / N
res = [0.0]
for m in 1:M
sample!(M_vector, Weights(weights_vector_to_update, 1), res)
end
end
update_weights_1 requires substantially less memory allocation than update_weights_2 because Weights(weights_vector_to_update, 1) needs memory allocation. However, suppose I have a list of small vectors, say z,
z = [ones(3) / 3 for i in 1:10000]
and this function
function update_weights_3(z,M)
N = size(z[1],1)
M_vector = 1:N
for i in 1:size(z,1)
rand!(z[i])
res = [0.0]
for m in 1:M
sample!(M_vector, Weights(z[i]), res)
end
end
end
update_weights_3(z,1000) allocates a lot of memory. I know that using StaticArrays for z can significantly speed up the code and reduce memory allocation. However, following the procedure in this post, whenever I wrap Weights around a StaticArray, it creates memory.
Would you know how to apply StaticArray in this case? Essentially I have a collection of small arrays that I would like to transform into Weights.

Weights is a mutable type, which can cause unnecessary heap allocations (sometimes they are stack allocated... I don't fully understand when this optimization happens). You can define your own immutable weights type, though:
struct StaticWeights{S<:Real, T<:Real, N, V<: StaticVector{N, T}} <: AbstractWeights{S, T, V}
values::V
sum::S
end
StaticWeights(values) = StaticWeights(values, sum(values))
Used in your example:
function update_weights_3(z,M)
N = size(z[1],1)
M_vector = 1:N
for i in 1:size(z,1)
rand!(z[i])
res = [0.0]
for m in 1:M
sample!(M_vector, StaticWeights(z[i]), res)
end
end
end
With this change I don't see any allocations in the inner loop.

Related

Type-stability in Julia's product iterator

I am trying to make A in the following code type-stable.
using Primes: factor
function f(n::T, p::T, k::T) where {T<:Integer}
return rand(T, n * p^k)
end
function g(m::T, n::T) where {T<:Integer}
i = 0
for A in Iterators.product((f(n, p, T(k)) for (p, k) in factor(m))...)
i = sum(A)
end
return i
end
Note that f is type-stable. The variable A is not type-stable because the product iterator will return different sized tuples depending on the values of n and m. If there was an iterator like the product iterator that returned a Vector instead of a Tuple, I believe that the type-instability would go away.
Does anyone have any suggestions to make A type-stable in the above code?
Edit: I should add that f returns a variable-sized Vector of type T.
One way I have solved the type-stability is by doing this.
function g(m::T, n::T) where {T<:Integer}
B = Vector{T}[T[]]
for (p, k) in factor(m)
C = Vector{T}[]
for (b, r) in Iterators.product(B, f(n, p, T(k)))
c = copy(b)
push!(c, r)
push!(C, c)
end
B = C
end
for A in B
i = sum(A)
end
return i
end
This (and in particular, A) is now type-stable, but at the cost lots of memory. I'm not sure of a better way to do this.
It's not easy to get this completely type stable, but you can isolate the type instability with a function barrier. Convert the factorization to a tuple in an outer function, which you pass to an inner function which is type stable. This gives just one dynamic dispatch, instead of many:
# inner, type stable
function _g(n, tup)
i = 0
for A in Iterators.product((f(n, p, k) for (p, k) in tup)...)
i += sum(A) # or i = sum(A), whatever
end
return i
end
# outer function
g(m::T, n::T) where {T<:Integer} = _g(n, Tuple(factor(m)))
Some benchmarks:
julia> #btime g(7, 210); # OP version
149.600 μs (7356 allocations: 172.62 KiB)
julia> #btime g(7, 210); # my version
1.140 μs (6 allocations: 11.91 KiB)
You should expect to hit compilation occasionally, whenever you get a number that contains a new number of factors.

Updating a list of StaticArrays

Suppose I have this function, implemented without StaticArrays
function example_svector_bad(G)
vector_list = [ randn(G) for q in 1:1000]
for i in size(vector_list)
for g in 1:G
vector_list[i][g] = vector_list[i][g] * g
end
end
return vector_list
end
I'm hoping to implement it using StaticArrays for speed gains. However, I don't know how to do it without losing the flexibility of specifying G. For example, I could do
function example_svector()
vector_list = [#SVector randn(3) for q in 1:1000]
for i in size(vector_list)
vector_list[i] = SVector(vector_list[i][1] * 1, vector_list[i][1] * 2,
vector_list[i][1] * 3)
end
return vector_list
end
if I knew that G = 3 and I had to write out SVector(vector_list[i][1] * 1, vector_list[i][1] * 2, vector_list[i][1] * 3).
Is there a way to implement this for any arbitrary number of G?
The size of a static vector or array must be known at the compile time.
At the compile time only types are known (rather than values).
Hence your function could look like this:
function myRandVec(::Val{G}) where G
SVector{G}(rand(G))
end
Note that G is passed as type rather than as value and hence can be used to create a static vector.
This function could be used as:
julia> myRandVec(Val{2}())
2-element SVector{2, Float64} with indices SOneTo(2):
0.7618992223709563
0.5979657793050613
Firstly, there is a mistake in how you are indexing vector_list, where you do
for i in size(vector_list)
Let's see what that does:
julia> x = 1:10;
julia> size(x)
(10,)
The size of x is its length in each dimension, for a vector that is just (10,) since it has only one dimension. Let's try iterating:
julia> for i in size(x)
println(i)
end
10
It just prints out the number 10.
You probably meant
for i in 1:length(vector_list)
but it's better to write
for i in eachindex(vector_list)
since it is more general and safer.
As for your actual question, you can use StaticArrays.SOneTo which provides a static version of [1,2,3]:
function example_svector()
vector_list = [#SVector randn(3) for q in 1:1000]
N = length(eltype(vector_list))
c = SOneTo(N)
for i in eachindex(vector_list)
vector_list[i] = vector_list[i] .* c
end
return vector_list
end

Julia: type-stability with DataFrames

How can I access the columns of a DataFrame in a type-stable way?
Let's assume I have the following data:
df = DataFrame(x = fill(1.0, 1000000), y = fill(1, 1000000), z = fill("1", 1000000))
And now I want to do some recursive computation (so I cannot use transform)
function foo!(df::DataFrame)
for i in 1:nrow(df)
if (i > 1) df.x[i] += df.x[i-1] end
end
end
This has terrible performance:
julia> #time foo!(df)
0.144921 seconds (6.00 M allocations: 91.529 MiB)
A quick fix in this simplified example would be the following:
function bar!(df::DataFrame)
x::Vector{Float64} = df.x
for i in length(x)
if (i > 1) x[i] += x[i-1] end
end
end
julia> #time bar!(df)
0.000004 seconds
However, I'm looking for a solution that is generalisable, eg when the recursive computation is just specified as a function
function foo2!(df::DataFrame, fn::Function)
for i in 1:nrow(df)
if (i > 1) fn(df, i) end
end
end
function my_fn(df::DataFrame, i::Int64)
x::Vector{Float64} = df.x
x[i] += x[i-1]
end
While this (almost) doesn't allocate, it is still very slow.
julia> #time foo2!(df, my_fn)
0.050465 seconds (1 allocation: 16 bytes)
Is there an approach that is performant and allows this kind of flexibility / generalisability?
EDIT: I should also mention that in practice it is not known a priori on which columns the function fn depends on. Ie I'm looking for an approach that allows performant access to / updating of arbitrary columns inside fn. The needed columns could be specified together with fn as a Vector{Symbol} for example if necessary.
EDIT 2: I tried using barrier functions as follows, but it's not performant
function foo3!(df::DataFrame, fn::Function, colnames::Vector{Symbol})
cols = map(cname -> df[!,cname], colnames)
for i in 1:nrow(df)
if (i > 1) fn(cols..., i) end
end
end
function my_fn1(x::Vector{Float64}, i::Int64)
x[i] += x[i-1]
end
function my_fn2(x::Vector{Float64}, y::Vector{Int64}, i::Int64)
x[i] += x[i-1] * y[i-1]
end
#time foo3!(df, my_fn1, [:x])
#time foo3!(df, my_fn2, [:x, :y])
This issue is intended (to avoid excessive compilation for wide data frames) and the ways how to handle it are explained in https://github.com/bkamins/Julia-DataFrames-Tutorial/blob/master/11_performance.ipynb.
In general you should reduce the number of times you index into a data frame. So in this case do:
julia> function foo3!(x::AbstractVector, fn::Function)
for i in 2:length(x)
fn(x, i)
end
end
foo3! (generic function with 1 method)
julia> function my_fn(x::AbstractVector, i::Int64)
x[i] += x[i-1]
end
my_fn (generic function with 1 method)
julia> #time foo3!(df.x, my_fn)
0.010746 seconds (16.60 k allocations: 926.036 KiB)
julia> #time foo3!(df.x, my_fn)
0.002301 seconds
(I am using the version where you want to have a custom function passed)
My current approach involves wrapping the DataFrame in a struct and overloading getindex / setindex!. Some additional trickery using generated functions is needed to get the ability to access columns by name. While this is performant, it is also a quite hacky, and I was hoping there was a more elegant solution using only DataFrames.
For simplicity this assumes all (relevant) columns are of Float64 type.
struct DataFrameWrapper{colnames}
cols::Vector{Vector{Float64}}
end
function df_to_vectors(df::AbstractDataFrame, colnames::Vector{Symbol})::Vector{Vector{Float64}}
res = Vector{Vector{Float64}}(undef, length(colnames))
for i in 1:length(colnames)
res[i] = df[!,colnames[i]]
end
res
end
function DataFrameWrapper{colnames}(df::AbstractDataFrame) where colnames
DataFrameWrapper{colnames}(df_to_vectors(df, collect(colnames)))
end
get_colnames(::Type{DataFrameWrapper{colnames}}) where colnames = colnames
#generated function get_col_index(x::DataFrameWrapper, ::Val{col})::Int64 where col
id = findfirst(y -> y == col, get_colnames(x))
:($id)
end
Base.#propagate_inbounds Base.getindex(x::DataFrameWrapper, col::Val)::Vector{Float64} = x.cols[get_col_index(x, col)]
Base.#propagate_inbounds Base.getindex(x::DataFrameWrapper, col::Symbol)::Vector{Float64} = getindex(x, Val(col))
Base.#propagate_inbounds Base.setindex!(x::DataFrameWrapper, value::Float64, row::Int64, col::Val) = setindex!(x.cols[get_col_index(x, col)], value, row)
Base.#propagate_inbounds Base.setindex!(x::DataFrameWrapper, value::Float64, row::Int64, col::Symbol) = setindex!(x, value, row, Val(col))

Use of Memory-mapped in Julia

I have a Julia code, version 1.2, which performs a lot of operations on a 10000 x 10000 Array . Due to OutOfMemory() error when I run the code, I’m exploring other options to run it, such as Memory-mapping. Concerning the use of Mmap.mmap, I’m a bit confused with the use of the Array that I map to my disk, due to little explanations on https://docs.julialang.org/en/v1/stdlib/Mmap/index.html. Here is the beginning of my code:
using Distances
using LinearAlgebra
using Distributions
using Mmap
data=Float32.(rand(10000,15))
Eucldist=pairwise(Euclidean(),data,dims=1)
D=maximum(Eucldist.^2)
sigma2hat=mean(((Eucldist.^2)./D)[tril!(trues(size((Eucldist.^2)./D)),-1)])
L=exp.(-(Eucldist.^2/D)/(2*sigma2hat))
L is the 10000 x 10000 Array with which I want to work, so I mapped it to my disk with
s = open("mmap.bin", "w+")
write(s, size(L,1))
write(s, size(L,2))
write(s, L)
close(s)
What am I supposed to do after that? The next step is to perform K=eigen(L) and apply other commands to K. How should I do that? With K=eigen(L) or K=eigen(s)? What’s the role of the object s and when does it get involved? Moreover, I don’t understand why I have to use Mmap.sync! and when. After each subsequent lines after eigen(L)? At the end of the code? How can I be sure that I’m using my disk space instead of RAM memory?Would like some highlights about memory-mapping, please. Thank you!
If memory usage is a concern, it is often best to re-assign your very large arrays to 0, or to a similar type-safe small matrix, so that the memory can be garbage collected, assuming you are done with those intermediate matrices. After that, you just call Mmap.mmap() on your stored data file, with the type and dimensions of the data as second and third arguments to mmap, and then assign the function's return value to your variable, in this case L, resulting in L being bound to the file contents:
using Distances
using LinearAlgebra
using Distributions
using Mmap
function testmmap()
data = Float32.(rand(10000, 15))
Eucldist = pairwise(Euclidean(), data, dims=1)
D = maximum(Eucldist.^2)
sigma2hat = mean(((Eucldist.^2) ./ D)[tril!(trues(size((Eucldist.^2) ./ D)), -1)])
L = exp.(-(Eucldist.^2 / D) / (2 * sigma2hat))
s = open("./tmp/mmap.bin", "w+")
write(s, size(L,1))
write(s, size(L,2))
write(s, L)
close(s)
# deref and gc collect
Eucldist = data = L = zeros(Float32, 2, 2)
GC.gc()
s = open("./tmp/mmap.bin", "r+") # allow read and write
m = read(s, Int)
n = read(s, Int)
L = Mmap.mmap(s, Matrix{Float32}, (m, n)) # now L references the file contents
K = eigen(L)
K
end
testmmap()
#time testmmap() # 109.657995 seconds (17.48 k allocations: 4.673 GiB, 0.73% gc time)

Tail recursion in R

I seem to misunderstand tail recursion; according to this stackoverflow question R does not support tail recursion. However, let's consider the following functions to compute the nth fibonacci number:
Iterative version:
Fibo <- function(n){
a <- 0
b <- 1
for (i in 1:n){
temp <- b
b <- a
a <- a + temp
}
return(a)
}
"Naive" recursive version:
FiboRecur <- function(n){
if (n == 0 || n == 1){
return(n)
} else {
return(FiboRecur(n-1) + FiboRecur(n-2))
}
}
And finally an example I found that should be tail call recursive:
FiboRecurTail <- function(n){
fib_help <- function(a, b, n){
if(n > 0){
return(fib_help(b, a+b, n-1))
} else {
return(a)
}
}
return(fib_help(0, 1, n))
}
Now if we take a look at the traces when these functions are called, here is what we get:
Fibo(25)
trace: Fibo(25)
[1] 75025
trace(FiboRecur)
FiboRecur(25)
Thousands of calls to FiboRecur and takes a lot of time to run
FiboRecurTail(25)
trace: FiboRecurTail(25)
[1] 75025
In the cases of Fibo(25) and FiboRecurTail(25), the answer is displayed instantaneously and only one call is made. For FiboRecur(25), thousands of calls are made and it runs for some seconds before showing the result.
We can also take a look at the run times using the benchmark function from the package rbenchmark:
benchmark(Fibo(30), FiboRecur(30), FiboRecurTail(30), replications = 5)
test replications elapsed relative user.self sys.self user.child sys.child
1 Fibo(30) 5 0.00 NA 0.000 0 0 0
2 FiboRecur(30) 5 13.79 NA 13.792 0 0 0
3 FiboRecurTail(30) 5 0.00 NA 0.000 0 0 0
So if R does not support tail recursion, what is happening in FiboRecurTail(25) that makes it run as fast as the iterative version while the "naive" recursive function runs like molasses? Is it rather that R supports tail recursion, but does not optimize a "naive" recursive version of a function to be tail-call recursive like other programming languages (Haskell for instance) do? This is what I understand from this post in R's mailing list.
I would greatly appreciate if someone would shed some light into this. Thanks!
The difference is that for each recursion, FiboRecur calls itself twice. Within FiboRecurTail, fib_help calls itself only once.
Thus you have a whole lot more function calls with the former. In the case of FiboRecurTail(25) you have a recursion depth of ~25 calls. FiboRecur(25) results in 242,785 function calls (including the first).
I didn't time any of the routines, but note that you show 0.00 for both of the faster routines. You should see some difference with a higher input value, but note that Fibo iterates exactly as much as FiboRecurTail recurses.
In the naive recursive approach, you repetitively calculated a lot of values. For example, when you calculate FiboRecur(30) you will calculate FiboRecur(29) and FiboRecur(28), and each of these two calls are independent. And in FiboRecur(29) you will calculate FiboRecur(28) again and FiboRecur(27) even though FiboRecur(28) has already been calculated somewhere else as above. And this happens for every stage of recursion. Or simply put, for every increase of n, the calculation effort almost doubles but obviously, in reality it should just be as simple as add the last two calculated numbers together.
A little summary of FiboRecur(4): FiboRecur(0) is calculated twice, FiboRecur(1) is calculated three times, FiboRecur(2) is calculated twice and FiboRecur(3) is calculated once. The former three should really be calculated once and stored somewhere so that you can extract the values whenever they are needed. And that's why you see so many function calls even though it's not a large number.
In the tail recursive version, however, every previously calculated values are passed to the next stage via a + b parameter, which avoids countless repetitive calculations as in the naive recursive version, and thus more efficient.
The following algorithm uses accumulator parameter technique to make things tail recursive, then wraps it in a memoization function.
Number of function calls shouldn't necessarily differ for tail-recursion. This is mostly about managing stack memory, not speed. Every call to fib(n) generates calls to fib(n - 1) and fib(n - 2), expect in tail-recursive cases, the stack frame is reused rather than a new one being allocated for each call.
Memoization is what gives a speed-boost. Results are cached for future use.
library(hash)
# Generate Fibonacci numbers
# Tail Recursive Algorithm using Accumulator Parameter Technique
fibTR <- function(n) {
fibLoop <- function(acc, m, k) {
if (k == 0)
acc
else
fibLoop(acc = m, m = acc + m, k = k - 1)
}
fibLoop(acc = 0, m = 1, k = n)
}
# A generic memoization function for function fn taking integer input
memoize <- function(fn, inp) {
cache <- hash::hash()
key <- as.character(inp)
if (hash::has.key(key = key, hash = cache))
cache[[key]]
else {
cache[[key]] <- inp %>% fn
cache[[key]]
}
}
# Partial Application of a Function
# Memoized and Tail Recursive Fibonacci Number Generator
fib <- partial(.f = memoize, fn = fibTR)
# Get the first 10 Fibonacci numbers
map(.x = 0:9, .f = fib) %>% unlist
Running fibAux(10000) yields
Error: C stack usage 15927040 is too close to the limit
So, I doubt R does efficient tail call optimization.
Another issue is the construction of the cache or lookaside table. In functional languages such as Haskell, ML, ..., that intermediary data structures get built when you first partially call the function. Assuming the same effect in R, another issue is that memory allocation in R is very expensive so is growing vectors, matrices, etc: Here, we are growing a dictionary, and if we pre-allocate the dictionary of appropriate size, then we have to supply the n argument and the cache gets constructed every time we call the function which defeats the purpose.
// Here is F# code to do the same:
// A generate Fibonacci numbers: Tail Recursive Algorithm
let fibTR n =
let rec fibLoop acc m k =
match k with
| 0 -> acc
| n -> fibLoop m (acc + m) (n - 1)
fibLoop 0 1 n
// A generic memoization function
let memoize (fn: 'T -> 'U) =
let cache = new System.Collections.Generic.Dictionary<_, _>()
fun inp ->
match cache.TryGetValue inp with
| true, res -> res
| false, _ ->
let res = inp |> fn
cache.Add(inp, res)
res
// A tail recursive and
let fib = fibTR |> memoize
// Get the first 10 Fibonacci numbers
[ 0..9 ] |> List.map fib

Resources