Check size in bytes of variable using Julia - julia

Question: How do I check the size in bytes of a variable using Julia?
What I've tried: In Matlab, the whos() function provided this information, but in Julia that just provides the variable names and module. Browsing the standard library in the Julia manual, sizeof() looked promising, but it only appears to provide the size of the canonical binary representation, rather than the current variable.

sizeof works on variables too
sizeof(a::Array{T,N})
returns the size of the array times the element size.
julia> x = [1 2 3 4]
1x4 Array{Int64,2}:
1 2 3 4
julia> sizeof(x)
32
julia> x = Int8[1 2 3 4]
1x4 Array{Int8,2}:
1 2 3 4
julia> sizeof(x)
4
sizeof(B::BitArray{N})
returns chunks; each chunk is 8 bytes so can represent up to 64 bits
julia> x = BitArray(36);
julia> sizeof(x)
8
julia> x = BitArray(65);
julia> sizeof(x)
16
sizeof(s::ASCIIString) and sizeof(s::UTF8String)
return the number of characters in the string (1 byte/char).
julia> sizeof("hello world")
11
sizeof(s::UTF16String) and sizeof(s::UTF32String)
Same as above but with 2 and 4 bytes/character respectively.
julia> x = utf32("abcd");
julia> sizeof(x)
16
Accordingly other strings
sizeof(s::SubString{ASCIIString}) at string.jl:590
sizeof(s::SubString{UTF8String}) at string.jl:591
sizeof(s::RepString) at string.jl:690
sizeof(s::RevString{T<:AbstractString}) at string.jl:737
sizeof(s::RopeString) at string.jl:802
sizeof(s::AbstractString) at string.jl:71
core values
returns the number of bytes each variable uses
julia> x = Int64(0);
julia> sizeof(x)
8
julia> x = Int8(0);
julia> sizeof(x)
1
julia> x = Float16(0);
julia> sizeof(x)
2
julia> x = sizeof(Float64)
8
one would expect, but note that Julia characters are wide characters
julia> sizeof('a')
4
getBytes
For cases where the layout is more complex and/or not contiguous. Here's a function that will iterate over the fields of a variable (if any) and return of sum of all of the sizeof results which should be the total number of bytes allocated.
getBytes(x::DataType) = sizeof(x);
function getBytes(x)
total = 0;
fieldNames = fieldnames(typeof(x));
if fieldNames == []
return sizeof(x);
else
for fieldName in fieldNames
total += getBytes(getfield(x,fieldName));
end
return total;
end
end
using it
create an instance of a random-ish type...
julia> type X a::Vector{Int64}; b::Date end
julia> x = X([i for i = 1:50],now())
X([1,2,3,4,5,6,7,8,9,10 … 41,42,43,44,45,46,47,48,49,50],2015-02-09)
julia> getBytes(x)
408

The function Base.summarysize provides exactly that
It also includes the overhead from the struct as seen in the examples.
julia> struct Foo a; b end
julia> Base.summarysize(ones(10000))
80040
julia> Base.summarysize(Foo(ones(10000), 1))
80064
julia> Base.summarysize(Foo(ones(10000), Foo(ones(10, 10), 1)))
80920
However, care should be taken as the function is non-exported and might not be future proof

In julia 1.6, varinfo() shows sizes:
julia> a = 1;
julia> v = ones(10000);
julia> varinfo()
name size summary
–––––––––––––––– ––––––––––– –––––––––––––––––––––––––––––
Base Module
Core Module
InteractiveUtils 250.022 KiB Module
Main Module
ans 78.164 KiB 10000-element Vector{Float64}
v 78.164 KiB 10000-element Vector{Float64}
a 8 bytes Int64
For specific variables, either use pattern matching (r"..." is a regular expression):
julia> varinfo(r"^v$")
name size summary
–––– –––––––––– –––––––––––––––––––––––––––––
v 78.164 KiB 10000-element Vector{Float64}
or combine the Base.summarysize from Korbinian answer with Base.format_bytes:
julia> pretty_summarysize(x) = Base.format_bytes(Base.summarysize(x))
pretty_summarysize (generic function with 1 method)
julia> pretty_summarysize(v)
"78.164 KiB"
Edit: beware that summarysize had a bug, at least in 1.5.3 and 1.6.1. varinfo was affected as well. It is fixed (tested with 1.7.3).

Related

Bind function arguments in Julia

Does Julia provide something similar to std::bind in C++? I wish to do something along the lines of:
function add(x, y)
return x + y
end
add45 = bind(add, 4, 5)
add2 = bind(add, _1, 2)
add3 = bind(add, 3, _2)
And if this is possible does it incur any performance overhead?
As answered here you can obtain this behavior using higher order functions in Julia.
Regarding the performance. There should be no overhead. Actually the compiler should inline everything in such a situation and even perform constant propagation (so that the code could actually be faster). The use of const in the other answer here is needed only because we are working in global scope. If all this would be used within a function then const is not required (as the function that takes this argument will be properly compiled), so in the example below I do not use const.
Let me give an example with Base.Fix1 and your add function:
julia> using BenchmarkTools
julia> function add(x, y)
return x + y
end
add (generic function with 1 method)
julia> add2 = Base.Fix1(add, 10)
(::Base.Fix1{typeof(add), Int64}) (generic function with 1 method)
julia> y = 1:10^6;
julia> #btime add.(10, $y);
1.187 ms (2 allocations: 7.63 MiB)
julia> #btime $add2.($y);
1.189 ms (2 allocations: 7.63 MiB)
Note that I did not define add2 as const and since we are in global scope I need to prefix it with $ to interpolate its value into the benchmarking suite.
If I did not do it you would get:
julia> #btime add2.($y);
1.187 ms (6 allocations: 7.63 MiB)
Which is essentially the same timing and memory use, but does 6 not 2 allocations since in this case add2 is a type-unstable global variable.
I work on DataFrames.jl, and there using the patterns which we discuss here is very useful. Let me give just one example:
julia> using DataFrames
julia> df = DataFrame(x = 1:5)
5×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
4 │ 4
5 │ 5
julia> filter(:x => <(2.5), df)
2×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
What the operation does is picking rows where values from column :x that are less than 2.5. The key thing to understand here is what <(2.5) does. It is:
julia> <(2.5)
(::Base.Fix2{typeof(<), Float64}) (generic function with 1 method)
so as you can see it is similar to what we would have obtained if we defined the x -> x < 2.5 function (essentially fixing the second argument of < function, as in Julia < is just a two argument function). Such shortcuts like <(2.5) above are defined in Julia by default for several common comparison operators.

Julia - combining vectors into the matrix

Let's assume I have two vectors x = [1, 2] and y = [3, 4]. How to best combine them to get a matrix m = [1 2; 3 4] in Julia Programming language? Thanks in advance for your support.
Note that in vcat(x', y') the operation x' is adjoint so it should not be used if you are working with complex numbers or vector elements that do not have adjoint defined (e.g. strings). Therefore then permutedims should be used but it will be slower as it allocates. A third way to do it is (admittedly it is more cumbersome to type):
julia> [reshape(x, 1, :); reshape(y, 1, :)]
2×2 Array{Int64,2}:
1 2
3 4
It is non allocating like [x'; y'] but does not do a recursive adjoint.
EDIT:
Note for Cameron:
julia> x = repeat(string.('a':'z'), 10^6);
julia> #btime $x';
1.199 ns (0 allocations: 0 bytes)
julia> #btime reshape($x, 1, :);
36.455 ns (2 allocations: 96 bytes)
so reshape allocates but only minimally (it needs to create an array object, while x' creates an immutable struct which does not require allocation).
Also I think it was a design decision to allocate. As for isbitsunion types actually reshape returns a struct so it does not allocate (similarly like for ranges):
julia> #btime reshape($x, 1, :)
12.211 ns (0 allocations: 0 bytes)
1×2 reshape(::Array{Union{Missing, Int64},1}, 1, 2) with eltype Union{Missing, Int64}:
1 missing
Two ways I know of:
julia> x = [1,2];
julia> y = [3,4];
julia> vcat(x', y')
2×2 Array{Int64,2}:
1 2
3 4
julia> permutedims(hcat(x, y))
2×2 Array{Int64,2}:
1 2
3 4
One more option - this one works both with numbers and other objects as Strings:
julia> rotl90([y x])
2×2 Array{Int64,2}:
1 2
3 4
What about
vcat(transpose(x), transpose(y))
or
[transpose(x); transpose(y)]

I want to find the number which act to 0 of Julia - I mean the nearest number of 0

Why does this happen in Julia?
My input is
A = []
for i = 17:21
t = 1/(10^(i))
push!(A, t)
end
return(A)
And the output was:
5-element Array{Any,1}:
1.0e-17
1.0e-18
-1.1838881245526248e-19
1.2876178137472069e-19
2.5800991659088344e-19
I observed that
A[3]>0
false
I want to find the number which act to 0 of Julia, but I found this and don’t understand.
The reason for this problem is when you have i = 19, note that then:
julia> 10^19
-8446744073709551616
and it is unrelated to floating point numbers, but is caused by Int64 overflow.
Here is the code that will work as you expect. Either use 10.0 instead of 10 as 10.0 is a Float64 value:
julia> A=[]
Any[]
julia> for i=17:21
t=1/(10.0^(i))
push!(A,t)
end
julia> A
5-element Array{Any,1}:
1.0e-17
1.0e-18
1.0e-19
1.0e-20
1.0e-21
or using high precision BigInt type that is created using big(10)
julia> A=[]
Any[]
julia> for i=17:21
t=1/(big(10)^(i))
push!(A,t)
end
julia> A
5-element Array{Any,1}:
9.999999999999999999999999999999999999999999999999999999999999999999999999999967e-18
9.999999999999999999999999999999999999999999999999999999999999999999999999999997e-19
9.999999999999999999999999999999999999999999999999999999999999999999999999999997e-20
1.000000000000000000000000000000000000000000000000000000000000000000000000000004e-20
9.999999999999999999999999999999999999999999999999999999999999999999999999999927e-22
You can find more discussion of this here https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Overflow-behavior.
For example notice that (which you might find surprising not knowing about the overflow):
julia> x = typemin(Int64)
-9223372036854775808
julia> x^2
0
julia> y = typemax(Int64)
9223372036854775807
julia> y^2
1
Finally to find smallest positive Float64 number use:
julia> nextfloat(0.0)
5.0e-324
or
julia> eps(0.0)
5.0e-324

Utilizing ndgrid/meshgrid functionality in Julia

I'm trying to find functionality in Julia similar to MATLAB's meshgrid or ndgrid. I know Julia has defined ndgrid in the examples but when I try to use it I get the following error.
UndefVarError: ndgrid not defined
Anyone know either how to get the builtin ndgrid function to work or possibly another function I haven't found or library that provides these methods (the builtin function would be preferred)? I'd rather not write my own in this case.
Thanks!
We prefer to avoid these functions, since they allocate arrays that usually aren't necessary. The values in these arrays have such a regular structure that they don't need to be stored; they can just be computed during iteration. For example, one alternative approach is to write an array comprehension:
julia> [ 10i + j for i=1:5, j=1:5 ]
5×5 Array{Int64,2}:
11 12 13 14 15
21 22 23 24 25
31 32 33 34 35
41 42 43 44 45
51 52 53 54 55
Or, you can write for loops, or iterate over a product iterator:
julia> collect(Iterators.product(1:2, 3:4))
2×2 Array{Tuple{Int64,Int64},2}:
(1, 3) (1, 4)
(2, 3) (2, 4)
I do find sometimes it's convenient to use some function like meshgrid in numpy. It's easy to do it with list comprehension:
function meshgrid(x, y)
X = [i for i in x, j in 1:length(y)]
Y = [j for i in 1:length(x), j in y]
return X, Y
end
e.g.
x = 1:4
y = 1:3
X, Y = meshgrid(x, y)
now
julia> X
4×3 Array{Int64,2}:
1 1 1
2 2 2
3 3 3
4 4 4
julia> Y
4×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
1 2 3
However, I did not find this makes the code run faster than using iteration. Here's what I mean:
After defining
x = 1:1000
y = x
X, Y = meshgrid(x, y)
I did benchmark on the following two functions
using Statistics
function fun1()
return mean(sqrt.(X.*X + Y.*Y))
end
function fun2()
sum = 0.0
for i in 1:1000
for j in 1:1000
sum += sqrt(i*i + j*j)
end
end
return sum / (1000*1000)
end
Here are the benchmark results:
julia> #btime fun1()
8.310 ms (19 allocations: 30.52 MiB)
julia> #btime run2()
1.671 ms (0 allocations: 0 bytes)
The meshgrid method is both significantly slower and taking more memory. Any Julia expert knows why? I understand Julia is a compiling language unlike Python so iterations won't be slower than vectorization, but I don't understand why vector(array) calculation is many times slower than iteration. (For bigger N this difference is even larger.)
Edit: After reading this post, I have the following updated version of the 'meshgrid' method. The idea is to not create a meshgrid beforehand, but to do it in the calculation via Julia's powerful elementwise array operation:
x = collect(1:1000)
y = x'
function fun1v2()
mean(sqrt.(x .* x .+ y .* y))
end
The trick here is the .+ between a size-M column array and a size-N row array which returns a M-by-N array. It does the 'meshgrid' for you. This function is nearly 3 times faster then fun1, albeit not as fast as fun2.
julia> #btime fun1v2()
3.189 ms (24 allocations: 7.63 MiB)
765.8435104896155
Above, #ChrisRackauckas suggests that the "proper way" to do this is with a lazy operator but he hadn't gotten around to it.
There is now a registered packaged with lazy ndgrid in it:
https://github.com/JuliaArrays/LazyGrids.jl
It is more general than the version in
VectorizedRoutines.jl
because it can handle vectors with different types, e.g.,
ndgrid(1:3, Float16[0:2], ["x", "y", "z"]).
There are Literate.jl examples in the docs that show the lazy performance is pretty good.
Of course lazy meshgrid is just one step away:
meshgrid(y,x) = (ndgrid_lazy(x,y)[[2,1]]...,)

Concatenate ArrayViews (or sliceviews or SubArrays) in Julia?

Is there a way to concatenate ArrayViews in Julia, that doesn't copy the underlying data? (I'd also be glad to use a SubArray, if that solves the problem.)
In the code below, for example, I want a single ArrayView that references the data in both y1 and y2.
julia> x = [1:50];
julia> using ArrayViews;
julia> y1 = view(x, 2:5);
julia> y2 = view(x, 44:48);
julia> concat(y1, y2) # I wish there were a function like this
ERROR: concat not defined
julia> [y1, y2] # This copies the data in y1 and y2, unfortunately
9-element Array{Int64,1}:
2
3
4
5
44
45
46
47
48
Not directly. But you could roll your own type with something like:
julia> type CView{A<:AbstractArray} <: AbstractArray
a::A
b::A
end
julia> import Base: size, getindex, setindex!
julia> size(c::CView) = tuple([sa+sb for (sa, sb) in zip(size(c.a), size(c.b))]...)
size (generic function with 57 methods)
julia> getindex(c::CView, i::Int) = i <= length(c.a) ? getindex(c.a, i) : getindex(c.b, i)
getindex (generic function with 180 methods)
julia> c = CView(y1, y2);
julia> size(c)
(9,)
julia> c[1]
2
julia> c[4]
5
julia> c[5]
48
These methods may not be optimal but they can certainly get you started. To be useful, more methods would probably be needed. Note that the key is simply in deciding which member array to index into. For multidimensional indexing sub2ind can be used.

Resources