I’m trying to count things in a Julia list with the goal of plotting a histogram. These things may be other arrays to simpler objects like Strings or Integers. My function is currently using the counter library, which works great for non-complex objects like strings or integers.
function viz(data::Vector)
counts = counter(data)
k = [x for x in keys(counts)]
v = [x for x in values(counts)]
bar(k, v./sum(v))
end
In Python, I’d just do str(x) for x in the_list To convert the inner element to strings, but I’m having trouble figuring out how to do this in Julia.
Or is there a better way to count complex objects in Julia? (I’m a beginner at Julia)
[string(x) for x in the_list]
# or
[String(x) for x in the_list]
one of them probably gives what you want
Take a look at StatsBase.countmap(x), if it does what you need.
Related
I know in Julia, the index of an array begin from 1. Like
b = Array{Float64, 1}(undef, 10)
This array b is a 1d array with 10 elements. The index of b begins from 1.
But, I want an array whose index is from 0 or any integer, how to do that in Julia?
Say, I want the index ranges from 0 to 9, and I tried to do things like
b = Array{Float64, 1}(undef, 0:9)
But obviously it does not work in Julia.
Can Julia easily define an array with arbitrary index range like Fortran?
I googled a little and it seems not easy to do this in Julia, am I missing something?
Is there a generic way in Julia to define arbitrary indexed array? Or do I have to install packages like OffsetArrays?
It seems just not so great that Julia cannot generically define arbitrary indexed array.
Thanks!
In Julia, this is provided by the OffsetArrays package. Try, for example
using OffsetArrays
A = rand(10)
OA = OffsetArray(A, 0:9)
OA[0]
then
julia> OA[0]
0.26079620656304203
Julia's "higher-order" function "map" looks very useful. But while it is easy to understand how it can be used on functions that have one input, it is not obvious how map can be used when the function has multiple inputs, and when each these may be arrays. I would like discover how map is used in that situation.
Suppose I have the following function:
function randomSample(items, weights)
sample(items, Weights(weights))
end
Example:
Pkg.add("StatsBase")
using StatsBase
randomSample([1,0],[0.5, 0.5])
How can map be used here? I have tried something like:
items = [1 0;1 0;1 0]
weights = [1 0;0.5 0.5;0.75 0.25]
map(randomSample(items,weights))
In the example above, I would expect Julia to output a 3 by 1 array of integers (from the items), each row being either 0 or 1 depending on the corresponding weights.
In your case when items and weights are Matrix you can use the eachrow function like this:
map(randomSample, eachrow(items), eachrow(weights))
If you are on Julia version earlier than 1.1 you can write:
map(i -> randomSample(items[i, :], weights[i, :]), axes(items, 1))
or
map(i -> randomSample(view(items,i, :), view(weights, i, :)), axes(items, 1))
(the latter avoids allocations)
However, in practice I would probably define items and weights as vectors of vectors:
items = [[1, 0],[1, 0],[1, 0]]
weights = [[1, 0], [0.5, 0.5], [0.75, 0.25]]
and then you can simply write:
map(randomSample, items, weights)
or
randomSample.(items, weights)
The reason for my preference is the following:
it is conceptually clearer what is the structure of your data
vector of vectors is easier to mutate (e.g. you can push! a new entry at the end)
vector of vectors can be ragged if needed
in some cases it might be a bit faster (iterating by rows in Julia is not optimal as it uses column-major indexing; of course you can fix it in your Matrix approach by assuming that you store your data columnwise not colwise as you currently do)
(this is not a very strong preference and you can probably choose whatever is more convenient to you)
I have an Array of arrays, called y:
y=Array(Vector{Int64}, 10)
which is basically a list of 1-dimensional arrays(10 of them), and each 1-dimensional array has length 5. Below is an example of how they are initialized:
for i in 1:10
y[i]=sample(1:20, 5)
end
Each 1-dimensional array includes 5 randomly sampled integers between 1 to 20.
Right now I am applying a map function where for each of those 1-dimensional arrays in y , excludes which numbers from 1 to 20:
map(x->setdiff(1:20, x), y)
However, I want to make sure when the function applied to y[i], if the output of setdiff(1:20, y[i]) includes i, i is excluded from the results. in other words I want a function that works like
setdiff(deleteat!(Vector(1:20),i) ,y[i])
but with map.
Mainly my question is that whether you can access the index in the map function.
P.S, I know how to do it with comprehensions, I wanted to know if it is possible to do it with map.
comprehension way:
[setdiff(deleteat!(Vector(1:20), index), value) for (index,value) in enumerate(y)]
Like this?
map(x -> setdiff(deleteat!(Vector(1:20), x[1]),x[2]), enumerate(y))
For your example gives this:
[2,3,4,5,7,8,9,10,11,12,13,15,17,19,20]
[1,3,5,6,7,8,9,10,11,13,16,17,18,20]
....
[1,2,4,7,8,10,11,12,13,14,15,16,17,18]
[1,2,3,5,6,8,11,12,13,14,15,16,17,19,20]
I would like to write a function fun1 with a DataArrays.DataArray y as unique argument. y can be either an integer or a float (in vector or in matrix form).
I have tried to follow the suggestions I have found in stackoverflow (Functions that take DataArrays and Arrays as arguments in Julia) and in the official documentation (http://docs.julialang.org/en/release-0.5/manual/methods/). However, I couldn't write a code enought flexible to deal with the uncertainty around y.
I would like to have something like (but capable of handling numerical DataArrays.DataArray):
function fun1(y::Number)
println(y);
end
Any suggestion?
One options can be to define:
fun1{T<:Number}(yvec::DataArray{T}) = foreach(println,yvec)
Then,
using DataArrays
v = DataArray(rand(10))
w = DataArray(rand(1:10,10))
fun1(v)
#
# elements of v printed as Flaot64s
#
fun1(w)
#
# elements of w printed as Ints
#
A delicate but recurring point to note is the invariance of Julia parametric types which necessitate defining a parametric function. A look at the documentation regarding types should clarify this concept (http://docs.julialang.org/en/release-0.4/manual/types/#types).
Let x::Vector{Vector{T}}. What is the best way to iterate over all the elements of each inner vector (that is, all elements of type T)? The best I can come up with is a double iteration using the single-line notation, ie:
for n in eachindex(x), m in eachindex(x[n])
x[n][m]
end
but I'm wondering if there is a single iterator, perhaps in the Iterators package, designed specifically for this purpose, e.g. for i in some_iterator(x) ; x[i] ; end.
More generally, what about iterating over the inner-most elements of any array of arrays (that is, arrays of any dimension)?
Your way
for n in eachindex(x), m in eachindex(x[n])
x[n][m]
end
is pretty fast. If you want best speed, use
for n in eachindex(x)
y = x[n]
for m in eachindex(y)
y[m]
end
end
which avoids dereferencing twice (the first dereference is hard to optimize out because arrays are mutable, and so getindex isn't pure). Alternatively, if you don't need m and n, you could just use
for y in x, for z in y
z
end
which is also fast.
Note that column-major storage is irrelevant, since all arrays here are one-dimensional.
To answer your general question:
If the number of dimensions is a compile-time constant, see Base.Cartesian
If the number of dimensions is not a compile-time constant, use recursion
And finally, as Dan Getz mentioned in a comment:
using Iterators
for z in chain(x...)
z
end
also works. This however has a bit of a performance penalty.
I'm wondering if there is a single iterator, perhaps in the Iterators package, designed specifically for this purpose, e.g. for i in some_iterator(x) ; x[i] ; end
Today (in Julia 1.x versions), Iterators.flatten is exactly this.
help?> Iterators.flatten
flatten(iter)
Given an iterator that yields iterators, return an iterator that
yields the elements of those iterators. Put differently, the
elements of the argument iterator are concatenated.
julia> x = [1:5, [π, ℯ, 42], 'a':'e']
3-element Vector{AbstractVector}:
1:5
[3.141592653589793, 2.718281828459045, 42.0]
'a':1:'e'
julia> for el in Iterators.flatten(x)
print(el, " ")
end
1 2 3 4 5 3.141592653589793 2.718281828459045 42.0 a b c d e
julia>