Frequencies in a vector or list using counts() - julia

How can i use counts() to show the frequencies and the items? for example:
a=[1,2,2,3]
count(a) gives 1,2,1
How can i do to get:
1:1, 2:2, 3:1?
Thanks

It looks like you are already using StatsBase, because that is where the counts function you mention is defined. The function you are looking for is called countmap:
using StatsBase
a = [1,2,2,3];
countmap(a)
# Dict{Int64, Int64} with 3 entries:
# 2 => 2
# 3 => 1
# 1 => 1

If you prefer tabular output you can also do:
julia> using FreqTables
julia> a = [1,2,2,3];
julia> freqtable(a)
3-element Named Vector{Int64}
Dim1 │
──────┼──
1 │ 1
2 │ 2
3 │ 1

Related

Iterate over dataframe names in Julia

I am trying to generate n dataframes using a loop, where each dataframe has one column with i rows populated with random numbers where i=1:n. So far, none of the following iterating methods work for iterating over dataframe names in order to generate them:
n = 5;
for i = 1:n
"df_$i" = DataFrame(rand($i), :auto)
end
or
n = 5;
for i = 1:n
"df_$(i)" = DataFrame(rand($i), :auto)
end
Thanks!
Is this what you want?
julia> [DataFrame("col$i" => rand(i)) for i in 1:3]
3-element Vector{DataFrame}:
1×1 DataFrame
Row │ col1
│ Float64
─────┼──────────
1 │ 0.368821
2×1 DataFrame
Row │ col2
│ Float64
─────┼──────────
1 │ 0.757023
2 │ 0.201711
3×1 DataFrame
Row │ col3
│ Float64
─────┼──────────
1 │ 0.702651
2 │ 0.256179
3 │ 0.560374
(I additionally showed you how to dynamically generate the name of the column in each data frame)

Is it possible to access row index inside DataFramesMeta macros?

Is there a way to access current_row_index in the following snippet ?
#with df begin
fn.(:col, current_row_index)
end
In this context, since you are broacasting just pass first axes of df:
julia> using DataFramesMeta
julia> fn(x, y) = (x, y)
fn (generic function with 1 method)
julia> df = DataFrame(col=["a", "b", "c"])
3×1 DataFrame
Row │ col
│ String
─────┼────────
1 │ a
2 │ b
3 │ c
julia> #with df begin
fn.(:col, axes(df, 1))
end
3-element Vector{Tuple{String, Int64}}:
("a", 1)
("b", 2)
("c", 3)

Julia subsetting dataframe with multiple conditions

In DataFramesMeta, why should I wrap every condition within a pair of parentheses? Below is an example dataframe where I want a subset that contains values greater than 1 or is missing.
d = DataFrame(a = [1, 2, missing], b = ["x", "y", missing]);
Using DataFramesMeta to subset:
#chain d begin
#subset #byrow begin
(:a > 1) | (:a===missing)
end
end
If I don't use parentheses, errors pop up.
#chain d begin
#subset #byrow begin
:a > 1 | :a===missing
end
end
# ERROR: LoadError: TypeError: non-boolean (Missing) used in boolean context
The reason is operator precedence (and is unrelated to DataFramesMeta.jl).
See:
julia> dump(:(2 > 1 | 3 > 4))
Expr
head: Symbol comparison
args: Array{Any}((5,))
1: Int64 2
2: Symbol >
3: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol |
2: Int64 1
3: Int64 3
4: Symbol >
5: Int64 4
as you can see 2 > 1 | 3 > 4 gets parsed as: 2 > (1 | 3) > 4 which is not what you want.
However, I would recommend you the following syntax for your case:
julia> #chain d begin
#subset #byrow begin
coalesce(:a > 1, true)
end
end
2×2 DataFrame
Row │ a b
│ Int64? String?
─────┼──────────────────
1 │ 2 y
2 │ missing missing
or
julia> #chain d begin
#subset #byrow begin
ismissing(:a) || :a > 1
end
end
2×2 DataFrame
Row │ a b
│ Int64? String?
─────┼──────────────────
1 │ 2 y
2 │ missing missing
I personally prefer coalesce but it is a matter of taste.
Note that || as opposed to | does not require parentheses, but you need to reverse the order of the conditions to take advantage of short circuiting behavior of || as if you reversed the conditions you would get an error:
julia> #chain d begin
#subset #byrow begin
:a > 1 || ismissing(:a)
end
end
ERROR: TypeError: non-boolean (Missing) used in boolean context
Finally with #rsubset this can be just:
julia> #chain d begin
#rsubset coalesce(:a > 1, true)
end
2×2 DataFrame
Row │ a b
│ Int64? String?
─────┼──────────────────
1 │ 2 y
2 │ missing missing
(I assume you want #chain as this is one of the steps you want to do in the analysis so I keep it)

Bind function arguments in Julia

Does Julia provide something similar to std::bind in C++? I wish to do something along the lines of:
function add(x, y)
return x + y
end
add45 = bind(add, 4, 5)
add2 = bind(add, _1, 2)
add3 = bind(add, 3, _2)
And if this is possible does it incur any performance overhead?
As answered here you can obtain this behavior using higher order functions in Julia.
Regarding the performance. There should be no overhead. Actually the compiler should inline everything in such a situation and even perform constant propagation (so that the code could actually be faster). The use of const in the other answer here is needed only because we are working in global scope. If all this would be used within a function then const is not required (as the function that takes this argument will be properly compiled), so in the example below I do not use const.
Let me give an example with Base.Fix1 and your add function:
julia> using BenchmarkTools
julia> function add(x, y)
return x + y
end
add (generic function with 1 method)
julia> add2 = Base.Fix1(add, 10)
(::Base.Fix1{typeof(add), Int64}) (generic function with 1 method)
julia> y = 1:10^6;
julia> #btime add.(10, $y);
1.187 ms (2 allocations: 7.63 MiB)
julia> #btime $add2.($y);
1.189 ms (2 allocations: 7.63 MiB)
Note that I did not define add2 as const and since we are in global scope I need to prefix it with $ to interpolate its value into the benchmarking suite.
If I did not do it you would get:
julia> #btime add2.($y);
1.187 ms (6 allocations: 7.63 MiB)
Which is essentially the same timing and memory use, but does 6 not 2 allocations since in this case add2 is a type-unstable global variable.
I work on DataFrames.jl, and there using the patterns which we discuss here is very useful. Let me give just one example:
julia> using DataFrames
julia> df = DataFrame(x = 1:5)
5×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
3 │ 3
4 │ 4
5 │ 5
julia> filter(:x => <(2.5), df)
2×1 DataFrame
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
What the operation does is picking rows where values from column :x that are less than 2.5. The key thing to understand here is what <(2.5) does. It is:
julia> <(2.5)
(::Base.Fix2{typeof(<), Float64}) (generic function with 1 method)
so as you can see it is similar to what we would have obtained if we defined the x -> x < 2.5 function (essentially fixing the second argument of < function, as in Julia < is just a two argument function). Such shortcuts like <(2.5) above are defined in Julia by default for several common comparison operators.

Julia: How to obtain all but one point from an array? [duplicate]

This question already has answers here:
Index Array without Elements
(3 answers)
Closed 2 years ago.
Say x = [1:5..] and I wish to return an array with the element 1:2 and 4:5 i.e. all but one element namely 3. How do I do that?
I tried x[1:2; 4:end] and x[1:2 4:end]. Neither worked.
I would really like to use the end keyword if possible.
InvertedIndices.jl has a nice interface for this:
julia> using InvertedIndices
julia> v = map(i -> i => rand(), 1:5)
5-element Array{Pair{Int64,Float64},1}:
1 => 0.8165266824627073
2 => 0.38840874144349025
3 => 0.061178225310028145
4 => 0.6615139442678073
5 => 0.10733363621427094
julia> v[Not(3)]
4-element Array{Pair{Int64,Float64},1}:
1 => 0.8165266824627073
2 => 0.38840874144349025
4 => 0.6615139442678073
5 => 0.10733363621427094
You could do a union of the indices:
julia> x = [1:5..]
5-element Array{Int64,1}:
1
2
3
4
5
julia> x[(1:2) ∪ (4:end) ]
4-element Array{Int64,1}:
1
2
4
5
I typed the "union" symbol by writing \cup and hitting TAB

Resources