Equivalent of pandas 'clip' in Julia - julia

In pandas, there is the clip function (see https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.clip.html), which constrains values within the lower and upper bound provided by the user. What is the Julia equivalent? I.e., I would like to have:
> clip.([2 3 5 10],3,5)
> [3 3 5 5]
Obviously, I can write it myself, or use a combination of min and max, but I was surprised to find out there is none. StatsBase provides the trim and winsor functions, but these do not allow fixed values as input, but rather counts or percentiles (https://juliastats.github.io/StatsBase.jl/stable/robust.html).

You are probably looking for clamp:
help?> clamp
clamp(x, lo, hi)
Return x if lo <= x <= hi. If x > hi, return hi. If x < lo, return lo. Arguments are promoted to a common type.
This is a function for scalar x, but we can broadcast it over the vector using dot-notation:
julia> clamp.([2, 3, 5, 10], 3, 5)
4-element Array{Int64,1}:
3
3
5
5
If you don't care about the original array you can also use the in-place version clamp!, which modifies the input:
julia> A = [2, 3, 5, 10];
julia> clamp!(A, 3, 5);
julia> A
4-element Array{Int64,1}:
3
3
5
5

Related

Combining vectors of unequal length

x = [1, 2, 3, 4]
y = [1, 2]
If I want to be able to operate on the two vectors with a default value filling in, what are the strategies?
E.g. would like to do the following and implicitly fill in with 0 or missing
x + y # would like [2, 4, 3, 4]
Ideally would like to do this in a generic way so that I could do arbitrary operations with the two.
Disregarding whether Julia has something built-in to do this, remember that Julia is fast. This means that you can write code to support this kind of need.
extend!(x, y::Vector, default=0) = extend!(x, length(y), default)
extend!(x, n::Int, default=0) = begin
while length(x) < n
push!(x, default)
end
x
end
Then when you have code such as you describe, you can symmetrically extend x and y:
x = [1, 2, 3, 4]
y = [1, 2]
extend!(x, y)
extend!(y, x)
x + y
==> [2, 4, 3, 4]
Note that this mutates y. In many cases, the desired length would come from outside the code and would be applied to both x and y. I can also imagine that 0 is a bad default in general (even though it is completely appropriate in your context of addition.
A comment below makes the worthy point that you should consider using append! instead of looping over push!. In fact, it is best to measure differences like that if you care about very small differences. I went ahead and tested:
julia> using BenchmarkTools
julia> extend1(x, n) = begin
while length(x) < n
push!(x, 0)
end
x
end
julia> #btime begin
x = rand(10)
sum(x)
end
59.815 ns (1 allocation: 160 bytes)
5.037723569560573
julia> #btime begin
x = rand(10)
extend1(x, 1000)
sum(x)
end
7.281 μs (8 allocations: 20.33 KiB)
6.079832879992913
julia> x = rand(10)
julia> #btime begin
x = rand(10)
append!(x, zeros(990))
sum(x)
end
1.290 μs (3 allocations: 15.91 KiB)
3.688526541987817
julia>
Pushing primitives in a loop is damned fast, allocating a vector of zeros so we can use append! is very slightly faster.
But the real lesson here is seen in the fact that the loop version takes microseconds to append nearly 1000 values (reallocating the array several times). Appending 10 values one by one takes just over 150ns (and append! is slightly faster). This is blindingly fast. Literally doing nothing in R or Python can take longer than this.
This difference would matter in some situations and would be undetectable in many others. If it matters, measure. If it doesn't, do the simplest thing that comes to mind because Julia has your back (performance-wise).
FURTHER UPDATE
Taking a hint from another of Colin's comments, here are results where we use append! but we don't allocate a list. Instead, we use a generator ... that is, a data structure that invents data when asked for it with an interface much like a list. The results are much better than what I showed above.
julia> #btime begin
x = rand(10)
append!(x, (0 for i in 1:990))
sum(x)
end
565.814 ns (2 allocations: 8.03 KiB)
Note the round brackets around the 0 for i in 1:990.
In the end, Colin was right. Using append! is much faster if we can avoid related overheads. Surprisingly, the base function Iterators.repeated(0, 990) is much slower.
But, no matter what, all of these options are pretty blazingly fast and all of them would probably be so fast that none of these subtle differences would matter.
Julia is fun!
Note that if you want to fill with missing or some other type different from the element type in your original vector, then you will need to change the type of your vectors to allow those new elements. The function below will handle any case.
function fillvectors(x, y, fillvalue=missing)
xl = length(x)
yl = length(y)
if xl < yl
x::Vector{Union{eltype(x), typeof(fillvalue)}} = x
for i in xl+1:yl
push!(x, fillvalue)
end
end
if yl < xl
y::Vector{Union{eltype(y), typeof(fillvalue)}} = y
for i in yl+1:xl
push!(y, fillvalue)
end
end
return x, y
end
x = [1, 2, 3, 4]
y = [1, 2]
julia> (x, y) = fillvectors(x, y)
([1, 2, 3, 4], Union{Missing, Int64}[1, 2, missing, missing])
julia> y
4-element Vector{Union{Missing, Int64}}:
1
2
missing
missing
julia> (x, y) = fillvectors(x, y, 0)
([1, 2, 3, 4], [1, 2, 0, 0])
julia> y
4-element Vector{Int64}:
1
2
0
0
julia> (x, y) = fillvectors(x, y, 1.001)
([1, 2, 3, 4], Union{Float64, Int64}[1, 2, 1.001, 1.001])
julia> y
4-element Vector{Union{Float64, Int64}}:
1
2
1.001
1.001

How can I slice the high-order multidimeonal array (or tensor) on the specific axis in Julia?

I am using Julia1.6
Here, X is a D-order multidimeonal array.
How can I slice from i to j on the d-th axis of X ?
Here is an exapmle in case of D=6 and d=4.
X = rand(3,5,6,6,5,6)
Y = X[:,:,:,i:j,:,:]
i and j are given values which are smaller than 6 in the above example.
You can use the built-in function selectdim
help?> selectdim
search: selectdim
selectdim(A, d::Integer, i)
Return a view of all the data of A where the index for dimension d equals i.
Equivalent to view(A,:,:,...,i,:,:,...) where i is in position d.
Examples
≡≡≡≡≡≡≡≡≡≡
julia> A = [1 2 3 4; 5 6 7 8]
2×4 Matrix{Int64}:
1 2 3 4
5 6 7 8
julia> selectdim(A, 2, 3)
2-element view(::Matrix{Int64}, :, 3) with eltype Int64:
3
7
Which would be used something like:
julia> a = rand(10,10,10,10);
julia> selectedaxis = 5
5
julia> indices = 1:2
1:2
julia> selectdim(a,selectedaxis,indices)
Notice that in the documentation example, i is an integer, but you can use ranges of the form i:j as well.
If you need to just slice on a single axis, use the built in selectdim(A, dim, index), e.g., selectdim(X, 4, i:j).
If you need to slice more than one axis at a time, you can build the array that indexes the array by first creating an array of all Colons and then filling in the specified dimensions with the specified indices.
function selectdims(A, dims, indices)
indexer = repeat(Any[:], ndims(A))
for (dim, index) in zip(dims, indices)
indexer[dim] = index
end
return A[indexer...]
end
idx = ntuple( l -> l==d ? (i:j) : (:), D)
Y = X[idx...]

Create a Vector of Integers and missing Values

What a hazzle...
I'm trying to create a vector of integers and missing values. This works fine:
b = [4, missing, missing, 3]
But I would actually like the vector to be longer with more missing values and therefore use repeat(), but this doesn't work
append!([1,2,3], repeat([missing], 1000))
and this also doesn't work
[1,2,3, repeat([missing], 1000)]
Please, help me out, here.
It is also worth to note that if you do not need to do an in-place operation with append! actually in such cases it is much easier to do vertical concatenation:
julia> [[1, 2, 3]; repeat([missing], 2); 4; 5] # note ; that denotes vcat
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
julia> vcat([1,2,3], repeat([missing], 2), 4, 5) # this is the same but using a different syntax
7-element Array{Union{Missing, Int64},1}:
1
2
3
missing
missing
4
5
The benefit of vcat is that it automatically does the type promotion (as opposed to append! in which case you have to correctly specify the eltype of the target container before the operation).
Note that because vcat does automatic type promotion in corner cases you might get a different eltype of the result of the operation:
julia> x = [1, 2, 3]
3-element Array{Int64,1}:
1
2
3
julia> append!(x, [1.0, 2.0]) # conversion from Float64 to Int happens here
5-element Array{Int64,1}:
1
2
3
1
2
julia> [[1, 2, 3]; [1.0, 2.0]] # promotion of Int to Float64 happens in this case
5-element Array{Float64,1}:
1.0
2.0
3.0
1.0
2.0
See also https://docs.julialang.org/en/v1/manual/arrays/#man-array-literals.
This will work:
append!(Union{Int,Missing}[1,2,3], repeat([missing], 1000))
[1,2,3] creates just a Vector{Int} and since Julia is strongly typed the Vector{Int} cannot accept values of non-Int type. Hence, when defining a structure, that you plan to hold more data types within, you need to explicitly state it - here we have defined Vector{Union{Int,Missing}}.

Julia: find maximum along columns in array

Suppose we have an array defined like this:
a=[1 2; 3 4; 5 5; 7 9; 1 2];
In Matlab, we could find the maximum values by writing:
[x y] = max(a)
x =
7 9
In Julia, we could use:
a=[1 2; 3 4; 5 5; 7 9; 1 2]
findmax(a,1)
returning:
([7 9],
[4 9])
However, I am interested not only in finding [7 9] for the two columns, but also their relative position within each column, like [4, 4]. Of course, I can write a bit more of coding lines, but can I do it directly with findmax?
The second matrix returned by findmax is the linear index of the locations of the maxima over the entire array. You want the position within each column; to get that, you can convert the linear indices into subscripts with ind2sub. Then the first element of the subscript tuple is your row index.
julia> vals, inds = findmax(a, 1)
(
[7 9],
[4 9])
julia> map(x->ind2sub(a, x), inds)
1×2 Array{Tuple{Int64,Int64},2}:
(4,1) (4,2)
julia> map(x->ind2sub(a, x)[1], inds)
1×2 Array{Int64,2}:
4 4
This is mentioned in the comments but I figured I'd do a response that's easy to see. I have version 1.0.3, so I don't know what's the earliest version that allows this. But now you can just do
julia> findmax(a) #Returns 2D index of overall maximum value
(9, CartesianIndex(4, 2))
julia> findmax(a[:,1]) #Returns 1D index of max value in column 1
(7, 4)
julia> findmax(a[:,2]) #Returns 1D index of max value in column 2
(9, 4)
Hope this makes things easier.
I've adopted the following function:
indmaxC(x) = cat(1, [indmax(x[:,c]) for c in 1:size(x,2)]...)
The Good: it's convenient and small
The Bad: it's only valid for 2-D arrays
A safer version would be:
function indmaxC(x::AbstractArray)
assert(ndims(x)==2)
cat(1, [indmax(x[:,c]) for c in 1:size(x,2)]...)
end

Is there outer map function in Julia?

I am trying to construct all possible combinations of four vectors (parameters in a model) that would give me a big nx4 matrix and I could then run simulation on each set (row) of parameters. In R I would achieve this by using expand.grid in Mathematica style, I could use something like outer product with vcat and reduce the output using hcat.
Is there some function analog of expand.grid from R or outer map function?
Toy example:
A = [1 2]
B = [3 4]
some magic
output = [1 3, 1 4, 2 3, 2 4]
Using the Iterators package, it might look like this:
using Iterators
for p in product([1,2], [3,4])
println(p)
end
where you would replace println with your algorithm. You can also use collect if it's important to get the set of all combinations.
Not the exact notation you show, but a comprehension might be useful.
julia> a=[1, 2];
julia> b=[3, 4];
julia> [[i, j] for j in b, i in a]
2x2 Array{Any,2}:
[1,3] [2,3]
[1,4] [2,4]
julia> [[i, j] for j in b, i in a][:]
4-element Array{Any,1}:
[1,3]
[1,4]
[2,3]
[2,4]

Resources