Julia Language: findInterval - julia

The Question: I was hoping to find a function like findInterval in R which gives inputs a scalar and a vector representing interval starting points and returns the index of what interval the scalar falls in. For instance in R:
findInterval(x = 2.6, vec = c(1.1,2.1,3.1,4.1))
#[1] 2
In this exchange someone gave a function that does this functionality in Julia (See next section). Apparently the base indexin function does this task though. I was wondering how to get indexin function (or another base function) to do this. I know Julia loops are fast and I could write a function but I would rather not do that if there is an inbuilt function and this should be a common problem.
When I try the indexin function with the same numbers I used in R above I get:
indexin([2.6], [1.1 2.1 3.1 4.1])
# 1-element Array{Int64,1}:
# 0
Which just indicates that 2.6 is not in the vector as it (as I understand) is looking to match values rather than placing a scaler in an interval.
Function from above referenced link (with my changes to input\output datatypes)
function findInterval(x::Float64,vec::Array{Float64})
out = zeros(Int,length(x))
vec = unique(vec)
sort!(vec)
for j in 1:length(x)
if x[j] < vec[1]
out[1] = 0
elseif x[j] > vec[end]
out[end] = 0
else
out[j] = searchsortedfirst(vec,x[j])-1
end
end
return out
end
Which works as intended:
findInterval(2.6, [1.1 2.1 3.1 4.1])
# 1-element Array{Int64,1}:
# 2
Related Questions from SO: Other questions on SO look for finding the index of exact matches between an input value and a vector:
Julia version of R's Match?
Find first index of an item in an array in Julia
Vectorized "in" function in julia?

If your input vector is always sorted, then searchsortedlast will do what you want, e.g.
vec = [1.1, 2.1, 3.1, 4.1]
x = 2.6
searchsortedlast(vec, x)
However, note that searchsortedlast will return 0 if x < vec[1], and will return length(vec) if x > vec[end]. So you might want to write your own custom behaviour that checks for these outcomes, e.g. if you want to always return 0 if x is not in any of the intervals in vec, you could write:
function find_interval(vec, x)
i = searchsortedlast(vec, x)
i == length(vec) && (i = 0)
return(i)
end
Also, if you work a lot with sorted vectors, you might be interested in a package I've written for sorted vectors in Julia, but have never gotten around to adding to METADATA. The source of SortedVectors package is here.

Related

Cumulative Integration Options With Julia

I have two 1-D arrays in which I would like to calculate the approximate cumulative integral of 1 array with respect to the scalar spacing specified by the 2nd array. MATLAB has a function called cumtrapz that handles this scenario. Is there something similar that I can try within Julia to accomplish the same thing?
The expected result is another 1-D array with the integral calculated for each element.
There is a numerical integration package for Julia (see the link) that defines cumul_integrate(X, Y) and uses the trapezoidal rule by default.
If this package didn't exist, though, you could easily write the function yourself and have a very efficient implementation out of the box because the loop does not come with a performance penalty.
Edit: Added an #assert to check matching vector dimensions and fixed a typo.
function cumtrapz(X::T, Y::T) where {T <: AbstractVector}
# Check matching vector length
#assert length(X) == length(Y)
# Initialize Output
out = similar(X)
out[1] = 0
# Iterate over arrays
for i in 2:length(X)
out[i] = out[i-1] + 0.5*(X[i] - X[i-1])*(Y[i] + Y[i-1])
end
# Return output
out
end

Store values generated by a for-loop. JuMP/Julia

It's amazing that the internet is totally void of this simple question (or similar). Or I'm just very bad at searching. Anyway, I simply want to store values generated by a for-loop in an array and print the array. Simple as that.
On every other language Matlab, R, Python, Java etc this is very simple. But in Julia I seem to be missing something.
using JuMP
# t = int64[] has also been tested
t = 0
for i in 1:5
vector[i]
println[vector]
end
I get the error
ERROR: LoadError: BoundsError
What am I missing?
You didn't initialize vector and you should call the method println like this following way, in Julia 1.0 :
vector = Array{Int,1}(undef, 5)
for i in 1:5
vector[i] = i
println(vector[i])
end
Or, more quickly, with a comprehension list :
vector = [i for i in 1:5]
for i in 1:5
println(vector[i])
end
Another possibility using push! method :
vector = []
for i in 1:5
push!(vector, i)
println(vector[i])
end

what is the purpose of 'NULL' in processing loops?

sqr = seq(1, 100, by=2)
sqr.squared = NULL
for (n in 1:50)
{
sqr.squared[n] = sqr[n]^2
}
I came accross the loop above, for a beginner this was simple enough. To further understand r what was the precise purpose of the second line? For my research I gather it has something to do with resetting the vector. If someone could elaborate it'd be much appreciated.
sqr.squared <- NULL
is one of many ways initialize the empty vector sqr.squared prior to running it through a loop. In general, when the length of the resulting vector is known, it is much better practice to allocate the vector's length. So here,
sqr.squared <- vector("integer", 50)
would be much better practice. And faster too. This way you are not building the new vector in the loop. But since ^ is vectorized, you could also simply do
sqr[1:50] ^ 2
and ditch the loop all together.
Another way to think about it is to remember that everything in r is a function call, and functions need input (usually).
say you calculated y and want to store that value somewhere. You can do x <- y without initializing an x object (r does this for you unlike in other languages, c for example), but say you want to store it in a specific place in x.
So note that <- (or = in your example) is a function
y <- 1
x[2] <- y
# Error in x[2] <- y : object 'x' not found
This is a different function than <-. Since you want to put y at x[2], you need the function [<-
`[<-`(x, 2, y)
# Error: object 'x' not found
But this still doesn't work because we need the object x to use this function, so initialize x to something.
(x <- numeric(5))
# [1] 0 0 0 0 0
# and now use the function
`[<-`(x, 2, y)
# [1] 0 1 0 0 0
This prefix notation is easier for computers to parse (eg, + 1 1) but harder for humans (me at least), so we prefer infix notation (eg, 1 + 1). R makes such functions easier to use x[2] <- y rather than how I did above.
The first answer is correct, when you assign a NULL value to a variable, the purpose is to initialize a vector. In many cases, when you are working checking numbers or with different types of variables, you will need to set NULL this arrays, matrix, etc.
For example, in you want to create a some type of element, in some cases you will need to put something inside them. This is the purpose of to use NULL. In addition, sometimes you will require NA instead of NULL.

Julia: confusion with error on datatype / DataFrame

New to Julia. Following this blog to do Neural Network:
http://blog.yhathq.com/posts/julia-neural-networks.html
I am confused about data types and error messages in Julia. This is my code (again, following the blog post on Neural Network):
# read in df to train
train_df = readtable("data/winequality-red.csv", separator=';')
# create train and test data splits
y = train_df[:quality]
x = train_df[:, 1:11] # matrix of all except quality
# vector() and matrix() from blog post
n = length(y)
is_train = shuffle([1:n] .> floor(n * .25))
x_train,x_test = x[is_train,:],x[!is_train,:]
y_train,y_test = y[is_train],y[!is_train]
type StandardScalar
mean::Vector{Float64}
std::Vector{Float64}
end
# initialize empty scalar
function StandardScalar()
StandardScalar(Array(Float64, 0), Array(Float64, 0))
end
# compute mean and std of each col
function fit_std_scalar!(std_scalar::StandardScalar, x::Matrix{Float64})
n_rows, n_cols = size(x_test)
std_scalar.std = zeros(n_cols)
std_scalar.mean = zeros(n_cols)
for i = 1:n_cols
std_scalar.mean[i] = mean(x[:,i])
std_scalar.std[i] = std(x[:,i])
end
end
# further vectorize the transformation
function transform(std_scalar::StandardScalar, x::Matrix{Float64})
# element wise subtraction of mean and division of std
(x .- std_scalar.mean') ./ std_scalar.std'
end
# fit and transform
function fit_transform!(std_scalar::StandardScalar, x::Matrix{Float64})
fit_std_scalar!(std_scalar, x)
transform(std_scalar, x)
end
# fit scalar on training data and then transform the test
std_scalar = StandardScalar()
n_rows, n_cols = size(x_test)
# cols before scaling
println("Col means before scaling: ")
for i = 1:n_cols
# C printf function
#printf("%0.3f ", (mean(x_test[:, i])))
end
I am getting the error:
'.-' has no method matching .-(::DataFrame, ::Array{Float64,2}) in fit_transform! ...
For this code:
x_train = fit_transform!(std_scalar, x_train)
x_test = transform(std_scalar, x_test)
# after transforming
println("\n Col means after scaling:")
for i = 1:n_cols
#printf("%0.3f ", (mean(x_test[:,i])))
end
I am new to Julia and am just not understanding what the issue is. Vector() and Matrix() do not work from the blog post. I assume that was from an older version of DataFrame.
What I think my issue is: these functions are taking in ::Matrix{Float64} and I am passing in the DataFrame. I assume that deprecated (?) Matrix() would have fixed this? Not sure. How do I analyze this error and pass these functions the correct types (if that is the problem here)?
Thank you!
The error message says that you're attempting an element-wise subtraction, .-, between a DataFrame and an Array but that operation has no definition for those types. A silly example of this sort of situation:
julia> "a" .- [1, 2, 3]
ERROR: `.-` has no method matching .-(::ASCIIString, ::Array{Int64,1})
My guess is that if you add
println(typeof(x_train))
in front of
x_train = fit_transform!(std_scalar, x_train)
that you'll be told that it's a DataFrame rather than an array that you're trying to work with. I'm not experienced with the DataFrame library but may be able to dig up the conversion tomorrow sometime. This is all I have time for just now.
Added comments after obtaining data file
I retrieved winequality-red.csv and worked with its DataFrame
julia> VERSION
v"0.3.5"
julia> using DataFrames
julia> train_df = readtable("data/winequality-red.csv", separator=';')
julia> y = train_df[:quality]
1599-element DataArray{Int64,1}:
julia> x = train_df[:, 1:11]
1599x11 DataFrame
julia> typeof(x)
DataFrame (constructor with 22 methods)
x and y are at this point array-like objects. The blog post apparently uses vector and matrix to convert these to true arrays, but these functions are unfamiliar to me. As IainDunning points out in his answer (I'd like to cite this properly but haven't puzzled that out yet), this conversion is now done via array. Perhaps this is what you need to do:
julia> y = array(train_df[:quality])
1599-element Array{Int64,1}:
julia> x = array(train_df[:, 1:11])
1599x11 Array{Float64,2}:
I've not followed through with an analysis of all of the other code, so this is a hint at the answer rather than a fully fleshed out and tested solution to your problem. Please let me know how this it works out if you give it a try.
I'm accustomed to seeing and using Array{Float64,1} and Array{Float64,2} rather than Vector{Float64} and Matrix{Float64}. Possibly the vector and matrix synonyms for specific types of arrays is deprecated.
I believe vector(...) and matrix(...) were both replaced with just array(...), but I can't find an issue number to correspond with that change.

Define Piecewise Functions in Julia

I have an application in which I need to define a piecewise function, IE, f(x) = g(x) for [x in some range], f(x)=h(x) for [x in some other range], ... etc.
Is there a nice way to do this in Julia? I'd rather not use if-else because it seems that I'd have to check every range for large values of x. The way that I was thinking was to construct an array of functions and an array of bounds/ranges, then when f(x) is called, do a binary search on the ranges to find the appropriate index and use the corresponding function (IE, h(x), g(x), etc.
It seems as though such a mathematically friendly language might have some functionality for this, but the documentation doesn't mention piecewise in this manner. Hopefully someone else has given this some thought, thanks!
with a Heaviside function you can do a interval function:
function heaviside(t)
0.5 * (sign(t) + 1)
end
and
function interval(t, a, b)
heaviside(t-a) - heaviside(t-b)
end
function piecewise(t)
sinc(t) .* interval(t,-3,3) + cos(t) .* interval(t, 4,7)
end
and I think it could also implement a subtype Interval, it would be much more elegant
I tried to implement a piecewise function for Julia, and this is the result:
function piecewise(x::Symbol,c::Expr,f::Expr)
n=length(f.args)
#assert n==length(c.args)
#assert c.head==:vect
#assert f.head==:vect
vf=Vector{Function}(n)
for i in 1:n
vf[i]=#eval $x->$(f.args[i])
end
return #eval ($x)->($(vf)[findfirst($c)])($x)
end
pf=piecewise(:x,:([x>0, x==0, x<0]),:([2*x,-1,-x]))
pf(1) # => 2
pf(-2) # => 2
pf(0) # => -1
Why not something like this?
function piecewise(x::Float64, breakpts::Vector{Float64}, f::Vector{Function})
#assert(issorted(breakpts))
#assert(length(breakpts) == length(f)+1)
b = searchsortedfirst(breakpts, x)
return f[b](x)
end
piecewise(X::Vector{Float64}, bpts, f) = [ piecewise(x,bpts,f) for x in X ]
Here you have a list of (sorted) breakpoints, and you can use the optimized searchsortedfirst to find the first breakpoint b greater than x. The edge case when no breakpoint is greater than x is also handled appropriately since length(breakpts)+1 is returned, so b is the correct index into the vector of functions f.

Resources