What is Julia's equivalent of R's which? - r

In Rb given a vector x one can find the indices where its elements are TRUE using the which function. E.g. y = 1:100 and which(is.even(y)) should return 2,4,...,100
There are also which.max and which.min which returns the indices of minimum and maximum values respectiely.
What are their equivalents in Julia?

The find function does that.
In R:
y = c(1,2,3,4)
which(y > 2)
In Julia:
y = [1, 2, 3, 4]
find(y .> 2)

There is no exact equivalent but findall
There is a comparison list of vocabularies for Julia vs R; which is on the list
http://www.johnmyleswhite.com/notebook/2012/04/09/comparing-julia-and-rs-vocabularies/
However, according to the list Julia's find is equivalent to R's which as answered by others.

The equivalent of R's which is Julia's findall:
y = [1, 2, 3, 4]
findall(y .> 2)

Related

How to combine functions in a plot in R

This is my code so far:
Basically, I want x^2[0;10] & 6[11, infinity]
random <- function(x){
if (any(x <= 10 )) {
return (x**2)}
else if (any(x > 10 )){
return(6) }
}
Unfortunately, R uses only the first part of the function when I try to plot or integrate it.
Thanks for your help!
Your error is because of the use of the "any" function. any(x <= 10) will always be true as long as a single value in x is less than ten, e.g. it'll be true for [1, 2, 10, 15, 30]. Because of this, this function never reaches the second if statement.
What you actually want to do is map this function. First, remove the "any" calls in your function. Then pass in your function (labelled here as "random") into a map function. A map function is a dynamic function, one that takes in a function and a list of objects (in this case numbers) as its arguments. The map then applies the function to each element of said list.
E.g.
Mapping [1, 2, 3, 4] with x**2 returns [1, 4, 9, 16].
Mapping [1, 5, 15, 20] with random returns [1, 25, 6, 6]
There are several different mapping functions in R, so look here to pick which one is best for you. Some even include if statements which may save you time.

How to get minimal value of Array in Julia?

How do I get the minimum value of an Array, Vector, or Matrix in Julia?
The code min([1, 2, 3]) doesn't work.
The Julia manual:
https://docs.julialang.org/en/v1/base/math/#Base.min
https://docs.julialang.org/en/v1/base/collections/#Base.minimum
min(x, y, ...)
Return the minimum of the arguments. Operates elementwise over arrays.
julia> min([1, 2, 3]...)
1
julia> min(2,3)
2
minimum(A, dims)
Compute the minimum value of an array over the given dimensions.
minimum!(r, A)
Compute the minimum value of A over the singleton dimensions of r, and write results to r.
julia> minimum([1, 2, 3])
1

Is there a way to mimic R's higher order (binary) function shorthand syntax within spark or pyspark?

In R, I can write the following:
## Explicit
Reduce(function(x,y) x*y, c(1, 2, 3))
# returns 6
However, I can also do this less explicitly with the following:
## Less explicit
Reduce(`*`, c(1, 2, 3))
# also returns 6
In pyspark, I could do the following:
rdd = sc.parallelize([1, 2, 3])
rdd.reduce(lambda a, b: a * b)
Question: Can you mimic the "shorthand" (less explicit) syntax of R's Reduce('*', ...) with pyspark or some sort of anonymous function?
In R, you're supplying a binary function. The multiply operator (as with all operators) is actually a binary function. Type
`*`(2, 3)
to see what I mean.
In Python, the equivalent for multiplication is operator.mul.
So:
rdd = sc.parallelize([1, 2, 3])
rdd.reduce(operator.mul)

Slicing and broadcasting multidimensional arrays in Julia : meshgrid example

I recently started learning Julia by coding a simple implementation of Self Organizing Maps. I want the size and dimensions of the map to be specified by the user, which means I can't really use for loops to work on the map arrays because I don't know in advance how many layers of loops I will need. So I absolutely need broadcasting and slicing functions that work on arrays of arbitrary dimensions.
Right now, I need to construct an array of indices of the map. Say my map is defined by an array of size mapsize = (5, 10, 15), I need to construct an array indices of size (3, 5, 10, 15) where indices[:, a, b, c] should return [a, b, c].
I come from a Python/NumPy background, in which the solution is already given by a specific "function", mgrid :
indices = numpy.mgrid[:5, :10, :15]
print indices.shape # gives (3, 5, 10, 15)
print indices[:, 1, 2, 3] gives [1, 2, 3]
I didn't expect Julia to have such a function on the get-go, so I turned to broadcasting. In NumPy, broadcasting is based on a set of rules that I find quite clear and logical. You can use arrays of different dimensions in the same expression as long as the sizes in each dimension match or one of it is 1 :
(5, 10, 15) broadcasts to (5, 10, 15)
(10, 1)
(5, 1, 15) also broadcasts to (5, 10, 15)
(1, 10, 1)
To help with this, you can also use numpy.newaxis or None to easily add new dimensions to your array :
array = numpy.zeros((5, 15))
array[:,None,:] has shape (5, 1, 15)
This helps broadcast arrays easily :
A = numpy.arange(5)
B = numpy.arange(10)
C = numpy.arange(15)
bA, bB, bC = numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:])
bA.shape == bB.shape == bC.shape = (5, 10, 15)
Using this, creating the indices array is rather straightforward :
indices = numpy.array(numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:]))
(indices == numpy.mgrid[:5,:10,:15]).all() returns True
The general case is of course a bit more complicated, but can be worked around using list comprehension and slices :
arrays = [ numpy.arange(i)[tuple([None if m!=n else slice(None) for m in range(len(mapsize))])] for n, i in enumerate(mapsize) ]
indices = numpy.array(numpy.broadcast_arrays(*arrays))
So back to Julia. I tried to apply the same kind of rationale and ended up achieving the equivalent of the arrays list of the code above. This ended up being rather simpler than the NumPy counterpart thanks to the compound expression syntax :
arrays = [ (idx = ones(Int, length(mapsize)); idx[n] = i;reshape([1:i], tuple(idx...))) for (n,i)=enumerate(mapsize) ]
Now I'm stuck here, as I don't really know how to apply the broadcasting to my list of generating arrays here... The broadcast[!] functions ask for a function f to apply, and I don't have any. I tried using a for loop to try forcing the broadcasting:
indices = Array(Int, tuple(unshift!([i for i=mapsize], length(mapsize))...))
for i=1:length(mapsize)
A[i] = arrays[i]
end
But this gives me an error : ERROR: convert has no method matching convert(::Type{Int64}, ::Array{Int64,3})
Am I doing this the right way? Did I overlook something important? Any help is appreciated.
If you're running julia 0.4, you can do this:
julia> function mgrid(mapsize)
T = typeof(CartesianIndex(mapsize))
indices = Array(T, mapsize)
for I in eachindex(indices)
indices[I] = I
end
indices
end
It would be even nicer if one could just say
indices = [I for I in CartesianRange(CartesianIndex(mapsize))]
I'll look into that :-).
Broadcasting in Julia has been modelled pretty much on broadcasting in NumPy, so you should hopefully find that it obeys more or less the same simple rules (not sure if the way to pad dimensions when not all inputs have the same number of dimensions is the same though, since Julia arrays are column-major).
A number of useful things like newaxis indexing and broadcast_arrays have not been implemented (yet) however. (I hope they will.) Also note that indexing works a bit differently in Julia compared to NumPy: when you leave off indices for trailing dimensions in NumPy, the remaining indices default to colons. In Julia they could be said to default to ones instead.
I'm not sure if you actually need a meshgrid function, most things that you would want to use it for could be done by using the original entries of your arrays array with broadcasting operations. The major reason that meshgrid is useful in matlab is because it is terrible at broadcasting.
But it is quite straightforward to accomplish what you want to do using the broadcast! function:
# assume mapsize is a vector with the desired shape, e.g. mapsize = [2,3,4]
N = length(mapsize)
# Your line to create arrays below, with an extra initial dimension on each array
arrays = [ (idx = ones(Int, N+1); idx[n+1] = i;reshape([1:i], tuple(idx...))) for (n,i) in enumerate(mapsize) ]
# Create indices and fill it one coordinate at a time
indices = zeros(Int, tuple(N, mapsize...))
for (i,arr) in enumerate(arrays)
dest = sub(indices, i, [Colon() for j=1:N]...)
broadcast!(identity, dest, arr)
end
I had to add an initial singleton dimension on the entries of arrays to line up with the axes of indices (newaxis had been useful here...).
Then I go through each coordinate, create a subarray (a view) on the relevant part of indices, and fill it. (Indexing will default to returning subarrays in Julia 0.4, but for now we have to use sub explicitly).
The call to broadcast! just evaluates the identity function identity(x)=x on the input arr=arrays[i], broadcasts to the shape of the output. There's no efficiency lost in using the identity function for this; broadcast! generates a specialized function based on the given function, number of arguments, and number of dimensions of the result.
I guess this is the same as the MATLAB meshgrid functionality. I've never really thought about the generalization to more than two dimensions, so its a bit harder to get my head around.
First, here is my completely general version, which is kinda crazy but I can't think of a better way to do it without generating code for common dimensions (e.g. 2, 3)
function numpy_mgridN(dims...)
X = Any[zeros(Int,dims...) for d in 1:length(dims)]
for d in 1:length(dims)
base_idx = Any[1:nd for nd in dims]
for i in 1:dims[d]
cur_idx = copy(base_idx)
cur_idx[d] = i
X[d][cur_idx...] = i
end
end
#show X
end
X = numpy_mgridN(3,4,5)
#show X[1][1,2,3] # 1
#show X[2][1,2,3] # 2
#show X[3][1,2,3] # 3
Now, what I mean by code generation is that, for the 2D case, you can simply do
function numpy_mgrid(dim1,dim2)
X = [i for i in 1:dim1, j in 1:dim2]
Y = [j for i in 1:dim1, j in 1:dim2]
return X,Y
end
and for the 3D case:
function numpy_mgrid(dim1,dim2,dim3)
X = [i for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Y = [j for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Z = [k for i in 1:dim1, j in 1:dim2, k in 1:dim3]
return X,Y,Z
end
Test with, e.g.
X,Y,Z=numpy_mgrid(3,4,5)
#show X
#show Y
#show Z
I guess mgrid shoves them all into one tensor, so you could do that like this
all = cat(4,X,Y,Z)
which is still slightly different:
julia> all[1,2,3,:]
1x1x1x3 Array{Int64,4}:
[:, :, 1, 1] =
1
[:, :, 1, 2] =
2
[:, :, 1, 3] =
3
julia> vec(all[1,2,3,:])
3-element Array{Int64,1}:
1
2
3

How do functions that simultaneously operate over vectors and their elements work in R?

Take the following example:
boltzmann <- function(x, t=0.1) { exp(x/t) / sum(exp(x/t)) }
z=rnorm(10,mean=1,sd=0.5)
exp(z[1]/t)/sum(exp(z/t))
[1] 0.0006599707
boltzmann(z)[1]
[1] 0.0006599707
It appears that exp in the boltzmann function operates over elements and vectors and knows when to do the right thing. Is the sum "unrolling" the input vector and applying the expression on the values? Can someone explain how this works in R?
Edit: Thank you for all of the comments, clarification, and patience with an R n00b. In summary, the reason this works was not immediately obvious to me coming from other languages. Take python for example. You would first compute the sum and then compute the value for each element in the vector.
denom = sum([exp(v / t) for v in x])
vals = [exp(v / t) / denom for v in x]
Whereas is R the sum(exp(x/t)) can be computed inline.
This is explained in An Introduction to R, Section 2.2: Vector arithmetic.
Vectors can be used in arithmetic expressions, in which case the
operations are performed element by element. Vectors occurring in the
same expression need not all be of the same length. If they are not,
the value of the expression is a vector with the same length as the
longest vector which occurs in the expression. Shorter vectors in the
expression are recycled as often as need be (perhaps fractionally)
until they match the length of the longest vector. In particular a
constant is simply repeated. So with the above assignments the command
x <- c(10.4, 5.6, 3.1, 6.4, 21.7)
y <- c(x, 0, x)
v <- 2*x + y + 1
generates a new vector v of length 11 constructed by adding together,
element by element, 2*x repeated 2.2 times, y repeated just once, and
1 repeated 11 times.
This might be clearer if you evaluated the numerator and the denominator separately:
x = rnorm(10,mean=1,sd=0.5)
t = .1
exp(x/t)
# [1] 1.845179e+05 6.679273e+03 4.379369e+06 1.852623e+06 9.960374e+02
# [6] 1.359676e+09 6.154045e+03 1.777027e+01 1.070003e+04 6.217397e+04
sum(exp(x/t))
# [1] 2984044296
Since the numerator is a vector of length 10, and the denominator is a vector of length 1, the division returns a vector of length 10.
Since you're interested in comparing this to Python, imagine the two following rules were added to Python (incidentally, these are similar to the usage of arrays in numpy):
If you divide a list by a number, it will divide all items in the list by the number:
[2, 4, 6, 8] / 2
# [1, 2, 3, 4]
The function exp in Python is "vectorized", which means that when it is applied to a list it will apply to each item in the list. However, sum still works the way you expect it to.
exp([1, 2, 3]) => [exp(1), exp(2), exp(3)]
In that case, imagine how this code would be evaluated in Python:
t = .1
x = [1, 2, 3, 4]
exp(x/t) / sum(exp(x/t))
It would follow the following simplifications, using those two simple rules:
exp([v / t for v in x]) / sum(exp([v / t for v in x]))
[exp(v / t) for v in x] / sum([exp(v / t) for v in x])
Now do you see how it knows the difference?
Vectorisation has several slightly different meanings in R.
It can mean accepting a vector input, transforming each element, and returning a vector (like exp does).
It can also mean accepting a vector input and calculating some summary statistic, then returning a scalar value (like mean does).
sum conforms to the second behaviour, but also has a third vectorisation behaviour, where it will create a summary statistic across inputs. Try sum(1, 2:3, 4:6), for example.

Resources