How to combine functions in a plot in R - r

This is my code so far:
Basically, I want x^2[0;10] & 6[11, infinity]
random <- function(x){
if (any(x <= 10 )) {
return (x**2)}
else if (any(x > 10 )){
return(6) }
}
Unfortunately, R uses only the first part of the function when I try to plot or integrate it.
Thanks for your help!

Your error is because of the use of the "any" function. any(x <= 10) will always be true as long as a single value in x is less than ten, e.g. it'll be true for [1, 2, 10, 15, 30]. Because of this, this function never reaches the second if statement.
What you actually want to do is map this function. First, remove the "any" calls in your function. Then pass in your function (labelled here as "random") into a map function. A map function is a dynamic function, one that takes in a function and a list of objects (in this case numbers) as its arguments. The map then applies the function to each element of said list.
E.g.
Mapping [1, 2, 3, 4] with x**2 returns [1, 4, 9, 16].
Mapping [1, 5, 15, 20] with random returns [1, 25, 6, 6]
There are several different mapping functions in R, so look here to pick which one is best for you. Some even include if statements which may save you time.

Related

How are apply family functions scoped?

Consider:
x <- 5
replicate(10, x <- x + 1)
This has output c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6). However:
x <- 5
replicate(10, x <<- x + 1)
has output c(6, 7, 8, 9, 10, 11, 12, 13, 14, 15).
What does this imply about the environment that x <- x + 1 is evaluated in? Am I to believe that x is treated as if it is an internal variable for replicate? That appears to be what I'm seeing, but when I consulted the relevant section of the language definition, I saw the following:
It is also worth noting that the effect of foo(x <- y) if the argument is evaluated is to change the value of x in the calling environment and not in the evaluation environment of foo.
But if x really was changed in the calling environment, then why does:
x <- 5
replicate(10, x <- x + 1)
x
Return 5 and not 15? What part have I misunderstood?
The sentence you quoted from the language definition is about standard evaluation, but replicate uses non-standard evaluation. Here's its source:
replicate <- function (n, expr, simplify = "array")
sapply(integer(n), eval.parent(substitute(function(...) expr)),
simplify = simplify)
The substitute(function(...) expr) call takes your expression x <- x + 1 without evaluating it, and creates a new function
function(...) x <- x + 1
That's the function that gets passed to sapply(), which applies it to a vector of length n. So all the assignments take place in the frame of that anonymous function.
When you use x <<- x + 1, the evaluation still takes place in the constructed function, but its environment is the calling environment to replicate() (because of the eval.parent call), and that's where the assignment happens. That's why you get the increasing values in the output.
So I think you understood the manual correctly, but it didn't make clear it was talking there about the case of standard evaluation. The following paragraph hints at what's happening here:
It is possible to access the actual (not default) expressions used as arguments inside the function. The mechanism is implemented via promises. When a function is being evaluated the actual expression used as an argument is stored in the promise together with a pointer to the environment the function was called from. When (if) the argument is evaluated the stored expression is evaluated in the environment that the function was called from. Since only a pointer to the environment is used any changes made to that environment will be in effect during this evaluation. The resulting value is then also stored in a separate spot in the promise. Subsequent evaluations retrieve this stored value (a second evaluation is not carried out). Access to the unevaluated expression is also available using substitute.
but the help page for replicate() doesn't make clear this is what it's doing.
BTW, your title asks about apply family functions: but most of them other than replicate ask explicitly for a function, so this issue doesn't arise there. For example, it's obvious that this doesn't affect the global x:
sapply(integer(10), function(i) x <- x + 1)

What is Julia's equivalent of R's which?

In Rb given a vector x one can find the indices where its elements are TRUE using the which function. E.g. y = 1:100 and which(is.even(y)) should return 2,4,...,100
There are also which.max and which.min which returns the indices of minimum and maximum values respectiely.
What are their equivalents in Julia?
The find function does that.
In R:
y = c(1,2,3,4)
which(y > 2)
In Julia:
y = [1, 2, 3, 4]
find(y .> 2)
There is no exact equivalent but findall
There is a comparison list of vocabularies for Julia vs R; which is on the list
http://www.johnmyleswhite.com/notebook/2012/04/09/comparing-julia-and-rs-vocabularies/
However, according to the list Julia's find is equivalent to R's which as answered by others.
The equivalent of R's which is Julia's findall:
y = [1, 2, 3, 4]
findall(y .> 2)

Is there a way to mimic R's higher order (binary) function shorthand syntax within spark or pyspark?

In R, I can write the following:
## Explicit
Reduce(function(x,y) x*y, c(1, 2, 3))
# returns 6
However, I can also do this less explicitly with the following:
## Less explicit
Reduce(`*`, c(1, 2, 3))
# also returns 6
In pyspark, I could do the following:
rdd = sc.parallelize([1, 2, 3])
rdd.reduce(lambda a, b: a * b)
Question: Can you mimic the "shorthand" (less explicit) syntax of R's Reduce('*', ...) with pyspark or some sort of anonymous function?
In R, you're supplying a binary function. The multiply operator (as with all operators) is actually a binary function. Type
`*`(2, 3)
to see what I mean.
In Python, the equivalent for multiplication is operator.mul.
So:
rdd = sc.parallelize([1, 2, 3])
rdd.reduce(operator.mul)

Slicing and broadcasting multidimensional arrays in Julia : meshgrid example

I recently started learning Julia by coding a simple implementation of Self Organizing Maps. I want the size and dimensions of the map to be specified by the user, which means I can't really use for loops to work on the map arrays because I don't know in advance how many layers of loops I will need. So I absolutely need broadcasting and slicing functions that work on arrays of arbitrary dimensions.
Right now, I need to construct an array of indices of the map. Say my map is defined by an array of size mapsize = (5, 10, 15), I need to construct an array indices of size (3, 5, 10, 15) where indices[:, a, b, c] should return [a, b, c].
I come from a Python/NumPy background, in which the solution is already given by a specific "function", mgrid :
indices = numpy.mgrid[:5, :10, :15]
print indices.shape # gives (3, 5, 10, 15)
print indices[:, 1, 2, 3] gives [1, 2, 3]
I didn't expect Julia to have such a function on the get-go, so I turned to broadcasting. In NumPy, broadcasting is based on a set of rules that I find quite clear and logical. You can use arrays of different dimensions in the same expression as long as the sizes in each dimension match or one of it is 1 :
(5, 10, 15) broadcasts to (5, 10, 15)
(10, 1)
(5, 1, 15) also broadcasts to (5, 10, 15)
(1, 10, 1)
To help with this, you can also use numpy.newaxis or None to easily add new dimensions to your array :
array = numpy.zeros((5, 15))
array[:,None,:] has shape (5, 1, 15)
This helps broadcast arrays easily :
A = numpy.arange(5)
B = numpy.arange(10)
C = numpy.arange(15)
bA, bB, bC = numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:])
bA.shape == bB.shape == bC.shape = (5, 10, 15)
Using this, creating the indices array is rather straightforward :
indices = numpy.array(numpy.broadcast_arrays(A[:,None,None], B[None,:,None], C[None,None,:]))
(indices == numpy.mgrid[:5,:10,:15]).all() returns True
The general case is of course a bit more complicated, but can be worked around using list comprehension and slices :
arrays = [ numpy.arange(i)[tuple([None if m!=n else slice(None) for m in range(len(mapsize))])] for n, i in enumerate(mapsize) ]
indices = numpy.array(numpy.broadcast_arrays(*arrays))
So back to Julia. I tried to apply the same kind of rationale and ended up achieving the equivalent of the arrays list of the code above. This ended up being rather simpler than the NumPy counterpart thanks to the compound expression syntax :
arrays = [ (idx = ones(Int, length(mapsize)); idx[n] = i;reshape([1:i], tuple(idx...))) for (n,i)=enumerate(mapsize) ]
Now I'm stuck here, as I don't really know how to apply the broadcasting to my list of generating arrays here... The broadcast[!] functions ask for a function f to apply, and I don't have any. I tried using a for loop to try forcing the broadcasting:
indices = Array(Int, tuple(unshift!([i for i=mapsize], length(mapsize))...))
for i=1:length(mapsize)
A[i] = arrays[i]
end
But this gives me an error : ERROR: convert has no method matching convert(::Type{Int64}, ::Array{Int64,3})
Am I doing this the right way? Did I overlook something important? Any help is appreciated.
If you're running julia 0.4, you can do this:
julia> function mgrid(mapsize)
T = typeof(CartesianIndex(mapsize))
indices = Array(T, mapsize)
for I in eachindex(indices)
indices[I] = I
end
indices
end
It would be even nicer if one could just say
indices = [I for I in CartesianRange(CartesianIndex(mapsize))]
I'll look into that :-).
Broadcasting in Julia has been modelled pretty much on broadcasting in NumPy, so you should hopefully find that it obeys more or less the same simple rules (not sure if the way to pad dimensions when not all inputs have the same number of dimensions is the same though, since Julia arrays are column-major).
A number of useful things like newaxis indexing and broadcast_arrays have not been implemented (yet) however. (I hope they will.) Also note that indexing works a bit differently in Julia compared to NumPy: when you leave off indices for trailing dimensions in NumPy, the remaining indices default to colons. In Julia they could be said to default to ones instead.
I'm not sure if you actually need a meshgrid function, most things that you would want to use it for could be done by using the original entries of your arrays array with broadcasting operations. The major reason that meshgrid is useful in matlab is because it is terrible at broadcasting.
But it is quite straightforward to accomplish what you want to do using the broadcast! function:
# assume mapsize is a vector with the desired shape, e.g. mapsize = [2,3,4]
N = length(mapsize)
# Your line to create arrays below, with an extra initial dimension on each array
arrays = [ (idx = ones(Int, N+1); idx[n+1] = i;reshape([1:i], tuple(idx...))) for (n,i) in enumerate(mapsize) ]
# Create indices and fill it one coordinate at a time
indices = zeros(Int, tuple(N, mapsize...))
for (i,arr) in enumerate(arrays)
dest = sub(indices, i, [Colon() for j=1:N]...)
broadcast!(identity, dest, arr)
end
I had to add an initial singleton dimension on the entries of arrays to line up with the axes of indices (newaxis had been useful here...).
Then I go through each coordinate, create a subarray (a view) on the relevant part of indices, and fill it. (Indexing will default to returning subarrays in Julia 0.4, but for now we have to use sub explicitly).
The call to broadcast! just evaluates the identity function identity(x)=x on the input arr=arrays[i], broadcasts to the shape of the output. There's no efficiency lost in using the identity function for this; broadcast! generates a specialized function based on the given function, number of arguments, and number of dimensions of the result.
I guess this is the same as the MATLAB meshgrid functionality. I've never really thought about the generalization to more than two dimensions, so its a bit harder to get my head around.
First, here is my completely general version, which is kinda crazy but I can't think of a better way to do it without generating code for common dimensions (e.g. 2, 3)
function numpy_mgridN(dims...)
X = Any[zeros(Int,dims...) for d in 1:length(dims)]
for d in 1:length(dims)
base_idx = Any[1:nd for nd in dims]
for i in 1:dims[d]
cur_idx = copy(base_idx)
cur_idx[d] = i
X[d][cur_idx...] = i
end
end
#show X
end
X = numpy_mgridN(3,4,5)
#show X[1][1,2,3] # 1
#show X[2][1,2,3] # 2
#show X[3][1,2,3] # 3
Now, what I mean by code generation is that, for the 2D case, you can simply do
function numpy_mgrid(dim1,dim2)
X = [i for i in 1:dim1, j in 1:dim2]
Y = [j for i in 1:dim1, j in 1:dim2]
return X,Y
end
and for the 3D case:
function numpy_mgrid(dim1,dim2,dim3)
X = [i for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Y = [j for i in 1:dim1, j in 1:dim2, k in 1:dim3]
Z = [k for i in 1:dim1, j in 1:dim2, k in 1:dim3]
return X,Y,Z
end
Test with, e.g.
X,Y,Z=numpy_mgrid(3,4,5)
#show X
#show Y
#show Z
I guess mgrid shoves them all into one tensor, so you could do that like this
all = cat(4,X,Y,Z)
which is still slightly different:
julia> all[1,2,3,:]
1x1x1x3 Array{Int64,4}:
[:, :, 1, 1] =
1
[:, :, 1, 2] =
2
[:, :, 1, 3] =
3
julia> vec(all[1,2,3,:])
3-element Array{Int64,1}:
1
2
3

Assignment in R language

I am wondering how assignment works in the R language.
Consider the following R shell session:
> x <- c(5, 6, 7)
> x[1] <- 10
> x
[1] 10 6 7
>
which I totally understand. The vector (5, 6, 7) is created and bound to
the symbol 'x'. Later, 'x' is rebound to the new vector (10, 6, 7) because vectors
are immutable data structures.
But what happens here:
> c(4, 5, 6)[1] <- 10
Error in c(4, 5, 6)[1] <- 10 :
target of assignment expands to non-language object
>
or here:
> f <- function() c(4, 5, 6)
> f()[1] <- 10
Error in f()[1] <- 10 : invalid (NULL) left side of assignment
>
It seems to me that one can only assign values to named data structures (like 'x').
The reason why I am asking is because I try to implement the R language core and I am unsure
how to deal with such assignments.
Thanks in advance
It seems to me that one can only assign values to named data structures (like 'x').
That's precisely what the documentation for ?"<-" says:
Description:
Assign a value to a name.
x[1] <- 10 doesn't use the same function as x <- c(5, 6, 7). The former calls [<- while the latter calls <-.
As per #Owen's answer to this question, x[1] <- 10 is really doing two things. It is calling the [<- function, and it is assigning the result of that call to x.
So what you want to achieve your c(4, 5, 6)[1] <- 10 result is:
> `[<-`(c(4, 5, 6),1, 10)
[1] 10 5 6
You can make modifications to anonymous functions, but there is no assignment to anonymous vectors. Even R creates temporary copies with names and you will sometimes see error messages that reflect that fact. You can read this in the R language definition on page 21 where it deals with the evaluation of expressions for "subset assignment" and for other forms of assignment:
x[3:5] <- 13:15
# The result of this commands is as if the following had been executed
`*tmp*` <- x
x <- "[<-"(`*tmp*`, 3:5, value=13:15)
rm(`*tmp*`)
And there is a warning not to use *tmp* as an object name because it would be overwritting during the next call to [<-

Resources