What's Julia's equivalent of R's replicate and rep? - julia

In R replicate(n, expr) repeatedly run the expr expression n times in contrast to rep(value, n) which repeats the value n times.
What's Julia's equivalent of R's replicate and rep?
Eg. in R
rep(1:3, 3) yields c(1:3, 1:3, 1:3)
and replicate(3, runif(1)) generates 3 random numbers from the uniform distribute (i.e. it ran runif(1) 3 times.

I may be late to the party, but here is some R and Julia code that combines suggestions from the comments:
# R version 3.5.1 of rep function
> rep(1:3, 3)
[1] 1 2 3 1 2 3 1 2 3
# Julia version 1.0.0 of repeat function
julia> repeat(1:3, 3)
9-element Array{Int64,1}:
1
2
3
1
2
3
1
2
3
# R version 3.5.1 for replicate function
> replicate(3, runif(1))
[1] 0.3849424 0.3277343 0.6021007
# Julia version 1.0.0 of an array comprehension
julia> [rand() for i in 1:3]
3-element Array{Float64,1}:
0.8076220876500786
0.9700908450487538
0.14006111319509862

Related

Behavour of the rpois function

I'm trying to understand a proposed solution for a University test.
Let me assume that we have created a random variable with
set.seed(123)
R <- 5
X <- rexp(R, 2)
So the content of X is
0.42172863 0.28830514 0.66452743 0.01578868 0.02810549
In the solutions of the problem I find
Y <- rpois(R, exp(X / 4))
where the content of exp(X / 4) is
1.111191 1.074737 1.180729 1.003955 1.007051
where, contrary to my expectations the second argument is an array instead of being a scalar.
If I calculate
print(rpois(R, 1.111191))
print(rpois(R, 1.074737))
print(rpois(R, 1.180729))
print(rpois(R, 1.003955))
print(rpois(R, 1.007051))
I get
2 1 1 3 1
1 1 0 2 0
0 1 3 3 2
1 4 1 1 1
1 0 0 3 2
while for rpois(R, exp(X / 4)) I get
1 2 0 1 2
How are the two results related?
It's a behaviour I can't find explained anywhere.
R makes its functions vectorized wherever it's reasonable to do so.
In particular, in the function call rpois(R, lambda), R specifies the number of samples to take, and lambda is the vector of means, which is recycled to match R. In other words, if lambda is a single value then the same mean will be used for each Poisson draw; if it is a vector of length R, then each element of the vector will be used for the corresponding Poisson draw.
So the equivalent of Y <- rpois(R, exp(X / 4)) would be
Y <- c(
rpois(1, exp(X[1]/4),
rpois(1, exp(X[2]/4),
rpois(1, exp(X[3]/4),
...
)
We could also do this with a for loop:
Y <- numeric(R) ## allocate a length-R numeric vector
for (i in seq(R)) {
Y[i] <- rpois(1, exp(X[i]/4))
}
Using the vectorized version whenever it's available is best practice; it's faster and requires less code (therefore easier to understand).

R keep randomly generating numbers until all numbers within specified range are present

My aim is to randomly generate a vector of integers using R, which is populated by numbers between 1-8. However, I want to keep growing the vector until all the numbers between 1:8 are represented at least once, e.g. 1,4,6,2,2,3,5,1,4,7,6,8.
I am able to generate single numbers or a sequence of numbers using sample
x=sample(1:8,1, replace=T)
>x
[1] 6
I have played around with the repeat function to see how it might work with sample and I can at least get the generation to stop when one specific number occurs, e.g.
repeat {
print(x)
x = sample(1:8, 1, replace=T)
if (x == 3){
break
}
}
Which gives:
[1] 3
[1] 6
[1] 6
[1] 6
[1] 6
[1] 6
[1] 2
I am struggling now to work out how to stop number generation once all numbers between 1:8 are present. Additionally, I know that the above code is only printing the sequence as it is generated and not storing it as a vector. Any advice pointing me in the right direction would be really appreciated!
This is fine for 1:8 but might not always be a good idea.
foo = integer(0)
set.seed(42)
while(TRUE){
foo = c(foo, sample(1:8, 1))
if(all(1:8 %in% foo)) break
}
foo
# [1] 8 8 3 7 6 5 6 2 6 6 4 6 8 3 4 8 8 1
If you have more than 1:8, it may be better to obtain the average number of tries (N) required to get all the numbers at least once and then sample N numbers such that all numbers are sampled at least once.
set.seed(42)
vec = 1:8
N = ceiling(sum(length(vec)/(1:length(vec))))
foo = sample(c(vec, sample(vec, N - length(vec), TRUE)))
foo
# [1] 3 6 8 3 8 8 6 4 5 6 1 6 4 6 6 3 5 7 2 2 7 8
Taking cue off of d.b, here's a slightly more verbose method that is more memory-efficient (and a little faster too, though I doubt speed is your issue):
Differences:
pre-allocate memory in chunks (size 100 here), mitigates the problem with extend-by-one vector work; allocating and extending 100 (or even 1000) at a time is much lower cost
compare only the newest number instead of all numbers each time (the first n-1 numbers have already been tabulated, no need to do that again)
Code:
microbenchmark(
r2evans = {
emptyvec100 <- integer(100)
counter <- 0
out <- integer(0)
unseen <- seq_len(n)
set.seed(42)
repeat {
if (counter %% 100 == 0) out <- c(out, emptyvec100)
counter <- counter+1
num <- sample(n, size=1)
unseen <- unseen[unseen != num]
out[counter] <- num
if (!length(unseen)) break
}
out <- out[1:counter]
},
d.b = {
foo = integer(0)
set.seed(42)
while(TRUE){
foo = c(foo, sample(1:n, 1))
if(all(1:n %in% foo)) break
}
}, times = 100, unit = 'us')
# Unit: microseconds
# expr min lq mean median uq max neval
# r2evans 1090.007 1184.639 1411.531 1228.947 1320.845 11344.24 1000
# d.b 1242.440 1372.264 1835.974 1441.916 1597.267 14592.74 1000
(This is intended neither as code-golf nor speed-optimization. My primary goal is to argue against extend-by-one vector work, and suggest a more efficient comparison technique.)
As d.b further suggested, this works fine for 1:8 but may run into trouble with larger numbers. If we extend n up:
(Edit: with d.b's code changes, the execution times are much closer, and not nearly as exponential looking. Apparently the removal of unique had significant benefits to his code.)

Translate mathematical function into R

I have this mathematical function
I have written R code:
result <- 0
for (i in length(v)) {
result <- abs(x-v[i])
}
return(result)
to compute the function.
However, this does not seems efficient to me? How to implement this sum with the R sum() function?
I appreciate your answer!
sum(abs(x-v)) show be enough, no need the for loop, since arithmetic operations in R are vectorized
# Example
> x <- 5
> v <- 1:10
> abs(x-v)
[1] 4 3 2 1 0 1 2 3 4 5
> sum(abs(x-v))
[1] 25

Apply a function on a list of similar size tables, cell by cell

We are given a list of n data.frame or matrix of the same size (r by c), we need to apply a function over each cell of all tables and having result as a data.frame or matrix of the same size (r by c again).
For example:
a <- matrix(0:5, 2, 3)
b <- matrix(5:0, 2, 3)
c <- matrix(1, 2, 3)
l <- list(a, b, c)
foo(l, mean) # should retrun
2 2 2
2 2 2
# For instance the top-left cell of 3 given matrices are 0, 5, and 1, and the mean is 2
# For all other cells, the mean of the values in 3 matrices will be 2
There are many ways to do the job, but I am looking for a very fast and short solution
Here's an R base solution using simplify2array function
apply(simplify2array(l),c(1,2),mean)
[,1] [,2] [,3]
[1,] 2 2 2
[2,] 2 2 2
Note that simplify2array(l) does exact the same as abind(l,along = 3)
Use the abind package:
library(abind)
apply(abind(l,along = 3),c(1,2),mean)
and of course a speedier version:
rowMeans(abind(l,along = 3),dims = 2)

R: first N of all permutations

I'm looking for a function that
can list all n! permutations of a given input vector (typically just the sequence 1:n)
can also list just the first N of all n! permutations
The first requirement is met, e.g., by permn() from package combinat, permutations() from package e1071, or permutations() from package gtools. However, I'm positive that there is yet another function from some package that also provides the second feature. I used it once, but have since forgotten its name.
Edit:
The definition of "first N" is arbitrary: the function just needs an internal enumeration scheme which is always followed, and should break after N permutations are computed.
As Spacedman correctly pointed out, it's crucial that the function does not compute more permutations than actually needed (to save time).
Edit - solution: I remembered what I was using, it was numperm() from package sna. numperm(4, 7) gives the 7th permutation of elements 1:4, for the first N, one has to loop.
It seems like the best way to approach this would be to construct an iterator that could produce the list of permutations rather than using a function like permn which generates the entire list up front (an expensive operation).
An excellent place to look for guidance on constructing such objects is the itertools module in the Python standard library. Itertools has been partially re-implemented for R as a package of the same name.
The following is an example that uses R's itertools to implement a port of the Python generator that creates iterators for permutations:
require(itertools)
permutations <- function(iterable) {
# Returns permutations of iterable. Based on code given in the documentation
# of the `permutation` function in the Python itertools module:
# http://docs.python.org/library/itertools.html#itertools.permutations
n <- length(iterable)
indicies <- seq(n)
cycles <- rev(indicies)
stop_iteration <- FALSE
nextEl <- function(){
if (stop_iteration){ stop('StopIteration', call. = FALSE) }
if (cycles[1] == 1){ stop_iteration <<- TRUE } # Triggered on last iteration
for (i in rev(seq(n))) {
cycles[i] <<- cycles[i] - 1
if ( cycles[i] == 0 ){
if (i < n){
indicies[i:n] <<- c(indicies[(i+1):n], indicies[i])
}
cycles[i] <<- n - i + 1
}else{
j <- cycles[i]
indicies[c(i, n-j+1)] <<- c(indicies[n-j+1], indicies[i])
return( iterable[indicies] )
}
}
}
# chain is used to return a copy of the original sequence
# before returning permutations.
return( chain(list(iterable), new_iterator(nextElem = nextEl)) )
}
To misquote Knuth: "Beware of bugs in the above code; I have only tried it, not proved it correct."
For the first 3 permutations of the sequence 1:10, permn pays a heavy price for computing unnecessary permutations:
> system.time( first_three <- permn(1:10)[1:3] )
user system elapsed
134.809 0.439 135.251
> first_three
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 1 2 3 4 5 6 7 8 10 9
[[3]]
[1] 1 2 3 4 5 6 7 10 8 9)
However, the iterator returned by permutations can be queried for only the first three elements which spares a lot of computations:
> system.time( first_three <- as.list(ilimit(permutations(1:10), 3)) )
user system elapsed
0.002 0.000 0.002
> first_three
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10
[[2]]
[1] 1 2 3 4 5 6 7 8 10 9
[[3]]
[1] 1 2 3 4 5 6 7 9 8 10
The Python algorithm does generate permutations in a different order than permn.
Computing all the permutations is still possible:
> system.time( all_perms <- as.list(permutations(1:10)) )
user system elapsed
498.601 0.672 499.284
Though much more expensive as the Python algorithm makes heavy use of loops compared to permn. Python actually implements this algorithm in C which compensates for the inefficiency of interpreted loops.
The code is available in a gist on GitHub. If anyone has a better idea, fork away!
In my version of R/combinat, the function permn() is just over thirty lines long. One way would be to make a copy of permn and change it to stop early.

Resources