I'd like to generate identical random numbers in R and Julia. Both languages appear to use the Mersenne-Twister library by default, however in Julia 1.0.0:
julia> using Random
julia> Random.seed!(3)
julia> rand()
0.8116984049958615
Produces 0.811..., while in R:
set.seed(3)
runif(1)
produces 0.168.
Any ideas?
Related SO questions here and here.
My use case for those who are interested: Testing new Julia code that requires random number generation (e.g. statistical bootstrapping) by comparing output to that from equivalent libraries in R.
That is an old problem.
Paul Gilbert addressed the same issue in the late 1990s (!!) when trying to assert that simulations in R (then then newcomer) gave the same result as those in S-Plus (then the incumbent).
His solution, and still the golden approach AFAICT: re-implement in fresh code in both languages as the this the only way to ensure identical seeding, state, ... and whatever else affects it.
Pursuing the RCall suggestion made by #Khashaa, it's clear that you can set the seed and get the random numbers from R.
julia> using RCall
julia> RCall.reval("set.seed(3)")
RCall.NilSxp(16777344,Ptr{Void} #0x0a4b6330)
julia> a = zeros(Float64,20);
julia> unsafe_copy!(pointer(a), RCall.reval("runif(20)").pv, 20)
Ptr{Float64} #0x972f4860
julia> map(x -> #printf("%20.15f\n", x), a);
0.168041526339948
0.807516399072483
0.384942351374775
0.327734317164868
0.602100674761459
0.604394054040313
0.124633444240317
0.294600924244151
0.577609919011593
0.630979274399579
0.512015897547826
0.505023914156482
0.534035353455693
0.557249435689300
0.867919487645850
0.829708693316206
0.111449153395370
0.703688358888030
0.897488264366984
0.279732553754002
and from R:
> options(digits=15)
> set.seed(3)
> runif(20)
[1] 0.168041526339948 0.807516399072483 0.384942351374775 0.327734317164868
[5] 0.602100674761459 0.604394054040313 0.124633444240317 0.294600924244151
[9] 0.577609919011593 0.630979274399579 0.512015897547826 0.505023914156482
[13] 0.534035353455693 0.557249435689300 0.867919487645850 0.829708693316206
[17] 0.111449153395370 0.703688358888030 0.897488264366984 0.279732553754002
** EDIT **
Per the suggestion by #ColinTBowers, here's a simpler/cleaner way to access R random numbers from Julia.
julia> using RCall
julia> reval("set.seed(3)");
julia> a = rcopy("runif(20)");
julia> map(x -> #printf("%20.15f\n", x), a);
0.168041526339948
0.807516399072483
0.384942351374775
0.327734317164868
0.602100674761459
0.604394054040313
0.124633444240317
0.294600924244151
0.577609919011593
0.630979274399579
0.512015897547826
0.505023914156482
0.534035353455693
0.557249435689300
0.867919487645850
0.829708693316206
0.111449153395370
0.703688358888030
0.897488264366984
0.279732553754002
See:
?set.seed
"Mersenne-Twister":
From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 - 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.
And you might see if you can link to the same C code from both languages. If you want to see the list/vector, type:
.Random.seed
Related
In many Machine Learning use cases, you need to create an array filled with ones, with specific dimensions. In Python, I would use np.ones((2, 1)). What is the analog version of this in Julia?
Julia has a built in ones function which can be used as follows:
julia> ones(1,2)
1×2 Matrix{Float64}:
1.0 1.0
You can read more about the ones function in the Julia docs.
The answer by Logan is excellent. You can just use the ones function.
BUT, you can also often not use it.
For instance, a common use of a vector of ones is to multiply that vector times another vector so you get a matrix where each row just has the same value as in the corresponding element of the matrix. Then you can add that matrix to something. This allows you add the values of a vector to the corresponding rows of a matrix. You get code like this:
>>> A = np.random.rand(4,3)
>>> x = np.random.rand(4)
array([0.01250529, 0.9620139 , 0.70991563, 0.99795451])
>>> A + np.reshape(np.ones(3), (1,3)) * np.reshape(x, (4,1))
array([[0.09141967, 0.83982525, 0.16960596],
[1.39104681, 1.10755182, 1.60876696],
[1.14249757, 1.68167344, 1.64738165],
[1.10653393, 1.45162139, 1.23878815]])
This is actually a lot of extra work for the computer because Python can't optimize this and a lot of extra work is going on. You could also use so called broadcasting to do this extension more simply, but you still have to get x into the right shape:
>>> A + x
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,3) (4,)
>>> A + np.reshape(x, (4,1))
array([[0.09141967, 0.83982525, 0.16960596],
[1.39104681, 1.10755182, 1.60876696],
[1.14249757, 1.68167344, 1.64738165],
[1.10653393, 1.45162139, 1.23878815]])
In Julia, the extension of the vector to the same shape as the matrix you to which you want to add can be done more simply using the broadcast operator. Thus, the code above simplifies to
julia> A = rand(4,3)
4×3 Matrix{Float64}:
0.885593 0.494999 0.534039
0.915725 0.479218 0.229797
0.739122 0.670486 0.247376
0.419879 0.857314 0.652547
julia> x = rand(4)
4-element Vector{Float64}:
0.9574839624590326
0.9736140903654276
0.6051487944513263
0.3581090323172089
julia> A .+ x
4×3 Matrix{Float64}:
1.84308 1.45248 1.49152
1.88934 1.45283 1.20341
1.34427 1.27563 0.852524
0.777988 1.21542 1.01066
One reason that this works better is because there is less noise in the syntax because arrays are primitive to Julia.
Much more importantly, though, compiler sees the use of the broadcast operator and it can generate very efficient code (and can even vectorize it). In fact, x doesn't even have to be an actual vector as long as it has a few of the same methods defined for it.
In fact, if you really do need a vector or matrix of all ones (or some other constant) you can use broadcast with scalars as well
julia> A .+ 1
4×3 Matrix{Float64}:
1.88559 1.495 1.53404
1.91572 1.47922 1.2298
1.73912 1.67049 1.24738
1.41988 1.85731 1.65255
I would like to write a function fun1 with a DataArrays.DataArray y as unique argument. y can be either an integer or a float (in vector or in matrix form).
I have tried to follow the suggestions I have found in stackoverflow (Functions that take DataArrays and Arrays as arguments in Julia) and in the official documentation (http://docs.julialang.org/en/release-0.5/manual/methods/). However, I couldn't write a code enought flexible to deal with the uncertainty around y.
I would like to have something like (but capable of handling numerical DataArrays.DataArray):
function fun1(y::Number)
println(y);
end
Any suggestion?
One options can be to define:
fun1{T<:Number}(yvec::DataArray{T}) = foreach(println,yvec)
Then,
using DataArrays
v = DataArray(rand(10))
w = DataArray(rand(1:10,10))
fun1(v)
#
# elements of v printed as Flaot64s
#
fun1(w)
#
# elements of w printed as Ints
#
A delicate but recurring point to note is the invariance of Julia parametric types which necessitate defining a parametric function. A look at the documentation regarding types should clarify this concept (http://docs.julialang.org/en/release-0.4/manual/types/#types).
This is a follow up of my previous question. The difference is that instead of a one dimensional array i want to get a 2 dimensional array.
I have the following Fortran subroutine:
subroutine test(d, i, nMCd, DF, X)
integer, intent(in) :: d, i, nMCd
double precision, intent(in), dimension(i,nMCd) :: DF
double precision, intent(out), dimension(i,nMCd) :: X
X = DF + DF
end subroutine test
In R the code is simple:
input <- data.frame(A=c(11,12), B=c(21, 22))
input + input
and I get a 2 by 2 data frame
I am able to compile it for R load it and run it.
system("R CMD SHLIB ./Fortran/mytest.f90")
dyn.load("./Fortran/mytest.so")
X <- .Fortran("test", d = as.integer(1), i = nrow(input), nMCd = ncol(input), DF = unlist(input), X = numeric(nrow(input)*ncol(input)))$X
But I get a vector of length 4 instead of a 2x2 matrix or data frame. I tried X = numeric(nrow(input), ncol(input)) but it does not work
The only solution I can think is running after I run the fortran function
matrix(X,nrow = nrow(input),ncol = ncol(input))
Thanks!
I've reviewed the documentation for .Fortran, .Call and "Writing R extensions" and have not found any instances where a Fortran subroutine returns a matrix. Furthermore, there is no matrix type in the ?.Fortran help page, so I think my comment(s) might be the best solution:
in R you can also coerce a vector to matrix by adding a dimension: dim(X) <- c(2,2).
Or even with more generality: dim(X) <- dim(input)
Obviously I cannot claim this to be from high authority, since I've not done any Fortran programming before. If you were interested in writing code that interacts with R objects in place, you might want to consider studying the data.table package's code. Most such efforts use C or C++. You might also consider the Rcpp or inline package interfaces. The Fortran interface is mostly used to take advantage of the many numerical functions that have been bullet proofed.
I couldn't find an already made function in Julia to compute Pearson's r so I resorted to trying to make it myself however I run into trouble.
code:
r(x,y) = (sum(x*y) - (sum(x)*sum(y))/length(x))/sqrt((sum(x^2)-(sum(x)^2)/length(x))*(sum(y^2)-(sum(y)^2)/length(x)))
if I attempt to run this on two arrays:
b = [4,8,12,16,20,24,28]
q = [5,10,15,20,25,30,35]
I get the following error:
ERROR: `*` has no method matching *(::Array{Int64,1}, ::Array{Int64,1})
in r at none:1
Pearson's r is available in Julia as cor:
julia> cor(b,q)
1.0
When you're looking for functions in Julia, the apropos function can be very helpful:
julia> apropos("pearson")
Base.cov(v1[, v2][, vardim=1, corrected=true, mean=nothing])
Base.cor(v1[, v2][, vardim=1, mean=nothing])
The issue you're running into with your definition is the difference between elementwise multiplication/exponentiation and matrix multiplication/exponentiation. In order to use elementwise behavior as you intend, you need to .* and .^:
r(x,y) = (sum(x.*y) - (sum(x)*sum(y))/length(x))/sqrt((sum(x.^2)-(sum(x)^2)/length(x))*(sum(y.^2)-(sum(y)^2)/length(x)))
With only those three changes, your r definition seems to match Julia's cor to within a few ULPs:
julia> cor(b,q)
1.0
julia> x,y = randn(10),randn(10)
([-0.2384626335813905,0.0793838075714518,2.395918475924737,-1.6271954454542266,-0.7001484742860653,-0.33511064476423336,-1.5419149314518956,-0.8284664940238087,-0.6136547926069563,-0.1723749334766532],[0.08581770755520171,2.208288163473674,-0.5603452667737798,-3.0599443201343854,0.585509815026569,0.3876891298047877,-0.8368409374755644,1.672421071281691,0.19652240951291933,0.9838306761261647])
julia> r(x,y)
0.23514468093214283
julia> cor(x,y)
0.23514468093214275
Julia's cor is defined iteratively (this is the zero-mean implementation — calling cor first subtracts the mean and then calls corzm) which means fewer allocations and better performance. I can't speak to the numerical accuracy.
Your function is trying to multiply two column vectors. You will need to invert transpose one of them. Consider:
> [1,2]*[3,4]
ERROR: `*` has no method matching *(::Array{Int64,1}, ::Array{Int64,1})
but:
> [1,2]'*[3,4]
1-element Array(Int64,1)
11
and:
> [1,2]*[3,4]'
2x2 Array(Int64,2):
3 4
6 8
Now that I find myself spending so much time programming in R, I really want to get back to automated testing (which I learned to do by habit in Perl). Besides being user-friendly, I would also be particularly interested in being able to generate random inputs for tests like Perl's Test::LectroTest or Haskell's QuickCheck. Is there anything similar for R?
See the R package quickcheck on GitHub.
Like Test::LectroTest, the R package quickcheck is a port of QuickCheck, which Koen Claessen and John Hughes wrote for Haskell.
In addition to QuickCheck features, quickcheck also gives a nod to Hadley Wickam's popular testthat R package, by intentionally incorporating his "expectation" functions (which they call "assertions"). In addition to numerical and string tests are tests for failures and warnings, etc.
Here is a simple example using it:
library(quickcheck)
my_square <- function(x){x^2} # the function to test
test( function(x = rinteger()) min(my_square(x)) >= 0 )
# Pass function (x = rinteger())
# min(my_square(x)) >= 0
# [1] TRUE
test( function(x = rdouble())
all.equal(
my_square(x),
x^2
)
)
# Pass function (x = rdouble())
# all.equal(my_square(x), x^2)
# [1] TRUE
The first test ensures that anything generated by my_square is positive. The second test actually replicates the functionality of my_square and checks every output to make sure it is correct.
Note that rinteger() produces a vector of any length consisting of integer values. Other randomly generated input data can be produced using functions like rcharacter, rdouble, and rmatrix.