Same random seed in Matlab and R - r

I am generating data in R and Matlab for 2 separate analyses and I want to determine if the results in the two systems are equivalent. Between the 2 sets of code there is inherent variability due to the random number generator. If possible, I would like to remove this source of variability. Does anyone know of a way to set the same starting seed in both Matlab and R? I provide some demo code below.
%Matlab code
seed=rng %save seed
matlabtime1=randn(1,5) %generate 5 random numbers from standard normal
rng(seed) %get saved seed
matlabtime2=randn(1,5) %generates same output as matlabtime1
#R code
set.seed(3) #save seed
r.time1=rnorm(5) #generate 5 random numbers from standard normal
set.seed(3) #get saved seed
r.time2=rnorm(5) #generates same output as r.time1
Essentially, I want the results from matlabtime2 and r.time2 to match exactly. (The code I am using is more complex than this illustrative demo so rewriting in one language only is not really a feasible option.)

I'm finding it difficult to get the same random numbers in R and
MATLAB - even using the same seed for the same algorithm (Mersenne
Twister).
I guess it's about how they are implemented - even with the same seed, they have different initial states (you can print and inspect the states both in R and MATLAB).
In the past when I've needed this, I generated random input, saved it as a file on disk, and fed it to both MATLAB and R.
Another option is to write C wrappers for a random number generator (there are many of these in C/C++) both for R and MATLAB and invoke those instead of the built-in ones.

Related

Different random number between SPSS and R by using the same seed

I'm generating random values from a Bernoulli distribution in both SPSS and R.
In R:
set.seed(9191972)
Y1C <- rbinom(nrow(data3), 1, 0.70)
In SPSS:
SET SEED=9191972.
IF (MISSING(Y1) = 0) Campione=RV.BERNOULLI(0.70).
EXECUTE.
I don't get why the generated distributions differ. I've also tried to set the parameter "kind" in the function set.seed() in R but still got different values from those of SPSS.
Also, I run R in SO Windows while SPSS in Mac. I'm wondering whether it could be due to the different SO used.
R and SPSS don't use the same random number generator. Even if they used very similar generators, it is unlikely that any specific implementation would be the same.
You need to think of another way to solve your problem.

Why should we use set.seed() before apply knn() in R?

When I read An Introduction To Statistical Learning, I am puzzled by the following passage:
We set a random seed before we apply knn() because if several
observations are tied as nearest neighbors, then R will randomly break
the tie. Therefore, a seed must be set in order to ensure
reproducibility of results.
Could anyone please tell me why is the result of KNN random?
The reason behind that if we use set.seed() before knn() in R then it helps to select only one random number because if we run knn() then random numbers are generated but if we want that the numbers do not change then we can use it.

How to store random data created in r for further use?

I am using Rstudio and I created a random data like this:
n<-500
u<-runif(n)
This data is now stored but obviously once I run the code again it will change. How could I store it to use it again? If the number of points was small I would just define a vector and manually write the numbers like
DATA<-c(1,2,3,4)
But obviously doing this for 500 points is not very practical. Thank you.
In such cases, i.e. when using pseudo random number generators, a common approach is to set the seed:
set.seed(12345)
You have to store the seed that you used for the simulation, so that in future you's set the same seed and get the same sequence of numbers. The seed indicates that the numbers are not truly random, they're pesudo random. The same seed will generate the same numbers. There are services such as RANDOM which attempt to generate true random numbers.

Reproduce S-plus result in R

I have an old S-plus script, and I would like to reproduce the results in R. The only issue I'm having is the random seed. I know they use different algorithms for the pseudo-random number generation. In the S-plus file the seed was set using:
set.seed(337)
The relevant information I could find is
S-plus set seed
R set seed
From this documentation it looks like S-plus used the "Super-duper" algorithm for pseudo-random number generation. R has this option, but it says it takes in 2 integers, while S-plus only requires 1 integer between 0 and 1000. Furthermore the R doc says
The two seeds are the Tausworthe and congruence long integers, respectively.
A one-to-one mapping to S's .Random.seed[1:12] is possible but we will not
publish one, not least as this generator is not exactly the same as that in
recent versions of S-PLUS.
I'm not quite sure what this means. So does anyone know if it would be possible to replicate results?
An old post on the R mailing list tries to get same results in S/R/Splus.

Cluster analysis in R: How can I get deterministic results from pvclust?

pvclust is great for cluster analysis in R. However, when running it as part of a batch operation, it is annoying to get different results for the same data. Obviously, there are many "correct" clusterings of the same data, and it seems that pvclust uses some randomness to determine the clusters of a specific run. But is there any way to get deterministic results?
I want to be able to present a minimal, repeatable analysis package: the data plus an R script, and a separate written document that contains my interpretations of the clustering. It is then possible for others to add to the analysis, e.g. by changing the aesthetic appearance of plots. Now, the interpretations will always be out of sync with what someone else gets when they run the script containing pvclust.
Not only for cluster analysis, but when there is randomness involved, you can fix the random number generator so you always get the same results.
Try:
set.seed(seed=123)
# your code here
The seed can be any integer, or something that can be converted to integer. And that's all.
i've only used k means. There I had to set the number of 'runs' or iterations to a higher value than default to get the same custers at consecutive runs.

Resources