I have an old S-plus script, and I would like to reproduce the results in R. The only issue I'm having is the random seed. I know they use different algorithms for the pseudo-random number generation. In the S-plus file the seed was set using:
set.seed(337)
The relevant information I could find is
S-plus set seed
R set seed
From this documentation it looks like S-plus used the "Super-duper" algorithm for pseudo-random number generation. R has this option, but it says it takes in 2 integers, while S-plus only requires 1 integer between 0 and 1000. Furthermore the R doc says
The two seeds are the Tausworthe and congruence long integers, respectively.
A one-to-one mapping to S's .Random.seed[1:12] is possible but we will not
publish one, not least as this generator is not exactly the same as that in
recent versions of S-PLUS.
I'm not quite sure what this means. So does anyone know if it would be possible to replicate results?
An old post on the R mailing list tries to get same results in S/R/Splus.
Related
I am writing an R code where, I am using set.seed() function in the whole program to generate the data and then using it in a function , ultimately plotting the function and then using optim to get the minima. But now the issue is the graphs of the function changes if I change the seed value and sometimes doesn't even produce a concave graph but an exponential graph.
I am not able to understand why this is happening and how I can fix it. If anyone can provide me with any reference to read in this subject or any suggestions as to what can be done, that will be great.
Thanks in advance
set.seed() configures the random number generator to start from that seed. This may be a bit more complicated, depending on the precise implementation, but the effects are always the same: The sequence of numbers will be identical.
This is useful in a number of applications where you want some randomness, but you want to get the same result if you re-run the code. Say for example you need to randomly sample your data, but since you are debugging, it's useful if you get the same sample so that the bugs don't disappear on you.
Also if you want other people to replicate the results, you simply pick some random number as the seed and tell them that you used that seed. Anything in the algorithm based on random numbers will behave the same because you are both using the same sequence of numbers.
For your graph problem you need to share some code so that people understand what you are doing. It's very hard to guess what went wrong. At the outset it seems that you algorithm is very strongly influenced by the random numbers (usually not a good sign).
In simple, if you set a seed, and extract a random number, the random number will be always the same. If you not set a seed, every time you choose a number the number will be different. The seed permit you to replicate your experiment.
I am running a function with random factor in R, and I realized that I forgot to run set.seed() before running this function.
Is there a way I can retrieve the seed value R is using (I suppose it will be an arbitrary number, but doesn't matter) so I can reproduce the execution?
You can get the current random state using .Random.seed. However, you'd need the previous state for reproducing your results and it's not possible to get that.
(Well, the Mersenne Twister is not cryptographically secure, so maybe it could be possible, but certainly not practical.)
Today i first met a set.seed function in R.
It's useful in same times, and i understand how to use it. But i have a small problem - how to choose a real good number as a first parameter in this function?
From that question a get another - how the first parameter from set.seed() function influence into random in R? Maybe if i understand the last, i will take the answer of first.
Thanks a lot.
In a nutshell:
By setting set.seed() you specify the starting-point for all "pseudo random number generators" that create the random numbers in R. See ?set.seed
As computers are very deterministic there is nothing like a real "random number".
Computers always have to use an algorithm to generate so called "pseudo random numbers".
These generators/algorithms work (very often) iterative so the next number is influenced by its predecessor. set.seed() defines the initial predecessor and thereby makes pseudo random numbers reproducible. Which number you choose is irrelevant in most cases.
(see here: http://en.wikipedia.org/wiki/Pseudorandom_number_generator)
pvclust is great for cluster analysis in R. However, when running it as part of a batch operation, it is annoying to get different results for the same data. Obviously, there are many "correct" clusterings of the same data, and it seems that pvclust uses some randomness to determine the clusters of a specific run. But is there any way to get deterministic results?
I want to be able to present a minimal, repeatable analysis package: the data plus an R script, and a separate written document that contains my interpretations of the clustering. It is then possible for others to add to the analysis, e.g. by changing the aesthetic appearance of plots. Now, the interpretations will always be out of sync with what someone else gets when they run the script containing pvclust.
Not only for cluster analysis, but when there is randomness involved, you can fix the random number generator so you always get the same results.
Try:
set.seed(seed=123)
# your code here
The seed can be any integer, or something that can be converted to integer. And that's all.
i've only used k means. There I had to set the number of 'runs' or iterations to a higher value than default to get the same custers at consecutive runs.
I am generating data in R and Matlab for 2 separate analyses and I want to determine if the results in the two systems are equivalent. Between the 2 sets of code there is inherent variability due to the random number generator. If possible, I would like to remove this source of variability. Does anyone know of a way to set the same starting seed in both Matlab and R? I provide some demo code below.
%Matlab code
seed=rng %save seed
matlabtime1=randn(1,5) %generate 5 random numbers from standard normal
rng(seed) %get saved seed
matlabtime2=randn(1,5) %generates same output as matlabtime1
#R code
set.seed(3) #save seed
r.time1=rnorm(5) #generate 5 random numbers from standard normal
set.seed(3) #get saved seed
r.time2=rnorm(5) #generates same output as r.time1
Essentially, I want the results from matlabtime2 and r.time2 to match exactly. (The code I am using is more complex than this illustrative demo so rewriting in one language only is not really a feasible option.)
I'm finding it difficult to get the same random numbers in R and
MATLAB - even using the same seed for the same algorithm (Mersenne
Twister).
I guess it's about how they are implemented - even with the same seed, they have different initial states (you can print and inspect the states both in R and MATLAB).
In the past when I've needed this, I generated random input, saved it as a file on disk, and fed it to both MATLAB and R.
Another option is to write C wrappers for a random number generator (there are many of these in C/C++) both for R and MATLAB and invoke those instead of the built-in ones.