As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm new in random numbers generators field. I would like to use the Mersenne-Twister algorithm since it has the longest period respect to other algorithms.
Which R function implements this algorithm? I used
"?sample" but no information about which algorithm is used, is there.
Another question is: which is the best seed to set in the random number generation?
Finally: is R the best tool to generate random numbers?
The default algorithm used by R is Mersenne-Twister.
There is no best seed. It depends on your application. Do you want it to be the same set of numbers every time you run your code? Use the same seed(s). If not, perhaps using the current time will suit your needs.
The best tool to generate random numbers is something that does not use a deterministic PRNG (such as Mersenne-Twister). Instead look into something such as random.org. I think it will really benefit you to read up on True randomness vs. Pseudo randomness.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
I am a programmer, but when I am faced complex mathematical formulas I am often stuck.
Please suggest a good video lecture resource that teaches reading math symbols, quantifiers etc.
This formula means nothing without a context. It seems to be the derivation of the partial derivative of Ep with respect to yhp, which turns out to be the negative of the sum of products of δop and wpo, with o ranging from 1 to No:
def partial_of_E_wrt_y(p):
acc = 0
for o in range(1, No):
acc = acc + delta[p][o] * w[p][o]
return -acc
E, y and δ may be tensors because of the use of superscript indexes. This would also means that δopwpo could be a tensor product. Or it could be that the author simply likes using superscript indexes without any association with tensors, a convention I have seen in some texts on machine learning. If δ has not been given any other interpretation, it's possible it stands for the Kronecker delta, which would mean δop = 1 if o=p, and 0 otherwise.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
When I try to visualize my integer data with histogram(mydata,breaks=c(0,n)), R usually doesnt care about how many breaks (usually 1 bar for each sample) do I use and it plots n-1 bars (first two bars are summed into one).
In most cases I use barplot(table(mydata))
And there is one more way to do it
How to separate the two leftmost bins of a histogram in R
but I think its not "clear" way.
So how do you visualize frequency of your integer data?
Which one is right?
Thank you a lot
hist(dataset, breaks=seq(min(dataset)-0.5, max(dataset)+0.5, by=1) )
Another option (for thos situations where you know these are integers would be:
require(lattice)
barchart(table(dataset), horizontal=FALSE)
Or:
barplot(table(dataset))
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for a basic implementation of EM clustering in R. So far, what I can find seem to be specialized or 'some-assembly-required' versions of it. For example, the implementation from mclust defines a range of parameters that I'm not familiar with and doesn't take a parameter for k. What I am looking for is something closer to the kmeans implementation that comes with R, or ELKI's implementation of EM.
How about reading the documentation for mclust?
http://cran.r-project.org/web/packages/mclust/mclust.pdf
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Expectation_Maximization_%28EM%29
Make sure to choose the desired model (probably VVV?), and if you want a fixed k, then set G to a single value instead of the default 1:9.
Try this:
library(mclust)
m <- Mclust(data, 4:4, c("VVV"), control=emControl(tol=e1-4))
I must say I don't use or like R much. It has tons of stuff, but it doesn't fit together. It's just random stuff written independently by random people and then uploaded to a central repository. But there is no QA at all, and nobody that makes libraries compatible.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a dataframe compose of 25 col and ~1M rows, split into 12 files, now I need to import them and then use some reshape package to do some data management. Each file is too large that I have to look for some "non-RAM" solution for importing and data processing, current I don't need to do any regression, I will have some descriptive statistics about the dataframe only.
I searched a bit and found two packages: ff and filehash, I read filehash manual first and found that it seems simple, just added some code on importing the dataframe into a file, the rest seems to be similar as usual R operations.
I haven't tried ff yet, as it comes with lots of different class, and I wonder if it worth investing time for understanding ff itself before my real work begins. But filehash package seems to be static for sometime and there's little discussion about this package, I wonder if filehash has become less popular, or even become obsolete.
Can anyone help me to choose which package to use? Or can anyone tell me what is the difference/ pros-and-cons between them? Thanks.
update 01
I am currently using filehash for importing the dataframe, and realize that it dataframe imported using filehash should be considered as readonly, as all the further modification in that dataframe will not be stored back to the file, unless you save it again, which is not very convenient in my view, as I need to remind myself to do the saving. Any comment on this?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some good libraries for handling mathematical functions. these types of things(Preferably Open Source).
In particular:
Derivative of a function.
Solving a function for a particular variable, not always for a real value, but in terms of other variables.
Ex. Solving x^2 + y^2 = y for y in terms of x.
Graphing functions.
Ability to handle piece-wise functions.
scipy or gsl
What I was looking for is symbolic mathmetatics.
Sympy is a very good python library for this with very little special syntax to learn.