As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I was reading an interesting post on R-bloggers on ''Object Oriented Programming in R using S4 Classes". The book "Statistics and Computing" written by Venables and Ripley has some chapters introducing S3 classes and S4 classes in S and R and have been useful to me in terms of understanding the concept of object oriented programming in R.
Do you know of any useful book(s) introducing Object Oriented programming in R in more details with examples like the one in the R blogger?
As the commenters said, "Software for Data Analysis" by John chambers is excellent. I would also recommend the R manual "Writing R Extensions", although it can get quite technical. For more introductory sources, I would look into these documents: How Methods Work by John Chambers, this S4 tutorial by Christophe Genolini, and this useful powerpoint, which is a nice high-level overview.
Related
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm new in random numbers generators field. I would like to use the Mersenne-Twister algorithm since it has the longest period respect to other algorithms.
Which R function implements this algorithm? I used
"?sample" but no information about which algorithm is used, is there.
Another question is: which is the best seed to set in the random number generation?
Finally: is R the best tool to generate random numbers?
The default algorithm used by R is Mersenne-Twister.
There is no best seed. It depends on your application. Do you want it to be the same set of numbers every time you run your code? Use the same seed(s). If not, perhaps using the current time will suit your needs.
The best tool to generate random numbers is something that does not use a deterministic PRNG (such as Mersenne-Twister). Instead look into something such as random.org. I think it will really benefit you to read up on True randomness vs. Pseudo randomness.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
When providing example code, "foo" and "bar" are commonly used to represent "arbitrary values", to the point they are almost standard notation.
Are there more "standard" terms for when you want to show more than two arbitrary values?
ie. Is there a standard list of terms whose first two are "foo" and "bar"?
Those are known as "Metasyntatic variables". I would not consider any of them standard, but Wikipedia offers the following as common in the U.S.:
foo
bar
baz
qux
quux
corge
grault
garply
waldo
fred
plugh
xyzzy
thud
Source: http://en.wikipedia.org/wiki/Metasyntactic_variable
Personally, I have only seen foo, bar, baz and xyzzy used from the list. The list was cited from RFC 3092.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I'm looking for a basic implementation of EM clustering in R. So far, what I can find seem to be specialized or 'some-assembly-required' versions of it. For example, the implementation from mclust defines a range of parameters that I'm not familiar with and doesn't take a parameter for k. What I am looking for is something closer to the kmeans implementation that comes with R, or ELKI's implementation of EM.
How about reading the documentation for mclust?
http://cran.r-project.org/web/packages/mclust/mclust.pdf
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/Clustering/Expectation_Maximization_%28EM%29
Make sure to choose the desired model (probably VVV?), and if you want a fixed k, then set G to a single value instead of the default 1:9.
Try this:
library(mclust)
m <- Mclust(data, 4:4, c("VVV"), control=emControl(tol=e1-4))
I must say I don't use or like R much. It has tons of stuff, but it doesn't fit together. It's just random stuff written independently by random people and then uploaded to a central repository. But there is no QA at all, and nobody that makes libraries compatible.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I have a dataframe compose of 25 col and ~1M rows, split into 12 files, now I need to import them and then use some reshape package to do some data management. Each file is too large that I have to look for some "non-RAM" solution for importing and data processing, current I don't need to do any regression, I will have some descriptive statistics about the dataframe only.
I searched a bit and found two packages: ff and filehash, I read filehash manual first and found that it seems simple, just added some code on importing the dataframe into a file, the rest seems to be similar as usual R operations.
I haven't tried ff yet, as it comes with lots of different class, and I wonder if it worth investing time for understanding ff itself before my real work begins. But filehash package seems to be static for sometime and there's little discussion about this package, I wonder if filehash has become less popular, or even become obsolete.
Can anyone help me to choose which package to use? Or can anyone tell me what is the difference/ pros-and-cons between them? Thanks.
update 01
I am currently using filehash for importing the dataframe, and realize that it dataframe imported using filehash should be considered as readonly, as all the further modification in that dataframe will not be stored back to the file, unless you save it again, which is not very convenient in my view, as I need to remind myself to do the saving. Any comment on this?
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some good libraries for handling mathematical functions. these types of things(Preferably Open Source).
In particular:
Derivative of a function.
Solving a function for a particular variable, not always for a real value, but in terms of other variables.
Ex. Solving x^2 + y^2 = y for y in terms of x.
Graphing functions.
Ability to handle piece-wise functions.
scipy or gsl
What I was looking for is symbolic mathmetatics.
Sympy is a very good python library for this with very little special syntax to learn.