Distributed computing in R [closed] - r

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I worked out an estimator, and I would like check its performance by doing simulation studies with R. I want to repeat the experiment for 500 times. Unfortunately, the computation involved in the estimator is sophisticated. Each replication will take 15 minutes on my desktop. I am looking for some distributed computation approaches with R. How should I start? I googled this topic. There are some many posts about this.

I'd suggest starting with the foreach package. If you're using mac or linux the following is the simplest way to do parallel computing:
# First we register a parallel backend. This will work on mac and linux.
# Windows is more complicated, try the `snow` package.
library(doMC)
registerDoMC(cores=4) # substitute for number of cores you want to run on.
# now we can run things in parallel using foreach
foreach (i = 1:4) %dopar% {
# What's in here will run on a separate core for each iteration.
}
You should read the vignette for foreach as it's quite different to for (especially nested loops) and it is also quite powerful for combining results at the end and returning them.

First step with any R problem as broad as this should be checking the CRAN Task Views. Oh look:
http://cran.r-project.org/web/views/HighPerformanceComputing.html
Note that StackOverflow isn't really the place for asking broad questions that are best answered with 'read that documentation over there' or 'why don't you try using tool X?'

Related

How do I determine which parallel processing package for R to use? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am exploring parallel programming in R and I have a good understanding of how the foreach function works, but I don't understand the differences between parallel,doparallel,doMC,doSNOW,SNOW,multicore, etc.
After doing a bunch of reading it seems that these packages work differently depending on the operating system, and I see some packages use the word multicore, and others use cluster (I am not sure if those are different), but beyond that it isn't clear what advantages or disadvantages each have.
I am working Windows, and I want to calculate standard errors using replicate weights in parallel so I don't have to calculate each replicate one at a time (if I have n cores I should be able to do n replicates at once). I was able to implement it using doSNOW, but it looks like plyr and the R community in general uses doMC so I am wondering if using doSNOW is a mistake.
Regards,
Carl
My understanding is that parallel is a conglomeration of snow and multicore, and is meant to incorporate the best parts of both.
For parallel computing on a single machine, I find parallel to have been very effective.
For parallel computing using a cluster of multiple machines, I've never succeeded in completing the cluster set up using parallel, but have succeeded using snow.
I've never used any of the do* packages, so I'm afraid I'm unable to comment.

Can you recommend a package in R that can be used to count precision, recall and F1-score for multi class classification tasks [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Is there any package that you would recommend which can be used to calculate the precision, F1, recall for multi class classification task in R. I tried to use ROCR but it states that:
ROCR currently supports only evaluation of binary classification tasks
I know that you were looking for a solution in R. That said, this is a link to a nice solution library in Python, using scikit-learn version 0.14. Python is very similar to R in a lot of respects (if you haven't used it before), and this could be a good place to start.
Another place you might want to look, if you are focused on R, is the the PerfMeas package. As I quote, this "Package implements different performance measures for
classification and ranking tasks. AUC, precision at a given recall, F-score for single and multiple classes are available."

Free software for mathematical modeling. Is R a good one? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am looking for a free software for mathematical modeling.
Here is a list of things I might be willing to achieve with this software: Integrating functions, solving differential equations, graph theory, analyzing infinite series, local stability analysis, Taylor series, get eigenvectors, compute the long term behaviour of a system of equations, etc...
Here is a related SE post. I am surprised that nobody is suggesting R. I am currently a R user and already use R for graph theory. Therefore I would appreciate to use R also to make other mathematical modeling. Is R less efficient that Sage, SimPy, Mathematica and others for mathematical modeling? Why? Do you know a manual providing exaplanation for how to make mathematical modeling with R?
Thank you
Sounds like R is your first way to go. It does not make to good sense to compare R with any other tool in such a braod way you are asking for. R packages differ largely in efficiency, some are in fact C tools while others are written in the R language. As a start R can hardly be any wrong and is free.
Matlab might be a stable alternative, Julia is rising but still pre alpha.

Simple MCMC Bayesian Inference in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I'm looking for a simple MCMC Bayesian network Inference function/package in R. Essentially, I just want a function that accepts the matrix containing my samples x my variables ( + optional parameters like burn-in and iteration counts) and returns the adjacency matrix of the inferred network.
I had been using the Matlab toolkit "BayesNet", which offers a simple 'learn_struct_mcmc' function which offers most of what I'm looking for. I'm looking for an equivalent in R.
I've been looking through the packages in http://cran.r-project.org/web/views/Bayesian.html, but haven't seen anything that quite does what I'm looking for. I wasn't trained as a statistician, and many of the packages I've looked at on that list either lack documentation or have more complicated statistics than I'm comfortable wiring together myself. I just need a simple function with "reasonable" defaults to get started.
Bonus points for something that leverages Rmpi or snow.
This gave me 132 possible relevant functions.
library(sos)
findFn("bayesian network")
How about this package.
http://cran.r-project.org/web/packages/MCMCpack/index.html
The closest thing to what I had in mind that I've found is the hc() function in the blearn package. They have a variety of other Bayesian network inference functions, as well, some of which can use snow.

Tutorial for R vectorised programming [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Can someone point me to a good tutorial for using vectorized programming methods in R.
At the moment it feels very magical to me and I don't really understand what R's doing.
Especially with regards to if statements and addressing values neighboring rows.
I am not aware of a specific tutorial on vectorized programming for R.
I have a few versions of my Intro to High-Performance Computing with R tutorial here. The benefit of vectorized code is mentioned in the context of profiling, but it doesn't explain 'how to vectorize code'. I think that is hard to teach -- my best bet would be to read other people's code. Pick a few packages from CRAN and poke around.
Other than that, decent general purpose documents about R and programming in R are e.g. Pat Burns S Poetry and the more recent R Inferno.
The best way to learn this is to experiment with it since it's an interactive environment, and it's easy to create dummy data.
With regards to making comparisons in neighboring rows, the easiest thing to do is to use the - operator (which means "exclude this index") to eliminate the first and last row, as in this example:
a <- 1:10
a[5] <- 0
a[-1] > a[-length(a)] # compare each row with the preceding value
If you want to do an if statement, you have two options:
1) The if command only evaluates one value, so you need to ensure that it evaluates to TRUE/FALSE (e.g. use the all or any functions):
if(all(a[-1] > a[-length(a)])) {
print("each row is incrementing")
} else {
print(paste("the",which(c(FALSE, a[-1] <= a[-length(a)])),"th row isn't incrementing"))
}
2) You can do a vectorized if statement with the ifelse function. See help("ifelse") for more details. Here's an example:
ifelse(a[-1] > a[-length(a)], 1, 0)

Resources