Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
So there is R for 64-bit Windows users now. I'd like to know if anyone has found incremental benefits in using R-64bit over the 32bit version on Windows.
I'm looking for more specific information
What was the system specification (6gb RAM for example) and the largest data-set that was crunched ?
Which algorithm performed faster ?
Any other experiential information that motivated you to adopt the 64bit version on Windows
If I had a $ for all the times non-R users cribbed about R's data limitation....!
I want to showcase R at my workplace and want some testimonials to prove that with a decently powerful machine, 64bit R on windows can crunch gigabyte class datasets.
I don't have anything quantitative, but it's been well worth the upgrade. 64-bit Windows (7) is far more reliable and you can simply run more large jobs at once. The main advantage in 64-bit R is the ability to create large objects that don't hit the 32-bit limit. I have 12Gb of RAM and have worked with objects ~ 8Gb in size, for simple tests. Usually I wouldn't have any R process using more than 1-2Gb but the overall performance is great.
I'd be happy to run examples if you want specifics.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Is there a way (perhaps a website) where one can run R online in a Linux environment?
motivation: I develop a few packages in R and oftentimes I need to run tests in Linux. However, I use a Windows OS and don't want to go through the hassle of learning Linux to install it locally.
A few suggestions:
Install docker to be able to have a 'virtual' Linux on your windows computer. That is essentially unlimited use on your own machine allowing you to learn and test.
You can also go to rstudio.cloud to run a few hours of R within RStudio (Cloud) per month for free. If you need more hours, you can purchase them. Possibly easiest immediate approach but with a usage cap.
Similarly Google Colab has an ability to run R in the notebooks, but it still somewhat hidden. One source with tips is this SO answer.
If you want to / can test in batch mode, then RHub is good. There is also a CRAN package rhub to interact with it. You need to create a token; this is documented.
Last but not least CI providers let you run on their systems. GitHub Actions is popular and supports many operating systems and variants. GitLab had something similar much earlier too. My r-ci setup aims to facilitate this without tieing you to a CI provider "forver". If you just want GitHub Actions, follow one of the many tutorials for it.
Both Rstudio cloud and rdrr.io/snippets use linux (according to Sys.info())
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Background: I want to build a simple distributed environment in R, which could do some "data massive" jobs in WINDOWS. For example, to calculate "big" matrix multiplication. There seem to have varies of solutions and I worked on them for a while, but I can't fix it.
I already tried these: Rserve & RSclient, packages such as snow, snowfall.
I tried several ways but I can't find a proper solution to transform data between clients, and it could be a disaster if all the data transform has to through the master.
Question: Is there any functions to deliver a matrix between every two computers as I want in a cluster?
I get an idea that maybe socket connection could work, but how can I start it gracefully? Should I have to start R script on different computers manually since there seems no SSH in the WINDOWS? I have to work on it because of my professor.
Wanted to know if it's good practice to do that? Thanks in advance.
You have the option to use would be using SparkR .
You will be compelled to use Spark APIs to distribute your data and there's a chance certain packages don't behave as expected but it would do the job.
A spark standalone cluster is made of a master accessible via HTTP and multiple workers. It's not the Ideal solution for resource sharing but it's lighter than a Hadoop + spark on yarn solution.
Finally you can try Dataiku as it can provide such ability via notebooks, spark integration and Dataset management . The community edition is not collaborative but they provide free license to schools
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am currently have a R query that can do the parallel processing within a loop using foreach. But it is done using a single server with 32 cores. Because of my data size, I am trying to find the r packages that can distribute the computing to different window servers and can work with foreach for paralleling.
Really appreciate for your help!
For several releases now, R has been shipping with a base library parallel. You could do much worse than starting to read its rather excellent (and still short) pdf vignette.
In a nutshell, you can just do something like
mclapply(1:nCores, someFunction())
and the function someFunction() will be run in parallel over nCores. A default value of half your physical cores may be a good start.
The Task View on High-Performance Computing has many more pointers.
SparkR is the answer. From "Announcing SparkR: R on Apache Spark":
SparkR, an R package initially developed at the AMPLab, provides an R frontend to Apache Spark and using Spark’s distributed computation engine allows us to run large scale data analysis from the R shell.
Also see SparkR (R on Spark).
To get started you need to set up a Spark cluster. This web page should help. The Spark documentation, without using Mesos or YARN as your cluster manager, is here. Once you have Spark set up, see Wendy Yu's tutorial on SparkR. She also shows how to integrate H20 with Spark which is referred to as 'Sparkling Water'.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I am currently developing a solution in R and I need to know the system requirements for R on a windows machine for documentation purposes.
It's a question beyond "would R run in my machine", since I need to know the exact specifics. I know for a fact that it already runs without any problem, but I need to document this requirements for the Administrator in the IT team.
Thank you so much for your collaboration!
From An Introduction to R (https://cran.r-project.org/doc/manuals/r-release/R-intro.html)
--max-mem-size=N
(Windows only) Specify a limit for the amount of memory to be used both for R objects and working areas. This is set by default to the smaller of the amount of physical RAM in the machine and for 32-bit R, 1.5Gb26, and must be between 32Mb and the maximum allowed on that version of Windows.
Note, this is specific to Windows machines. I haven't seen anything regarding other operating systems. I've never seen anything about processors or other hardware either.
as far as I can tell, if you have a computer with a processor and at least 32 Mb or RAM, it will run R (no guarantees on how well).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I learnt from the web that Revolution R allows multi-threading and optimize running of my R-scripts.
My question is: after installation of Revolution R, if I run my R-script under Revolution R environment, will it automatically optimize running of my R-script? Or I need to modify my R-script in order to allow Revolution R to optimize running of my R-script?
Thanks a lot.
I think your terminology may need some refinement. You may need to distinguish multi-processing from multi-threading. Revolution R does link to a multithreaded BLAS library for Windows that might otherwise not be available unless you compiled your version. Whether or not that will improve your performance is apparently somewhat dependent on what functions you use.
To use multi-processing in R, you will need set up your machine resources appropriately and then use code that distributes the parallizable tasks. Those seem to be the applications you are thinking about when you ask about modifying your scripts. Revo-R used to have advantages here over regular R, but for the last couple of versions, the 'parallel' package has been available to all useRs.
Revo R has multithreaded BLAS, this does not require a change in your scripts.
And GNU R, or Standard R, can of course also use multithreaded BLAS as detailed in Appendix A.3.1 of the R Installation and Administration manual.