I am working with large datasets that are available in *.mdb (i.e access database) format. I am using RODBC R package to extract data from access database. I figured out that I have 32 bit office installed on my machine. Since, I have 32 bit office installed, it seems I can use only 32 bit R in order to connect to the access database using RODBC. After I read the data using 32 bit R, then doing some exploratory analysis (plotting data, summary / regression), I got the memory issues which I didn't get while using 64-bit R.
Currently, I am using Rstudio to run all my code and I could change the version of R that I use from Options >> Global Options >> R version:
However, I don't want to switch to 32-bit while reading access database using RODBC and then go back to R-studio to revert back to 64-bit for analysis. Is there an automatic solution which allows me to specify 32-bit or 64-bit ? Can we do that using batch file ? If anyone could shed some light that would be great.
Write your code that extracts the data as one R script. Have that script save the output data that you need for your analysis to an .RData file.
Write the code that you run your analyses in, to be run in 64-bit R. Using the answer found here, run your code using the 32-bit R. Then, the next line can be reading the data in from the .RData file. If needed to allow things to load, use Sys.sleep to have your first program wait a few seconds for the load to complete.
Related
I've created a program that runs in R that I plan on distributing among a lot of other people. Currently the R script is ran completely automatically and behind the scenes with one .sh script which is exactly how it is intended to be. I'm trying to make it so theres no need for client intervention. The R script itself loads the packages and installs them if they aren't present which takes away the task of them installing the packages themselves.
Is there a way I can provide a folder within my Application's folder that they already download that contains R-script and its dependencies so the code can use that location of Rscript to compile and run the R-program I have created. The goal is to be able to download it and run without the need of internet connection to download R and maybe even the programs required packages if possible.
Any help or ideas is appreciated.
I assume that process you want called "creating binary package". Binary is programs (like EXE files) which can run directly on target CPU without any interpreter software (like Python interpreter for python scripts, or Java VM for java applications). I'm not so familiar with packaging of R programs but I found some materials regarding this issue:
1 - Building binary package R
2 - https://seandavi.github.io/post/build-linux-r-binary-packages/
3 - https://support.rstudio.com/hc/en-us/articles/200486508-Building-Testing-and-Distributing-Packages
Second link assumes Linux as target system. Opposite to interpreted languages, binary files often OS dependent (Linux, Windows, or Mac). I, personally, don't know how compatible are packages between Linux systems with different library sets.
Please comment if you find some information misleading, I'll correct the answer.
I want to follow the advice I've read and heard to have both a main library in R_HOME/library and a user library. I'm using W10 on a desktop machine (not important, except that it gives me a name by which to refer to it), and I can't make R use the user library.
I have succeeded in doing that on a W10 laptop: C:/R/R-4.0.2/library contains some 30 recommended packages, and C:/Users/[username]/Documents/R/win-library/4.0 con contains a much larger number of packages in my user library.
As I recall, and as I wrote down when I did an upgrade on a server, all you have to do to create a site-library is to create a directory called C:/R/R-4.0.2/site-library, and R will use that the next time it starts.
To create a user library, create the directory C:/Users/[username]/Documents/R/win-library/4.0.
That seemed to work on my laptop, for I have seemingly a working R library and a user library there.
That seemed to work on the server, too: I have a library and a site-library.
In both cases, .libPaths() shows the same libraries that I see with Dired on the disk.
I tried to do the same thing on the desktop machine, and i can't make it work.
I created a directory C:/Users/[username]/Documents/R/win-library/4.0, restarted R, and ran .libPaths(); the only directory that was listed was C:/R/R-4.0.2/library.
Because I thought the Documents in that path seemed odd, I tried it again using C:/Users/[username]/R/win-library/4.0, still with no success.
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#Managing-libraries seems pertinent, but I'm not sure how to interpret the output of Sys.getenv("R_LIBL_USER). I get "\\[toplevel]\[nextlevel]\Home$\[username]/R/win-library/4.0", which I presume is a long-winded way to get to /Home$/[username]/R/win-library/4.0 (aka C:/Users/[username]/R/win-library/4.0.
Suggestions? I've tried a number of other suggestions from SO, all to no avail.
I have two different versions of R installed, one which is up to date and which I use for all my regular R coding (needs to be up to date so that I can use various updated and new packages) and one which I use to access OLAP cubes (needs to be the R Client from Microsoft, because this is the only one which supports the olapR package, and which currently uses R version 3.4.3).
Since, in theory, I only have to access the OLAP cube once a month, I "outsourced" this task to a different RStudio project, in which I download and save the required data for all other projects. Hence, all other projects never require the olapR package to be installed and can and will be run in the up to date R version.
Now, ideally I would like to link my R version to my projects, so that I do not have to change my global R version and restart RStudio every time I access the OLAP cube or work on this data retrieval project (and then switch it back). However, I could not find any options in RStudio to achieve this result.
There are a few threads out there describing the same problem, but with no satisfactory answer in my opinion:
https://support.rstudio.com/hc/en-us/community/posts/200657296-Link-Project-and-R-Version
Rstudio project using different version of R
I also tried looking for a different package than olapR but with similar functionality, but could not find anything except X4R, which seems outdated and does not work for me (https://github.com/overcoil/X4R). Sadly, I am also unable to directly access the databases which the OLAP cube uses for its results, so I cannot go "around" it.
I am happy for any help or suggestions you can offer, whether it is a general workaround to link a project to a specific R version or the (less helpful for the community) solution of accessing the OLAP cube in a different way.
Thanks in advance!
Using the answer from MrGumble I created a .bat file that will execute my .R file using the desired R installation. Even though it is not the answer I thought I would get, I think it is an even better solution to the problem.
For all facing a similar issue, here is the .bat file (never created one before, so also had to google how to do it and I guess some might be in the same position):
#echo off
title Getting data for further processing in R
echo Retrieving OLAP data
echo.
"C:\Program Files\Microsoft\R Client\R_SERVER\bin\Rscript.exe" "C:\Users\me\Documents\Projects\!Data\script.R"
echo.
echo Saved data
echo.
pause
Thanks again to MrGumble for his help.
Skip RStudio.
RStudio is really just an editor (albeit powerful and useful) editor, which starts an R console for you (and the surrounding PATH variables, library locations, etc.).
If your monthly task only requires you to run the R-script (or a bit of interactive work), you can simply execute your preferred version of R from the command line and have it run your R script. E.g.
C:\Users\me>"C:\Program Files (x64)\Microsoft R\bin\Rscript" myscript.R
You might have to define some PATH variables so that the older R doesn't look for packages in the newer R's libraries, but that depends entirely on your current setup.
I have an R script which I program on my laptop. After I am done, I FTP the R script up to my university cluster and run my code there (in parallel if needed). Most of my functions return data frames that I'd like to plot using ggplot. This is perfectly fine however I'd like to use tikzDevice to create tikz (latex code) for my plot to have the same font and style as my thesis.
The problem:
I can't run tikzdevice on the university cluster because of the lack of LaTeX packages. I also can't install them due to no sudo access. Essentially, this route is a dead end for me.
Solution:
I can run tikzDevice on my own laptop. Since I am working on my latex document(thesis) on my laptop, its a seamless \include.
The problem is that the data (as dataframes) exist on the university cluster. I COULD save dataframes as textfiles, download them onto my laptop, and read.table them but this is gonna kill my productivity.
Are there any pacakges, tools, software, anything that will let me "extract" my data from the university server?
A possible solution is https://gist.github.com/SachaEpskamp/5796467
but I have no idea how to use this.
Note: I also don't know which part of the SE network this could go on.
I've found a workaround solution to this.
To those who are looking to transfer data back and forth from server/client, you can send and receive objects by serializing it.
On the server, you use the saveRDS command, and on the client you have the readRDS command. To provide a URL to readRDS, you must use gzcon, so like the following:
con = gzcon(url("http://path.com/to/your/object/serialized"))
a = readRDS(file = con)
Obviously this depends on some protocol installed on the server (like http)
RStudio server uses a headless R session and seems to pass all of the I/O operations encoded to save bandwidth. This works for everything except for packages like Rattle or Latticist, which work through their own GUI. Is there a way to use these packages through RStudio server or otherwise access the RStudio server R session to run these packages remotely?
Bonus if there's an efficient way to run these packages remotely without forwarding an X session over SSH.
I'm not sure this is possible over the RStudio interface because of the way these graphical programs work. It's easy enough for RStudio to capture textual input and output for R. Capturing normal graphical output is pretty impressive, but that's done "natively" in R. Even packages like ggplot2 and lattice use the builtin R plotting capabilities -- they do some rendering and data processing on their own, pass that onto grid and then grid renders the plots via R builtins when plot() or print is called (including implicitly in the REPL for interactive sessions). RCommander, RGL and the like use external libraries (Tcl/Tk, OpenGL), which render their interfaces directly over operating system services and not via R. R doesn't even see the output from these programs -- it only knows that the R wrapper function for these services hasn't returned yet. For local RStudio, this isn't a problem because the services are forwarded directly to the local display, but for RStudio server, there is no display!
Another consideration: assuming R could capture and forward X, that would imply having an X Server (in X, Server is the display/keyboard/etc, Client is the program that needs I/O) running in your browser. Modern JavaScript is pretty amazing at times, but X is a very complicated codebase and very sensitive to latency. Running X over the Internet is much slower than over the local network -- the protocol just wasn't designed for such things and most operations involve far too many roundtrips.
On a more practical side, you can still do most of your work via RStudio and only do the graphical commands via X forwarding:
Do everything that doesn't involve an external graphics interface.
Save your R Session (in the Environment tab or via the command line) as .RData in your project directory. (You can actually do this elsewhere, but it's generally more convenient if your workspace is saved in the working directory.)
Login in via SSH and X Forwarding and cd to the project directory.
Start R -- R will automatically load any existing workspaces saved as .RData. (You can disable this behavior with --vanilla. Depending on the size of your workspace, R may take a few seconds to a few minutes to load.
Have fun with Rattle, Latticist, RCommander, RGL, etc! Be ready for massive lag if you're doing this over the Internet and not the local network (see above).