I was wondering how people generally connected to Oracle Databases in R. Currently I am using the odbc package and I was wondering if there was a faster alternative. I looked at ROracle, but it seems to involve downloading and using an older version of R (I am currently using R 4.0). Are odbc and ROracle the only options?
I believe odbc and ROracle are the two best packages for connecting to an Oracle database. Both are based on DBI and require the Oracle instant client to be installed on the system.
odbc is available as a binary on CRAN. As ROracle requires the Oracle instant client to build the package, the binary must be downloaded from Oracle or installed from source, which can be tricky. With both packages, I have experienced difficulties in initial setup.
As far as the user interface is concerned, ROracle and odbc are very similar but there are subtle differences. For example, ROracle does not have a dbBind function, instead passing a data.frame with bind data to dbSendQuery. There may also be minor differences when using dbplyr.
In the olden days, people used the RODBC and JDBC packages. These are still being maintained. In my experience though, these are considerably slower than ROracle or odbc. I would consider them to be legacy packages that should not be considered for new projects.
Related
I find that there's very little documentation on how to extract SAP tables into R.
I'm not talking about SAP HANA.
Currently, it's very troublesome that I need to manually extract SAP tables using a GUI interface, export them into tabular format. Then only I can import them using my R script.
The current solution I'm exploring is to have my SAP colleagues to export those SAP tables into SQL database, then I can query the tables from R.
Ideally I want to cut this seemingly unnecessary step of having the SAP tables exported into a database.
For SAP R/3 systems (or what you call ECC), your best bet would be executing remote function calls (i.e. RFC).
Normally these would be supported by open source interfaces for at least the more recent versions (e.g. 4.6 or above).
However, they are fairly scarce and I know only of one such implementation in R - this is the RSAP. You'd also need to download NW RFC SDK, and there may be further requirements based on your OS (e.g. what Visual C++ you'd need for Windows, etc.).
There's also a slightly more widely recognised equivalent in Python, the PyRFC.
On the other hand, you may try Robotic Process Automation (RPA) to interact with GUI in an automated way. One of the options is UiPath but there are others. This way you could configure the automation of table extraction - at the same time you can also call R scripts directly from the RPA.
Overall - to be honest - the solution with extracting tables into a separate database does seem to be the best alternative (compared to what I've described above).
Note: The above presumes that - for any reason, usually security - you cannot access the database underlying ECC directly through ODBC calls - otherwise the instructions for connecting and calling SQL from R are the same as for HANA or similar.
Consider using RODBC. This package allows adding different ODBC sources and use them in R Studio.
Follow this article and don't bug to word "HANA", this approach allows using any database, not only HANA.
There are some options to access R libraries in Spark:
directly using sparkr
using language bindings like rpy2 or rscala
using standalone service like opencpu
It looks like SparkR is quite limited, OpenCPU requires keeping additional service and bindings can have stability issue. Is there something else specific to Spark architecture which make using any solution not easy.
Do you have any experience with integrating R and Spark you can share?
The main language for the project seems like an important factor.
If pyspark is a good way to use Spark for you (meaning that you are accessing Spark from Python) accessing R through rpy2 should not make much difference from using any other Python library with a C-extension.
There exist reports of users doing so (although with occasional questions such as How can I partition pyspark RDDs holding R functions or Can I connect an external (R) process to each pyspark worker during setup)
If R is your main language, helping the SparkR authors with feedback or contributions where you feel there are limitation would be way to go.
If your main language is Scala, rscala should be your first try.
While the combo pyspark + rpy2 would seem the most "established" (as in "uses the oldest and probably most-tried codebase"), this does not necessarily mean that it is the best solution (and young packages can evolve quickly). I'd assess first what is the preferred language for the project and try options from there.
I have a client with salesforce enterprise edition. I need to connect to and extract the salesforce data using Base SAS (SAS/Access for ODBC is licensed).
How can this be achieved? Is it possible to map a libname using an ODBC engine, or is it necessary to use the web APIs?
Don't know about Salesforce support for ODBC, but certainly it is possible to map a libname using ODBC.
http://support.sas.com/documentation/onlinedoc/91pdf/sasdoc_91/access_odbc_7365.pdf
has examples - basically it is
libname <name> odbc <connection options>;
Salesforce specific SAS product:
http://support.sas.com/documentation/cdl/en/bidsag/61236/HTML/default/viewer.htm#a003279102.htm
That requires more than base SAS, of course, but if SAS BI is an option it seems pretty easy to configure.
Another option that is not free (well, it seems to be free for 15 days):
http://blogs.datadirect.com/2012/02/sas-access-to-salesforce-crm-for-superior-odbc-integration-with-sas.html
It doesn't seem like there is a free ODBC driver for Salesforce, though, unless something's changed - example, http://success.salesforce.com/ideaView?id=08730000000Bqqu seems to suggest it's something desired but not available.
So for free solution you may want to use the Web API...
Does anyone know of any way to connect to an OLEDB data source directly in R?
I've tried google, CRAN and rseek with no luck whatsoever.
A good alternative to both ODBC and OLEDB for saving data to SQL Server is to BCP using the rsqlserver package that is on GitHub here: https://github.com/agstudy/rsqlserver
You can pull down data via ODBC if you'd like which would be pretty fast, but sending data to SQL Server via ODBC will take a long time (in my tests), so BCP is a great option.
It's a little difficult to install (requires .NET and rtools), but once you get it going it's blazing fast.
Depending on versions and drivers, this may work:
http://cran.r-project.org/web/packages/RODBC/index.html
I seem to be unable to compile RPostgreSQL for Windows x64, and after extensive searching, I've not been able to find a precompiled binary. To get on with my work, I've installed a 32 bit version of Postgre and have been using 32 bit R for all database ops.
I need to do much of my work in 64 bit R, so switching back and forth has become a bit painful, especially since this requires a save() and load() operation each time I need to run a query.
I'm wondering whether it is possible to call one R installation directly from another? For example, could I simply pass queries to my 32 bit R installation and retrieve the result? I think there are other times when the ability to call another R installation would be useful as well.
All I've come up with is using a system() call, either directly to pgsql or to 32-bit R, but this doesn't allow for very efficient transfer of data.
I'd very sincerely appreciate any advice or assistance!
P.S. I'd rather ask how to compile RPostgreSQL for x64, but as I understand the rules here, such a question would be inappropriate since it's not a general question (e.g. I'd need step-by-step instructions since I don't have the requisite skills).
http://wiki.postgresql.org/wiki/64bit_Windows_port