Hive connector in R / Rstudio - r

Does anybody knows if it's possible to interface Hadoop with R / Rstudio ? If yes, HOW?
I have some hive's table and I'd like to accès them with R / Rstudio and within 'shiny' make a visual restitution (graphs etc...).
I would appreciate any help (ideas, code examples ...).

Try the package dplyr.hive.spark. The docs are still a bit more geared towards spark, but I tested it against Hive with the latest HDP sandbox and things were going smoothly. If you give it a try please report any problems.

If you just want to access hive tables on HDFS, you can use the RJDBC package and a JDBC connection (explained here: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC). Then you can use RJDBC just like you would for a relational database except that it might launch some map/reduce jobs on your cluster to execute.

Related

Can people without R installed run an R Notebook file successfully?

I have an R Notebook that I am building to provide an analysis for somebody, and I am wondering if I should choose another option as I don't know if she will be able to run the Notebook without having R installed.
Is it possible to run an R Notebook as a single entity or must you have R installed in order to do it?
To rerun the notebook they require R. But the whole point of R Notebooks is that they produce a static document as output. That document (usually in HTML format) can be shared in isolation, and does not require any additional software besides a web browser to be viewerd.
Notebook will need R to run. To distribute a notebook without the R dependency will be a bit more elaborate, like installing rstudio server within a docker container. User will, in this particular case, need to have Docker installed and know how to start a container. From there on the user can interact with the code through a web browser.
Another option would be to use the cloud solution that some companies offer. It offers sharing functionality and you don't have to worry about the infrastructure or distribution of your work. There are some free plans that may work for you, but the real power is in premium features.

Error connecting to mongoDB using Mongolite

I'm having issues connecting to my MongoDB via Mongolite, and I'm not sure if it is an issue on my side, or if I need to use a different package to connect to the database. Please keep in mind that I cannot change the software being run by the MongoDB server, and I am a novice when it comes to all of this, so it could just be a silly error on my part.
I've run the following code:
m <- mongo(collection = "test", url="mongodb://22.92.59.149:27017")
As far as I can tell from the Mongolite tutorial (https://jeroen.github.io/mongolite/), this is the correct syntax to connect to the database, but I'm not 100% sure. Regardless, I get the following error:
Error: Server at 22.92.59.149:27017 reports wire version 2,
but this version of libmongoc requires at least 3 (MongoDB 3.0)
From what I can tell, this means that mongolite won't work with my database. If that is the case, what other package should I try to use to connect, or if it is not the issue what am I doing wrong?
Thanks in advance!
As the message says, there is a mismatch between versions of the client and the server.
More precisely, mongolite relies on a more general driver written in C, libmongoc, and it seems the version automatically installed by the install.packages("mongolite") statement is too recent towards the server's version.
If you can't change anything server-side, maybe you could try to manually install an older version of libmongoc before installing mongolite, but I'm not confident about the compatibility with that R package afterwards.
Maybe you can use RMongo, an older and archived package to interact with Mongo in R, but I'm afraid what you're going to develop won't be stable in further R versions.
I'd rather recommend you to look at the problem server side.

Error: Unable to establish connection with R session

I am having trouble pulling down data sets from the web (https://api.fda.gov) FDA API. The connection is fine and it appears generate more than 4000 JSON records. The problem is, when it is done, R gives me the below error and I am no longer able to do anything in RStudio - I basically have to force quit. I've seen references to similar issues, but haven't been able to find a solution.
Error: Unable to establish connection with R session
I assume that you are using RStudio? There is a problem with RStudio and pulling large data sets. See here: Wang's Tech Blog and here: RStudio Support.
I suggest just using R in the terminal when you run scripts that pull large data-sets.
I was able to solve all problems by deactivating/deinstalling the antivirus software (Avast)
I also ran into this error. Uninstalling avast did not work for me. I uninstalled r and r studio and reinstalled r and r studio and this solved my problem.

Can my R code talk to spark and spark sql

I have a use case where I have revolution R on my desktop and want to invoke/talk to spark and spark sql deployed in a hadoop cluster(also have sparkR installed there).Any suggestions on how to proceed on this.I heard it can be done if spark is in stand alone mode.But I want with sparn in yarn mode.
You can checkout the sparkR package. This also helps in connecting to the Spark SQL as you had required
https://spark.apache.org/docs/1.5.1/sparkr.html

Connect R with OrientDB

Are there any R packages for connecting R to OrientDB?
For instance (maybe), something similar to the packages for MongoDB (RMongo, rmongodb).
I searched a little on the web, but couldn't find anything.
Thank you!
Look for Retrography https://github.com/retrography/OrientR . Right now it only supports a query function.

Resources