I have a use case where I have revolution R on my desktop and want to invoke/talk to spark and spark sql deployed in a hadoop cluster(also have sparkR installed there).Any suggestions on how to proceed on this.I heard it can be done if spark is in stand alone mode.But I want with sparn in yarn mode.
You can checkout the sparkR package. This also helps in connecting to the Spark SQL as you had required
https://spark.apache.org/docs/1.5.1/sparkr.html
Related
I have a cluster having Cloudera CDH running. I need to connect my R programs (running on my Laptop) to the Spark running in cluster.
However If I try to connect the local R through Sparklyr, it is giving Error. As it is searching the Spark home on the laptop itself means i have to install spark on my laptop but i can't do it.
I googled and found we can install SparkR and use R with Spark. However, for using sparkR what i have to do:
install SparkR on all nodes of cluster?
How to configure to use SparkR?
I am new in R and I am working with a datasets that has more than 5 millions of observations. So I thought that it would be a good idea to use RStudio on a virtual machine instead of using it on my local machine.
I am reading the documentation about virtual machines and RServer but it is still not clear to me if I have to use Microsoft R Server to create a VIM and then just install Rstudio as I would do in my local machine or if I can create a generic VIM and then install RStudio. Which is the correct way? Why?
If both of these options are possible, which one is the best?
Please help me. Sorry for my confusion.
You can do either. If you are using Azure (which I think you are given that you mention Microsoft R Server), there is also the Data Science VM, which will come preinstalled with RStudio and many other useful programs.
R Server is more for production workloads with R, so unless you are planning that you could probably stick with the Data Science VM. If you end up choosing this option, you can connect directly to an RStudio instance on the R Server from the Azure portal.
Does anybody knows if it's possible to interface Hadoop with R / Rstudio ? If yes, HOW?
I have some hive's table and I'd like to accès them with R / Rstudio and within 'shiny' make a visual restitution (graphs etc...).
I would appreciate any help (ideas, code examples ...).
Try the package dplyr.hive.spark. The docs are still a bit more geared towards spark, but I tested it against Hive with the latest HDP sandbox and things were going smoothly. If you give it a try please report any problems.
If you just want to access hive tables on HDFS, you can use the RJDBC package and a JDBC connection (explained here: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-JDBC). Then you can use RJDBC just like you would for a relational database except that it might launch some map/reduce jobs on your cluster to execute.
Im going to install new Oracle BI enterprise edition along existing Oracle BI source edition. Is there any way i can install it on single machine?Can i use same BIPLATFORM schemas?
There isnt much to read about on oracle webpages. I see i can use software only to install binaries or install new oracle BI. But my question is. From tutorial I can see there is something like : Ensure you install New oracle Business intelligence to separate middleware home.
My middleware home is .../fmw/ where Oracle BI is but also weblogic and others. So should i install new instance of weblogic too, or just install it under ../fmw/ to new folder named f.e. : .../fmw/Oracle_BI2/
Im quite confused. Im working for the first time with business intelligence from oracle.
IF someone experienced can give me a hint it would be much appreciated.
MATT.
Yes, you can install multiple BI systems in the same machine. Install WebLogic, then Software Only install and then run Configuration Wizard twice. Choose the Auto-port selection both the time.
http://docs.oracle.com/cd/E21764_01/bi.1111/e10539/c2_scenarios.htm#CHDDIDGE
Vijay
I am trying to connect R to a Hadoop cluster using R. The cluster has HDFS, Map Reduce, Hive, Pig and Sqoop installed on it.
R will be running on in the Windows environment. I know that rhdfs, rhadoop and rmr exist for Linuix, but I can't find anything on Windows.
Does anyone know of a library to use?
Thank you
Revolution Analytrics is trying to make a name for themselves in this space. They have a couple of nice packages (some of which are open-source and/or free for non-commercial use) which allow you to interact with Hadoop from R in a Windows environment fluidly.