R integration with Tableau - r

I am facing difficulty in integrating R with Tableau.
Initially when I created calculated field it was asking for Rserve package in R and was not alowing to drag field to worksheet. I have installed this package but still it shows error saying
"Error occurred while communicating with the Resrve service.Tableau i unable to connect to the service.Verify that server is running and that you have access privileges"
Any inputs. Thank you

You need to start Rserve. If you successfully install Rserve package, simply run this (on RGui, RStudio or wherever you run R scripts)
> library(Rserve)
> Rserve()
You can test your connection to RServe on Tableau, on Help, Settings and Performance, Manage R Connection.

As of Tableau 9, you can use *.rdata files with Tableau. Tableau 9 will read the first item stored in the *.rdata file. Just open an *.rdata file under "Statistical Files" in the Tableau intro screen.
To do this do:
save(myDataframe, "Myfile.rdata")
This will save the file with the dataframe stored in it. You can save as many items as you want, but Tableau will only read the first. It can read vectors and variables as well if they are in the first item. Note that rdata files compress data quite a bit. I recently compressed 900mb to 25mb. However Tableau will still need to decompress it to use it so be careful about memory.

Related

How to change working directory/read from local csv in an R script used as a data source in Power BI?

I am trying to use an R script as a data source for Power BI. I am a regular user of R but am new to Power BI. When all the datasets that are imported by the R script are from SQL databases I can import the resulting dataframes from the R script fine, however I have a script that uses a .csv file that Power BI's R session can't find which results in the error:
Error: 'times_of_day_grid.csv' does not exist in current working directory ('C:/Users/MyUserName/RScriptWrapper_ac2d4ec7-a4f6-4977-8713-10494f4b0c4f').
The .pbix file and the R script are both stored in the same folder as the csv
I have tried manually setting the wd by inserting into the script
setwd("C:/Users/MyUserName/Documents/R/Projects/This Project Folder")
But this just results in the message
"Connecting - Please wait while we establish a connection to R"
And later if I leave it running:
Unable to connect
We encountered an error while trying to connect.
Details: "ADO.NET: R execution timeout. The script execution was
terminated, since it was running for more than 1800000 miliseconds."
I have also tried specifying the full addresses of the csv files in read_csv(), but this results in the same timeout warning.
Any ideas as to how I can edit my script (or the settings in Power BI) to get around this? (The script only takes a minute or so to run in RStudio.)
Don't forget that you can load your csv file using the built-in functionalities in PowerBI Get Data > Text/CSV and then go to Edit Queries and handle the R scripting from there. That way you won't have to worry about setting the working directory in the R script at all.
You can even load multiple files and work on each and everyone of them using the approach described in Operations on multiple tables / datasets with Edit Queries and R in Power BI
Please let me know how this works out for you-

Is it possible to download software using R?

I am writing a user-friendly function to import Access tables using R. I have found that most of the steps will have to be done outside of R, but I want to keep most of this within the script if possible. The first step is to download a Database driver from microsoft: https://www.microsoft.com/en-US/download/details.aspx?id=13255.
I am wondering if it is possible to download software from inside R, and what function/package I can use? I have looked into download.file but this seems to be for downloading information files rather than software.
Edit: I have tried
install_url(https://download.microsoft.com/download/2/4/3/24375141-E08D-
4803-AB0E-10F2E3A07AAA/AccessDatabaseEngine_X64.exe)
But I get an error:
Downloading package from url: https://download.microsoft.com/download/2/4/3/24375141-E08D-4803-AB0E-10F2E3A07AAA/AccessDatabaseEngine_X64.exe
Installation failed: Don't know how to decompress files with extension exe

Execute R script from SSIS Package

I wanted to execute R code from SSIS package. How can I add a data control step that executes R-code? SSIS supports only vb.net and asp.net.
SSIS has many data transformations available but R is very friendly when it comes to data manipulations.
I want to run a R-code from SSIS scripts or some other way.Basically, I'm trying to integrate R in ETL process.
I wanted to extract data(E) from from a CSV file.
Transform (T) it in R and load (L) it in Microsoft database.
Is it possible to get this workflow done in SSIS package by executing R-script using SSIS data control items? Thanks!
Here are a couple of ways you could integrate R into your ETL process.
Crude, fast and dirty - Execute Process Task in the Control Flow. This would be similar to calling RScript from the command line. You would likely make your transformation, save it to a file on disk, and get that filename from your Execute Process Task so you can feed it into a Data Flow task. Upside is you're keeping your R clean and separate from your C#/VB.
Integrated via Rdotnet - You could use the RDotNet library (I believe, haven't tried to integrate it). You would need to register the DLLs in the GAC, and then you can either work with .NET objects in your SSIS scripts or call R scripts directly.
Integrated in SQL Server 2016 - Microsoft has added R support via extended stored procedures. You call the R script via stored proc and use a sql query for input data and can store the output. See more detail here. This would mean utilizing an Execute SQL task in SSIS.
I hope it helps you or someone else, since you want data processing you might bring your dataset into a CSV file (throught a data flow task), execute the file using: "Rscript " (it might be executed as a command with the execute process task), inside the file you have to upload the dataset into a dataframe ( calling it with readLines() function), then do all the math/Calculation you request, write the data or calculation results into a CSV file an reading again it from SSIS.
It is not an elegant solution, but it works :), At least till microsoft integrates R as a control/data flow process.
CYA
PS. here you go how to execute files from the command line: Run R script from command line

Amazon EMR: Using R code in Amazon EMR

I have a very beginner question. I've just been reading through some of the documentation regarding Amazon's EMR. Before I sign up etc. I just wanted to ask about using R in it.
I have one R module that calls several other modules, and then, just before it finishes running, saves several variables as .txt files.
My rather basic question is, can I do this in Amazon's EMR? And will I be able to access the .txt output files? Finally, my R script reads in some data from Excel spreadsheets. Will it still be able to do this from the EMR if I upload the Excel files into the system?
Thanks
Mike
#Mike, Answers to your 3 questions below
Running R on EMR: Yes you can.
You can run R programs on EMR once you have installed R on the EMR instance. I assume that you would write MapReduce moules if you plan to use multi-instance cluster. If you program is just about a "plain" R program then you may have to just use one sizable instance. I would rather use an EC2 instance with R AMI (look for Louis Aslett).
Moving output files:
Yes you can. It is possible to transfer your program output from EMR to S3 storage bucket of your choice. You will have to add a step calling S3DistCp command to move the files. An example from my project -
--jar /home/hadoop/lib/emr-s3distcp-1.0.jar --args '--src,hdfs:///contents,--dest,s3://<bucket-name>/'
Reading spreadsheets: AFAIK, If you are able to do this on local installation of R, then you should also be able to do it on EMR. You have to ensure that the necessary packages/libraries are installed during the bootstrap process.
I am able to install squeezy-cran and rmr2 on an EMR instance with all their dependencies (RCpp, reshap2, digest, RJSONIO, functional etc.). I am still unable to call the R program as a step. I am having to use SSH session and run R CMD commands on the shell prompt. Being on Windows, putty.exe works for me.

R dataset connection to tableau

Recently tableau gave the functionality of R connection in their release 8.1. I want to know if there is any way i can call an entire table created in R to tableau. Or an .rds object which contains the dataset into Tableau?
There is a tutorial on the Tableau website for this and a blog on r-bloggers which discuss. The tutorial has a number of comments and one of them (in early Dec I think) asks how to get an rds file in. You need to start Rserve and then execute a script on it to get your data.
Sorry I can't be more help as I only looked into it briefly and put it on the back-burner but if you get stuck they seem to come back quickly if you post a comment on the page:
http://www.tableausoftware.com/about/blog/2013/10/tableau-81-and-r-25327
Just pointing out that the Tableau Data Extract API might be useful here, even if the current version of R integration doesn't yet meet your needs. (Note, that link is to the version 8.1 docs released in late 2013 - so look for the latest version to see what functionality they've added since)
If what you want to do is to manipulate data in R and then send a table of data to Tableau for visualization, you could first try the simple step of exporting the data from R as a CSV file and then visualizing that data in Tableau. I know that's not sexy, but its always good to make sure you've got a way to get the output result you need before investing time in optimizing the process.
If that gets the effect you want, but you just want to automate more of the steps, then take a look at the Tableau Data Extract API. You could use that library to generate a Tableau Data Extract instead of a CSV file. If you have something in production that needs updates, then you could presumably create a python script or JVM program to read your RDS file periodically and generate a revised extract.
Let us assume your data.frame/ tibble etc (say dataset object) is ready in R/ RStudio and you want to connect it with Tableau
1. In RStudio (or R terminal), execute the following steps:
install.packages("Rserve")
library(Rserve)
Rserve() ##This gets the R connection service up and running
2. Now go to Tableau (I am using 10.3.2):
Help > Settings and Performances > Manage External Service Connection
Enter localhost in the Server field and click on Test Connection.
You have now established a connection between R and Tableau.
3. Come back to RStudio. Now we need a .rdatafile that will consist of our R object(s). In this case, dataset. This is the R object that we want to use in Tableau. Enter this in the R console:
save(dataset, file="objectName.rdata")
4. Switch to Tableau now.
Connect To a File > Statistical File
Go to your working directory where the newly created objectName.rdata resides. From the drop down list of file type, select R files (*.rdata, *.rda) and select your object. This will open the object you created in R in Tableau. Alternatively, you can drag and drop your object directly to Tableau's workspace.

Resources