We currently have a database that can only be accessed via a Tableau front end.
Is it possible with R (or any other means) to be able to extract the underlying data table into csv or txt? I don't really care what it ends up as so long as it is structured and can read into a data frame.
I can see from some looking around that I can use R scripting within Tableau but I cannot see a method to go the other way and pull data out of the Tableau Worksheet.
Any help appreciated!
Related
This question already has an answer here:
Requirements for converting Spark dataframe to Pandas/R dataframe
(1 answer)
Closed 4 years ago.
I use R on Zeppelin at work to develop machine learning models. I extract the data from Hive tables using %sparkr, sql(Constring, 'select * from table') and in default it generates a spark data frame with 94 Million records.
However, I cannot perform all R data munging tasks on this Spark df, so I try to convert it to an R data frame using Collect(), as.data.frame() but I run into memory node/ time-out issues.
I was wondering if stack overflow community is aware of any other way to convert a Spark df to R df by avoiding time-out issues?
Did you try to cache your spark dataframe first? If you cache your data first, it may help speed up the collect as the data is already in RAM...that could get rid of the timeout problem. At the same time, this would only increase your RAM requirements. I too have seen those timeout issues when you are trying to serialize or deserialize certain data types, or just large amounts of data between R and Spark. Serialization and deserialization for large data sets is far from a "bullet proof" operation with R and Spark. Moreover, 94M records may just be too much for your driver node to handle in the first place, especially if there is a lot of dimensionality to your dataset.
One workaround I've used, but am not proud of is to use spark to write out the dataframe as a CSV and then have R read that CSV file back in on the next line of the script. Oddly enough, in a few of the cases I did this, the write a file and read the file method actually ended up being faster than a simple collect operation. A lot faster.
Word of advice- make sure to watch out for partitioning when writing out csv files with spark. You'll get a bunch of csv files and have to do some sort of tmp<- lapply(list_of_csv_files_from_spark, function(x){read.csv(x)}) operation to read in each csv file individually and then maybe a df<- do.call("rbind", tmp) ...it would probably be best to use fread to read in the csvs in place of read.csv as well.
Perhaps the better question is, what other data munging tasks are you unable to do in Spark that you need R for?
Good luck. I hope this was helpful. -nate
I want to read in to R an EXCEL tab that has the following content configuration into a more Tidy data format. The next picture shows how I want the content to look like once all of the code has run. The table below just represents file that will come in daily. Each day, the numbers and the date will change, but the format will be exactly the same. So I need to learn how to automate the extraction of the pieces of this format into R from EXCEL. The end goal is to stack the daily data into a format that can be exported to Tableau.
The image above (figure 2) represents the final format I want to arrive at. I know there are several packages to read in EXCEL data into R, however I cannot figure out how I can possibly automate this tasks with readr or readxl etc... I am at least hoping some one has faced this type of problem before and can give me general directions.
I am trying to read in data from a data visualization website (Qlik). Table access requires my username and pw. It also requires me to select a date range for the data. Is there a way I can read this kind of data into R instead of having to download the data each time into excel files and then reading them into R?
Thanks!
I have a tableau extract.
I want to analyze the data in that tableau extract using R.
Does any one know how to load tableau extract into R?
A tableau data extract (.tde file) is a native tableau format. As far as I know you cannot read it directly without using tableau. So, you need to open it in tableau and then export the data if you want to take it out of tableau.
If you want all the data you can do something like this:
From here you can then paste into excel and save it as a .csv file which R can hit.
However, if you are happy to have the data in a .tde format you can always create R calculations in tableau. This is done by using rserver to communicated between R and tableau. See: https://www.tableau.com/sites/default/files/media/whitepaper-power-tableau-and-r.pdf
I am new to knitr and would appreciate if someone could help me with a pointer on this.
Most examples I see using knitr/sweave create a data frame inside a chunk and then refers to that in the subsequent processing. However, what if I have massaged the raw into a data frame, and then want to use that. How do I do that?
I have tried saving the data frame as an R object, and then loading it inside a chunk, but I am not sure if this is the best way.