I am trying to integrate Time series Model of R in Tableau and I am new to integration. Please help me in resolving below mentioned Error. Below is my code in tableau for integration with R. Calculation is Valid bur getting an error.
SCRIPT_REAL(
"library(forecast);
cln_count_ts <- ts(.arg1,frequency = 7);
arima.fit <- auto.arima(log10(cln_count_ts));
forecast_ts <- forecast(arima.fit, h =10);",
SUM([Count]))
Error : Error in auto.arima(log10(cln_count_ts)) : No suitable ARIMA model found
When Tableau calls R, Python, or another tool, it does so as a "table calc". That means it sends the external system one or more vectors as arguments and expects a single vector in response.
Depending on your data and calculation, you may want to send all your data to R in a single call, passing a very large vector, or call it several times with different vectors - say forecasting each region separately. Or even call R multiple times with many vectors of size one (aka scalars).
So with table calcs, you have other decisions to make beyond just choosing the function to invoke. Chiefly, you have to decide how to partition your data for analysis. And in some cases, you also need to determine the order that the data appears in the vectors you send to R - say if the order implies a time series.
The Tableau terms for specifying how to divide and order data for table calculations are "partitioning and addressing". See the section on that topic in the online help. You can change those settings by using the "Edit Table Calc" menu item.
Related
I connect Tableau to R and execute an R function for recommending products. When R ends, the return value is a string which will have all products details, like below:
ID|Existing_Prod|Recommended_Prod\nC001|NA|PROD008\nC002|PROD003|NA\nF003|NA|PROD_ABC\nF004|NA|PROD_ABC1\nC005|PROD_ABC2|NA\nC005|PRODABC3|PRODABC4
(Each line separated by \n indicating end of line)
On Tableau, I display the calculated field which is as below:
ID|Existing_Prod|Recommended_Prod
C001|NA|PROD008
C002|PROD003|NA
F003|NA|PROD_ABC
F004|NA|PROD_ABC1
C005|PROD_ABC2|NA
C005|PRODABC3|PRODABC4
Above data reaches Tableau through a calculated field as a single string which I want to split based on pipeline ('|'). Now, I need to split this into three columns, separated by the pipeline.
I used Split function on the calculated field :
SPLIT([R_Calculated_Field],'|',1)
SPLIT([R_Calculated_Field],'|',2)
SPLIT([R_Calculated_Field],'|',3)
But the error says "SPLIT function cannot be applied on Table calculations", which is self explanatory. Are there any alternatives to solve this ?? I googled to check for best practices to handle integration between R and Tableau and all I could find was simple kmeans clustering codes.
Make sure you understand how partitioning and addressing work for table calcs. Table calcs pass vectors of arguments to the R script, and receive a single vector in response. The cardinality of those vectors depends on the partitioning of the table calc. You can view that by editing the table calc, clicking specific dimensions. The fields that are not checked determine the partitioning - and thus the cardinality of the arguments you send and receive from R
This means it might be tricky to map your problem onto this infrastructure. Not necessarily impossible. It was designed to send a series of vector arguments with one cell per partitioning dimension, say, Manufacturer and get back one vector with one result per Manufacturer (or whatever combination of fields partition your data for the table calc). Sounds like you are expecting an arbitrary length list of recommendations. It shouldn’t be too hard to have your R script turn the string into a vector before returning, but the size of the vector has to make sense.
As an example of an approach that fits this model more easily, say you had a Tableau view that had one row per Product (and you had N products) - and some other aggregated measure fields in the view per Product. (In Tableau speak, the view’s level of detail is at the Product level.)
It would be straightforward to pass those measures as a series of argument vectors to R - each vector having N values, and then have R return a vector of reals of length N where the value returned at each location was a recommender score for the product at that position. (Which is why the ordering aka addressing of the vectors also matters)
Then you could filter out low scoring products from the view and visually distinguish highly recommended products.
So the first step to understanding R integration is to understand how table calcs operate with partitioning and addressing and to think in terms of vectors of fixed lengths passed in both directions.
If this model doesn’t support your use case well, you might be able to do something useful with URL actions or the JavaScript API.
I have a bunch of sales opportunities in various excel files- broken down by region, type, etc.- that are one column each and simply list the dollar amounts of each opportunity. In R I have run a simulation to determine the likelihood of each opportunity closing with a sale or not, and repeated the simulation 100,000 times. I know that I can't pass the full results table back to Tableau because it has 100,000 rows- one total for each simulation- and the data I'm pulling into Tableau would just have the $ value of each opportunity so would only have a length of the number of opportunities of that type.
What I have in R is basically this first block of code; repeated a number of times with varying inputs and changing probabilities; then ultimately combine the totals vectors to get a quarter total vector.
APN<-ncol(APACPipelineNew)
APNSales<-matrix(rbinom(APN, 1, 0.033), 100000, APN)
APNSales<-sweep(APNSales,2,APACPipelineNew,'*')
APNTotals<-rowSums(APNSales)
...
Q1APACN<-APNTotals+ABNTotals+AFNTotals
...
Q1Total<-Q1APACT+Q1EMEAT+Q1NAMT
What I'd like to do is set this up as a dashboard in Tableau so that it can automatically update each week, but I'm not sure how to pass the simulation back into Tableau given the difference in length of the data.
Some suggestions:
For R you can use a windows scheduler to run a job at any given interval (or use the package taskscheduleR).
After you save the R data you can manually update your dashboard if it is on a desktop version (I do not know if you can schedule an extract refresh with a desktop dashboard).
However, if your dashboard lives on a tableau server you can schedule an extract refresh every week. Obviously, I would schedule the r update before the tableau extract refresh.
If you only wanted the data to update if there was a differing number of rows from the previous weekly run you can build that logic into R. Although saving the r data and refreshing the extract with the same data and number of rows should not cause any problems.
I am trying to do sentiment analysis on a table that I have.
I want each row of string data to be passed to the R script, but the problem is that Tableau is accepting only aggregate data as params for:
SCRIPT_STR(
'output <- .arg1; output', [comments]
)
This gives me an error message:
# All fields must be aggregate or constant.
From the Tableau and R Integration documentation:
Given that the SCRIPT_*() functions work as table calculations, they
require aggregate measures or Tableau parameters to work properly.
Aggregate measures include MIN(), MAX(), ATTR(), SUM(), MEDIAN(), and
any table calculations or R measures. If you want to use a specific
non-aggregated dimension, it needs to be wrapped in an aggregate
function.
In your case you could do:
SCRIPT_STR(
'output <- .arg1; output', ATTR([comments])
)
ATTR() is a special Tableau aggregate that does the following:
IF MIN([Dimension]) = MAX([Dimension]) THEN
[Dimension] ELSE * (a special version of Null) END
It’s really useful when building visualizations and you’re not sure of the level of detail of data and what’s being sent
Note: It can be significantly slower than MIN() or MAX() in large data sets, so once you get confident your results are accurate then you can switch to one of the other functions for performance.
Try MIN([comments]) and make sure you have appropriate dimensions on your viz to partition the data fine enough to get a single comment for each combination of dimensions.
I have extensively read and re-read the Troubleshooting R Connections and Tableau and R Integration help documents, but as a new Tableau user they just aren't helping me.
I need to be able to calculate Kaplan-Meier survival probabilities across any dimensions that are dragged onto the sheet. Ideally, I would be able to retrieve this in a tabular format at multiple time points, but for now, I would be happy just to get it at a single time point.
My data in Tableau have columns for [event-boolean] and [time to event]. Let's say I also have columns for Gender and District.
Currently, I have a calculated field [surv] as:
SCRIPT_REAL('
library(survival);
fit <- summary(survfit(Surv(.arg2,.arg1) ~ 1), times=365);
fit$surv'
, min([event-boolean])
, min([time to event])
)
I have messed with Computed Using, Addressing, Partitions, Aggregate Measures, and parameters to the R function, but no combination I have tried has worked.
If [District] is in Columns, do I need to change my SCRIPT_REAL call or do I just need to change some other combination of levers?
I used Andrew's solution to solve this problem. Essentially,
- Turn off Aggregate Measures
- In the Measure Values shelf, select Compute Using > Cell
- In the calculated field, start with If FIRST() == 0 script_*() END
- Ctrl+drag the measure to the Filters shelf and use a Special > Non-null filter.
I am new to data mining and I am trying to figure out how to cluster cell tower IDs to find its location from the known location labels (Home, Work, Elsewhere, No signal).
I have a location driven dataset of user A that contains cellID (Unique ID of detected celltowers), starttime (date & time it detected particular tower), endtime (last date & time before it connected to different celltower), placenames(user labelled place names such as home, work). There are unlabelled locations in dataset as well that are left empty by the user and I want to label these celltowers using clustering approach so that they represent as one of location names.
I am using R programming and I tried to feed complete dataset to kmeans clustering but it's resulting me with warning message which I completely dont have a clue why?
*Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In kmeans(dataset, 4, 15) : NAs introduced by coercion*
Any suggestions on how can I use clustering approach for this problem? Thanks
Since you have all the labeled and unlabeled data available at the training stage, what you are looking for is "transductive learning", which is a little different from clustering (which is "unsupervised learning").
For each cell tower you collect the average starttime, endtime and cellID. You can get lat/lng from cellIDs here: https://mozilla-ichnaea.readthedocs.org/en/latest/api/search.html or http://locationapi.org/api (expensive).
This gives you 4-dimensional feature vectors for each tower, the goal is to assign a ternary labeling based on these continuous feautres:
[avg_starttime avg_endtime lat lng] home/work/other
I don't know about R, but in python basic transductive learning is available:
http://scikit-learn.org/stable/modules/label_propagation.html#label-propagation
If you don't get good results with label propagation, and since off-the-shelf transductive learning tools are rare, you might just want to ignore some of your data during training and use more standard methods. If you ignore the unlabeled data at the start you can have an "inductive" or "supervised" learning problem (solve with an SVM). If you ignore the labels at the start you can use unsupervised learning (eg "clustering"; use kmeans or DBSCAN) and then assign labels to the clusters after clustering is done.
I don't know how you got the NaN.