I am working on converting a SAS code into R and since I am relatively new to SAS I am having trouble understanding the following code snippet -
proc expand data=A out=B;
by number beg_date;
id date;
convert alpha1=calpha1/transformout=(+1 cuprod -1);
convert alpha2=calpha2/transformout=(+1 cuprod -1);
convert alpha3=calpha3/transformout=(+1 cuprod -1);
run;
I understand expand is used for expanding time series data like from monthly to quaterly or contract them. But what are the by and id statements for?
From referring to SAS Support, I believe that the BY statement is used to specify the variables so that the cumulative product is calculated for a group of that variable. As for the ID statement, I understand that it is a key to identify the observations.Can anyone tell me if my understanding is correct? Do I used the transform command in R for this purpose?
I don't have SAS license so I cannot try this out on a sample data and understand the output. Similarly, I don't have a raw data set to work on.
From your code snippet, it seems like this proc expand is going to create three variables calpha1, calpha2 and calpha3. cuprod is one of the options in proc expand that is going to output the cumulative product. So this is going to find the product of all alpha1, alpha2 and alpha3 within every beg_date group that was sorted like that in by statement. I believe that there should have been a proc sort before the proc expand for the use of the by statement.
Regarding the ID statement, it seems like original writer didn't want to use the default time settings of proc expand. Thus, by specifying date variable in id statement, the calculations would be based on the points in time given from date.
http://support.sas.com/documentation/cdl/en/etsug/63348/HTML/default/viewer.htm#etsug_expand_sect008.htm
Related
I work a lot with the new tables collect command in stata 17. Does anybody know how to get the confidence interval in one cell in the table vs. One column for lower bound and one column for the upper bound estimate?
Alternatively a quick fix in word (or excel though my final document is word. Saving the output in excel takes so long)
Is I see it there is no option to put it in one column, so maybe a layout work around?
From the stata documentation of the collect command, the quick start mentions
table (colname) (result), command(_r_b _r_ci: regress y x1 x2 x3). You should be able to use collect with it, but without a minimum reproducible example of your specific case, it is hard to verify if this works as intended in your case. For the general idea of a minimum reproducible example please see here and for specific advice on how to create a minimum reproducible example please see here.
Here is a general example that uses table, collect and putdocx to create a word document to get the confidence interval in one cell:
use https://www.stata-press.com/data/r17/nlsw88.dta
table (colname) (result), command(_r_b _r_ci: regress wage union occupation married age)
collect layout (colname) (result)
putdocx begin
putdocx collect
putdocx save Table, replace
In Enterprise Guide, I draw scatter plots with creation and closing date of issues to detect when backloggs occur and when they are resolved:
(The straight lines in the graph are batch interventions, like closing a set of issues that were handled outside ot the system.)
proc sgplot data=alert;
scatter x=create_Date y=CloseDate / group=CloseReason;
run;
When I try to do the same in SAS Visual Analytics, I can only put measures on the x-ax and y-ax and I cant make te date or datetime variable a measure.
Do I do something wrong? Should I use another graph type?
My take is that the inability of SAS VA Explorer to allow dates to be measures is a real weakness. Old school trickery would be perhaps to create a duplicate data item that computes the SAS data value (giving you a number result and thus a measure) and then formatting that with a custom format to render it back as a human readable date.
However, according to http://support.sas.com/kb/47/100.html#explorer
How SAS Visual Analytics Designer supports formats
In SAS Visual Analytics Designer, the Format property of the data item displays the name of the format for both numeric and character data items. However, there are some differences between numeric and character data items.
Numeric data items
You can change the format. If you change the format, you can restore the user-defined format by selecting Reset to Default in the Format type box.
You can specify to sort by formatted or unformatted values (release 6.2 and later).
(My bolds) Numeric data items with a user-defined format are classified as categories. You cannot change these data items to measures while the user-defined format is applied.
According to support.sas.com/documentation/cdl/en/vaug/68648/PDF/default/vaug.pdf , page 166, you could work on defining data roles for a scatter plot.
I am not sure that this could solve your situation but it says that:
"In addition to measures, you can assign a Group variable. The Group variable groups the data based on the values of the category data item that you assign. A separate set of scatter points is created for each value of the group variable.
You can add data items to the Data tips role. The values for the data items in the Data tips role are displayed in the data tips for the scatter plot".
Hope it helps.
I would like to know how to use table data as a parameter in Tableau's R integration.
Example
Tableau has a builtin data set called "Superstore" used for reproducible examples. Suppose I've used the Superstore data set to create a "text table" (i.e. spreadsheet) with Region as rows and SUM(Profit) as the data.
That looks like this:
Now, suppose I wanted to pass the data in this table to R for a calculation. I would start Rserve
library(Rserve)
Rserve()
and establish a connection with Tableau's UI.
Next I would want to create a calculated field to send the data to R and retrieve the results. I'm not sure how to do this.
My attempt looks like this:
SCRIPT_REAL('
output <- anRFunction(.arg1)
',
[someTableauMeasure])
which should be fine, except that I don't know how to represent the table data where it currently says someTableauMeasure. This is just an arbitrary example, but a reason I might want to do this is that I might provide the user with a filter, such as Country, so that they could filter the results at will and get an updated result from R.
For testing purposes that function anRFunction could be replaced with something like abs.
Tableau will pass the aggregated values to R, depending on the settings of your worksheet.
So in your case if you use:
SCRIPT_REAL('
output <- anRFunction(.arg1)
',
sum(Profit))
You will get the output according to the dimensions you have on your worksheet, in your case [Region] if you set up a filter by country, R will only receive and return the values for a certain country and if you choose to use [Category] instead, you will get the results of your R function broken down by category.
I am trying to do sentiment analysis on a table that I have.
I want each row of string data to be passed to the R script, but the problem is that Tableau is accepting only aggregate data as params for:
SCRIPT_STR(
'output <- .arg1; output', [comments]
)
This gives me an error message:
# All fields must be aggregate or constant.
From the Tableau and R Integration documentation:
Given that the SCRIPT_*() functions work as table calculations, they
require aggregate measures or Tableau parameters to work properly.
Aggregate measures include MIN(), MAX(), ATTR(), SUM(), MEDIAN(), and
any table calculations or R measures. If you want to use a specific
non-aggregated dimension, it needs to be wrapped in an aggregate
function.
In your case you could do:
SCRIPT_STR(
'output <- .arg1; output', ATTR([comments])
)
ATTR() is a special Tableau aggregate that does the following:
IF MIN([Dimension]) = MAX([Dimension]) THEN
[Dimension] ELSE * (a special version of Null) END
It’s really useful when building visualizations and you’re not sure of the level of detail of data and what’s being sent
Note: It can be significantly slower than MIN() or MAX() in large data sets, so once you get confident your results are accurate then you can switch to one of the other functions for performance.
Try MIN([comments]) and make sure you have appropriate dimensions on your viz to partition the data fine enough to get a single comment for each combination of dimensions.
I have extensively read and re-read the Troubleshooting R Connections and Tableau and R Integration help documents, but as a new Tableau user they just aren't helping me.
I need to be able to calculate Kaplan-Meier survival probabilities across any dimensions that are dragged onto the sheet. Ideally, I would be able to retrieve this in a tabular format at multiple time points, but for now, I would be happy just to get it at a single time point.
My data in Tableau have columns for [event-boolean] and [time to event]. Let's say I also have columns for Gender and District.
Currently, I have a calculated field [surv] as:
SCRIPT_REAL('
library(survival);
fit <- summary(survfit(Surv(.arg2,.arg1) ~ 1), times=365);
fit$surv'
, min([event-boolean])
, min([time to event])
)
I have messed with Computed Using, Addressing, Partitions, Aggregate Measures, and parameters to the R function, but no combination I have tried has worked.
If [District] is in Columns, do I need to change my SCRIPT_REAL call or do I just need to change some other combination of levers?
I used Andrew's solution to solve this problem. Essentially,
- Turn off Aggregate Measures
- In the Measure Values shelf, select Compute Using > Cell
- In the calculated field, start with If FIRST() == 0 script_*() END
- Ctrl+drag the measure to the Filters shelf and use a Special > Non-null filter.