I have one metric and one dimension
I need to create a calculated field like SQL Query:
sum(new_users) where event_action= "manual_widget_click" / Total_new_Users
I can not understand the logic of formulas in Data Studio.
Not here standard functions like 'sumif' or 'where'....
only 'Case When'. But I can not put together a formula...
It can be achieved by breaking down the formula into 2 Calculated Fields.
First :
CASE
WHEN event_action= "manual_widget_click" THEN new_users
ELSE 0
END
Then :
SUM(event_action= "manual_widget_click" THEN new_users) / Total_new_Users
Related
I’m looking for a simple expression that puts a ‘1’ in column E if ‘SomeContent’ is contained in column D. I’m doing this in Azure ML Workbench through their Add Column (script) function. Here’s some examples they give.
row.ColumnA + row.ColumnB is the same as row["ColumnA"] + row["ColumnB"]
1 if row.ColumnA < 4 else 2
datetime.datetime.now()
float(row.ColumnA) / float(row.ColumnB - 1)
'Bad' if pd.isnull(row.ColumnA) else 'Good'
Any ideas on a 1 line script I could use for this? Thanks
Without really knowing what you want to look for in column 'D', I still think you can find all the information you need in the examples they give.
The script is being wrapped by a function that collects the value you calculate/provide and puts it in the new column. This assignment happens for each row individually. The value could be a static value, an arbitrary calculation, or it could be dependent on the values in the other columns for the specific row.
In the "Hint" section, you can see two different ways of obtaining the values from the other rows:
The current row is referenced using 'row' and then a column qualifier, for example row.colname or row['colname'].
In your case, you obtain the value for column 'D' either by row.D or row['D']
After that, all you need to do is come up with the specific logic for ensuring if 'SomeContent' is contained in column 'D' for that specific row. In your case, the '1 line script' would look something like this:
1 if [logic ensuring 'SomeContent' is contained in row.D] else 0
If you need help with the logic, you need to provide more specific examples.
You can read more in the Azure Machine Learning Documentation:
Sample of custom column transforms (Python)
Data Preparations Python extensions
Hope this helps
In IBM Cognos Report Studio
I have a data structure like so, plain dump of the customer details:
Account|Type|Value
123-123| 19 |2000
123-123| 20 |2000
123-123| 21 |3000
If I remove the Type from my report I get:
Account|Value
123-123|2000
123-123|3000
It seems to have treated the two rows with an amount '2000' as some kind of duplicated amount and removed it from my report.
My assumption was that Cognos will aggregate the data automatically?
Account|Value
123-123|8000
I am lost on what it is doing. Any pointers? If it is not grouping it, I would at least expect 3 rows still
Account|Value
123-123|2000
123-123|2000
123-123|3000
In any case I would like to end up with 1 line. The behaviour I'm getting is something I can't figure out. Thanks for any help.
Gemmo
The 'Auto-group & Summarize' feature is the default on new queries. This will find all unique combinations of attributes and roll up all measures to these unique combinations.
There are three ways to disable auto-group & summarize behavior:
Explicitly turn it off at the query level
Include a grain-level unique column, e.g. a key, in the query
Not include any measures in the query
My guess is that your problem is #3. The [Value] column in your example has to have its 'Aggregate Function' set to an aggregate function or 'Automatic' for the auto-group behavior to work. It's possible that column's 'Aggregate Function' property is set to 'None'. This is the standard setting for an attribute value and would prevent the roll up from occurring.
I have extensively read and re-read the Troubleshooting R Connections and Tableau and R Integration help documents, but as a new Tableau user they just aren't helping me.
I need to be able to calculate Kaplan-Meier survival probabilities across any dimensions that are dragged onto the sheet. Ideally, I would be able to retrieve this in a tabular format at multiple time points, but for now, I would be happy just to get it at a single time point.
My data in Tableau have columns for [event-boolean] and [time to event]. Let's say I also have columns for Gender and District.
Currently, I have a calculated field [surv] as:
SCRIPT_REAL('
library(survival);
fit <- summary(survfit(Surv(.arg2,.arg1) ~ 1), times=365);
fit$surv'
, min([event-boolean])
, min([time to event])
)
I have messed with Computed Using, Addressing, Partitions, Aggregate Measures, and parameters to the R function, but no combination I have tried has worked.
If [District] is in Columns, do I need to change my SCRIPT_REAL call or do I just need to change some other combination of levers?
I used Andrew's solution to solve this problem. Essentially,
- Turn off Aggregate Measures
- In the Measure Values shelf, select Compute Using > Cell
- In the calculated field, start with If FIRST() == 0 script_*() END
- Ctrl+drag the measure to the Filters shelf and use a Special > Non-null filter.
I'm looking to be able to perform the equivalent of a count if on a data set similar to the below. I found something similar here, but I'm not sure how to translate it into Enterprise Guide. I would like to create several new columns that count how many date occurrences there are for each primary key by year, so for example:
PrimKey Date
1 5/4/2014
2 3/1/2013
1 10/1/2014
3 9/10/2014
To be this:
PrimKey 2014 2013
1 2 0
2 0 1
3 1 0
I was hoping to use the advanced expression for calculated fields option in query builder, but if there is another better way I am completely open.
Here is what I tried (and failed):
CASE
WHEN Date(t1.DATE) BETWEEN Date(1/1/2014) and Date(12/31/2014)
THEN (COUNT(t1.DATE))
END
But that ended up just counting the total date occurrences without regard to my between statement.
Assuming you're using Query Builder you can use something like the following:
I don't think you need the CASE statement, instead use the YEAR() function to calculate the year and test if it's equal to 2014/2013. The test for equality will return a 1/0 which can be summed to the total per group. Make sure to include PrimKey in your GROUP BY section of query builder.
sum(year(t1.date)=2014) as Y2014,
sum(year(t2.date)=2013) as Y2013,
I don't like this type of solution because it's not dynamic, i.e. if your years change you have to change your code, and there's nothing in the code to return an error if that happens either. A better solution is to do a Summary Task by Year/PrimKey and then use a Transpose Task to get the data in the structure you want it.
I am using a parent-child relationship for accounts in the OLAP database icCube. To include financial logic I make use of unary operators. In addition, I have set-up several account hiearchies using the many-2-many relationship and all is working very smoothly, except ....
when I want to apply time logic on the result, e.g. show the YTD value for April,30 2014 by:
Aggregate(crossjoin ({[View].[View].[Periodiek]},PeriodsToDate([Tijd].[Kalender].[jaar],[Tijd].[Kalender].currentmember)))
I get the message:
Aggregate() : the aggregation 'unary-operator' is not supported (measure or calculated measure/member:[Measures].[bedrag])
Apparently, this is not the way to do this.
How can one achieve cumulative figures (periods-to-date) in this setting?
The current version of icCube - 4.8.2 - does not support the Aggregate function for measures with Aggregation type 'unary operator'. See Aggregation function doc here.
The Aggregate function is a bit dodgy if you're using many-2-many relations as well as special measure aggregation types. For example :
Aggregate( { [Account1], [Account2] }, [Measures].[Special] )
If [Special] is a measure with 'Sum' aggregation type and [Account1] and [Account2] have a many-2-many relation we would be counting twice the same 'shared' amount (aka same row is counted twice).
Other measures with aggregation types are just not supported to avoid getting unexpected results. This applies for aggregation types a Open / Close / Distinct Count.
The solution in your case is using the Sum function :
Sum( CompactSet( PeriodsToDate([Tijd].[Kalender].[jaar],[Tijd].[Kalender].currentmember) ) , [View].[View].[Periodiek] )
Compact Set allows to compact to set in case you're using days or hours reducing the set. It's a perfomance booster.
If you want to handle properly m2m relations and special measure aggregation types you can use Categories in icCube, see here some doc. Quickly, they allow to define dynamically a members as a set of tuples.
To properly support Aggregate we should add a new method, e.g. icAggregate, that works as using Categories. The Aggregate function it's a bit strange, for the time being we mimic a bit SSAS...
To prove the comment of Sourav try changing your measure to the following:
Aggregate(
{[View].[View].[Periodiek]}
* PeriodsToDate([Tijd].[Kalender].[jaar],[Tijd].[Kalender].currentmember)
, [Measures].[MEASURE_NOT_bedrag] //<<replace with actual
)
Do you still get the same error?