Blending 2 dimensions and 2 metrics into 1 dimension and 1 metric - google-analytics

I have a UA account and a GA4 account.
In UA I have:
Dimension: Event Category
Metric: Unique Events
In GA4:
Dimension: Event name
Metric: Event count
Is there a way to blend data in a way that I have 1 dimension (event category OR event name) and 1 metric (unique events OR event count)
I'd really need to come to 1 single dimension and 1 single metric.
Thanks in advance.

Data studio supports blending data which means to join two or more datasets. Here you ased for appending data which is in SQL called an UNION ALL.
How to union two dataset in Data Studio
First generate a Google Sheet document with the numbers 0 and 1:
Add this sheet to Data Studio and take care that the column is interpreted as a number:
To each of your datasets from UA and GA4 please add a calculated field dummy with
the formula 0 and in the 2nd datset with the formula 1.
Blend these dataset together:
Since the column test on the left dataset has the value 0 and 1, the other two datasets are unioned.
To combine the two dimensions, please add the + twice:
and enter following formula there
case when test=0 then name else name2 end
and a formula for the metric (counts and events columns) as well:
case when test=0 then counts else events end

Rather than Blending, if you want to "UNION" two separate tables, currently the Google Data Studio does not support it, however I recommend you to use this tool called Windsor AI, This platform will allow you to see all different kinds of Data Sources and do whatever operations you want with them in one place.
Suppose, I want data from Google Analytics, I'll simply choose the data source from the account I want to pull the Data from..
1
From there you can preview your Data and then procced to pull it and from Multiple Tables like here
2
You can then export it to any Data Platform you want
3
Hope this Helps..
P.S. This is a genuine problem that Data Studio has, although Data Studio is relatively new, it lacks a lot of features that can enable it to be a viable platform amongst it's competitors.

Related

Remove duplicate custom metric events from application insights before plotting in Azure portal

I'm logging some custom metrics in Application insights using the TelemetryClient.TrackMetric method in .NET, and I've noticed that occasionally some of the events are duplicated when I view them in the Azure portal.
I've drilled into the data, and the duplicate events have the same itemId and timestamp, but if I show the ingestion time by adding | extend ingestionTime = ingestion_time() to the query then I can see that the ingestion times are different.
This GitHub issue indicates that this behavior is expected, as AI uses at-least-once delivery.
I plot these metrics in charts in the Azure portal using a sum aggregation, however these duplicates are creating trust issues with the charts as the duplicates are simply treated as two separate events.
Is there a way to de-dupe the events based on itemId before plotting the data in the Azure portal?
Update
A more specific example:
I'm running an algorithm, triggered by an event, which results in a reward. The algorithm may be triggered several dozen times a day, and the reward is a positive or negative floating point value. It logs the reward each time to Application Insights as a custom metric (called say custom-reward), along with some additional properties for data splitting.
In the Azure portal I'm creating a simple chart by going to Application Insights -> Metrics and customising the chart. I select my custom-reward metric in the Metric dropdown, and select Sum as the aggregation. I may or may not apply splitting. I save the chart to my dashboard.
This simple chart gives me a nice way of monitoring the system to make sure nothing unexpected is happening, and the Sum value in the bottom left of the chart allows me to quickly see whether the sum of the rewards is positive or negative over the chart's range, and by how much.
However, on occasion I've been surprised by the result (say over the last 12 hours the sum of the rewards was surprisingly negative), and on closer inspection I discovered that a few large negative results have been duplicated. Further investigation shows this has been happening with other events, but with smaller results I tend not to notice.
I'm not that familiar with the advanced querying bit of Application Insights, I actually just used it for the first time today to dig into the events. But it does sound like there might be something I can do there to create a query that I can then plot, with the results deduped?
Update 2
I've managed to make progress with this thanks to the tips by #JohnGardner, so I'll mark that as the answer. I've deduped and plotted the results by adding the following line to the query:
| summarize timestamp=any(timestamp), value=any(value), name=any(name), customDimensions=any(customDimensions) by itemId
Update 3
Adding the following line to the query allowed me to split on custom data (in this case splitting by algorithm ID):
| extend algorithmId = tostring(customDimensions.["algorithm-id"])
With that line added, when you select "Chart" in the query results, algorithmId now shows up as an option in the split dropdown. After that you can click "Pin to dashboard". You lose the handy "sum over the time period" indicator in the bottom left of the chart which you get via the simple "Metrics" chart, however I'm sure I'll be able to recreate that in other ways.
if you are doing your own queries, you would generally be using something like summarize or makeseries to do this deduping for a chart. you wouldn't generally plot individual items unless you are looking at a very small time range?
so instead of something like
summarize count() ...
you could do
summarize dcount(itemId) ...
or you might add a "fake" summarize to a query that didn't need it before with by itemId to coalesce multiple rows into just one, using any(x) to grab any individual row's value for each column for each itemId.
but it really depends on what you are doing in your specific query. if you were using something like sum(itemCount) to also deal with sampling, you have other odd cases now, where the at-least-once delivery might have duplicated sampled items? (updating your question to add a specific query and hypothetical result would possibly lead to a more specific answer).

Effects of high-cardinality Google Analytics event label fields?

I have a Google Analytics event label with high cardinality that I'd like to implement - it is a string that can take on any combination of a finite-but-large number of names in a comma-separated list.
I'm worried mainly about losing data - I found this Analytics Help support page:
https://support.google.com/analytics/answer/1009671?hl=en
...which states:
Reports containing high-cardinality dimensions may be affected by
Analytics system limits, resulting in the creation of a rolled-up
(other) entry in the report to contain the data that exceeds these
limits.
...and am wondering if that would also affect reports without the label included, i.e., reports just looking at unique category/action pairings - would GA still roll-up otherwise-identical into "other" entries if the (undisplayed) labels are different?
Also, am wondering if there would be any hits to performance for similar report types (not looking at labels, just category/action pairings).
Maybe this is just bad practice out of the gate? :)
Google Analytics stores daily, in the processed tables, up to a maximum of 50,000 rows (in Google Analytics 360 the limit increases to 1,000,000 rows, making the problem of data aggregation less frequent). As a result, many combinations of unique dimension values are stored for each table processed every day. If a given table has a larger number of combinations of values of dimensions, Analytics stores the top N values and creates a row of type (other) for the remaining combinations of values.
https://www.analyticstraps.com/valori-raggruppati-in-other-nei-report/
Anyway, I tried a custom report with label and without (same time period) and with label I got (other) while without that dimension I got the actual values.
So the problem you fear does not exist (unless the event action is also high cardinality) :)

Average scroll rate in Google Studio

I want to calculate and display the average scroll depth in Data Studio from analytics.
I’m looking to get an average scroll depth in Studio. I’ve got the 10%,25%, etc scroll depth data coming in, but I now need to be able to calculate the average scroll % from this data.
To calculate the average scroll depth:
multiply the scrolled threshold by the number of events (10x500) + (20x400) + (30x475) +(40x300) + (50x200) + (60x100) +(70x75) +(80x60) + (90x20) + (100x10)
Then, take that total divided by the total number of events. 500 + 400 + 475... etc
Because I can’t reference cells in Studio I can’t get it to work. I’ve also tried Google Sheets, which does work to do the calculation, but then I can’t use Data Studios filter to provide a specific page path?
I'm thinking that perhaps the calculation will need to be done at data source, but I am not sure how to reference a 'cell'?
Data Studio doesn't work based on a concept of "cells", it works based on a concept of "fields"—which are basically properties of the data source. Similarly, you don't have "formulas" per se, but rather "calculated fields". These fields can be created either at the chart-level (single-use, but doesn't require permissions to modify the data source) or in the data source (reusable across many charts, requires permissions to modify the data source). Most fields also have an aggregation type, which tells the report how to aggregate it in charts by default (e.g. Sum or Average).
When you either edit your data source and hit "Add Field" or the option with the same name under the "Add metric" or "Add dimension" menus on a chart, you'll be presented with a box to input the formula. To access a field, just type its name (of if you're in the data source, select it from the list on the left). The editor will also typically give you an auto-complete list below your cursor based on what you're typing. Once your entry matches a field, it will get a highlight box around it (the color is based on the type; green = dimension/string,blue = metric/number). The functions available are sort of a mash-up of something between what you'd expect in Google Sheets and in a SQL query, but with more constraints on when you're allowed to use certain functions.
The documentation for calculated fields is pretty simple, so I'd recommend starting there before you try to do too much heavy-lifting in Data Studio. Because of constraints in Data Studio's data model, you'll often find that you need to create separate calculated fields for different parts of the formula, and then combine them in a new calculated field. I'll warn you that the error messages in the field editor aren't super helpful sometimes, so you may need to re-read the documentation for the functions and field types you're working with to ensure you get a valid result.
If you're running into problems, including the field names and values that you need in your calculation may help, including the source of the data (are these GA events?). The more details you give, including what you've already tried, the more helpful we can be. Also, make sure to read the docs first to make sure you have a good handle on the product you're using and the terminology the community is most likely to understand.

Different Active Users count when using segments

I would love to understand what I'm looking at - why are the numbers different in this report when I add a segment?
This is the report without any segmentation:
This is the same report with the Mobile Traffic segment:
There two methods that Google uses to identify the number of users.
Calculation 1: Pre-calculated data
This calculation relies only on the number of sessions in the given date range and the time of each session. (This is determined by technology managed on the device, like a web browser, and is often referred to as the client-side time.) Because the result of this calculation can be added to the pre-aggregated data tables, Analytics can reference the table to quickly retrieve and serve this data in a report, including when you change the date range.
Calculation 2: Data calculated on the fly
Calculation 2 is based on the way you assign, collect, and store persistent data about your traffic. There are many solutions you can implement to customize this, but the most common way this data is going to be assigned and stored is through cookies managed via a web browser.
Adding a segment will force GA to calculate the data on the fly and that's why you are seeing a difference in the numbers.
Are you using GA free or 360? and the time range you are using is same in both reports?
You can also have a look into the Google article https://support.google.com/analytics/answer/2992042?hl=en
You are victim of sampling:
https://support.google.com/analytics/answer/2637192?hl=en
Sampling applies when:
you customize the reports
the number of sessions for the report time range exceeds 500K (GA) or 100M (GA 360)
The consequence is that:
the report will be based on a subset of the data (the % depends on the total number of sessions)
therefore your report data won't be as accurate as usual
What you can do to reduce sampling:
increase sample size in UI (will only decrease sampling to a certain extend, but in most cases won't completely remove sampling)
reduce time range
create filtered views so your reports contain the data you need and you don't have to customize them

SSAS facts sharing the same dimension

I'm building a cube with 2 fact tables that share some dimensions.
In the example below, I have Fact_Employee, Fact_Manager, Dim_Date, Dim_Country, Dim_Employee and Dim_Manager, with the respective links.
In SSAS I've created one Dim_Country. In the Cube "Dimension Usage" I am creating 2 dimensions (Man_Country and Emp_Country) and linking to the respective measure groups.
My Fact_Employee has the key for the Dim_Manager, so I can relate them.
My problem here is, when in the pivot table I drag the Man_Country, Emp_Country, Emp_Amount and Man_Amount, this doesn't work because I'm getting the list of all Manager Countries not related to the Manager Number and then the Employee Countries are correctly linked to the Employee Number, but are duplicate.
The below image shows the result Pivot table and what I am trying to get.
What do I need to change in the data source view or cube dimension usage to have the correct results.
The users should be able to filter the pivot by, for example, Manager Country to see all the employee Countries and Numbers and the amounts (for Managers and Employees).
Many thanks in advance for any help.
Regards,
PC
If you have country dimension then you should use this dimension for both measure groups, just remember to configure dimension usage for this dimension vs both measure groups.
There are special cases where you would want to separate those dimensions, f.eks:if you want them to act separately - let say you have a fact table with parcels and you need to have both DimFromCountry and DimToCountry. In this case you would want to use role playing dimension - it is same dimension then, but connected differently.

Resources