I'm building a cube with 2 fact tables that share some dimensions.
In the example below, I have Fact_Employee, Fact_Manager, Dim_Date, Dim_Country, Dim_Employee and Dim_Manager, with the respective links.
In SSAS I've created one Dim_Country. In the Cube "Dimension Usage" I am creating 2 dimensions (Man_Country and Emp_Country) and linking to the respective measure groups.
My Fact_Employee has the key for the Dim_Manager, so I can relate them.
My problem here is, when in the pivot table I drag the Man_Country, Emp_Country, Emp_Amount and Man_Amount, this doesn't work because I'm getting the list of all Manager Countries not related to the Manager Number and then the Employee Countries are correctly linked to the Employee Number, but are duplicate.
The below image shows the result Pivot table and what I am trying to get.
What do I need to change in the data source view or cube dimension usage to have the correct results.
The users should be able to filter the pivot by, for example, Manager Country to see all the employee Countries and Numbers and the amounts (for Managers and Employees).
Many thanks in advance for any help.
Regards,
PC
If you have country dimension then you should use this dimension for both measure groups, just remember to configure dimension usage for this dimension vs both measure groups.
There are special cases where you would want to separate those dimensions, f.eks:if you want them to act separately - let say you have a fact table with parcels and you need to have both DimFromCountry and DimToCountry. In this case you would want to use role playing dimension - it is same dimension then, but connected differently.
Related
I have a UA account and a GA4 account.
In UA I have:
Dimension: Event Category
Metric: Unique Events
In GA4:
Dimension: Event name
Metric: Event count
Is there a way to blend data in a way that I have 1 dimension (event category OR event name) and 1 metric (unique events OR event count)
I'd really need to come to 1 single dimension and 1 single metric.
Thanks in advance.
Data studio supports blending data which means to join two or more datasets. Here you ased for appending data which is in SQL called an UNION ALL.
How to union two dataset in Data Studio
First generate a Google Sheet document with the numbers 0 and 1:
Add this sheet to Data Studio and take care that the column is interpreted as a number:
To each of your datasets from UA and GA4 please add a calculated field dummy with
the formula 0 and in the 2nd datset with the formula 1.
Blend these dataset together:
Since the column test on the left dataset has the value 0 and 1, the other two datasets are unioned.
To combine the two dimensions, please add the + twice:
and enter following formula there
case when test=0 then name else name2 end
and a formula for the metric (counts and events columns) as well:
case when test=0 then counts else events end
Rather than Blending, if you want to "UNION" two separate tables, currently the Google Data Studio does not support it, however I recommend you to use this tool called Windsor AI, This platform will allow you to see all different kinds of Data Sources and do whatever operations you want with them in one place.
Suppose, I want data from Google Analytics, I'll simply choose the data source from the account I want to pull the Data from..
1
From there you can preview your Data and then procced to pull it and from Multiple Tables like here
2
You can then export it to any Data Platform you want
3
Hope this Helps..
P.S. This is a genuine problem that Data Studio has, although Data Studio is relatively new, it lacks a lot of features that can enable it to be a viable platform amongst it's competitors.
I'm logging some custom metrics in Application insights using the TelemetryClient.TrackMetric method in .NET, and I've noticed that occasionally some of the events are duplicated when I view them in the Azure portal.
I've drilled into the data, and the duplicate events have the same itemId and timestamp, but if I show the ingestion time by adding | extend ingestionTime = ingestion_time() to the query then I can see that the ingestion times are different.
This GitHub issue indicates that this behavior is expected, as AI uses at-least-once delivery.
I plot these metrics in charts in the Azure portal using a sum aggregation, however these duplicates are creating trust issues with the charts as the duplicates are simply treated as two separate events.
Is there a way to de-dupe the events based on itemId before plotting the data in the Azure portal?
Update
A more specific example:
I'm running an algorithm, triggered by an event, which results in a reward. The algorithm may be triggered several dozen times a day, and the reward is a positive or negative floating point value. It logs the reward each time to Application Insights as a custom metric (called say custom-reward), along with some additional properties for data splitting.
In the Azure portal I'm creating a simple chart by going to Application Insights -> Metrics and customising the chart. I select my custom-reward metric in the Metric dropdown, and select Sum as the aggregation. I may or may not apply splitting. I save the chart to my dashboard.
This simple chart gives me a nice way of monitoring the system to make sure nothing unexpected is happening, and the Sum value in the bottom left of the chart allows me to quickly see whether the sum of the rewards is positive or negative over the chart's range, and by how much.
However, on occasion I've been surprised by the result (say over the last 12 hours the sum of the rewards was surprisingly negative), and on closer inspection I discovered that a few large negative results have been duplicated. Further investigation shows this has been happening with other events, but with smaller results I tend not to notice.
I'm not that familiar with the advanced querying bit of Application Insights, I actually just used it for the first time today to dig into the events. But it does sound like there might be something I can do there to create a query that I can then plot, with the results deduped?
Update 2
I've managed to make progress with this thanks to the tips by #JohnGardner, so I'll mark that as the answer. I've deduped and plotted the results by adding the following line to the query:
| summarize timestamp=any(timestamp), value=any(value), name=any(name), customDimensions=any(customDimensions) by itemId
Update 3
Adding the following line to the query allowed me to split on custom data (in this case splitting by algorithm ID):
| extend algorithmId = tostring(customDimensions.["algorithm-id"])
With that line added, when you select "Chart" in the query results, algorithmId now shows up as an option in the split dropdown. After that you can click "Pin to dashboard". You lose the handy "sum over the time period" indicator in the bottom left of the chart which you get via the simple "Metrics" chart, however I'm sure I'll be able to recreate that in other ways.
if you are doing your own queries, you would generally be using something like summarize or makeseries to do this deduping for a chart. you wouldn't generally plot individual items unless you are looking at a very small time range?
so instead of something like
summarize count() ...
you could do
summarize dcount(itemId) ...
or you might add a "fake" summarize to a query that didn't need it before with by itemId to coalesce multiple rows into just one, using any(x) to grab any individual row's value for each column for each itemId.
but it really depends on what you are doing in your specific query. if you were using something like sum(itemCount) to also deal with sampling, you have other odd cases now, where the at-least-once delivery might have duplicated sampled items? (updating your question to add a specific query and hypothetical result would possibly lead to a more specific answer).
Im learning dimensional modeling and Im trying to create a model. I was thinking about a social media platform which rates hotels. The platform has following data:
hotel information: name and address
a user can rate hotels (1-5 points)
a user can write comments
platform stores the date of the comments
hotel can answer via comment and it stores the date of it
the platform stores the total number of each rating level (i.e.: all rates with 1 point, all rates with 2 point etc.)
platform stores information of the user: sex, name, total number of votes he/she made and address
First, I tried to define which information belongs to a dimension or fact table
(here I also checked which one is additive/semi additive/non-additive)
I realized my example is kind of difficult, because it’s hard to decide if it belongs to a fact table or dimension.
I would like to hear some advice. Would someone agree with my model?
This is how I would model it:
Hotel information -> hotel dimension
User rating -> additive fact – because I can aggregate them with all dimensions
User comment -> semi additive? – because I can aggregate them with the date dimension (I don’t know if my argument is correct, but I know I would have new comments every day, which is for me a reason to store it in a fact table
Answer as comment -> same handling like with the user comments
Date of comment-> dimension
Total Number of all votes (1/2/3/4/5) -> semi-additive facts – makes no sense to aggregate them, since its already total but I would get the average
User information sex and name, address -> user-dimension
User Information: total number of votes -> could be dimension or fact. It depends how often it changes. If it changes often, I store it in a fact. If its not that often, then dimension
I still have question, hope someone can help me:
My Question: should I create two date dimensions, or can I store both information in one date dimension?
2nd Question: each user and hotel just have one address. Are there arguments, to separate the address dimension in a own hierarchy? Can I create a 1:1 relationship to a user dimension and address dimension?
For your model, it looks well considered, but here are some thoughts:
User comment (and answers to comments): they are an event to be captured (with new ones each day, as you mention) so are factual, with dimensionality of the commenter, type of comment, date, and the measure is at least a 'count' which is additive. But you don't want to store big text in a fact so you would put that in a dimension by itself which is 1:1 with the fact, for situations where you need to query on the comment itself.
Total Number of all votes (1/2/3/4/5) are, as you say, already aggregates, mostly for performance. Totals should be easy from the raw data itself so maybe not worthwhile to store them at all. You might also consider updating the hotel dimension with columns (hotel A has 5 '1' votes and 4 '2' votes) that you'd update as you go on, for easy filtering and categorisation.
User Information: total number of votes: it is factual information about a user (dimension) and it depends on whether you always just want to 'find it out' about a person or whether you are likely to use it to filter other information (i.e. show me all reviews for users who have made 10-20 votes). In that case you might store the total in the user dimension (and/or a banding, like 'number of reviews range' with 10-20, 20-30). You can update dimensions often if you need to, but you're right, it could still just live as a fact only.
As for date dimensions, if the 'grain' is 'day' then you only need one dimension, that you refer to from multiple facts.
As for addresses, you're right that there are arguments on both sides! Many people separate addresses into their own dimension, referred to from the other dimensions that use them. Kimball suggests you can do that behind the scenes if necessary, but prefers for each dimension to have its own set of address columns(but modelled as consistently as possible).
I would like to link multiple times to the same dimension without making a copy of the dimension. I understand that other products have such a feature (role-playing dimensions) but I can't find such functionality in icCube.
Is it really not possible in icCube?
For the time being you can't link multiple time the same dimension in a measure group. So, you've to copy your dimension (you've a button for doing this in the UI).
Is this a problem ?
I am trying to create a report that all updates according to one location out of 35. In essence, I have 35 locations and so I will filter on one of the locations to update the whole report with data that relates to that location. However, there are charts I want to create that should be summed for the whole country, but I am having a hard time creating charts with all 35 dimensions because I don't want to unclick my filter.
For example, I want one table that has all 35 locations and their respective categories (e.g., total score, total employer score, etc), but I can only achieve this table if I do not click on any of the locations in the "locations" field.
Is there an expression I need to specify for the dimension "locations" that will show all dimensions, even if I have one of them selected for the rest of the report?
You can use bookmarks in qlikview to store different filters, and you can create diagrams and tables that are independent form the current selection using set analaysis.
(I'm not sure if that also applies to the dimenension.)