Ist it possible to combine dimensions and metrics in calculated fields? - google-analytics

We have the variables:
"Unique User"
"Version" (Plus, Light in a ratio 79:21 from all Unique User)
"total Events"
"Eventkatagories".
And following scenario:
We can't get the exact data how many users are plus or light users.
But we know how many events are triggered by version (plus/light).
Now we want to know how the relative frequency of events triggered grouped by Version and event category.
So in a pivot table there is the row dimension = Version and the column Dimension = event category.
So the measurement should be the relative frequency.
So the simple custom calculated field should be "total events / users"... But remember we can't get the absolute value of Users by Version, we just know the ratio (80-20).
So I build another calculated field called UsersbyVersion with following statement:
CASE
WHEN (Version = "light") THEN SUM(User) * 0.21
WHEN (Version = "Plus") THEN SUM(User) * 0.79
END
But this formula gives following error:
Invalid formula - Invalid input expression. - Failed to parse CASE
statement
If I use absolute numbers for the statement it works.
Example:
CASE
WHEN (Version = "Normal") THEN 5000
WHEN (Version = "Plus") THEN 25000
END
But we need the statement "User * ration" ... the ratio won't change a lot but the user value in relation to the date we want to set on the Data Studio Report.
So I guess the problem is that the statement won't work with a combination of metrics and dimensions.
I already tried putting the "User * 0.79" and "User * 0.21" in custom metrics but this won't work aswell.
Is there a way to combine dimensions and metrics in a calculated field as an measurement?
Thx for your help

Create 2 metrics -
users * 0.2 (lets call this UsersP2)
users * 0.8 (lets call this UsersP8)
Now this should work
CASE
WHEN (Version = "light") THEN UserP2
WHEN (Version = "Plus") THEN UserP8
END
Dataset
Result

Related

How to SELECT a single record in table X with the largest value for X.a WHERE values for fields X.b & X.c are specified

I am using the following query to obtain the current component serial number (tr_sim_sn) installed on the host device (tr_host_sn) from the most recent record in a transaction history table (PUB.tr_hist)
SELECT tr_sim_sn FROM PUB.tr_hist
WHERE tr_trnsactn_nbr = (SELECT max(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_lot = '99524136'
AND tr_part = '6684112-001')
The actual table has ~190 million records. The excerpt below contains only a few sample records, and only fields relevant to the search to illustrate the query above:
tr_sim_sn |tr_host_sn* |tr_host_pn |tr_domain |tr_trnsactn_nbr |tr_qty_loc
_______________|____________|_______________|___________|________________|___________
... |
356136072015140|99524135 |6684112-000 |vattal_us |178415271 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178424418 |-1.0000000000
356136072015458|99524136 |6684112-001 |vattal_us |178628048 |1.0000000000
356136072015050|99524136 |6684112-001 |vattal_us |178628051 |-1.0000000000
356136072015836|99524137 |6684112-005 |vattal_us |178645337 |-1.0000000000
...
* = key field
The excerpt illustrates multiple occurrences of tr_trnsactn_nbr for a single value of tr_host_sn. The largest value for tr_trnsactn_nbr corresponds to the current tr_sim_sn installed within tr_host_sn.
This query works, but it is very slow, ~8minutes.
I would appreciate suggestions to improve or refactor this query to improve its speed.
Check with your admins to determine when they last updated the SQL statistics. If the answer is "we don't know" or "never" then you might want to ask them to run the following 4gl program which will create a SQL script to accomplish that:
/* genUpdateSQL.p
*
* mpro dbName -p util/genUpdateSQL.p -param "tmp/updSQLstats.sql"
*
* sqlexp -user userName -password passWord -db dnName -S servicePort -infile tmp/updSQLstats.sql -outfile tmp/updSQLtats.log
*
*/
output to value( ( if session:parameter <> "" then session:parameter else "updSQLstats.sql" )).
for each _file no-lock where _hidden = no:
put unformatted
"UPDATE TABLE STATISTICS AND INDEX STATISTICS AND ALL COLUMN STATISTICS FOR PUB."
'"' _file._file-name '"' ";"
skip
.
put unformatted "commit work;" skip.
end.
output close.
return.
This will generate a script that updates statistics for all table and all indexes. You could edit the output to only update the tables and indexes that are part of this query if you want.
Also, if the admins are nervous they could, of course, try this on a test db or a restored backup before implementing in a production environment.
I am posting this as a response to my request for an improved query.
As it turns out, the following syntax features two distinct features that greatly improved the speed of the query. One is to include tr_domain search criteria in both main and nested portions of the query. Second is to narrow the search by increasing the number of search criteria, which in the following are all included in the nested section of the syntax:
SELECT tr_sim_sn,
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_trnsactn_nbr IN (
SELECT MAX(tr_trnsactn_nbr)
FROM PUB.tr_hist
WHERE tr_domain = 'vattal_us'
AND tr_part = '6684112-001'
AND tr_lot = '99524136'
AND tr_type = 'ISS-WO'
AND tr_qty_loc < 0)
This syntax results in ~0.5s response time. (credit to my colleague, Daniel V.)
To be fair, this query uses criteria outside the originally stated parameters that were included in the original post, making it difficult to impossible for others to attempt a reasonable answer. This omission was not on purpose of course, rather due to being fairly new to fundamentals of good query design. This query in part is a result of learning that when too-few or non-indexed fields are used as search criteria in a large table, it is sometimes helpful to narrow the search by increasing the number of search criteria items. The original had 3, this one has 5.

CreateML Recommender Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1

I'm using CreateML to generate a Recommender model using an implicit dataset of the format: User ID, Item ID. The data is loaded into CreateML as a CSV with about 400k rows.
When attempting to 'Train' the model, I receive the following error:
Training Error: Item IDs in the recommender model must be numbered 0, 1, ..., num_items - 1
My dataset is in the following format:
"user_id","item_id"
"e7ca1b039bca4f81a33b21acc202df24","f7267c60-6185-11ea-b8dd-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","e643af62-6185-11ea-9d27-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","f2fd13ce-6185-11ea-b210-0657986dc989"
"1cd4285b19424a94b33ad6637ec1abb2","e95864ae-6185-11ea-a254-0657986dc989"
"31042cbfd30c42feb693569c7a2d3f0a","e513a2dc-6185-11ea-9b4c-0657986dc989"
"39e95dbb21854534958d53a0df33cbf2","f27f62c6-6185-11ea-b14c-0657986dc989"
"5c26ca2918264a6bbcffc37de5079f6f","ec080d6c-6185-11ea-a6ca-0657986dc989"
I've tried modifying both Item ID and User ID to enumerated IDs, but I still receive the training error. Example:
"item_ids","user_ids"
0,0
1,0
2,0
2,0
0,225
400,225
409,225
0,282
0,4
8,4
8,4
I receive this error both within the CreateML UI and when using CreateML within a Swift playground. I've also tried removing duplicates and verified that the maximum ID for each column is (num_items - 1).
I've searched for documentation on what the exact requirement is for the set of IDs with no luck.
Thank you in advance for any helping clarifying this error message.
I was able to discuss this issue with Apple's CoreML developers during WWDC2020. They described this as a known bug which will be fixed with the upcoming OS (Big Sur). The work-around for this bug is:
In the CSV dataset, create records for a single user which interacts with ALL items, and create records for a single item interacted with by ALL users.
Using pandas in python, I essentially implemented the following:
# Find the unique item ids
item_ids = ratings_df.item_id.unique()
# Find the unique user ids
user_ids = ratings_df.user_id.unique()
# Create a 'dummy user' which interacts with all items
mock_item_interactions_df = pd.DataFrame({'item_id': item_ids, 'user_id': 'mock-user'})
ratings_with_mocks_df = ratings_df.append(mock_item_interactions_df)
# Create a 'dummy item' which interacts with all users
mock_item_interactions_df = pd.DataFrame({'item_id': 'mock-item', 'user_id': user_ids})
ratings_with_mocks_df = ratings_with_mocks_df.append(mock_item_interactions_df)
# Export the CSV
ratings_with_mocks_df.to_csv('data/ratings-w-mocks.csv', quoting=csv.QUOTE_NONNUMERIC, index=True)
Using this CSV, I successfully generated a CoreML model using CreateML.
Try adding unnamed first column to your csv data which counts rows from 0 ... number of items - 1
like
"","userID","itemID","rating"
0,"a","x",1
1,"a","y",0
...
I think today after adding this column it started working for me. I use UUID for userID and itemID in my training model. and be sure to sort rows by itemID so all for one itemID are close to each other

How to use FORMATTED_VALUE in a cumulated graph?

At first, from the Data Render panel in icCube report, I used context.cumulativeCol(); in the Value field in order to create my cumulated graph.
Now, since the format of my data is not well suited for my application (I have values such as '4.547473508864641e-13' which I want to be formatted to 0.00), I tried to add parameters to the function :
var col = context.getColumnIndex();
var measure = context.getMeasures();
var property = "FORMATTED_VALUE";
return context.cumulativeCol(col, measure, property);
But I cannot get a proper output.
How should I do it?
You cannot use FORMATTED_VALUE to format numbers calculated on the client side, it's available on for data that comes directly from the server. So in your case you need to implement your own client-side formatting. You could use mathJS that bundled to the reporting i.e.:
return math.format(context.cumulativeCol(col), {notation: "fixed", precision: 2})
Or use any other JS formatting method like .toFixed(2)

Carbon Aggregator re-aggregating metric

I have the following aggregation rule:
abc.prod.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
Given metrics like:
abc.prod.host1.aservice.ametric.count
abc.prod.host2.aservice.ametric.count
I would expect them to be aggregated to
abc.prod.ALL.aservice.ametric.count
But that metric is never created. In aggregator logs, I see
Allocating new metric buffer for abc.prod.ALL.aservice.ametric.count
but it's not created. If I add a layer to the generated metric like:
abc.prod.extralayer.ALL.<service>.<metric>.count (60) = sum abc.local.*.<service>.<<metric>>.count
then we seem to get a recursive explosion of created metrics like:
abc.prod.extralayer.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.aservice.ametric.count
abc.prod.extralayer.ALL.ALL.ALL.ALL.aservice.ametric.count
Which led me to believe that the generated metric is then aggregated again...
I added a logging line to AggregationProcessor.process:
else:
log.clients("Found aggregate " + aggregate_metric + " for " + metric)
aggregate_metrics.add(aggregate_metric)
And then tried with my original, desired rule.. and I eventually started to see, loglines like:
Found aggregate abc.prod.ALL.aservice.ametric.count for abc.prod.ALL.aservice.ametric.count
It matched itself as if it was a new incoming metric... Why is it being fed back into the aggregator?
This appears to have been a bug. It was not in older version but was in master at the time of my question.
If you are seeing this behaviour, follow the issue on GitHub:
https://github.com/graphite-project/carbon/issues/560
https://github.com/graphite-project/carbon/issues/455
There is no point in continuing the question here on SO.
Note: I am using the older version, 0.9.15 and not seeing the problem - so I recommend this until it is confirmed to not be resolved in master.

Kibana : how to do a visualisation with an mathematical expression?

So I have 3 searches.
I'm interested in 3 lines of log (each line is a document, msg is a field)
S1 : msg = Sending to ELK
S2 : msg = ELK failure - rejected
S3 : msg = ELK failure due to us
Search 1 is a try, search 2 and 3 are failures, I need graph that display this :
(CountS1-(CountS2+CountS3))/(CountS1/100) on the Y axisand the date of the log on the X axis
I know how to use the date of the logs on the X axis but for the Y axis I can only do things such as count, average, sum, etc of 1 search only.
Any ideas?
Thanks.
Yes, the best solution is go to Scripted fields and create the field that you need.
You can do in this way for example:
(doc['CountS1'].value - (doc['CountS2'].value+doc['CountS3].value))/(doc['CountS1'].value/100)
With this you have a new field that you can use only reference the name that you given to the new field. For example you name this field as Example1, then in your visualize option will appear the new field.

Resources