Divide two metrics in Google Cloud Monitoring / MQL - stackdriver

How do I compute the difference and ratio between two Stackdriver metrics in MQL?
There are two parts two this question, but I am grateful if you can help me with one:
Compute the difference between the two time series
Compute the ratio between the two time series. Addon if possible: The case that the denominator is null should be handled gracefully.
I got so far, but it does not yield the expected result (the resulting time series is always zero):
fetch global
| { t_0:
metric 'custom.googleapis.com/http/server/requests/count'
| filter
(metric.service == 'service-a' && metric.uri =~ '/api/started')
| align next_older(1m);
t_1:
metric 'custom.googleapis.com/http/server/requests/count'
| filter
(metric.service == 'service-a' && metric.uri =~ '/api/completed')
| align next_older(1m)
}
| outer_join 0
| div
Obviously, the code has been anonymized. What I want to accomplish is to track whether there is a difference between processes that have been started vs completed.
EDIT / ADDITIONAL INFO 2021-11-18
I used the projects.timeSeries v3 API for further debugging. Apparently the outer_join operation assumes that the labels between the two timeseries are the same, which is not the case in my example.
Does anybody know, how to delete the labels, so I can perform the join and aggregation?
EDIT / ADDITIONAL INFO 2021-11-19
The aggregation now works as I managed to delete the labels using the map drop[...] maplet.
The challenge was indeed the labels as these labels are generated by the Spring Boot implementation of Micrometer. Since the labels are distinct between these two metrics the join operation was always empty for join, or only the second timeseries for outer_join.

As I can see you want to know the ratio between the server_req _count when the process started and when it’s completed and you are using the align next_older(1m) which may fetch the most recent value in the period. So it would be possible when the process started the count would be zero. So I would recommend you to change the aligner to mean or max to get the req_count during the process started and completed in a different time series.
align mean(1m)
Please see the documentation for more ways to do a ratio query:Examples of Ratio Query

This is what I got so far. Now the aggregation works as the labels are deleted. I will update this example when I know more.
fetch global
| {
t_0:
metric 'custom.googleapis.com/http/server/requests/count'
| filter
(metric.service == 'service-a' && metric.uri =~ '/api/started')
| every (1m)
| map drop [resource.project_id, metric.status, metric.uri, metric.exception, metric.method, metric.service, metric.outcome]
; t_1:
metric 'custom.googleapis.com/http/server/requests/count'
| filter
(metric.service == 'service-a' && metric.uri =~ '/api/completed')
| every (1m)
| map drop [resource.project_id, metric.status, metric.uri, metric.exception, metric.method, metric.service, metric.outcome]
}
| within d'2021/11/18-00:00:00', d'2021/11/18-15:15:00'
| outer_join 0
| value val(0)-val(1)

After using the map add / map drop operators like in your own answer, make sure to use outer_join 0,0 which will give you a full outer join. Note that the 0,0 argument to outer_join means "substitute zeros if either stream's value is missing".
In your case, since the first stream counts "started" tasks and the second stream counts "completed" tasks, you are likely to find cases when the first metric has more rows than the second one. If you want to do a left join operation, the syntax is outer_join _, 0. The underscore followed by 0 means "don't substitute anything if the first stream's value is missing, but do substitute a zero if the second stream's value is missing."

Related

Snowflake, Python/Jupyter analysis

I am new to Snowflake, and running a query to get a couple of day's data - this returns more than 200 million rows, and take a few days. I tried running the same query in Jupyter - and the kernel restars/dies before the query ends. Even if it got into Jupyter - I doubt I could analyze the data in any reasonable timeline (but maybe using dask?).
I am not really sure where to start - I am trying to check the data for missing values, and my first instinct was to use Jupyter - but I am lost at the moment.
My next idea is to stay within Snowflake - and check the columns there with case statements (e.g. sum(case when column_value = '' then 1 else 0 end) as number_missing_values
Does anyone have any ideas/direction I could try - or know if I'm doing something wrong?
Thank you!
not really the answer you are looking for but
sum(case when column_value = '' then 1 else 0 end) as number_missing_values`
when you say missing value, this will only find values that are an empty string
this can also be written is a simpler form as:
count_if(column_value = '') as number_missing_values
The data base already knows how many rows are in a column, and it knows how many null columns there are. If loading data into a table, it might make more sense to not load empty strings, and use null instead then, for not compute cost you can run:
count(*) - count(column) as number_empty_values
also of note, if you have two tables in snowflake you can compare the via the MINUS
aka
select * from table_1
minus
select * from table_2
is useful to find missing rows, you do have to do it in both directions.
Then you can HASH rows, or hash the whole table via HASH_AGG
But normally when looking for missing data, you have an external system, so the driver is 'what can that system handle' and finding common ground.
Also in the past we where search for bugs in our processing that cause duplicate data (where we needed/wanted no duplicates) so then the above, and COUNT DISTINCT like commands come in useful.

Using findall with multiple criteria

The following code finds the indexes in the 50th column of p where the value is equal to 1.
findall(p[:,50].== 1)
But suppose I was interested in screening for multiple criteria. For example, if I was also interested in the indexes where the value is 0.5. I have tried the following in that case, but something goes wrong:
findall(p[:,50].== 1 | p[:,50].== 0.5)
You're forgetting to dot the | operator. But you also need to use parens:
findall((p[:,50].== 1) .| (p[:,50].== 0.5))
But still, this is a bit wasteful, since you are making two copies of the same column, and are allocating five intermediate vectors that you don't need. You should try to use a predicate function to avoid this, like e.g. here:
findall(x->x in (0.5, 1.0), p[:,50])
or
findall(x->x==0.5||x==1, p[:,50])
On top of this, you can use view to avoid allocations due to p[:,50]:
findall(x->x==0.5||x==1, view(p, :,50))

jq filtering based on conditions

How can I use jq to do filtering based on certain conditions?
Using the example from https://stedolan.github.io/jq/tutorial/#result2 for e.g., say I want to filter the results by
.commit.author.date>=2016, and
.commit.comment_count>=1
All the items not satisfying such criteria will not show up in the end result.
What's the proper jq expression to do that? Thx.
The response is an array of objects that contain commit information. Since you want to filter that array, you usually will want to map/1 that array and filter using select/1: map(select(...)). You just need to provide the condition to filter.
map(select(.commit | (.author.date[:4] | tonumber) >= 2016 and .comment_count >= 1))
The date in this particular case is a date string in iso format. I'm assuming you wanted to get the commits that were in the year 2016 and beyond. Extract the year part and compare.
(.author.date[:4] | tonumber) >= 2016
Then combine that with comparing the comment count.
Note, I projected to the commit first to minimize repetition in the filter. I could have easily left that part out.
map(select((.commit.author.date[:4] | tonumber) >= 2016 and .commit.comment_count >= 1))

How to count occurrence of value and percentage of a subset in tableau public?

I have a set of data in the following format:
Resp | Q1 | Q2
P1 | 4 | 5
P2 | 1 | 2
P3 | 4 | 3
P4 | 6 | 4
I'd like to show the count and % of people who gave an answer greater than 3. So in this case, the output would be:
Question | Count | Percent
Q1 | 3 | 75%
Q2 | 2 | 50%
Any suggestions?
Although it sounds like a fairly easy thing, it is a bit more complicated.
Firstly your data is not row based so you will have to pivot it.
Load your data into Tableau
In the DataSource Screen choose column Q1 and Q1, right click on them and chosse "Pivot"
Name the column with the answers "Answers" (just for clarity.
You should get a table that looks like this:
Now you need to create a calculated field (I called it Overthreshold to check for your condition:
if [Answer] > 3 then
[Answer]
End
At this point you could substitute the 3 with a parameter in case you want to easily change that condition.
You can already drop the pills as follows to get the count:
Now if you want the percentage it gets a bit more complicated, since you have to determine the count of the questions and the count of the answers > 3 which is information that is stored in two different columns.
Create another Calculated field with this calculation COUNT([Overthreshold]) / AVG({fixed [Question]:count([Answer])})
drop the created pill onto the "text" field or into the columns drawer and see the percentage values
right click on the field and choose Default Propertiess / Number Format to have it as percentage rather than a float
To explain what the formular does:
It takes the count of the answers that are over the threshold and devides it by the count of answers for each question. This is done by the fixed part of the formular which counts the rows that have the same value in the Question column. The AVG is only there because Tableau needs an aggregeation there. Since the value will be the same for every record of the question, you could also use MIN or MAX.
It feels like there should be an eassier solution but right now I cannot think of one.
Here is a variation on #Alexander's correct answer. Some folks might find it slightly simpler, and it at least shows some of the Tableau features for calculating percentages.
Starting as in Alexander's answer, revise Overtheshold into a boolean valued field, defined as Answer > 3
Instead of creating a second calculated field for the percentage, drag Question, Overthreshold and SUM(Number Of Records) onto the viz as shown below.
Right click on SUM(Number of Records) and choose Quick Table Calculation->Percentage of Total
Double click Number of Records in the data pane on the left to add it to the sheet, which is a shortcut for bringing out the Measure Names and Measure Values meta-fields. Move Measure Names from Rows to Columns to get the view below, which also uses aliases on Measure Names to shorten the column titles.
If you don't want to show the below threshold data, simply right click on the column header False and choose Hide. (You can unhide it if needed by right clicking on the Overthreshold field)
Finally, to pretty it up a bit, you can move Overthreshold to the detail shelf (you can't remove it from the view though), and adjust the number formatting for the fields being displayed to get your result.
Technically, Alexander's solution uses LOD calculations to compute the percentages on the server side, while this solution uses Table calculations to compute the percentage on the client side. Both are useful, and can have different performance impacts. This just barely nicks the surface of what you can do with each approach; each has power and complexity that you need to start to understand to use in more complex situations.

Conditional Sum For Multiple Datasets in RDLC

I'm trying to sum up values based on the 'Description' column of a dataset. So far, I have this
=Sum(Cdbl(IIf(First(Fields!Description.Value, "Items") = "ItemA", Sum(Fields!Price.Value, "Items"), 0)))
But it keeps giving me an error saying that it "contains a First, Last, or Previous aggregate in an outer aggregate. These aggregate functions cannot be specified as nested aggregates" Is there something wrong with my syntax here?
What I need to do is take something like this...
Item | Price
Item A | 400.00
Item B | 300.00
Item A | 200.00
Item A | 100.00
And I need to get the summed Price for 'ItemA' - 700.00 in this case.
All of the answers I've found so far only show for a single dataset OR for use with a tablix. For example, the below code does not work because it does not specify the scope or the dataset to use.
=Sum(Cdbl(IIf(Fields!Description.Value) = "ItemA", Sum(Fields!Price.Value), 0)))
I also can't specify a dataset to use, because the control I'm loading into is a textbox, not a tablix.
If anyone else sees this and wants an answer, I ended up returning a count back of what I needed on another dataset. The other option I was thinking would possibly be to create a 1x1 tablix, set the dataset, and then use the second bit of code posted.

Resources