Say I have a table with three fields message, environment and function.
I want to count up the records by message, environment and function, and then select the highest scoring row for any combination.
Getting the counts is easy
Table
| summarize count() by message, environment, function
...but how do I get just one row with the top count? My solution so far is to create a new table that tallies the counts, then tally max() by environment, function and then do a join, but this seems like an expensive and complicated workaround.
If I understand your original question correctly, you may want to look into summarize arg_max() as well: https://learn.microsoft.com/en-us/azure/kusto/query/arg-max-aggfunction
Ah, just modify the solution here to use max instead of sum
Add column of totals pr. field value
Related
In ClickHouse, is there any way use the topK query on more than the column ,
for example:
select topK(10)(AGE,COUNTRY) ...
meaning I want the top10 combinations of AGE+COUNTRY,
I only found a workaround using concat on fields and topK on them, wondered if there is any other way.
You can pass array (or tuple) of columns to topK:
SELECT topK(10)([Age, Country])
FROM table
Or use the straightforward calculation (it is much slower but provides the exact result):
SELECT
Age,
Country
FROM table
GROUP BY
Age,
Country
ORDER BY count() DESC
LIMIT 10
I have a dataset with a column called Person and a column Time. The combination of these columns indicate at which time an employee completed a task. A person can complete multiple tasks on one day. I want to know what the difference between completion of two following tasks from the same person is and I want to store this data in another column. For sure I have to add a new column, but is this doable with one code? Or should I make a column first that stores the time of the next task completed by the same person? Any tips on how to do this?
I would tackle this using the dplyr package (though I am sure an equivalent solution exists using the data.table package).
Create a tibble (data frame) with the two columns, Person and Time.
Group the data by Person, and sort the data by Time. This will keep your data grouped by individual people, with each person's tasks in time order.
Then I would use the dplyr mutate command to create a 'TimeSinceLastTask' column. The equation you will need to do this needs to use the dplyr lead (or lag) functions to look up the following (or previous) result to subtract from the current value in Time.
If you are using times, I'd strongly recommend the use of lubridate to do your time difference calculations (makes it less messy).
I hope that makes sense. Not near an R terminal so can't safely create you a reprex that works (ie. I could guess but my blind coding never works first time!)
Hope that helps.
Andrew
I'd like to have a Calculated Column in a table that counts the instances of a concatenation.
I get the following error when inputting Abs(Count([concat])) as the column formula for the calculation: The expression Abs(Count([concat])) cannot be used in a calculated column.
Is there any other way to do it without doing a query? I'm pretty sure it can't be done but I figured I'd ask anyways since I didn't see any other posts about it.
No, and even if there was, you should create and use a query for this.
Besides, applying Abs on a count doesn't make much sense, as the count cannot be negative.
I was having a really hard time describing what I need in the Title, so I apologize ahead of time if that makes absolutely no sense.
If I have a CSV that has 2 columns, one with a persons name and a second column with a numeric value I need to find the duplicates in the names column then add the numeric values for that person together to get a total number in a new CSV.
This is a very simplified version of the real CSV
Name,Number
Dog,1
Cat,2
Fish,1
Dog,3
Dog,2
Cat,2
Fish,1
Given the information above, what I would like to be able to produce is this:
Name,Number
Dog,6
Cat,4
Fish,2
I really don't have any idea how to get there or if it's possible with PowerShell. I can only get as far as using group-object to group by name, but I have no clue how to add the columns after that.
The biggest problem I'm coming across with my research on this is that most if not all the results I get when googling involve adding new columns to a csv and not performing the mathematical calculation.
I finally got it
$csvfile = import-csv c:\csvfile.csv
$csvfile | group name | select name,#{Name="Totals";Expression={($_.group | Measure-Object -sum number).sum}}
Credit goes to:
http://www.hanselman.com/blog/ParsingCSVsAndPoorMansWebLogAnalysisWithPowerShell.aspx
I have a column of year values by which I am sorting. I'd like to find the quantity per year (read: number of repeats of each year value). I'd like to chart said values. I'm not sure how to make this happen.
I am using Apple's Numbers '08, but if possible a general solution that multiple people could use would be preferred.
You should use the countif() function: http://office.microsoft.com/en-us/excel/HP052090291033.aspx
I did a similar thing to count how many hours of work there are for each upcoming version of my iPhone app. I was doing sumif(), but you just want countif().
See cells N4-N6 here: http://spreadsheets.google.com/ccc?key=0AhL0igVI9HVNdGpaS3U1cS1qOGVNd3h0Slg0a21vUWc&hl=en
On a new sheet, list the unique years in one column, then their quantity count in the column next to them. Select the entire range created, then create a chart.
I'm unsure from your question what you would specifically need more than this (and I work in Excel 2003).