count distinct values in spreadsheet - count

I have a Google spreadsheet with a column that looks like this:
City
----
London
Paris
London
Berlin
Rome
Paris
I want to count the appearances of each distinct city (so I need the city name and the number of appearances).
City | Count
-------+------
London | 2
Paris | 2
Berlin | 1
Rome | 1
How do I do that?

Link to Working Examples
Solution 0
This can be accompished using pivot tables.
Solution 1
Use the unique formula to get all the distinct values. Then use countif to get the count of each value. See the working example link at the top to see exactly how this is implemented.
Unique Values Count
=UNIQUE(A3:A8) =COUNTIF(A3:A8;B3)
=COUNTIF(A3:A8;B4)
...
Solution 2
If you setup your data as such:
City
----
London 1
Paris 1
London 1
Berlin 1
Rome 1
Paris 1
Then the following will produce the desired result.
=sort(transpose(query(A3:B8,"Select sum(B) pivot (A)")),2,FALSE)
I'm sure there is a way to get rid of the second column since all values will be 1. Not an ideal solution in my opinion.
via http://googledocsforlife.blogspot.com/2011/12/counting-unique-values-of-data-set.html
Other Possibly Helpful Links
http://productforums.google.com/forum/#!topic/docs/a5qFC4pFZJ8

You can use the query function, so if your data were in col A where the first row was the column title...
=query(A2:A,"select A, count(A) where A != '' group by A order by count(A) desc label A 'City'", 0)
yields
City count
London 2
Paris 2
Berlin 1
Rome 1
Link to working Google Sheet.
https://docs.google.com/spreadsheets/d/1N5xw8-YP2GEPYOaRkX8iRA6DoeRXI86OkfuYxwXUCbc/edit#gid=0

=iferror(counta(unique(A1:A100))) counts number of unique cells from A1 to A100

Not exactly what the user asked, but an easy way to just count unique values:
Google introduced a new function to count unique values in just one step, and you can use this as an input for other formulas:
=COUNTUNIQUE(A1:B10)

This works if you just want the count of unique values in e.g. the following range
=counta(unique(B4:B21))

This is similar to Solution 1 from #JSuar...
Assume your original city data is a named range called dataCity. In a new sheet, enter the following:
A | B
----------------------------------------------------------
1 | =UNIQUE(dataCity) | Count
2 | | =DCOUNTA(dataCity,"City",{"City";$A2})
3 | | [copy down the formula above]
4 | | ...
5 | | ...

=UNIQUE({filter(Core!L8:L27,isblank(Core!L8:L27)=false),query(ArrayFormula(countif(Core!L8:L27,Core!L8:L27)),"select Col1 where Col1 <> 0")})
Where Core!L8:L27 is the list in the question.

Related

Generate variable by country for missing years?

Stata and R:
I have two cross-sectional datasets I'm merging. The two datasets have an equal amount of countries and only one dataset has zero missing years (year). The problem is that the missing years are simply not recorded, so I need to make a new variable that would add the years where there is no other data. Otherwise, I cannot merge the datasets according to the two keys, country and year.
Not so -- in Stata (and I would be surprised at a problem in R, but others must speak to that).
Missing observations -- in this context and any similar better called absent -- are not a problem. Here's a demonstration. merge is smart enough to notice gaps and make them explicit as missings. You could "fix" them yourself ahead of the merge, but that is pointless.
clear
input state year y
1 2019 1
1 2020 2
2 2019 3
2 2020 4
end
save tomerge
clear
input state year x
1 2019 42
2 2019 84
end
merge 1:1 state year using tomerge
list
Results
. merge 1:1 state year using tomerge
Result Number of obs
-----------------------------------------
Not matched 2
from master 0 (_merge==1)
from using 2 (_merge==2)
Matched 2 (_merge==3)
-----------------------------------------
.
. list
+----------------------------------------+
| state year x y _merge |
|----------------------------------------|
1. | 1 2019 42 1 Matched (3) |
2. | 2 2019 84 3 Matched (3) |
3. | 1 2020 . 2 Using only (2) |
4. | 2 2020 . 4 Using only (2) |
+----------------------------------------+
Otherwise put, 1:1 as syntax specifies the overall pattern and doesn't rule out 0:1 or 1:0 matches. merge will actually append if identifiers don't match at all. You do need the key variables to exist under identical names in both datasets.

Get max marks of each student in Kusto

Consider a table
StudentId
Subject
Marks
1
Maths
34
1
Science
54
2
Maths
64
2
French
85
2
Science
74
I'm looking for an output where it will give (note that I'm trying to find MAX marks for each student, irrespective of the subject)
StudentId
Subject
Marks
1
Science
54
2
French
85
Use the summarize operator:
T
| summarize max(Marks) by StudentId
In addition to above query from #Avnera, if you also care about the corresponding subject in which the student received the maximum marks (it seems like that based on your desired output table), you can use the arg_max function:
T
| summarize arg_max(Marks, Subject) by StudentId
arg_max(): https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/arg-max-aggfunction

DynamoDB Table/Index Modeling + Querying

Basic requirements:
I have a table with a bunch of attributes (20-30), but only 3 are used in querying: User, Category, and Date, and would be structured something like this...
User | Category | Date | ...
1 | Red | 5/15
1 | Green | 5/15
1 | Red | 5/16
1 | Green | 5/16
2 | Red | 5/18
2 | Green | 5/18
I want to be able to query this table in the following 2 ways:
Most recent rows (based on Date) by User. e.g., User=1 returns the 2 rows from 5/16 (3rd and 4th row)
Most recent rows (based on Date) by User and Category. e.g., User=1, Category=Red returns the 5/16 row only (the 3rd row).
Is the best way to model this with a HASH on User, RANGE on Date, and a GSI with HASH on User+Category and RANGE on Date? Is there anything else that might be more efficient? If that's the path of least resistance, I'd still need to know how many rows to return, which would require doing a count against distinct categories or something?
I've decided that it's going to be easier to just change the way I'm storing the documents. I'll move the category and other attributes into a sub-document so I can easily query against User+Date and I'll do any User+Category+Date querying with some client-side code against the User+Date result set.

Power Bi graph like pivot graph

I'm new to Power Bi, followed most of the tutorial on MS but haven't figured yet how creat a graph that resembles this graphic I did with Excel - Pivot Graph, using as source the same data table.
What I need to recreate in Power Bi is a column graph with the most requested (pre-orders requests % of total sum) products in different price ranges.
Pivot Graph
Table ie.
| Date | Product | 3 to 5 Eur | 5 to 8 Eur | 8 to 11 Eur |
----------------------------------------------------------
| mar17| Coffe | 12 | 7 | 2 |
| mar17| Milk | 15 | 3 | 1 |
| mar17| Honey | 17 | 0 | 5 |
| mar17| Sugar | 20 | 9 | 8 |
Thank in advance for the help.
Bests,
Alberto
Edit - Thanks to Mike Honey for pointing out the original request was for % of grand total. I have added an additional step to accomplish this and cleaned up some existing steps.
When I imported your sample data into Power BI, I got this (looking at the data in the Query Editor window).
From there, Select the Data and Product columns and then click on Transform -> Unpivot Columns -> Unpivot Other Columns...
... which results in this.
Just to clean this up, I renamed the Attribute and Value columns and changed the data type of the Value column. In the end, it looks like this.
Then just click on Home -> Close & Apply to get back in the Report Editor window, where you can create a graph and configure it as shown such:
Axis:
Price Range
Product
Value:
Quantity
Then click of the forked, drill-down arrow in the top left corner of the graph to show Price Range and Product.
Which looks like this.
Next, while not necessary I feel that it is very nice, with the graph selected, click on the paint roller icon and expand the X-Axis category. In there, turn off Concatenate labels.
Finally, to get the bars to be % grand total, simply right click on Quantity in the Value section of the graph's fields and then select Show value as -> Percent of grand total.
To get the final results that look like this.

Matching data across two data frames in R

I've found a number of answers to my question that almost get me to the result I want, but not quite!
I've got two data sets that include word lists, something like:
df1:
Word | Speaker
apple 1
dog 1
lobster 1
tree 2
df2:
Word | Speaker
car 2
lobster 2
fish 1
bird 1
I want to create a new column in df1 that will tell me whether or not the same word appears in df2, regardless of exactly where in the list it occurs and who the speaker was. So I want to create a new column in df1, similar to this:
df1
Word | Speaker | Match
apple 1 FALSE
dog 1 FALSE
lobster 1 TRUE
tree 2 FALSE
It seems that it should be very easy but I can't quite get it to do the right thing. Any help much appreciated!
You're right - it is easy! You need %in%...
df1$Match <- (df1$Word %in% df2$Word)

Resources