Using sqlite3, I have a column "grades" in table "students" and I want to get the proportion of students who scored over 80 on a test. How do I get that? I can select count(*) from students and then select count(*) from students where score>80, but how do I get the proportion in one statement?
Here is a simple way to do this:
SELECT
AVG(CASE WHEN grades > 80 THEN 1 ELSE 0 END)
FROM students;
This just takes a conditional average over the entire table, counting the number of students with a grade over 80, then normalizing that count by the total number of students.
Related
I have a database with species ID in the rows (very large) and places where they occur in the columns (several sites). I need a summary of how many species are per site. My observations are categorical in some cases (present) or numerical (number of individuals), because they are from different database sources. Also, there are several na's in the entire database.
in R, I have been using functions to count observations one site at the time only.
I appreciate any help on how to count the observations from the different columns at the same time.
You could do just:
SELECT COUNT(*)
FROM tables
WHERE conditions
And in the conditions specify the different columns conditions
WHERE t.COLUMN1="THIS" AND t.COLUMN2="THAT"
or with a SUM CASE (probably the best idea in general):
SELECT grfield,
SUM(CASE when a=1 then 1 else 0 end) as tcount1,
SUM(CASE when a=2 then 1 else 0 end) as tcount2
FROM T1
GROUP by grfield;
Or in a more complex way you could do a subquery inside the count:
SELECT COUNT(*) FROM
(
SELECT DISTINCT D
FROM T1
INNER JOIN T2
ON A.T1=B.T2
) AS subquery;
You could do also several counts in subqueries... the possibilities are endless.
I am able to find out the Count number of occurrences of values in a single column.
By using
select column_name,count(count_name)
from table_name order by column_name
But I want a query for no of occurrences of multiple column values.
The count function, when used directly on a column, just returns a count of the rows. The sum of the counts over multiple columns is just the amount of rows times the amount of columns. One thing we could do is to return the sum of decodes of the condition over all columns, e.g.:
select mytable.*,
DECODE(mytable.column1,"target value",1,0) + DECODE(mytable.column2,"target
value",1,0) as hits from mytable
Basically what that does, is for each row, it will check the amount of columns that meet the condition. In this case, that value ('hits') can be 0, 1 or 2 because we are checking the condition over 2 columns.
I am exploring a new table in SQL and was wondering what is the best way find the count of occurrence of each value. In essence I would like to better understand the distribution of values in the column.
At first I did a select Top 10000 for the table and for this particular column I am interested in I get 2-3 differing values. Let's call them A, B, C.
But when I do a select distinct on that column I get 5 million separate values.
What I am wanting to do is know the distribution of the values in the column.
So an example of output from the query I am looking for being:
Distinct Value of Column Count of Occurrence
A A lot
B A lot
C A lot
D 1
E 1
F 1
G 1
What's your looking for is "GROUP BY" :
Exemple :
SELECT category, COUNT(*) FROM CATALOGS GROUP BY category
Will give you the number of element per category.
I am new to SQL and am having trouble with a (fairly simple) query to rank time stamps.
I have one table with survey data from 2014. I am trying to determine the 'learning curve' for good customer satisfaction performance. I want to order and rank each survey at an agent level based on the time stamp of the survey. This would let me see what the average performance is when an agent has 5 total surveys, 10, 20 etc.
I imagine it should be something like (table name is tablerank):
select T1.*,
(select count(*)
from tablerank as T2
where T2.call_date > T1.call_date
) as SurveyRank
from tablerank as T1
where p1.Agent_ID = T2.Agent_ID;
For each agent, it would list each survey in order and tag a 1 for the earliest survey, a 2 for the second earliest, etc. Then I could Pivot the data in Excel and see the learning curve based on survey count rather than tenure or time (since surveys are more rare, sometimes you only get 1 or 2 in a month).
A correlated subquery must have the correlation in the subquery itself; any table names/aliases from the subquery (such as T2) are not visible in the outer query.
For ranking, you want to count earlier surveys, and you want to include the current survey so that the first one gets the rank number 1, so you need to use <= instead of >:
SELECT *,
(SELECT COUNT(*)
FROM tablerank AS T2
WHERE T2.Agent_ID = T1.Agent_ID
AND T2.call_date <= T1.call_date
) AS SurveyRank
FROM tablerank AS T1
Say we I have a query that displays groups of population by country having the country as its first column, and total population of that country as its the second column.
To achieve this I have the following query:
select
i.country,
count(1) population
from
individual i
group by
i.country
Now I want to introduce two more columns to that query to display the population of males and females for each country.
What I want to achieve might look something similar to this:
select
i.country,
count(1) population total_population,
count(1) over (partition by 1 where i.gender='male') male_population,
count(1) over (partition by 1 where i.gender='female') female_population,
from
individual i
group by
i.country
The problem with this is that
"partition by clause" is not allowed in a "group by" query
"where clause" is not allowed in "partition by" clause
I hope you get the point. Please excuse my grammar and the way I titled this (couldn't know any better description).
You don't need analytic functions here:
select
i.country
,count(1) population
,count(case when gender = 'male' then 1 end) male
,count(case when gender = 'female' then 1 end) female
from
individual i
group by
i.country
;
see http://www.sqlfiddle.com/#!4/7dfa5/4