Getting a distinct count by year from an access database - css

I'm very rusty with my SQL so this might be not so complicated but I just can't seem to be able to crack it.
I have a database with two tables - one containing details of patients and one of visits each patient has had. Patient_ID is the unique identifier for a patient and is used in the Visits table and I'm trying to pull the number of distinct patients and the total number of visits they've had (i.e. Patient A has visited 3 times in 2018)
I'm trying to get a Total count of the Distinct individual patients who have visited a centre per YEAR (field in Visits table), and also see information about the patient from the Patients table (gender, country, etc).
I've tried several count and distinct functions but can't get anything to work. The below is one of the last attempts but the distinct function doesn't actually show distinct values (am I doing something wrong with it?) in this scenario. It does work in other queries... Any help would be greatly appreciated.
SELECT DISTINCT Visits.Patient_ID, Patient.Gender, Patient.Village, Visits.Months_Of_Visit, Visits.Year
FROM Visits
INNER JOIN Patient ON Patient.Patient_ID=Visits.Patient_ID
WHERE Year='2018';
Expected result:
Unique Patient Id, Patient Gender, Patient Village PER month and PER Year.

If you want the number of times each patient visited each village/month/year:
SELECT Count(*) AS CountVisits, Visits.Patient_ID, Gender, Village, Months_Of_Visit, [Year]
FROM Visits
INNER JOIN Patient ON Patient.Patient_ID=Visits.Patient_ID
GROUP BY Patient_ID, Gender, Village, Months_Of_Visit, [Year];
If you want the number of DISTINCT patients per village/month/year:
Query1:
SELECT DISTINCT Visits.Patient_ID, Gender, Village, Months_Of_Visit, [Year]
FROM Visits
INNER JOIN Patient ON Patient.Patient_ID=Visits.Patient_ID;
Query2:
SELECT Count(*) AS CountPerVillage, Village, Months_Of_Visit, [Year]
FROM Query1 GROUP BY Village, Months_Of_Visit, [Year];
All in one:
SELECT Count(*) AS CountPerVillage, Village, Months_Of_Visit, [Year]
FROM (SELECT DISTINCT Visits.Patient_ID, Village, Months_Of_Visit, [Year]
FROM Visits INNER JOIN Patient ON Patient.Patient_ID=Visits.Patient_ID) AS Query1
GROUP BY Village, Months_Of_Visit, [Year];
Since Year is a reserved word (it is an intrinsic function), enclose in [ ] or include the table name prefix in the field reference.

Related

Select people with a given surname from database

I do the following to get the population in a set of districts for a given year:
SELECT Year, County, District, Count(*) FROM census_data group by Year, County, District where Year = ?;
Then I do the following many thousands of times to get the population in each district for each surname I am interested in:
SELECT Year, County, District, COUNT(*) FROM census_data where Year = ? and Surname = ? group by Year, County, District;
There are 8 million rows in my db covering two specific years. There are roughly 40 counties and a county typically has a few hundred districts.
Should I add an index on my table to speed up the above queries as follows:
CREATE INDEX surname_index ON census_data (surname);
My thinking is that since generally speaking there are not many people with a given surname then it should be enough just to index it. Or would you recommend something else? I could also change the query to:
SELECT Year, County, District, COUNT(*) FROM census_data where Surname = ? group by Year, County, District;
for I am usually interested in both years anyway. When doing queries, how do I see if my index is being used?
Yes, I would use an index on the columns you're grouping by. Like I mentioned in the comments, I'd also use one query that produces all the desired rows over 1000 queries that produce a fragment of the total apiece. Make the database do all that work only once. Since you mentioned the names you're interested in are the 1000 most common ones, not random names, that actually makes it a bit easier.
The following demonstrates two slightly different approaches to getting the count per (year, county, district, surname) of the most common surnames overall:
First, populate a table with some sample data:
CREATE TABLE census(year INTEGER, county TEXT, district TEXT, surname TEXT);
INSERT INTO census VALUES
(2012, 'Lake', 'West', 'Smith'),
(2012, 'Lake', 'West', 'Jones'),
(2012, 'Lake', 'West', 'Smith'),
(2012, 'Lake', 'West', 'Washington'),
(2012, 'Lake', 'West', 'Washington'),
(2012, 'Lake', 'East', 'Smith'),
(2012, 'Lake', 'East', 'Jackson'),
(2012, 'Williams', 'Downtown', 'Jones'),
(2012, 'Williams', 'Downtown', 'McMaster'),
(2012, 'Williams', 'West Side', 'Jones'),
(2012, 'Williams', 'West Side', 'Jones');
CREATE INDEX census_idx ON census(year, county, district, surname);
(Your real data will, of course, have a lot more rows, and presumably more columns. Depending on space constraints, you might want to drop surname from the index, at the cost of a slower query. With all four columns in the index, it's a covering index for the queries below and the actual table rows never get accessed. With just the first three (Or two, or one), it'll need temporary b-trees for the grouping, and more table accesses.).
Approach one: Populate a temporary table with the 1000 most common names overall, and use that table in a join to restrict the results to just those names:
CREATE TEMP TABLE names(name TEXT PRIMARY KEY) WITHOUT ROWID;
INSERT INTO names
SELECT surname FROM census GROUP BY surname ORDER BY count(*) DESC LIMIT 1000;
SELECT year, county, district, surname, count(*) as number
FROM census AS c
JOIN names AS n ON c.surname = n.name
GROUP BY year, county, district, surname
ORDER BY year, county, district, count(*) DESC, surname;
Approach two: Do the same thing, but a subquery instead of a table for the most common names:
SELECT year, county, district, surname, count(*) as number
FROM census AS c
JOIN (SELECT surname AS name FROM census GROUP BY surname ORDER BY count(*) DESC LIMIT 1000) AS n ON c.surname = n.name
GROUP BY year, county, district, surname
ORDER BY year, county, district, count(*) DESC, surname;
Both produce:
year county district surname number
---------- ---------- ---------- ---------- ----------
2012 Lake East Jackson 1
2012 Lake East Smith 1
2012 Lake West Smith 2
2012 Lake West Washington 2
2012 Lake West Jones 1
2012 Williams Downtown Jones 1
2012 Williams Downtown McMaster 1
2012 Williams West Side Jones 2
If you're going to run this query a lot in a session, the first approach will be faster - it only has to build the list of most common names once, while the second one has to do it every time the query is run. It is, however, more involved because it takes multiple SQL statements. For a single run, benchmarking the two on a decent sized dataset is the best guide, of course.

SQLite Calculate sum outside group by

I'm trying to calculate the percentage of a customer has spent over the total sales value.
I have calculated the total sales value per customer using sum() and group by, but after I use group by, I cannot differentiate the total sales value and the individual total for each sustomer.
is there anyway i could get around this?
i got to here so far and dont know what to do next:
select c.firstname ||' '|| c.lastname as 'Ful name',
sum(total) as 'Sales value',
/*something to calculate percentage*/,
from invoice i inner join customer c on i.customerid = c.customerid
group by i.customerid order by sum(total) desc limit 5;
To calculate the simple sum over the entire table, move it into an independent subquery:
SELECT ...,
sum(total) / (SELECT sum(total) FROM invoice)
FROM ...;

SQLite Ranking Time Stamps

I am new to SQL and am having trouble with a (fairly simple) query to rank time stamps.
I have one table with survey data from 2014. I am trying to determine the 'learning curve' for good customer satisfaction performance. I want to order and rank each survey at an agent level based on the time stamp of the survey. This would let me see what the average performance is when an agent has 5 total surveys, 10, 20 etc.
I imagine it should be something like (table name is tablerank):
select T1.*,
(select count(*)
from tablerank as T2
where T2.call_date > T1.call_date
) as SurveyRank
from tablerank as T1
where p1.Agent_ID = T2.Agent_ID;
For each agent, it would list each survey in order and tag a 1 for the earliest survey, a 2 for the second earliest, etc. Then I could Pivot the data in Excel and see the learning curve based on survey count rather than tenure or time (since surveys are more rare, sometimes you only get 1 or 2 in a month).
A correlated subquery must have the correlation in the subquery itself; any table names/aliases from the subquery (such as T2) are not visible in the outer query.
For ranking, you want to count earlier surveys, and you want to include the current survey so that the first one gets the rank number 1, so you need to use <= instead of >:
SELECT *,
(SELECT COUNT(*)
FROM tablerank AS T2
WHERE T2.Agent_ID = T1.Agent_ID
AND T2.call_date <= T1.call_date
) AS SurveyRank
FROM tablerank AS T1

Is there a way to partition a query that has a "group by" clause?

Say we I have a query that displays groups of population by country having the country as its first column, and total population of that country as its the second column.
To achieve this I have the following query:
select
i.country,
count(1) population
from
individual i
group by
i.country
Now I want to introduce two more columns to that query to display the population of males and females for each country.
What I want to achieve might look something similar to this:
select
i.country,
count(1) population total_population,
count(1) over (partition by 1 where i.gender='male') male_population,
count(1) over (partition by 1 where i.gender='female') female_population,
from
individual i
group by
i.country
The problem with this is that
"partition by clause" is not allowed in a "group by" query
"where clause" is not allowed in "partition by" clause
I hope you get the point. Please excuse my grammar and the way I titled this (couldn't know any better description).
You don't need analytic functions here:
select
i.country
,count(1) population
,count(case when gender = 'male' then 1 end) male
,count(case when gender = 'female' then 1 end) female
from
individual i
group by
i.country
;
see http://www.sqlfiddle.com/#!4/7dfa5/4

How to select data where sum is greater than x

I am very new to SQL and am using SQLite 3 to run basket analysis on sales data.
The relevant columns are the product ID, a unique transaction ID (which identifies the basket) and the product quantity. Where a customer has bought more than one product type, the unqiue transaction ID is repeated.
I am wanting to select only baskets where the customer has bought more than 1 item.
Is there any way on SQLite to select the unique transaction ID and the sum of the quantity, but only for unique transaction IDs where the quantity is more than one?
So far I have tried:
select uniqID, sum(qty) from salesdata where sum(qty) > 1 group by uniqID;
But SQLite gives me the error 'misuse of aggregate: sum()'
Sorry if this is a simple question but I am struggling to find any relevant information by googling!
Try
select uniqID, sum(qty) from salesdata group by uniqID having sum(qty) > 1
"where" cannot be used on aggregate functions - you can only use where on uniqId, in this case.
if you want to put any condition on the result you get with group by you must use having.
select uniqID, sum(qty) as sumqty from salesdata group by uniqID having sumqty > 1
you can put any of the condition with having normaly as in where.
having sumqty = 1 ,having sumqty < 1 ,having sumqty IN (1,2,3) etc..

Resources