Use Oracle Partition and Over By clause to retrieve section numbers - oracle11g

My Table comprises 4 Columns (Patient, Sample, Analysis and Component). I am trying to write a query that will look at the combination of Patient, Analysis and Component for each record and assign a "Section Number".
The numbering should re-start for every patient.
See expected output below. Patient 1010 has 3 samples but all have same Analysis-component. Hence they all have the same section (1).
Now, counting restarts for Patient 2020. This patient has 2 samples but both have a different Analysis-Component combination. Hence they are placed in separate sections 1 and 2.
Patient Sample Analysis Component Section Number
_______ ______ ________ _________ ______________
1010 720000140249 CALC Calcium 1
1010 720000140288 CALC Calcium 1
1010 720000140288 CALC Calcium 1
2020 720000190504 ALB Albumin 1
2020 720000160504 ALB Albumin Pct 2
3030 720000134568 CALC Calcium 1
3030 720000123404 ALB Albumin 2
3030 720000160765 ALB Albumin Pct 3
I have written the following query but all it does is groups samples with the same Component into one section. It does not consider the Patient or Analysis at all.
Your help is much appreciated (as always!)
select
x.patient, x.sample_number, x.analysis, x.component
a.myRowCount
from
X_PREV_PAT_RESULTS x inner join (
select distinct
x1.COMPONENT
, ROW_NUMBER() OVER (ORDER BY x1.COMPONENT) myRowCount
from X_PREV_PAT_RESULTS x1
group by x1.patient ) A on x.COMPONENT = A.COMPONENT
order by a.myRowCount, x.patient;

My guess is that you want
dense_rank() over (partition by patient
order by analysis desc, component) myRowCount
What happens with rows after a tie? If patient 1010 gets an ALB analysis? Would that have a MyRowCount of 2? Or 4? rank would return 4. dense_rank would return 2.
How are you determining the order of rows for a partiticular patient? It appears that you're going in reverse alphabetical order for analysis and then alphabetically for component but that seems like a pretty unusual ordering.

select x.patient, x.sample_number, x.analysis, x.component,
dense_rank() over(partition by x.patient order by x.analysis, x.component)
from X_PREV_PAT_RESULTS x
where exists (select 1 from X_PREV_PAT_RESULTS x1 where x1.COMPONENT = x.COMPONENT);

Related

divide counts in one column where a condition is met

I am trying to determine the on time delivery rate of orders:
The column of interest is on time delivery orders, which contains a field of 0 (not on time) or 1 ( on time). How can I calculate in sql the on time rate for each person? Basically count the number of 0 / over total count(0's & 1's) for each person? Same thing for on time ( count 1/total count (0's & 1's)?
Heres a data example:
Week Delivery on time Person
1 0 sARAH
1 0 sARAH
1 1 sARAH
2 1 vIC
2 0 Vic
You may aggregate by person, and then take the average of the on time statistic:
SELECT Person, AVG(1.0*DeliveryOnTime) AS OnTime,
AVG(1.0 - DeliveryOnTime) AS NotOnTime
FROM yourTable
GROUP BY Person;
Demo
The demo given is for SQL Server, and the above syntax might have to change slightly depending on your actual database, which you did not reveal to us.

Count in group_concat

I have this situation in Mysql table.
-----------------
code gr. state
-----------------
10 a available
10 a sold
10 b available
10 a available
10 a sold
10 a printed
10 b available
10 b sold
10 b available
------------------
I need to group these data for group getting something like
group a -> available(3), sold(2), printed(1)
group b -> available(2), sold(1), printed(0)
I tried combining group_concat() and count() but can't get the result I need.
My goal is to have 1 single row per group (group by is ok)
The states are always these 3 (available, sold, printed)
thx for help
SUM with IF could give you the right answear.
SELECT gr,
sum(if(state,'available',1,0)) available,
sum(if(state,'sold',1,0)) sold,
sum(if(state,'printed',1,0)) printed
FROM table
GROUP BY gr

Using "shift" function in R to subtract one row from another by group

I have a data.table that looks like this:
dt
id month balance
1: 1 4 100
2: 1 5 50
3: 2 4 200
4: 2 5 135
5: 3 4 100
6: 3 5 100
7: 4 5 300
"id" is the client's ID, "month" indicates what month it is, and "balance" indicates the account balance of a client. In a sense, this is longitudinal data where, say, element (2,3) indicates that Client #1 has an account balance of 50 at the end of month 5.
I want to generate a column that will give me the difference between a client's balance between month 5 and 4 to know the transactions carried out from one month to another.
This new variable should let me know that Client 1 drew 50, Client 2 drew 65 and Client 3 didn't do anything in aggregate terms between april and may. Client 4 is a new client that joined in may.
I thought of the following code:
dt$transactions <- dt$balance - shift(dt$balance, 1, "up")
However, it does not work properly because it's telling me that Client 4 made a 200 dollar deposit (but Client 4 is new!). Therefore, I want to be able to introduce the argument "by=id" to this somehow.
I know the solution lies in using the following notation:
dt[, transactions := balance - shift(balance, ??? ), by=id]
I just need to figure out how to make the aforementioned code work properly.
Thanks in advance.
Given that I only have two observations (at most), the following code gives me an elegant solution:
dt[, transaction := balance - first(balance), by = id]
This prevents any NAs from entering the variable transaction.
However, if I had more observations per id, I would do the following:
dt[,transaction := balance - shift(balance,1), by = id]
Big thanks to #Ryan and #Onyambu for helping.

Using name full name and maiden name strings (and birthdays) to match individuals across time

I've got a set of 20 or so consecutive individual-level cross-sectional data sets which I would like to link together.
Unfortunately, there's no time-stable ID number; there are, however, fields for first, last, and maiden names, as well as year of birth--this should allow for a pretty high (90-95%) match rate, I presume.
Ideally, I would create a time-independent ID for each unique individual.
I can do this for those whose marital status (maiden name) does not change pretty easily in R--stack the data sets to get a long panel, then do something to the effect of:
unique(dt,by=c("first_name","last_name","birth_year"))[,id:=.I]
(I'm of course using R data.table), then merging back to the full data.
However, I'm stuck on how to incorporate the maiden name to this procedure. Any suggestions?
Here's a preview of the data:
first_name last_name nee birth_year year
1: eileen aaldxxxx dxxxx 1977 2002
2: eileen aaldxxxx dxxxx 1977 2003
3: sarah aaxxxx gexxxx 1974 2003
4: kelly aaxxxx nxxxx 1951 2008
5: linda aarxxxx-gxxxx aarxxxx 1967 2008
---
72008: stacey zwirxxxx kruxxxx 1982 2010
72009: stacey zwirxxxx kruxxxx 1982 2011
72010: stacey zwirxxxx kruxxxx 1982 2012
72011: stacey zwirxxxx kruxxxx 1982 2013
72012: jill zydoxxxx gundexxxx 1978 2002
UPDATE:
I've done a lot of chipping and hammering at the problem; here's what I've got so far. I would appreciate any comments for possible improvements to the code so far.
I'm still completely missing something like 3-5% of matches due to inexact matches ("tonya" vs. "tanya", "jenifer" vs. "jennifer"); I haven't come up with a clean way of doing fuzzy matching on the stragglers, so there's room for better matching in that direction if anyone's got a straightforward way to implement that.
The basic approach is to build cumulatively--assign IDs in the first year, then look for matches in the second year; assign new IDs to the unmatched. Then for year 3, look back at the first 2 years, etc. As to how to match, the idea is to slowly expand the matching criteria--the idea being that the more robust the match, the lower the chances of mismatching accidentally (particularly worried about the John Smiths).
Without further ado, here's the main function for matching a pair of data sets:
get_id<-function(yr,key_from,key_to=key_from,
mdis,msch,mard,init,mexp,step){
#Want to exclude anyone who is matched
existing_ids<-full_data[.(yr),unique(na.omit(teacher_id))]
#Get the most recent prior observation of all
# unmatched teachers, excluding those teachers
# who cannot be uniquely identified by the
# current key setting
unmatched<-
full_data[.(1996:(yr-1))
][!teacher_id %in% existing_ids,
.SD[.N],by=teacher_id,
.SDcols=c(key_from,"teacher_id")
][,if (.N==1L) .SD,keyby=key_from
][,(flags):=list(mdis,msch,mard,init,mexp,step)]
#Merge, reset keys
setkey(setkeyv(
full_data,key_to)[year==yr&is.na(teacher_id),
(update_cols):=unmatched[.SD,update_cols,with=F]],
year)
full_data[.(yr),(update_cols):=lapply(.SD,function(x)na.omit(x)[1]),
by=id,.SDcols=update_cols]
}
Then I basically go through the 19 years yy in a for loop, running 12 progressively looser matches, e.g. step 3 is:
get_id(yy,c("first_name_clean","last_name_clean","birth_year"),
mdis=T,msch=T,mard=F,init=F,mexp=F,step=3L)
The final step is to assign new IDs:
current_max<-full_data[.(yy),max(teacher_id,na.rm=T)]
new_ids<-
setkey(full_data[year==yy&is.na(teacher_id),.(id=unique(id))
][,add_id:=.I+current_max],id)
setkey(setkey(full_data,id)[year==yy&is.na(teacher_id),
teacher_id:=new_ids[.SD,add_id]],year)

SQL Server - Group by, having and count in a mix

I have a database with a long list of records. Most of the columns have foreign keys to other tables.
Example:
ID SectorId BranchId
-- -------- --------
5 3 5
And then I will have a table with sectors, branches ect.
My issue:
I want to know how many records which has sector 1, 2, 3 ... n. So what I want is a group by Sector and then some count(*) which will tell me how many there is of each.
Expected output
So for instance, if I have 20 records the result might look like this:
SectorId Count
-------- -----
1 3
2 10
3 4
4 6
My attempts so far
I do not normally work a lot with databases and I have been trying to solve this for 1.5 hours. I have tried something like this:
SELECT COUNT(*)
FROM Records r
GROUP BY r.Sector
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
But... errors and problems all over!
I would really appreciate some help. I do know this is probably very simple.
Thanks!
The sequence of your query is not correct; it should be like this: -
SELECT COUNT(*)
FROM Records r
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY r.Sector
The output will be only counts i.e.
count
-----
3
10
4
6
If you want to fetch both sector and count then you need to modify the query a little
SELECT r.Sector, COUNT(*) as Count
FROM Records r
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY r.Sector
The output will be like this: -
Sector Count
------ -----
1 3
2 10
3 4
3 6
Your query was partially right,But it needs some modification.
If I write this way:-
SELECT r.SectorID,COUNT(*) AS count
FROM Records r
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY r.SectorID
Then output will be:-
SectorID Count
1 3
2 10
3 4
4 6

Resources