How to Transform Crosstab Query - ms-access-2010

Using cross tab query in MS Access SQL, how to transform table like this
IND LBL TICKETS
1 AN 101
2 AN 1
1 QZ 102
1 AP 100
2 AP 2
To this:
LBL IND1 IND2
AN 101 1
QZ 102 0
AP 100 2
Any ideas?

This should get you most of the way there -
TRANSFORM FIRST(TICKETS) SELECT LBL FROM MyTable GROUP BY LBL PIVOT IND
Change MyTable to the name of your table.

Related

SQL select column and count from the next column

My table structure for referrals is , the field ref is unique:
ID pid ref ref_by
1 1 k NAN
2 2 l k
3 3 m k
4 4 n l
And the user table is:
id name
1 john
2 Bob
3 Tim
4 Rob
I need to get the id,pid, ref and count of ref in next column .Based on the number of referrals they each will be assigned some points that is a constant 100, the result should look like this .
pid name ref number_of_referals points_earned
1 john k 2 200
2 Bob l 1 100
3 Tim m 0 0
4 Rob n 0 0
You need 2 joins:
the 1st from users table to referrals and the 2nd to a query that groups and counts the referrals:
select
r.id, u.name, r.ref,
case when c.counter is null then 0 else c.counter end number_of_referals,
case when c.counter is null then 0 else c.counter end * 100 points_earned
from users u inner join referrals r
on r.pid = u.id
left join (select ref_by, count(*) counter from referrals group by ref_by) c
on c.ref_by = r.ref
order by r.id
See the demo

Retrieve all rows with same minimum value for a column with sqldf

I have to retrieve IDs for employees who have completed the minimum number of jobs. There are multiple employees who have completed 1 job. My current sqldf query retrieves only 1 row of data, while there are multiple employee IDs who have completed just 1 job. Why does it stop at the first minimum value? And how do I fetch all rows with the minimum value in a column? Here is a data sample:
ID TaskCOunt
1 74
2 53
3 10
4 5
5 1
6 1
7 1
The code I have used:
sqldf("select id, min(taskcount) as Jobscompleted
from (select id,count(id) as taskcount
from MyData
where id is not null
group by id order by id)")
Output is
ID leastcount
5 1
While what I want is all the rows with minimum jobs completed.
ID Jobscompleted
5 1
6 1
7 1
min(...) always returns one row in SQL as do all SQL aggregate functions. Try this instead:
sqldf("select ID, TaskCount TasksCompleted from MyData
where TaskCount = (select min(TaskCount) from MyData)")
giving:
ID TasksCompleted
1 5 1
2 6 1
3 7 1
Note: The input in reproducible form is:
Lines <- "
ID TaskCount
1 74
2 53
3 10
4 5
5 1
6 1
7 1"
MyData <- read.table(text = Lines, header = TRUE)
As an alternative to sqldf, you could use data.table:
library(data.table)
dt <- data.table(ID=1:7, TaskCount=c(74, 53, 10, 5, 1, 1, 1))
dt[TaskCount==min(TaskCount)]
## ID TaskCount
## 1: 5 1
## 2: 6 1
## 3: 7 1

Replace value in data frame with value from other data frame based on set of conditions

In df1 I need to replace values for msec with corresponding values in df2.
df1 <- data.frame(ID=c('rs', 'rs', 'rs', 'tr','tr','tr'), cond=c(1,1,2,1,1,2),
block=c(2,2,4,2,2,4), correct=c(1,0,1,1,1,0), msec=c(456,678,756,654,625,645))
df2 <- data.frame(ID=c('rs', 'rs', 'tr','tr'), cond=c(1,2,1,2),
block=c(2,4,2,4), mean=c(545,664,703,765))
In df1, if correct==0, then reference df2 with the matching values of ID, cond, and block. Replace the value for msec in df1 with the corresponding value for mean in df2.
For example, the second row in df1 has correct==0. So, in df2 find the corresponding row where ID=='rs', cond==1, block==2 and use the value for mean (mean=545) to replace the value for msec (msec=678). Note that in df1 combinations of ID, block, and cond can repeat, but each combination occurs only once in df2.
Using the data.table package:
# load the 'data.table' package
library(data.table)
# convert the data.frame's to data.table's
setDT(df1)
setDT(df2)
# update df1 by reference with a join with df2
df1[df2[, correct := 0], on = .(ID, cond, block, correct), msec := i.mean]
which gives:
> df1
ID cond block correct msec
1: rs 1 2 1 456
2: rs 1 2 0 545
3: rs 2 4 1 756
4: tr 1 2 1 654
5: tr 1 2 1 625
6: tr 2 4 0 765
Note: The above code will update df1 instead of creating a new dataframe, which is more memory-efficient.
One option would be to use base R with an interaction() and a match(). How about:
df1[which(df1$correct==0),"msec"] <- df2[match(interaction(df1[which(df1$correct==0),c("ID","cond","block")]),
interaction(df2[,c("ID","cond", "block")])),
"mean"]
df1
# ID cond block correct msec
#1 rs 1 2 1 456
#2 rs 1 2 0 545
#3 rs 2 4 1 756
#4 tr 1 2 1 654
#5 tr 1 2 1 625
#6 tr 2 4 0 765
We overwrite the correct == 0 columns with their matched rows in df2$mean
Edit: Another option would be a sql merge this could look like:
library(sqldf)
merged <- sqldf('SELECT l.ID, l.cond, l.block, l.correct,
case when l.correct == 0 then r.mean else l.msec end as msec
FROM df1 as l
LEFT JOIN df2 as r
ON l.ID = r.ID AND l.cond = r.cond AND l.block = r.block')
merged
ID cond block correct msec
1 rs 1 2 1 456
2 rs 1 2 0 545
3 rs 2 4 1 756
4 tr 1 2 1 654
5 tr 1 2 1 625
6 tr 2 4 0 765
With dplyr. This solution left_join all columns and mutate when correct is 0.
library(dplyr)
left_join(df1,df2)%>%
mutate(msec=ifelse(correct==0,mean,msec))%>%
select(-mean)
ID cond block correct msec
1 rs 1 2 1 456
2 rs 1 2 0 545
3 rs 2 4 1 756
4 tr 1 2 1 654
5 tr 1 2 1 625
6 tr 2 4 0 765

Subsetting rows in R

I have a huge data set in the following format:
ID Interaction Interaction_number
1 abc 1
1 xyz 2
1 pqr 3
1 ced 0
2 ab 0
2 efg 1
3 asdf 2
3 fgh 3
3 abc 0
4 sql 1
4 ghj 2
5 poi 2
6 pqr 1
Now I want to extract all the ID data where there is interaction_number as 0. for eg:
ID Interaction Interaction_number
1 abc 1
1 xyz 2
1 pqr 3
1 ced 0
2 ab 0
2 efg 1
3 asdf 2
3 fgh 3
3 abc 0
Its a huge dataset. I need to extract it using R.
I tried using the sqldf function.
x<-sqldf("select * from data where data$ID in (select data$ID from data where data$Interaction_number ==0)")
But the function didnt work. I was looking to add a flagging column ( 1 for all IDs where there is interaction_number 0) and then subset those rows. But I cant figure out exactly how to do.
Can we create the data frame of the ID's and then using that data frame, we can use subset to get all the rows?
Please help.
Thank You
I suggest using data.table package. Then you could obtain your result. Say your data is in data.frame df. Then
library(data.table)
dt <- data.table(df, key = 'ID')
tmp <- dt[, list(condition = any(Interaction_number == 0)), by = ID]
res <- dt[tmp[condition == TRUE, list(ID)]]
Use this
sqldf("SELECT * FROM data WHERE ID IN (SELECT ID FROM data WHERE Interaction_number=0)")
You do not need the double equal in your test, and do not use data$ID and such to refer to the data columns in the SQL expression (you can use data.ID but it is unnecessary to use the dataframe name in this case).
It may be helpful to read up on SQL before using this function much. Keep in mind that what it will do is turn all your referenced dataframes into tables using the same name as the dataframe, and all of the columns into fields using the same name as the columns. Thus in this case, we are querying a table named data with fields named ID, Interaction, and Interaction_number.
We can do this with dplyr. Group the 'data' by 'ID', and filter if there is any 0 values in the 'Interaction_number'.
library(dplyr)
df1 %>%
group_by(ID) %>%
filter(any(!Interaction_number))
# ID Interaction Interaction_number
# (int) (chr) (int)
#1 1 abc 1
#2 1 xyz 2
#3 1 pqr 3
#4 1 ced 0
#5 2 ab 0
#6 2 efg 1
#7 3 asdf 2
#8 3 fgh 3
#9 3 abc 0
Or using ave from base R
df1[with(df1, ave(!Interaction_number, ID, FUN=any)),]
Or this can be done without any group by
df1[df1$ID %in%subset(df1, !Interaction_number)$ID,]

SELECTing "first" (as determined by ORDER BY) row FROM near-duplicate rows (as determined by GROUP BY, HAVING, COUNT) within SQLite

I have a problem which is a bit beyond me (I'm really awfully glad I'm a Beta) involving duplicates (so GROUP BY, HAVING, COUNT), compounded by keeping the solution within the standard functions that came with SQLite. I am using the sqlite3 module from Python.
Example table workers, Columns:
* ID: integer, auto-incrementing
* ColA: integer
* ColB: varchar(20)
* UserType: varchar(20)
* LoadMe: Boolean
(Yes, SQLite's datatypes are nominal)
My data table, Workers, at start looks like:
ID ColA ColB UserType LoadMe
1 1 a Alpha 0
2 1 b Beta 0
3 2 a Alpha 0
4 2 a Beta 0
5 2 b Delta 0
6 2 b Alpha 0
7 1 a Delta 0
8 1 b Epsilon 0
9 1 c Gamma 0
10 4 b Delta 0
11 5 a Alpha 0
12 5 a Beta 0
13 5 b Gamma 0
14 5 a Alpha 0
I would like to enable, for Loading onto trucks at a new factory, all workers who have unique combinations between ColA and ColB. For those duplicates (twins, triplets, etc., perhaps via Bokanovsky's Process) where unique combinations of ColA and ColB have more than one worker, I would like to select only one from each set of duplicates. To make the problem harder, I would like to additionally be able to make the selection one from each set of duplicates on the basis of UserType in some form of ORDER BY. I may wish to select the first "duplicate" with a UserType of "Alpha," to work on a frightfully clever problem, or ORDER BY UserType DESC, that I may issue an order for black tunics for the lowest of the workers.
You can see that IDs 9, 10, and 13 have unique combinations of ColA and ColB and are most easily identified. The 1-a, 1-b, 2-a, 2-b, and 5-a combinations, however, have duplicates within them.
My current process, as it stands so far:
0) Everyone comes with a unique ID number. This is done at birth.
1) SET all Workers to LoadMe = 1.
UPDATE Workers
SET LoadMe = 1
2) Find my duplicates based on their similarity in two columns (GROUP BY ColA, ColB):
SELECT Wk1.*
FROM Workers AS Wk1
INNER JOIN (
SELECT ColA, ColB
FROM Workers
GROUP BY ColA, ColB
HAVING COUNT(*) > 1
) AS Wk2
ON Wk1.ColA = Wk2.ColA
AND Wk1.ColB = Wk2.ColB
ORDER BY ColA, ColB
3) SET all of my duplicates to LoadMe = 0.
UPDATE Workers
SET LoadMe = 0
WHERE ID IN (
SELECT Wk1.ID
FROM Workers AS Wk1
INNER JOIN (
SELECT ColA, ColB
FROM Workers
GROUP BY ColA, ColB
HAVING COUNT(*) > 1
) AS Wk2
ON Wk1.ColA = Wk2.ColA
AND Wk1.ColB = Wk2.ColB
)
4) For each set of duplicates in my GROUP BY, ORDERed BY UserType, SELECT only one, the first in the list, to have LoadMe SET to 1.
This table would look like:
ID ColA ColB UserType LoadMe
1 1 a Alpha 1
2 1 b Beta 1
3 2 a Alpha 1
4 2 a Beta 0
5 2 b Delta 0
6 2 b Alpha 1
7 1 a Delta 0
8 1 b Epsilon 0
9 1 c Gamma 1
10 4 b Delta 1
11 5 a Alpha 1
12 5 a Beta 0
13 5 b Gamma 1
14 5 a Alpha 0
ORDERed BY ColA, ColB, UserType, then ID, and broken out by the GROUP BY columns, (and finally spaced for clarity) that same data might look like:
ID ColA ColB UserType LoadMe
1 1 a Alpha 1
7 1 a Delta 0
2 1 b Beta 1
8 1 b Epsilon 0
9 1 c Gamma 1
3 2 a Alpha 1
4 2 a Beta 0
6 2 b Alpha 1
5 2 b Delta 0
10 4 b Delta 1
11 5 a Alpha 1
14 5 a Alpha 0
12 5 a Beta 0
13 5 b Gamma 1
I am confounded on the last step and feel like an Epsilon-minus semi-moron. I had previously been pulling the duplicates out of the database into program space and working within Python, but this situation arises not infrequently and I would like to more permanently solve this.
I like to break a problem like this up a bit. The first step is to identify the unique ColA,ColB pairs:
SELECT ColA,ColB FROM Workers GROUP BY ColA,ColB
Now for each of these pairs you want to find the highest priority record. A join won't work because you'll end up with multiple records for each unique pair but a subquery will work:
SELECT ColA,ColB,
(SELECT id FROM Workers w1
WHERE w1.ColA=w2.ColA AND w1.ColB=w2.ColB
ORDER BY UserType LIMIT 1) AS id
FROM Workers w2 GROUP BY ColA,ColB;
You can change the ORDER BY clause in the subquery to control the priority. LIMIT 1 ensures that there is only one record for each subquery (otherwise sqlite will return the last record that matches the WHERE clause, although I'm not sure that that's guaranteed).
The result of this query is a list of records to be loaded with ColA, ColB, id. I would probably work directly from that and get rid of LoadMe but if you want to keep it you could do this:
BEGIN TRANSACTION;
UPDATE Workers SET LoadMe=0;
UPDATE Workers SET LoadMe=1
WHERE id IN (SELECT
(SELECT id FROM Workers w1
WHERE w1.ColA=w2.ColA AND w1.ColB=w2.ColB
ORDER BY UserType LIMIT 1) AS id
FROM Workers w2 GROUP BY ColA,ColB);
COMMIT;
That clears the LoadMe flag and then sets it to 1 for each of the records returned by our last query. The transaction guarantees that this all takes place or fails as one step and never leaves your LoadMe fields in an inconsistent state.

Resources