proc sql count by two groups or variables

proc sql count by two groups or variables - count

I have some data that looks like this:
ID flag
A 1
A 1
A 2
B 1
B 2
B 3
How do I use proc sql to count the Id by flag so that the outcome looks like this?:
ID flag count
A 1 2
A 2 1
B 1 1
B 2 1
B 3 1
The query I used does not seem to be correct since it returns the distinct types of flag, not how many of each type of flag there are.
proc sql;
select Id, flag, count(flag) as count from table
group by Id;

You need to include flag in your grouping.
proc sql;
select Id, flag, count(1) as count from table
group by Id, flag;
quit;

Related

Updating a column based on conditions

I have a table like:
ID ID2 Name
1 1
2 1
3 2
4 2
5 2
6 3
7 3
8 3
I want to Update the Name column with Values like Name1, name 2 and so on.
This will be based on condition if there are two similar values in ID2 column, for example, the first two rows, then the Name column to be updated with values Name1 and Name2 respectively.Next rows would be Name1, Name2, Name3 respectively and so on. Can someone help me with the logic for this?

ROW_NUMBER with a partition on the ID2 column, ordered by the ID column, should generate the sequences you want. Try this update query:
UPDATE yourTable t1
SET Name = (SELECT 'Name' || t.rn FROM
(
SELECT t2.ID, t2.ID2,
ROW_NUMBER() OVER (PARTITION BY t2.ID2 ORDER BY t2.ID) rn
FROM yourTable t2
WHERE t1.ID = t2.ID AND t1.ID2 = t2.ID2
) t)

SQLITE add a column counting values in other columns

I have a sql table containing some values as in the example.
id value
1 value1
2 value2
3 value3
4 value1
5 value2
6 value1
I need to add a column that have to increment a counter each time a "value1" is present.
id value session
1 value1 1
2 value2 1
3 value3 1
4 value1 2
5 value2 2
6 value1 3
Can you help me?
Thank you

You can use sql subquery with agregation
Your table
CREATE TABLE test (id integer PRIMARY KEY AUTOINCREMENT NOT NULL, value nvarchar (10));
Populate with data
INSERT INTO test(value) values ('value1'),('value2'),
('value3'),('value1'),('value2'),('value3'),
('value1'),('value1'),('value2');
Enjoy
SELECT a.id,a.value, (SELECT COUNT(b.id)
FROM test b
WHERE a.id>=b.id and b.value = 'value1') session
FROM test a
ORDER BY a.id;

SQLite UPDATE Statement

SQLite UPDATE Statement
I have two tables:
Table1
ID Num
1
2
3
4
5
Table2
ID
1
1
2
2
2
3
3
4
4
4
5
I need to UPDATE the Num field in Table1 with occurences of the ID field in Table2 i.e. based on previous:
Table1
ID Num
1 2
2 3
3 2
4 3
5 1
If i run this SQLite statement:
SELECT COUNT(t2.ID) FROM Table1 t1,Table2 t2 WHERE t1.ID=t2.ID GROUP BY t2.ID;
i have the correct table but when i try to UPDATE with that statement:
UPDATE Table1
SET Num=(SELECT COUNT(t2.ID) FROM Table1 t1,Table2 t2 WHERE t1.ID=t2.ID GROUP BY t2.ID);
i have nonsense output.Any ideas?

You must use a correlated subquery to correlate the value returned by the subquery with the current row in the outer query.
This means that you must not use Table1 again in the subquery, but instead refer to the outer table (with the actual name; the UPDATEd table does not support an alias):
UPDATE Table1
SET Num = (SELECT COUNT(t2.ID)
FROM Table2 t2
WHERE Table1.ID=t2.ID
GROUP BY t2.ID);

R sqldf find the second largest

I have a dataframe data like this
data
id time var1
1 a 3 0
2 a 2 2
3 a 1 3
4 b 3 2
5 b 4 6
I want to get the second largest time row of each id like this:
data2
id time var1
1 a 2 2
2 b 3 2
I try use sqldf
sqldf("select * from data order by time desc limit 2,1 group by id")
but I got an Error:
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: near "group": syntax error
I also try:
select max(time),* from data where time not in(select max(time) from data group by id) group by id
but I only got a result, I can't get the right answer.
Thanks !

Try taking the maximum among the rows with values less than the maximum for that id:
sqldf("select id, max(time) time, var1
from data a
where time < (select max(b.time)
from data b
where b.id = a.id)
group by id")

SELECTing "first" (as determined by ORDER BY) row FROM near-duplicate rows (as determined by GROUP BY, HAVING, COUNT) within SQLite

I have a problem which is a bit beyond me (I'm really awfully glad I'm a Beta) involving duplicates (so GROUP BY, HAVING, COUNT), compounded by keeping the solution within the standard functions that came with SQLite. I am using the sqlite3 module from Python.
Example table workers, Columns:
* ID: integer, auto-incrementing
* ColA: integer
* ColB: varchar(20)
* UserType: varchar(20)
* LoadMe: Boolean
(Yes, SQLite's datatypes are nominal)
My data table, Workers, at start looks like:
ID ColA ColB UserType LoadMe
1 1 a Alpha 0
2 1 b Beta 0
3 2 a Alpha 0
4 2 a Beta 0
5 2 b Delta 0
6 2 b Alpha 0
7 1 a Delta 0
8 1 b Epsilon 0
9 1 c Gamma 0
10 4 b Delta 0
11 5 a Alpha 0
12 5 a Beta 0
13 5 b Gamma 0
14 5 a Alpha 0
I would like to enable, for Loading onto trucks at a new factory, all workers who have unique combinations between ColA and ColB. For those duplicates (twins, triplets, etc., perhaps via Bokanovsky's Process) where unique combinations of ColA and ColB have more than one worker, I would like to select only one from each set of duplicates. To make the problem harder, I would like to additionally be able to make the selection one from each set of duplicates on the basis of UserType in some form of ORDER BY. I may wish to select the first "duplicate" with a UserType of "Alpha," to work on a frightfully clever problem, or ORDER BY UserType DESC, that I may issue an order for black tunics for the lowest of the workers.
You can see that IDs 9, 10, and 13 have unique combinations of ColA and ColB and are most easily identified. The 1-a, 1-b, 2-a, 2-b, and 5-a combinations, however, have duplicates within them.
My current process, as it stands so far:
0) Everyone comes with a unique ID number. This is done at birth.
1) SET all Workers to LoadMe = 1.
UPDATE Workers
SET LoadMe = 1
2) Find my duplicates based on their similarity in two columns (GROUP BY ColA, ColB):
SELECT Wk1.*
FROM Workers AS Wk1
INNER JOIN (
SELECT ColA, ColB
FROM Workers
GROUP BY ColA, ColB
HAVING COUNT(*) > 1
) AS Wk2
ON Wk1.ColA = Wk2.ColA
AND Wk1.ColB = Wk2.ColB
ORDER BY ColA, ColB
3) SET all of my duplicates to LoadMe = 0.
UPDATE Workers
SET LoadMe = 0
WHERE ID IN (
SELECT Wk1.ID
FROM Workers AS Wk1
INNER JOIN (
SELECT ColA, ColB
FROM Workers
GROUP BY ColA, ColB
HAVING COUNT(*) > 1
) AS Wk2
ON Wk1.ColA = Wk2.ColA
AND Wk1.ColB = Wk2.ColB
)
4) For each set of duplicates in my GROUP BY, ORDERed BY UserType, SELECT only one, the first in the list, to have LoadMe SET to 1.
This table would look like:
ID ColA ColB UserType LoadMe
1 1 a Alpha 1
2 1 b Beta 1
3 2 a Alpha 1
4 2 a Beta 0
5 2 b Delta 0
6 2 b Alpha 1
7 1 a Delta 0
8 1 b Epsilon 0
9 1 c Gamma 1
10 4 b Delta 1
11 5 a Alpha 1
12 5 a Beta 0
13 5 b Gamma 1
14 5 a Alpha 0
ORDERed BY ColA, ColB, UserType, then ID, and broken out by the GROUP BY columns, (and finally spaced for clarity) that same data might look like:
ID ColA ColB UserType LoadMe
1 1 a Alpha 1
7 1 a Delta 0
2 1 b Beta 1
8 1 b Epsilon 0
9 1 c Gamma 1
3 2 a Alpha 1
4 2 a Beta 0
6 2 b Alpha 1
5 2 b Delta 0
10 4 b Delta 1
11 5 a Alpha 1
14 5 a Alpha 0
12 5 a Beta 0
13 5 b Gamma 1
I am confounded on the last step and feel like an Epsilon-minus semi-moron. I had previously been pulling the duplicates out of the database into program space and working within Python, but this situation arises not infrequently and I would like to more permanently solve this.

I like to break a problem like this up a bit. The first step is to identify the unique ColA,ColB pairs:
SELECT ColA,ColB FROM Workers GROUP BY ColA,ColB
Now for each of these pairs you want to find the highest priority record. A join won't work because you'll end up with multiple records for each unique pair but a subquery will work:
SELECT ColA,ColB,
(SELECT id FROM Workers w1
WHERE w1.ColA=w2.ColA AND w1.ColB=w2.ColB
ORDER BY UserType LIMIT 1) AS id
FROM Workers w2 GROUP BY ColA,ColB;
You can change the ORDER BY clause in the subquery to control the priority. LIMIT 1 ensures that there is only one record for each subquery (otherwise sqlite will return the last record that matches the WHERE clause, although I'm not sure that that's guaranteed).
The result of this query is a list of records to be loaded with ColA, ColB, id. I would probably work directly from that and get rid of LoadMe but if you want to keep it you could do this:
BEGIN TRANSACTION;
UPDATE Workers SET LoadMe=0;
UPDATE Workers SET LoadMe=1
WHERE id IN (SELECT
(SELECT id FROM Workers w1
WHERE w1.ColA=w2.ColA AND w1.ColB=w2.ColB
ORDER BY UserType LIMIT 1) AS id
FROM Workers w2 GROUP BY ColA,ColB);
COMMIT;
That clears the LoadMe flag and then sets it to 1 for each of the records returned by our last query. The transaction guarantees that this all takes place or fails as one step and never leaves your LoadMe fields in an inconsistent state.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

proc sql count by two groups or variables - count

You need to include flag in your grouping. proc sql; select Id, flag, count(1) as count from table group by Id, flag; quit;

Related

Updating a column based on conditions

SQLITE add a column counting values in other columns

SQLite UPDATE Statement

R sqldf find the second largest

SELECTing "first" (as determined by ORDER BY) row FROM near-duplicate rows (as determined by GROUP BY, HAVING, COUNT) within SQLite

Categories

Resources