Count in group_concat - count

I have this situation in Mysql table.
-----------------
code gr. state
-----------------
10 a available
10 a sold
10 b available
10 a available
10 a sold
10 a printed
10 b available
10 b sold
10 b available
------------------
I need to group these data for group getting something like
group a -> available(3), sold(2), printed(1)
group b -> available(2), sold(1), printed(0)
I tried combining group_concat() and count() but can't get the result I need.
My goal is to have 1 single row per group (group by is ok)
The states are always these 3 (available, sold, printed)
thx for help

SUM with IF could give you the right answear.
SELECT gr,
sum(if(state,'available',1,0)) available,
sum(if(state,'sold',1,0)) sold,
sum(if(state,'printed',1,0)) printed
FROM table
GROUP BY gr

Related

Impala Using CASE to determine if an ID from one Tableis in another Table

Background:
Hey everyone! I'm hoping you can help me with something that I've been trying to figure out. I have a dataset/table called customer_universe that shows all of our in scope customers. Every row/cust_id in that table is unique.
Let's say this table has 60,000 total rows. Every cust_id entry in this table is unique so total rows = unique row count.
There is also a dataset that I created (customer_sport_product_purch) that lists out all of customers (from the customer_universe table) and any of the 3 in-scope sports products they purchased along with a purchase date. This tables only contains customers who have purchased one of the three sport products but since there are three sport products and a customer may have purchased multiple, cust_id field does not contain only unique customers.
Let's say this table has 46,000 total rows but only 25,000 unique customer.
Goal Query Output:
I need to write a query that lists out every customer in the customer_universe table and one more column with a binary (1/0) value that will indicate if they have purchased a sport product or not.
So this query output should have a total of 60000 records and only two columns.
Environment and Attempted Solutions Details
I'm currently building these queries using Impala in Hue. I'm trying to use a case statement to get me my desired result but I'm getting the error message provided below.
Customer_universe Table:
Cust_ID
Customer_Since
1
02-20-2019
2
01-13-2020
3
06-17-2012
4
06-19-2021
5
06-06-2017
Customer_sport_product_purch Table:
Cust ID
Product
Purch_Dt
1
Basketball
01-01-2022
1
BoxGlove
02-01-2020
5
BoxGlove
12-15-2019
Desired Query Output:
Cust_ID
Sport_Purch
1
1
2
0
3
0
4
0
5
1
Queries I've attempted and the Error Messages I've Received:
Query 1:
SELECT a.cust_id,
case when (a.cust_id in (select distinct b.cust_id from DB.customer_sport_purch b)
then 1 else 0 end as Sport_Purch
FROM DB.customer_universe
GROUP BY cust_id;
Error Message 1:
Error while compiling statement: FAILED: SemanticException [Error 10249]: line 2:72 Unsupported SubQuery Expression 'cust_id': Currently SubQuery expressions are only allowed as Where Clause predicates
Query 2:
SELET a.cust_id,
case when (a.cust_id in sportPurch) then 1 else 0 end as Sport_Purch
FROM DB.customer_universe a,
(select distinct cust_id from DB.customer_sport_purch) sportPurch
GROUP BY a.cust_id;
Error Message 2:
Error while compiling statement: FAILED: ParseException line 2:36 cannot recognize input near 'sportPurch' ')' 'then' in expression specification
Other Considerations:
I cannot bring bring the customer_sport_table.cust_id values into a text file and have the query read from file since those values will change frequently and need to be able to just re-execute queries.
Thanks in advance!

How to extract all the values on the basis of a match of a columns in two dataframes using R?

I have a dataframe like this say n
id subject
-------------
1 discount less
2 product good
3 product good
4 wonderful service
5 discount less
and another dataframe say p like this
Subject Rate
----------------
product good 20
wonderful service 30
discount less 10
i want the output as :
id subject rate
--------------------
1,5 discount less
2,3 product good
4 wonderful service
if I match like p$id <- n$id[match(p$subject,n$subject)] then only the first element matched will be shown...but i want all the ids....
Can anyone guide me on this
how about something like this:
n$subject<-as.character(n$subject)
id=sapply(unique(n$subject),function(x) paste(as.character(n[n$subject==x,]$id), collapse=", "))
subject=unique(n$subject)
df1=data.frame(id=id,subject=subject)
df2=merge(df1,p,by="subject")
df2=df2[c("id", "subject", "Rate")]

sorting data to provide top entries

I have a table like below. Each row has store id, discount % for one of their coupons. Each store could have multiple coupons but (store+discount %) is a primary key. I would like to find out top 10 coupons (by decreasing order of discount %) but would like to get only 2 coupon from the same store. What is the most efficient way to do this? My logic involves sorting data multiple times. Is there a better and more efficient way? I would like to do this in R.
Sample data:
df <- data.frame(Store=c("Lowes","Lowes","Lowes","Lowes","HD","HD","HD","ACE",
"ACE","Misc","Misc","Other","Other","Last","Last","Last"),
`discount_%`=c("60%","50%","40%","30%","60%","50%","40%","30%",
"20%","50%","30%","20%","10%","10%","5%","3%"),
check.names = FALSE)
my solution is ignore the store and sort the table by discount then
create a ID. ID would represent coupons in descending order
Then by Store and discount create ID2 which would have rankings of
coupons by store.
then filter all rows where ID2>2
then sort table by ID
take top 10 rows
Try this:
df$`discount_%` <- as.numeric(gsub("%","",df$`discount_%`))
require(data.table)
setDT(df)[order(-`discount_%`),.SD[1:2],by=Store][order(-`discount_%`)[1:10],]
Output:
Store discount_%
1: Lowes 60
2: HD 60
3: Lowes 50
4: HD 50
5: Misc 50
6: Misc 30
7: ACE 30
8: ACE 20
9: Other 20
10: Other 10
Data is easier to work with in R without special characters, but if you need to add the percent sign back, try something like this:
paste0(df$`discount_%`,"%")

sqllite select rows until a total amount is met in a column

I've seen the similar problem with mysql, but I barely could find any solution for the problem with sqllite.
My sample table,
-----------------------------
ID | Product Name | Price
-----------------------------
1 A 2
2 B 2
3 C 1
4 D 3
5 E 2
Here I need to get the rows until the total for the price column is equal or smaller than 5 in ascending order.
You could do a Running total using the Product ID and ORDER BY Product ID like the one below:
SELECT p1.ID, p1.ProductName, p1.Price,
(SELECT SUM(p2.Price) FROM Products p2 WHERE p1.ID >= p2.ID ORDER BY p2.ID ) as RunningTotal
FROM Products p1
WHERE RunningTotal <= 5
ORDER BY p1.ID
See Fiddle Demo
Or using the Price and ORDER BY Price like one below:
SELECT p1.ID, p1.ProductName, p1.Price,
(SELECT SUM(p2.Price) FROM Products p2 WHERE p1.Price >= p2.Price ORDER BY Price )
as RunningTotal
FROM Products p1
WHERE RunningTotal <= 5
ORDER BY p1.Price;
See 2nd Fiddle Demo
It's probably best to do it in code as SQLite does not support an easy way to do cumulative sums as far as I know. You can create an index on the Price column.
Then running a query like
SELECT * FROM <table> ORDER BY Price
Note that this will not eagerly fetch all rows from the database, but just provide you with the cursor. Keep fetching the next row from the cursor until you reach the desired sum.

SQL Server - Group by, having and count in a mix

I have a database with a long list of records. Most of the columns have foreign keys to other tables.
Example:
ID SectorId BranchId
-- -------- --------
5 3 5
And then I will have a table with sectors, branches ect.
My issue:
I want to know how many records which has sector 1, 2, 3 ... n. So what I want is a group by Sector and then some count(*) which will tell me how many there is of each.
Expected output
So for instance, if I have 20 records the result might look like this:
SectorId Count
-------- -----
1 3
2 10
3 4
4 6
My attempts so far
I do not normally work a lot with databases and I have been trying to solve this for 1.5 hours. I have tried something like this:
SELECT COUNT(*)
FROM Records r
GROUP BY r.Sector
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
But... errors and problems all over!
I would really appreciate some help. I do know this is probably very simple.
Thanks!
The sequence of your query is not correct; it should be like this: -
SELECT COUNT(*)
FROM Records r
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY r.Sector
The output will be only counts i.e.
count
-----
3
10
4
6
If you want to fetch both sector and count then you need to modify the query a little
SELECT r.Sector, COUNT(*) as Count
FROM Records r
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY r.Sector
The output will be like this: -
Sector Count
------ -----
1 3
2 10
3 4
3 6
Your query was partially right,But it needs some modification.
If I write this way:-
SELECT r.SectorID,COUNT(*) AS count
FROM Records r
WHERE r.Date BETWEEN '2011-01-01' AND '2011-12-31'
GROUP BY r.SectorID
Then output will be:-
SectorID Count
1 3
2 10
3 4
4 6

Resources