I have a scenario where i have to correct the history data. The current data is like below:
Status_cd event_id phase_cd start_dt end_dt
110 23456 30 1/1/2017 ?
110 23456 31 1/2/2017 ?
Status_cd event_id phase_cd start_dt end_dt
110 23456 30 1/1/2017 ?
111 23456 30 1/2/2017 ?
The major columns are status_cd and phase_cd. So, if any one of them change the history should be handled with the start dt of the next record as the end date of the previous record.
Here both the records are open which is not correct.
Please suggest on how to handle both the scenarios.
Thanks.
How are your history rows ordered in the table? In other words, how do you decide which history rows to compare to see if a value was changed? And how do you uniquely identify a history row entry?
If you order your history rows by start_dt, for example, you can compare the previous and current row values using window functions, like Rob suggested:
UPDATE MyHistoryTable
FROM (
-- Get source history rows that need to be updated
SELECT
history_row_id, -- Change this field to match your table
MAX(status_cd) OVER(ORDER BY start_dt ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS status_cd_next, -- Get "status_cd" value for "next" history row
MAX(phase_cd) OVER(ORDER BY start_dt ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS phase_cd_next,
MAX(start_dt) OVER(ORDER BY start_dt ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS start_dt_next
FROM MyHistoryTable
WHERE status_cd <> status_cd_next -- Check "status_cd" values are different
OR phase_cd <> phase_cd_next -- Check "phase_cd" values are different
) src
SET MyHistoryTable.end_dt = src.start_dt_next -- Update "end_dt" value of current history row to be "start_dt" value of next history row
WHERE MyHistoryTable.history_row_id = src.history_row_id -- Match source rows to target rows
This assumes you have a column to uniquely identify each history row, called "history_row_id". Give it a try and let me know.
I don't have a TD system to test on, so you may need to futz with the table aliases too. You'll also probably need to handle the edge cases (i.e. first/last rows in the table).
Related
Background:
Hey everyone! I'm hoping you can help me with something that I've been trying to figure out. I have a dataset/table called customer_universe that shows all of our in scope customers. Every row/cust_id in that table is unique.
Let's say this table has 60,000 total rows. Every cust_id entry in this table is unique so total rows = unique row count.
There is also a dataset that I created (customer_sport_product_purch) that lists out all of customers (from the customer_universe table) and any of the 3 in-scope sports products they purchased along with a purchase date. This tables only contains customers who have purchased one of the three sport products but since there are three sport products and a customer may have purchased multiple, cust_id field does not contain only unique customers.
Let's say this table has 46,000 total rows but only 25,000 unique customer.
Goal Query Output:
I need to write a query that lists out every customer in the customer_universe table and one more column with a binary (1/0) value that will indicate if they have purchased a sport product or not.
So this query output should have a total of 60000 records and only two columns.
Environment and Attempted Solutions Details
I'm currently building these queries using Impala in Hue. I'm trying to use a case statement to get me my desired result but I'm getting the error message provided below.
Customer_universe Table:
Cust_ID
Customer_Since
1
02-20-2019
2
01-13-2020
3
06-17-2012
4
06-19-2021
5
06-06-2017
Customer_sport_product_purch Table:
Cust ID
Product
Purch_Dt
1
Basketball
01-01-2022
1
BoxGlove
02-01-2020
5
BoxGlove
12-15-2019
Desired Query Output:
Cust_ID
Sport_Purch
1
1
2
0
3
0
4
0
5
1
Queries I've attempted and the Error Messages I've Received:
Query 1:
SELECT a.cust_id,
case when (a.cust_id in (select distinct b.cust_id from DB.customer_sport_purch b)
then 1 else 0 end as Sport_Purch
FROM DB.customer_universe
GROUP BY cust_id;
Error Message 1:
Error while compiling statement: FAILED: SemanticException [Error 10249]: line 2:72 Unsupported SubQuery Expression 'cust_id': Currently SubQuery expressions are only allowed as Where Clause predicates
Query 2:
SELET a.cust_id,
case when (a.cust_id in sportPurch) then 1 else 0 end as Sport_Purch
FROM DB.customer_universe a,
(select distinct cust_id from DB.customer_sport_purch) sportPurch
GROUP BY a.cust_id;
Error Message 2:
Error while compiling statement: FAILED: ParseException line 2:36 cannot recognize input near 'sportPurch' ')' 'then' in expression specification
Other Considerations:
I cannot bring bring the customer_sport_table.cust_id values into a text file and have the query read from file since those values will change frequently and need to be able to just re-execute queries.
Thanks in advance!
Imagine I have two tables:
Table A
Names
Sales
Department
Dave
5
Shoes
mike
6
Apparel
Dan
7
Front End
Table B
Names
SALES
Department
Dave
5
Shoes
mike
12
Apparel
Dan
7
Front End
Gregg
23
Shoes
Kim
15
Front End
I want to create a query that joins the tables by names and separates sum of sales by table. I additionally want to filter my query to remove string matches or partial matches in this case by certain names.
What I want is the following result
Table C:
A Sales Sum
B Sales Sum
18
24
I know I can do this with a query like the following:
SELECT SUM(A.sales) AS 'A Sales Sum', SUM(B.sales) AS 'B sales Sum' FROM A
JOIN B
ON B.names = A.Names
WHERE Names NOT LIKE '%Gregg%' OR NOT LIKE '%Kim%'
The problem with this is the WHERE clause doesn't seem to apply, or applies to the wrong table. Since the Names column doesn't exactly match between the two, what I think is happening is when they are joined 'ON B.names = A.Names', the extras from B are being excluded? When I flip things around though I get the same result, which is no filter being applied. The wrong result I am getting is the following:
Table D:
A Sales Sum
B Sales Sum
18
62
Clearly I have a syntax issue here since I'm pretty new to SQL. What am I missing? Thanks!
You don't need a join or a union of the tables and you shouldn't do it.
Aggregate in each table separately and return the results with 2 subqueries:
SELECT
(SELECT SUM(Sales) FROM A WHERE Names NOT LIKE '%Gregg%' AND Names NOT LIKE '%Kim%') ASalesSum,
(SELECT SUM(Sales) FROM B WHERE Names NOT LIKE '%Gregg%' AND Names NOT LIKE '%Kim%') BSalesSum
I think you want a union approach here:
SELECT
SUM(CASE WHEN src = 'A' THEN sales ELSE 0 END) AS "A Sales Sum",
SUM(CASE WHEN src = 'B' THEN sales ELSE 0 END) AS "B Sales Sum"
FROM
(
SELECT sales, 'A' AS src FROM A WHERE Names NOT IN ('Gregg', 'Kim')
UNION ALL
SELECT sales, 'B' FROM B WHERE Names NOT IN ('Gregg', 'Kim')
) t;
Here is a demo showing that the above query is working.
So im wondering if its possible for SQLite to understand number ranges.
I want to be able to have a range such as "25-30" and lookup "27" to see if it falls within that range.
The issue is that the range will contain some text beforehand such as "Alice 25-30"
An example of what Id be looking to achieve can be seen in Table3 of this link:
https://dbfiddle.uk/?rdbms=sqlite_3.27&fiddle=483f62c5fbf13998659cd5f7ebbb3ce9
More than happy for solutions that can break the string at the first number, but still keep the number so
Alice | 25-30
Not
Alice | 5-30 (ive seen this suggested before :D)
To actually create Table 3 ill be using either INNER or LEFT OUTER JOIN not just Re-creating the table but was speedier to do this
Thanks in advance.
You can do it with a join of the 2 tables like this:
INSERT INTO Table3 (`ID`, `Age`,'Age Range')
SELECT t1.ID, t1.Age, t2.`Age Range`
FROM Table1 t1 INNER JOIN Table2 t2
ON t1.Age + 0 BETWEEN `Age Range` + 0
AND SUBSTR(`Age Range`, INSTR(`Age Range`, '-') + 1) + 0
SQLite performs implicit conversions of strings to numbers when they are used in expressions with numeric operations like +0, so what the query does is to compare Age to the 1st and the 2nd part of Age Range numerically.
Note that + 0 would not be needed in ON t1.Age + 0 BETWEEN if you had defined Age as REAL which makes more sense.
Change the INNER join to LEFT join if you want the row from Table1 inserted to Table3 even if there is no matching Age Range.
See the demo.
Results:
ID
Age
Age Range
1
30
25-30
2
40.5
31-45
I have 2 tables fees and students. i want to update one field of fees with 3 WHERE conditions, i.e, 2 conditions in table 'fees' and 1 condition in table 'students'.
I tried many queries like
UPDATE fees, students SET fees.dues= 300 WHERE fees.month= November
AND fees.session= 2017-18 AND students.class= Nursery
It gives me error like java.sql.SQLException: near",": syntax error
I am using sqlite as database. Please suggest me a query or let me correct this query.
Thanks
You cannot join tables in a UPDATE command in SQLite. Therefore, use a sub-query in the where condition
UPDATE fees
SET dues = 300
WHERE
month = November AND
session = 2017-18 AND
student_id IN (SELECT id FROM students WHERE class=Nursery)
Also, I am not sure about the types of your columns. String literals must be enclosed in single quotes ('). The expression 2017-18 would yield the number 2017 minus 18 = 1999. Should it be a string literal as well?
UPDATE fees
SET dues = 300
WHERE
month = 'November' AND
session = '2017-18' AND
student_id IN (SELECT id FROM students WHERE class='Nursery')
I've seen the similar problem with mysql, but I barely could find any solution for the problem with sqllite.
My sample table,
-----------------------------
ID | Product Name | Price
-----------------------------
1 A 2
2 B 2
3 C 1
4 D 3
5 E 2
Here I need to get the rows until the total for the price column is equal or smaller than 5 in ascending order.
You could do a Running total using the Product ID and ORDER BY Product ID like the one below:
SELECT p1.ID, p1.ProductName, p1.Price,
(SELECT SUM(p2.Price) FROM Products p2 WHERE p1.ID >= p2.ID ORDER BY p2.ID ) as RunningTotal
FROM Products p1
WHERE RunningTotal <= 5
ORDER BY p1.ID
See Fiddle Demo
Or using the Price and ORDER BY Price like one below:
SELECT p1.ID, p1.ProductName, p1.Price,
(SELECT SUM(p2.Price) FROM Products p2 WHERE p1.Price >= p2.Price ORDER BY Price )
as RunningTotal
FROM Products p1
WHERE RunningTotal <= 5
ORDER BY p1.Price;
See 2nd Fiddle Demo
It's probably best to do it in code as SQLite does not support an easy way to do cumulative sums as far as I know. You can create an index on the Price column.
Then running a query like
SELECT * FROM <table> ORDER BY Price
Note that this will not eagerly fetch all rows from the database, but just provide you with the cursor. Keep fetching the next row from the cursor until you reach the desired sum.