I have a table with three columns: Id, Value, User. For each user, I want a maximum of 50 values in the table. When a user inserts the value with Id = 50, then before inserting the 51th value, I want to delete the first one (the one with Id=1) and substitute it with the new one, that will have Id=1. This operation must be specific for each user. I use an example to be more clear:
Id
Value
User
50
124
User1
49
67
User2
50
89
User3
Suppose User1 wants to add a new value, the result must be this one:
Id
Value
User
49
67
User2
50
89
User3
1
101
User1
Of course, Id, Value and User can't be primary keys. So, first of all: is a primary key needed? If not, the problem is easy to solve. Otherwise, I think I will have to add a new column, let's call it PK, that will be primaty key and auto generated. From what I read online, if I delete a row, the auto generated key will not restart automatically, so the situation will be more or less like this one:
PK
Id
Value
User
1
50
124
User1
2
49
67
User2
3
50
89
User3
and after the update, will be:
PK
Id
Value
User
2
49
67
User2
3
50
89
User3
4
1
101
User1
I would like that, when the value 101 is inserted, PK is set to 1, and if User2 adds a new vallue, it is assigned PK=4 (as 2 and 3 are already used)
PK
Id
Value
User
2
49
67
User2
3
50
89
User3
1
1
101
User1
4
50
33
User2
Is it possible?
I ask it because a new value is inserted more or less every 5 seconds by each user, and the number of users is very high so I am worried that auto generated PK could reach a limit very quickly, if it is not "restarted"
Related
I have a sqlite3 table containing students' marks for an assingment. Below is a sample data of the table
Id
Name
Marks
1
Mark
87
2
John
50
3
Adam
65
4
Cindy
68
5
Ruth
87
I would like to create a new column 'Rank', giving the students a rank according to marks scored.
These are 2 main criterias to follow:
If both students have the same marks, their rank would be the same
The total rank number would be the same as the total number of students. For example if there are two student with Rank 1, the next student below them would be Rank 3.
Below is a sample output of what i need
Id
Name
Marks
Rank
1
Mark
87
1
2
John
50
5
3
Adam
65
4
4
Cindy
68
3
5
Ruth
87
1
This is the code that i have at the moment
import sqlite3
conn = sqlite3.connect('students.sqlite')
cur = conn.cursor()
cur.execute('ALTER TABLE student_marks ADD Rank INTEGER')
conn.commit()
If you are using a recent version of SQLite, then you should probably avoid the update and just use the RANK() analytic function:
SELECT Id, Name, Marks, RANK() OVER (ORDER BY Marks DESC, Id) "Rank"
FROM student_marks
ORDER BY Id;
I have df that represents users browsing behavior over time. Therefore the df contains a unique UserId and each row has a timestamp and represents a visit to a certain website. Each website has a unique website Id and a unique website category, say c("electronics", "clothes",....).
Now I want to count per row how many unique websites per category the user has visited up to that row (including that row). I call this variable "breadth" since it represents how broad a user is browsing through the internet.
So far I only manage to produce dumb code that creates the total number of unique websites visited per category by filtering on each category and then take the length of the unique vector by the user and then do a left join.
Therefore I do lose information about the development over time.
Thanks so much in advance!
total_breadth <- df %>% filter(category=="electronics") %>%
group_by(user_id) %>%
mutate(breadth=length(unique(website_id)))
#Structure of the df I want to achieve:
user_id time website_id category breadth
1 1 70 "electronics" 1
1 2 93 "clothing" 1
1 3 34 "electronics" 2
1 4 93 "clothing" 1
1 5 26 "electronics" 3
1 6 70 "electronics" 3
#Structure of the df I produce:
user_id time website_id category breadth
1 1 70 "electronics" 3
1 2 93 "clothing" 1
1 3 34 "electronics" 3
1 4 93 "clothing" 1
1 5 26 "electronics" 3
1 6 70 "electronics" 3
This seems to be a case of a split, apply and combine.
Create a binary matrix of 1s and 0s whose dimensions are:
No. of Rows = No. of rows in the original data
No of Columns = No. of unique website categories
Each Row represents the timestamp and each column represents the respective website category. So a cell will be equal to 1 if and only if the user has visited the website for that website category on the respective timestamp else it will be 0.
Take the cumulative sum for individual columns of this matrix and then create a final column where it takes the value only for the visited website category on the respective timestamp.
Though it doesn't seem to be an elegant solution, hope this should solve your problem temporarily.
I have a table that has sequence numbers. Its a very big table 16 million rows give or take. The table has a key and it has events that happen to that key. Every time the key changes the seq_nums restarts in theory.
In the original table I had there was a timestamp associated with each event. In order to get the duration of the event i created a lag column and subtracted the lag column from the time stamp of the current event giving us the duration. This duration is called time_in_minutes in the table below
The new table has a number of properties
Each key in this case is a car wash with each event being assigned a category so on line 3 the car was submitted to a drying procedure for 45 mins
The second line which contains 23 mins, isn't actually 23 mins for the wash, it took the machine 23 minutes to power up
In ID number 144 the record for the powering up of the machine is missing. This seems to be prevalent in the data set
key Event time in mins seq_num
1 Start 0 1
1 Wash 23 2
1 Dry 45 3
1 Wash 56 4
1 Wash 78 5
1 Boil 20 6
1 ShutDown 11 7
2 Start 0 1
2 Wash 11 2
2 Dry 12 3
-------------------------------------------
144 Wash 0 1
144 Wash 11 2
144 Dry 12 3
I would like to move the time_in_mins to the seq_num 1 if is an Event of type Start in the previous record. So when we aggregate this later the minutes will be properly assigned to starting up
I could try and update the table by creating a new column again with another lag for time_in_mins this time but this seems to be quite expensive
Does anyone know of a clever way of doing this?
Edit 14/10/2016
The final output for the customer is like below albeit slightly out of order
key event total minutes
1 Start 23
1 Boil 20
1 Dry 45
1 Wash 134
1 ShutDown 11
2 Start 11
2 Dry 12
2 Wash 0
Thanks for your help
This will switch 1st and 2nd value based on your description, resulting in a single STAT-step in Explain:
SELECT key, seq_num, event,
CASE
WHEN seq_num = 1
AND Event = 'Start'
THEN Min(CASE WHEN seq_num = 2 THEN time_in_mins ELSE 0 END)
Over (PARTITION BY key)
WHEN seq_num = 2
AND Min(CASE WHEN seq_num = 1 THEN Event END)
Over (PARTITION BY key) = 'Start' THEN 0
ELSE time_in_mins
END AS new_time_in_mins
FROM tab
Now you can do the sum.
But it might be possible to include the logic in your previous step when you create the Voltile Table, can you add this Select, too?
I have table that lists items like below. It basically has Operation Numbers (OP_NO) that tell where a product is at in the process. These OP Numbers can be either Released or Completed. They follow a process as in 10 must happen before 20, 20 must happen before 30 etc. However users do not update all steps in reality so we end up with some items out of order complete while the earlier steps are not as show below (OP30 is completed but OP 10 and 20 are not).
I basically want to produce a listing of each ORDER_ID showing the furthest point of completion for each ORDER_ID. I figured I could do this by querying for STATUS = 'Completed' and Sorting by OP_NO Desc. However I can't figure out how to produce only 1 result for each ORDER_ID. For example in ORDER_ID 345 Steps 10 and 20 are completed. I would only want to return that STEP 20 is where it is currently at. I was figuring I could do this with 'WHERE ROWNUM <= 1' but haven't had much luck. Could any experts weigh in?
Thanks!
ORDER_ID | ORDER_SEC | ORDER_RELEASE | OP_NO | STATUS | Description
123 2 3 10 Released Op10
123 2 3 20 Released Op20
123 2 3 30 Completed Op30
123 2 3 40 Released Op40
345 1 8 10 Completed Op10
345 1 8 20 Completed Op20
345 1 8 30 Released Op30
345 1 8 40 Released Op40
If I understand correctly what you want the below should do what you need. Just replace test table with your table name.
select *
from test_table tst
where status = 'Completed'
and op_no = (select max(op_no)
from test_table tst1
where tst.order_id = tst1.order_id
and status = 'Completed');
Given your sample data this produced the below results.
Order_Id Order_Sec Order_Release op_no Status Description
123 2 3 30 Completed Op30
345 1 8 20 Completed Op20
Cheers
Shaun Peterson
I have two tables
tmp_CID_EIDs:
EID
====
1
2
3
5
EID_PID:
EID PID
==========
1 99
2 99
3 88
5 99
12 55
18 66
I use the following query to get a list of all positions where EID matches in both tables:
SELECT EID,
PID
FROM EID_PID
WHERE EID IN temp_CID_EIDs
-->
EID PID
=========
1 99
2 99
3 88
5 99
But my final goal is to get the number of unique PIDs from this query.
--> 99
88
How can I do that? Thanks..
SELECT DISTINCT PID FROM EID_PID WHERE EID IN tmp_CID_EIDs;