SQLite - performing calculation on preceding rows - sqlite

The problem is this. If I have a quantity in any location, I want to perform the calculation below to each member of that job_no.
The idea is that if there's a quantity in loc3, the same quantity was previously in loc1 and loc2.
So, how do I get 10 in loc1 and loc2 may be another way to put it..?
select s.job_no, s.part, s.location, s.qty,
coalesce(ptime.setup_time, '-') as setup_time,
coalesce(ptime.cycle_time, '-') as cycle_time,
ci.rate
from stock as s join part_timings as pt
on pt.part = s.part
join locations as l on s.location = l.location
left join part_timings as ptime on s.part = ptime.part
and ptime.location = s.location
join costs_internal as ci
group by s.part, s.location
order by s.part, l.stage
job_no | part | location | qty | setup_time | cycle_time | rate | total
123 p1 loc1 0 60 30 0.5 ?
123 p1 loc2 0 30 15 0.5 ?
123 p1 loc3 10 60 15 0.5 ?
123 p1 loc4 0 60 15 0.5 ?
123 p1 loc5 0 60 15 0.5 ?
123 p1 loc6 0 60 15 0.5 ?
123 p1 loc7 20 60 15 0.5 ?
calculation to get total:
coalesce(round((pt.cycle_time * s.qty * ci.rate) +
(pt.setup_time * ci.rate), 2), '-')
EDIT:
I've added loc4 to loc7.
loc3 would need to have the calculation applied to loc1 and loc2 (qty 10).
loc7 would need to have the calculation applied to all locations that are before it (qty 20).
Maybe I'm not explaining it perfectly, struggle to get my intentions across sometimes with SQL!

Using a simplified version of your data...
select * from stock;
job_no qty location
---------- ---------- ----------
123 0 loc1
123 0 loc2
123 10 loc3
123 0 loc4
456 0 loc1
456 20 loc2
You can use a sub-select to get the quantity for each job and join with it to get the stock for each job.
select stock.*, stocked.qty
from stock
join (select * from stock s where s.qty != 0) as stocked
on stock.job_no = stocked.job_no;
job_no qty location qty
---------- ---------- ---------- ----------
123 0 loc1 10
123 0 loc2 10
123 0 loc4 10
123 10 loc3 10
456 0 loc1 20
456 20 loc2 20
stocked has the row for each job which is currently stocked.
Note that unless you've made a restriction, there may be more than one stocked row for a job.
loc7 would need to have the calculation applied to all locations that are before it (qty 20).
With this data...
sqlite> select * from stock order by job_no, location;
job_no qty location
---------- ---------- ----------
123 0 loc1
123 0 loc2
123 10 loc3
123 0 loc4
123 0 loc5
123 0 loc6
123 20 loc7
456 0 loc1
456 20 loc2
To accomplish this, instead of joining on the subselect do it on a per column basis else we'll get multiple values stocked locations. (There's probably also a way to do it with a join)
In order to make sure we select only previous locations (or our own) it's necessary to check that stock.location <= stocked.location. In order to ensure we get the closest one, order them by location and select only the first one.
select stock.*, (
select stocked.qty
from stock stocked
where stock.job_no = stocked.job_no
and qty != 0
and stock.location <= stocked.location
order by stocked.location asc
limit 1
) as stocked_qty
from stock
order by job_no, location;
job_no qty location stocked_qty
---------- ---------- ---------- -----------
123 0 loc1 10
123 0 loc2 10
123 10 loc3 10
123 0 loc4 20
123 0 loc5 20
123 0 loc6 20
123 20 loc7 20
456 0 loc1 20
456 20 loc2 20
This may be inefficient as a column subselect. It's important that job_no, qty, and location are all indexed.

Related

Assigning unique ID to records based on certain deference between values in consecutive rows using loop in r

This is my df (data.frame)
Time <- c("16:04:56", "16:04:59", "16:05:02", "16:05:04", "16:05:11", "16:05:13", "16:07:59", "16:08:09", "16:09:03", "16:09:51", "16:11:10")
Distance <- c(45,38,156,157,37,159,79,79,78,160,78)
df <-as.data.frame(cbind(Time,Distance));dat
Time Distance
16:04:56 45
16:04:59 38
16:05:02 156
16:05:04 157
16:05:11 37
16:05:13 159
16:07:59 79
16:08:09 79
16:09:03 78
16:09:51 160
16:11:10 78
I need to assign an ID to each record based on two conditions:
If the absolute difference between two consecutive rows of the Time column is 1 minute and
If the difference between two consecutive rows of the Distance column is 10.
Only when both conditions are satisfied then should assign a new ID.
Results should be like this
Time Distance ID
16:04:56 45 1
16:04:59 38 1
16:05:02 156 1
16:05:04 157 1
16:05:11 37 1
16:05:13 159 1
16:07:59 79 2
16:08:09 79 2
16:09:03 78 2
16:09:51 160 2
16:11:10 78 3
Thanks to all who contribute any thoughts.
Change Time column to POSIXct format. Take difference between consecutive rows for Time and Distance column and increment the count using cumsum.
library(dplyr)
df %>%
mutate(Time1 = as.POSIXct(Time, format = '%T'),
ID = cumsum(
abs(difftime(Time1, lag(Time1, default = first(Time1)), units = 'mins')) > 1 &
abs(Distance - lag(Distance, default = first(Distance))) > 10) + 1) %>%
select(-Time1)
# Time Distance ID
#1 16:04:56 45 1
#2 16:04:59 38 1
#3 16:05:02 156 1
#4 16:05:04 157 1
#5 16:05:11 37 1
#6 16:05:13 159 1
#7 16:07:59 79 2
#8 16:08:09 79 2
#9 16:09:03 78 2
#10 16:09:51 160 2
#11 16:11:10 78 3
data
df <-data.frame(Time,Distance)

R group data into equal groups with a metric variable

I'm struggeling to get a good performing script for this problem: I have a table with a score, x, y. I want to sort the table by score and than build groups based on the x value. Each group should have an equal sum (not counts) of x. x is a metric number in the dataset and resembles the historic turnover of a customer.
score x y
0.436024136 3 435
0.282303336 46 56
0.532358015 24 34
0.644236597 0 2
0.99623626 0 4
0.557673456 56 46
0.08898779 0 7
0.702941303 453 2
0.415717835 23 1
0.017497461 234 3
0.426239166 23 59
0.638896238 234 86
0.629610596 26 68
0.073107526 0 35
0.85741877 0 977
0.468612039 0 324
0.740704267 23 56
0.720147257 0 68
0.965212467 23 0
a good way to do so is adding a group variable to the data.frame with cumsum! Now you can easily sum the groups with e. g. subset.
data.frame$group <-cumsum(as.numeric(data.frame$x)) %/% (ceiling(sum(data.frame$x) / 3)) + 1
remarks:
in big data.frames cumsum(as.numeric()) works reliably
%/% is a division where you get an integer back
the '+1' just let your groups start with 1 instead of 0
thank you #Ronak Shah!

Merge two data frames in R by the closest default options, not by excess

i have two dataframes:
df_bestquotes
df_transactions
df_transactions:
day time vol price buy ask bid
1 43688,08 100 195,8 1 195,8 195,74
1 56357,34 20 192,87 1 192,87 192,86
1 57576,14 14 192,48 -1 192,48 192,46
2 50468,29 3 193,83 1 193,86 193,77
2 56107,54 11 194,17 -1 194,2 194,16
7 42549,66 100 188,81 -1 188,85 188,78
7 42724,38 200 188,62 -1 188,66 188,61
7 48924,66 5 189,59 -1 189,62 189,59
8 48950,14 52 187,66 -1 187,7 187,66
9 36242,86 89 186,61 1 186,62 186,56
9 53910,46 1 189,81 -1 189,87 189,81
10 47041,94 15 187,87 -1 187,88 187,86
13 34380,73 87 187,29 -1 187,42 187,27
13 36037,18 100 188,94 1 188,95 188,94
14 46644,64 100 189,29 -1 189,34 189,29
14 57571,12 52 190,03 1 190,03 190
15 36418,71 45 192,07 1 192,07 192,04
15 37223,77 100 191,09 -1 191,07 191,06
17 37245,59 100 186,45 -1 186,47 186,45
23 34200,39 50 189,29 -1 189,29 189,27
24 40294,73 60 193,52 -1 193,54 193,5
29 52813,68 5 202,99 -1 203,01 202,99
29 55279,13 93 203,97 -1 203,98 203,9
30 51356,91 68 204,41 -1 204,45 204,4
30 53530,24 40 204,14 -1 204,18 204,14
df_bestquotes:
day time best_ask best_bid
1 51384,613 31,78 31,75
1 56593,74 31,6 31,55
3 40568,217 31,36 31,32
7 39169,237 31,34 31,28
8 44715,713 31,2 31,17
8 53730,707 31,24 31,19
8 55851,75 31,17 31,14
10 49376,267 31,06 30,99
16 48610,483 30,75 30,66
16 57360,917 30,66 30,64
17 53130,717 30,39 30,32
20 46353,133 30,72 30,63
23 46429,67 29,7 29,64
24 37627,727 29,81 29,63
24 46354,647 29,92 29,77
24 53863,69 30,04 29,93
24 53889,923 30,03 29,95
24 59047,223 29,99 29,2
28 39086,407 30,87 30,83
28 41828,703 30,87 30,8
28 50489,367 30,99 30,87
29 54264,467 30,97 30,85
30 34365,95 31,21 30,99
30 39844,357 31,06 31
30 57550,523 31,18 31,15
For each record of the df_transactions, from the day and time, I need to find the best_ask and the best_bid that was just before that moment, and incorporate this information to df_transactions.
df_joined: df_transactions + df_bestquotes
day time vol price buy ask bid best_ask best_bid
1 43688,08 100 195,8 1 195,8 195,74
1 56357,34 20 192,87 1 192,87 192,86
1 57576,14 14 192,48 -1 192,48 192,46
2 50468,29 3 193,83 1 193,86 193,77
2 56107,54 11 194,17 -1 194,2 194,16
7 42549,66 100 188,81 -1 188,85 188,78
7 42724,38 200 188,62 -1 188,66 188,61
7 48924,66 5 189,59 -1 189,62 189,59
8 48950,14 52 187,66 -1 187,7 187,66
9 36242,86 89 186,61 1 186,62 186,56
9 53910,46 1 189,81 -1 189,87 189,81
10 47041,94 15 187,87 -1 187,88 187,86
13 34380,73 87 187,29 -1 187,42 187,27
13 36037,18 100 188,94 1 188,95 188,94
14 46644,64 100 189,29 -1 189,34 189,29
14 57571,12 52 190,03 1 190,03 190
15 36418,71 45 192,07 1 192,07 192,04
15 37223,77 100 191,09 -1 191,07 191,06
17 37245,59 100 186,45 -1 186,47 186,45
23 34200,39 50 189,29 -1 189,29 189,27
24 40294,73 60 193,52 -1 193,54 193,5
29 52813,68 5 202,99 -1 203,01 202,99
29 55279,13 93 203,97 -1 203,98 203,9
30 51356,91 68 204,41 -1 204,45 204,4
30 53530,24 40 204,14 -1 204,18 204,14
I have tried with the next code, but it doesn't work:
library(data.table)
df_joined = df_bestquotes[df_transactions, on="time", roll = "nearest"]
Here are the real files with a lot more records, the ones I put before are an example of only 25 records.
df_transactions_original
df_bestquotes_original
And my code in R:
matching.R
Any suggestions on how to get it? Thanks a lot, guys.
The attempt you made uses data.table but you don't refer to data.table. Have you done library(data.table) before ?
I think it should rather be :
df_joined = df_bestquotes[df_transactions, on=.(day, time), roll = TRUE]
But I cannot test without the objects. Does it work ? roll="nearest" doesn't give you the previous best quotes but the nearest.
EDIT : Thanks for the objects, I checked, that works for me :
library(data.table)
dfb <- fread("df_bestquotes.csv", dec=",")
dft <- fread("df_transactions.csv", dec = ",")
dfb[, c("day2", "time2") := .(day,time)] # duplicated to keep track of the best quotes days
joinedDf <- dfb [dft, on=.(day, time), roll = +Inf]
It puts NA when there is no best quotes for the day. If you want to roll across days, I suggest you create a unique measure of time. I don't know exactly what time is. Considering the units of time is seconds :
dfb[, uniqueTime := day + time/(60*60*24)]
dft[, uniqueTime := day + time/(60*60*24)]
joinedDf <- dfb [dft, on=.(uniqueTime), roll = +Inf]
This works even if time is not seconds, only the ranking is important in this case.
Good morning #samuelallain, yes, I have used library(data.table) before.
I've edited it in the main commentary.
I have tried its solution and RStudio returns the following error:
library(data.table)
df_joined = df_bestquotes[df_transactions, on=.("day", "time"), roll = TRUE]
Error in [.data.frame(df_bestquotes, df_transactions, on = .(day, time), :
unused arguments (on = .("day", "time"), roll = TRUE)
Thank you.

Subsetting data using grep

I have a dataframe named Schedule which has data for multiple airlines. I run the table function just to get the breakdown of records by airlines. Here is the answer
table(Schedule$airline)
AA AS B6 BA DL F9 FI LH NK QR UA WN
757 4 14 2 65 24 2 2 18 2 36 60
Now I am subsetting this data using grep to get a data frame which has gates in a particular terminal of interest, which in this case is terminal F
Gated_Schedule <- Schedule[grep("F", Schedule$gate), ]
when I run table to get a breakdown here,
table(Gated_Schedule$airline)
AA AS B6 BA DL F9 FI LH NK QR UA WN
362 0 0 0 0 0 0 0 0 0 0 0
Ideally the output should only have been AA like:
AA
362
How do I get rid of this discrepancy?

sqlite selection help needed

I have the following bill table
building name amount payments receiptno
1234 name a 123 0 0
1234 name a 12 10 39
1234 name a 125 125 40
1235 name a 133 10 41
1235 name b 125 125 50
1234 name c 100 90 0
I want to select rows that amount minus payments is greater than zero and display the max value of receiptno
so I want to select only the following from building 1234
name a 39
name c 0
How can I do this?
Translating your description into SQL results in this:
SELECT building,
name,
MAX(receiptno)
FROM BillTable
WHERE amount - payments > 0
GROUP BY building,
name

Resources