SQLite: OR and | - sqlite

The following query works in SQL Server but not in SQLite 3.8.7 and I would like to know why.
Table
l | r
0 | 10
0 | 2
8 | 10
Query
SELECT * FROM Segments AS s1
LEFT JOIN Segments AS s2
ON ((s2.l <= s1.l AND s2.r > s1.r)
OR (s2.l < s1.l AND s2.r >= s1.r));
Expected output
s1.l | s1.r | s2.l | s2.r
0 | 10 | null | null
0 | 2 | 0 | 10
8 | 10 | 0 | 10
However I got
s1.l | s1.r | s2.l | s2.r
0 | 10 | 0 | 2
0 | 2 | 0 | 10
8 | 10 | 0 | 10
And when I switched the expression order i.e
((s2.l < s1.l AND s2.r >= s1.r) (s2.l <= s1.l AND s2.r > s1.r))
I got
s1.l | s1.r | s2.l | s2.r
0 | 10 | 8 | 10
0 | 2 | 0 | 10
8 | 10 | 0 | 10
This was solved by using | instead of OR, but I am wondering why OR did not work?
Heres the example on SQLFiddle
http://sqlfiddle.com/#!7/15859/22/1
Thanks

This is a bug that was fixed in SQLite 3.8.7.2.

Related

Subtract column values using coalesce

I want to subtract values in the "place" column for each record returned in a "race", "bib", "split" group by so that a "diff" column appears like so.
Desired Output:
race | bib | split | place | diff
----------------------------------
10 | 514 | 1 | 5 | 0
10 | 514 | 2 | 3 | 2
10 | 514 | 3 | 2 | 1
10 | 17 | 1 | 8 | 0
10 | 17 | 2 | 12 | -4
10 | 17 | 3 | 15 | -3
I'm new to using the coalesce statement and the closest I have come to the desired output is the following
select a.race,a.bib,a.split, a.place,
coalesce(a.place -
(select b.place from ranking b where b.split < a.split), a.place) as diff
from ranking a
group by race,bib, split
which produces:
race | bib | split | place | diff
----------------------------------
10 | 514 | 1 | 5 | 5
10 | 514 | 2 | 3 | 2
10 | 514 | 3 | 2 | 1
10 | 17 | 1 | 8 | 8
10 | 17 | 2 | 12 | 11
10 | 17 | 3 | 15 | 14
Thanks for looking!
To compute the difference, you have to look up the value in the row that has the same race and bib values, and the next-smaller split value:
SELECT race, bib, split, place,
coalesce((SELECT r2.place
FROM ranking AS r2
WHERE r2.race = ranking.race
AND r2.bib = ranling.bib
AND r2.split < ranking.split
ORDER BY r2.split DESC
LIMIT 1
) - place,
0) AS diff
FROM ranking;

R data.table apply increasing values to specific row indices

My data is like this:
Time | State | Event
01 | 0 |
02 | 0 |
03 | 0 |
04 | 2 | A_start
05 | 2 |
06 | 2 |
07 | 2 |
08 | 2 |
09 | 1 | A_end
10 | 1 |
11 | 1 |
12 | 1 |
13 | 1 |
14 | 2 | B_start
15 | 2 |
16 | 2 |
17 | 2 |
18 | 2 |
19 | 0 | B_end
20 | 0 |
21 | 0 |
22 | 0 |
23 | 0 |
24 | 2 | A_start
25 | 2 |
26 | 2 |
27 | 2 |
28 | 2 |
29 | 2 |
30 | 2 |
31 | 1 | A_end
32 | 1 |
33 | 1 |
34 | 1 |
35 | 1 |
36 | 1 |
37 | 2 | B_start
38 | 2 |
39 | 2 |
40 | 2 |
The cycle can repeat with any number of 0s, 1s and 2s in between. Sometimes, 0s, 1s or 2s can be missing entirely. I want to get the difference in the Time column between every A_start and the A_end immediately after it. Similarly, I want the difference in Time between every B_start and the B_end that immediately follows.
For this, I thought it would help if I made a "group" for each cycle, as follows:
Time | State | Event | Group
01 | 0 | |
02 | 0 | |
03 | 0 | |
04 | 2 | A_start | 1
05 | 2 | |
06 | 2 | |
07 | 2 | |
08 | 2 | |
09 | 1 | A_end | 1
10 | 1 | |
11 | 1 | |
12 | 1 | |
13 | 1 | |
14 | 2 | B_start | 1
15 | 2 | |
16 | 2 | |
17 | 2 | |
18 | 2 | |
19 | 0 | B_end | 1
20 | 0 | |
21 | 0 | |
22 | 0 | |
23 | 0 | |
24 | 2 | A_start | 2
25 | 2 | |
26 | 2 | |
27 | 2 | |
28 | 2 | |
29 | 2 | |
30 | 2 | |
31 | 1 | A_end | 2
32 | 1 | |
33 | 1 | |
34 | 1 | |
35 | 1 | |
36 | 1 | |
37 | 2 | B_start | 2
38 | 2 | |
39 | 2 | |
40 | 2 | |
However, because there are sometimes missing values in the State column, this isn't working out too well.
The correct cycle sequence is 0 -> 2 -> 1 -> 2 -> 0. Sometimes, a cycle may miss a 2 and be like this: 0 -> 1 -> 2 -> 0. Various combinations of the cycle 0 -> 2 -> 1 -> 2 -> 0 are possible (44 in total). How should I go about this?
Here is a base solution:
#identify the times where there is a change in the State
timeWithChanges <- which(abs(diff(dat$State)) > 0) + 1
#pivot those times into a m * 2 matrix
startEnd <- matrix(dat$Time[timeWithChanges], ncol=2, byrow=TRUE)
#calculate the time difference and label them as A, B
data.frame(AB=rep(c("A", "B"), nrow(startEnd)/2),
TimeDiff=startEnd[,2] - startEnd[,1])
Please let me know if this works generally enough for you.
data:
dat <- read.table(text="Time | State
01 | 0
02 | 0
03 | 0
04 | 2
05 | 2
06 | 2
07 | 2
08 | 2
09 | 1
10 | 1
11 | 1
12 | 1
13 | 1
14 | 2
15 | 2
16 | 2
17 | 2
18 | 2
19 | 0
20 | 0
21 | 0
22 | 0
23 | 0
24 | 2
25 | 2
26 | 2
27 | 2
28 | 2
29 | 2
30 | 2
31 | 1
32 | 1
33 | 1
34 | 1
35 | 1
36 | 1
37 | 2
38 | 2
39 | 2
40 | 2
41 | 0", sep="|", header=TRUE)

update with query of multiple fields from various tables

I have the following tables:
book_tbl:
book_instance_id | book_type_id | library_instance_id | location_id | book_index
1 | 70000 | 2 | 0 | 1
2 | 70000 | 2 | 0 | 2
3 | 70000 | 2 | 0 | 3
4 | 70000 | 3 | 0 | 1
5 | 70000 | 3 | 0 | 2
6 | 70000 | 3 | 0 | 3
7 | 70000 | 4 | 1 | 1
8 | 70000 | 4 | 1 | 2
9 | 70000 | 4 | 1 | 3
and library_tbl:
library_instance_id | library_type_id | location_id
2 | 1000 | 0
3 | 1001 | 0
4 | 1000 | 1
I would like to update the field book_type_id in book_tbl only for the first element (index) in library_type_id 1000
To retrieve this information I used sqlite query:
SELECT * FROM ( ( SELECT *
FROM library_tbl
WHERE library_type_id=1000 ) t1
join book_tbl t2 on t1.location_id=t2.location_id
AND t1.library_instance_id=t2.library_instance_id
AND book_index=1 )
How could I use the query above with UPDATE query to update rows 1 and 7:
UPDATE book_tbl SET book_type_id=15000 WHERE ????
Use EXISTS with a correlated subquery to check whether the corresponding library row exists:
UPDATE book_tbl
SET book_type_id = 15000
WHERE EXISTS (SELECT 1
FROM library_tbl
WHERE library_type_id = 1000
AND location_id = book_tbl.location_id
AND library_instance_id = book_tbl.library_instance_id)
AND book_index = 1;

SQLite query select best option depending on a max value

I have a probably pretty hard question/situation:
I have a database to divide several tasks to some workers.
In the next example I have two tasks (Task 1 and Task 2) and 4 Employee's(1, 2, 3 and 4)
The maximum employee's that works on 1 task is three. Therefore I have 3 columns to get all possible options (in this example, not every option is shown!). The last column is a value which indicate how good the option is (the higher the number, the better).
The goal is to get the most optimal situation which means:
Every employee have to do one task (and cannot do 2 tasks)
The sum of the values is the highest possible value
+------------+------------+------------+------+--------+
| Employee_1 | Employee_2 | Employee_3 | Task | Value |
+------------+------------+------------+------+--------+
| 1 | | | 1 | 5.0 |
| 2 | | | 1 | -2.5 |
| 3 | | | 1 | 1.0 |
| 4 | | | 1 | 0.5 |
| 1 | 2 | | 1 | 0.5 |
| 1 | 4 | | 1 | 5,0 |
| 1 | 2 | 3 | 1 | 0.33 |
| 2 | 3 | | 1 | -4.5 |
| 2 | 3 | 4 | 1 | -6.5 |
| 3 | 4 | | 1 | 3.0 |
| 1 | | | 2 | 1.0 |
| 2 | | | 2 | 2.0 |
| 3 | | | 2 | -5.0 |
| 4 | | | 2 | 3.0 |
| 1 | 2 | | 2 | -2.0 |
| 1 | 2 | 3 | 2 | -3.5 |
| 2 | 3 | | 2 | 5.0 |
| 2 | 3 | 4 | 2 | 0.5 |
| 3 | 4 | | 2 | 2.0 |
+------------+------------+------------+------+--------+
As you can see: sometimes it is better for the productivity:
Employee 1 gets a value of 5 on task 1
Employee 4 gets a value of 0.5 on task 1
Employee 1 and 3 gets a value of 5,0 on task 1
In this situation it is better that Employee 1 and 3 works separate and the query should give both lines:
+------------+-------------+------------+-------+---------+
| Employee_1 | Employee_2 | Employee_3 | Task | Value |
+------------+-------------+------------+-------+---------+
| 1 | | | 1 | 5.0 |
| 4 | | | 1 | 0.5 |
+------------+-------------+------------+-------+---------+
The real solution for this example should be:
+------------+-------------+------------+-------+---------+
| Employee_1 | Employee_2 | Employee_3 | Task | Value |
+------------+-------------+------------+-------+---------+
| 1 | | | 1 | 5.0 |
| 2 | 3 | | 2 | 5.0 |
| 4 | | | 2 | 3.0 |
+------------+-------------+------------+-------+---------+
Since employee 1 has a very high value on its own on task 1
Employee 3 is really bad on his own, but together with employee 2 they do great on task 2
Employee 4 is the only one who is left en this employee is pretty good at task 2.
The problem is to write the query to get this result

3-way tabulation in R

I have a dataset that looks like
| ID | Category | Failure |
|----+----------+---------|
| 1 | a | 0 |
| 1 | b | 0 |
| 1 | b | 0 |
| 1 | a | 0 |
| 1 | c | 0 |
| 1 | d | 0 |
| 1 | c | 0 |
| 1 | failure | 1 |
| 2 | c | 0 |
| 2 | d | 0 |
| 2 | d | 0 |
| 2 | b | 0 |
This is data where each ID potentially ends in a failure event, through an intermediate sequence of events {a, b, c, d}. I want to be able to count the number of IDs for which each of those intermediate events occur by failure event.
So, I would like a table of the form
| | a | b | c | d |
|------------+---+---+---+---|
| Failure | 4 | 5 | 6 | 2 |
| No failure | 9 | 8 | 6 | 9 |
where, for example, the number 4 indicates that in 4 of the IDs where a occurred ended in failure.
How would I go about doing this in R?
You can use table for example:
dat <- data.frame(categ=sample(letters[1:4],20,rep=T),
failure=sample(c(0,1),20,rep=T))
res <- table(dat$failure,dat$categ)
rownames(res) <- c('Failure','No failure')
res
a b c d
Failure 3 2 2 1
No failure 1 2 4 5
you can plot it using barplot:
barplot(res)
EDIT to get this by ID, you can use by for example:
dat <- data.frame(ID=c(rep(1,9),rep(2,11)),categ=sample(letters[1:4],20,rep=T),
failure=sample(c(0,1),20,rep=T))
by(dat,dat$ID,function(x)table(x$failure,x$categ))
dat$ID: 1
a b c d
0 1 2 1 3
1 1 1 0 0
---------------------------------------------------------------------------------------
dat$ID: 2
a b c d
0 1 2 3 0
1 1 3 1 0
EDIT using tapply
Another way to get this is using tapply
with(dat,tapply(categ,list(failure,categ,ID),length))

Resources