how to optimize away common subexpressions? - sqlite

select x+y as z,
case
when "x"+"y" < 0 then "Less Than Zero"
when "x"+"y" > 0 then "Non Zero"
else "Zero"
end
from sometable;
Returns expected result, but the addition is done with each row of data multiple times.
I am trying to optimize the query as follows but not working..
select x+y as z,
case
when "z" < 0 then "Less Than Zero"
when "z" > 0 then "Non Zero"
else "Zero"
end
from sometable;
Always returns "Less Than Zero".
What am I doing wrong on this query? How can I avoid adding A and B multiple times while the query is being executed?

Column aliases in the SELECT clause are not available in other expressions in the same SELECT clause. (What should happen with SELECT x AS y, y AS x ...?)
You can make such an alias available by moving it into a subquery:
SELECT z,
CASE WHEN z < 0 THEN 'Less Than Zero'
WHEN z > 0 THEN 'Non Zero'
ELSE 'Zero'
END
FROM (SELECT x + y AS z
FROM sometable);
However, this only saves typing; it does not actually optimize away the duplicate computation:
sqlite> explain select z, z from (select x+y as z from sometable);
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 11 0 00 Start at 11
1 OpenRead 1 2 0 2 00 root=2 iDb=0; sometable
2 Rewind 1 9 0 00
3 Column 1 0 3 00 r[3]=sometable.x
4 Column 1 1 4 00 r[4]=sometable.y
5 Add 4 3 1 00 r[1]=r[4]+r[3]
6 Add 4 3 2 00 r[2]=r[4]+r[3]
7 ResultRow 1 2 0 00 output=r[1..2]
8 Next 1 3 0 01
9 Close 1 0 0 00
10 Halt 0 0 0 00
11 Transaction 0 0 1 0 01 usesStmtJournal=0
12 TableLock 0 2 0 sometable 00 iDb=0 root=2 write=0
13 Goto 0 1 0 00

Related

Hmmm Assembly Fibonacci Sequence

I need to write and print out the Fibonacci sequence up to a given integer (can choose yourself)
I have to do this in Hmmm... Assembly
It gets stuck in infinite recursion, but I have no idea why
00 read r4 # User input
01 setn r4 -1 # adds -1 to r4
02 setn r1 1 # r1 == 1
03 setn r2 0 # r2 == 0
04 setn r5 1 # used as the first number of the fibonacci sequence
05 write r5 # 1
06 jeqzn r4 13 # if r4 == 0, the fibonacci sequence stops
07 add r3 r1 r2 # r3 = r1 + r2
08 addn r4 -1 # r4 = r4 -1
09 copy r2 r1 # r2 now equals r1
10 copy r1 r3 # r1 nog equals r3
11 write r3 # prints fibonacci number
12 jumpn 06 # checks if r4 == 0
13 halt # stops
current output:
1
1
2
3
5
8
13
21
34
55
89
144
233
377
..
..
Wanted output (example): if input (r4) = 10
1
1
2
3
5
8
13
21
34
55
08 addn r4 -1 (r4 should eventually end up being 0)
06 jeqzn r4 13 (should check when it's true, and it should halt)
What prevents it from halting?
Looks like line 1 sets r4 (the input) to -1 instead of the desired subtracting 1, so it should be addn r4 -1.
It's also worth noting the current implementation is iterative and not recursive, and the loop doesn't appear to be infinite but just really long as it would have to count down from -1 to wrap around to 0 (assuming addn does not saturate).

Subsetting data using grep

I have a dataframe named Schedule which has data for multiple airlines. I run the table function just to get the breakdown of records by airlines. Here is the answer
table(Schedule$airline)
AA AS B6 BA DL F9 FI LH NK QR UA WN
757 4 14 2 65 24 2 2 18 2 36 60
Now I am subsetting this data using grep to get a data frame which has gates in a particular terminal of interest, which in this case is terminal F
Gated_Schedule <- Schedule[grep("F", Schedule$gate), ]
when I run table to get a breakdown here,
table(Gated_Schedule$airline)
AA AS B6 BA DL F9 FI LH NK QR UA WN
362 0 0 0 0 0 0 0 0 0 0 0
Ideally the output should only have been AA like:
AA
362
How do I get rid of this discrepancy?

Equivalent bitget function in R

Is there a function in R that performs the same operation as bitget in MatLab/Octave:
bitget
From the bitget help page
Return the status of bit(s) n of unsigned integers in A the
lowest significant bit is n = 1.
bitget (100, 8:-1:1)
⇒ 0 1 1 0 0 1 0 0
so if you want to get the bit values for an integer in R, you can do
intToBits(100)[8:1]
# [1] 00 01 01 00 00 01 00 00
That technically returns a raw vector, so if you want just a numeric vector, do
as.numeric(intToBits(100)[8:1])
# [1] 0 1 1 0 0 1 0 0

Efficient (repeat) looping

I am trying to evaluate if a price, price(k), in a given row,(k), is equal to the one above, price(k-1). If it is I want to sum the volume from the prior and the price in question, volume(k)+volume(k+1), and then remove the row with the duplicate price, row k.
I have the following repeat loop which I am applying to a large dataset looking to delete repeated values.
k <- 1
repeat{
if( Prices$Price[ k + 1 ] == Prices$Price[ k ] ){
Prices$CumVolume[ k + 1 ] <- Prices$CumVolume[ k + 1 ] + Prices$CumVolume[ k ]
Prices <- Prices[ -k , ]
k <- k + 1
if( k > nrow( Prices ) ) break
}
}
The loop is very slow and I was wondering if there are ways to speed it up. Unfortunately I am relatively new to R and am having difficulty working out the best way to go about this.
Also is there a way in R to observe the iteration the loop is currently up too? i.e. have it displayed in the workspace on each iteration?
Example data:
Date Time Price CumVolume Ret MeanRet VolRet
26 01-JAN-2009 21:30:01.783 96.660 537 0 0 0
31 01-JAN-2009 21:30:58.041 96.650 78 0 0 0
33 01-JAN-2009 21:34:09.589 96.640 60 0 0 0
35 01-JAN-2009 21:34:10.879 96.640 40 0 0 0
37 01-JAN-2009 21:35:55.001 96.635 50 0 0 0
It appears you want something like this:
DF <- read.table(text=" Date Time Price CumVolume Ret MeanRet VolRet
26 01-JAN-2009 21:30:01.783 96.660 537 0 0 0
31 01-JAN-2009 21:30:58.041 96.650 78 0 0 0
33 01-JAN-2009 21:34:09.589 96.640 60 0 0 0
35 01-JAN-2009 21:34:10.879 96.640 40 0 0 0
37 01-JAN-2009 21:35:55.001 96.635 50 0 0 0", header=TRUE)
#create a run id
DF$runs <- cumsum(c(TRUE, diff(DF$Price) != 0))
#sum per each price run
DF$CCVolume <- with(DF, ave(CumVolume, runs, FUN=sum))
#remove duplicated prices
DF[!duplicated(DF$Price), ]
# Date Time Price CumVolume Ret MeanRet VolRet runs CCVolume
#26 01-JAN-2009 21:30:01.783 96.660 537 0 0 0 1 537
#31 01-JAN-2009 21:30:58.041 96.650 78 0 0 0 2 78
#33 01-JAN-2009 21:34:09.589 96.640 60 0 0 0 3 100
#37 01-JAN-2009 21:35:55.001 96.635 50 0 0 0 4 50
I think your code is going in infinite loop because of your increment index.K=k+1 and Break is always within the condition,I hope you want this
k=1
z=unique(Prices$Price)
for(i in 1:length(z))
{
dupindex=which(z[i]==Prices$Price)
Prices$CumVolume[tail(dupindex,n=1)]=sum(Prices$CumVolume[dupindex])
Prices=Prices[-(dupindex[1:length(dupindex)-1]),]
}
I hope it help,thanks.

mistake in multivePenal but not in frailtyPenal

The libraries used are: library(survival)
library(splines)
library(boot)
library(frailtypack) and the function used is in the library frailty pack.
In my data I have two recurrent events(delta.stable and delta.unstable) and one terminal event (delta.censor). There are some time-varying explanatory variables, like unemployment rate(u.rate) (is quarterly) that's why my dataset has been splitted by quarters.
Here there is a link to the subsample used in the code just below, just in case it may be helpful to see the mistake. https://www.dropbox.com/s/spfywobydr94bml/cr_05_males_services.rda
The problem is that it takes a lot of time running until the warning message appear.
Main variables of the Survival function are:
I have two recurrent events:
delta.unstable (unst.): takes value one when the individual find an unstable job.
delta.stable (stable): takes value one when the individual find a stable job.
And one terminal event
delta.censor (d.censor): takes value one when the individual has death, retired or emigrated.
row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392
When I apply multivePenal I obtain the following message:
Error en aggregate.data.frame(as.data.frame(x), ...) :
arguments must have same length
Además: Mensajes de aviso perdidos
In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created
#### multivePenal function
fit.joint.05_malesP<multivePenal(Surv(.t0,.t,delta.stable)~cluster(contadorbis)+terminal(as.factor(delta.censor))+event2(delta.unstable),formula.terminalEvent=~1, formula2=~as.factor(h.skill),data=cr_05_males_serv,Frailty=TRUE,recurrentAG=TRUE,cross.validation=F,n.knots=c(7,7,7), kappa=c(1,1,1), maxit=1000, hazard="Splines")
I have checked if Surv(.t0,.t,delta.stable) contains NA, and there are no NA's.
In addition, when I apply for the same data the function frailtyPenal for both possible combinations, the function run well and I get results. I take one week looking at this and I do not find the key. I would appreciate some of light to this problem.
#delta unstable+death
enter code here
fit.joint.05_males<-frailtyPenal(Surv(.t0,.t,delta.unstable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+ as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+ as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities)+
terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+ as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###Be patient. The program is computing ...
###The program took 2259.42 seconds
#delta stable+death
fit.joint.05_males<frailtyPenal(Surv(.t0,.t,delta.stable)~cluster(id)+u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(non.manual)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities)+terminal(delta.censor),formula.terminalEvent=~u.rate+as.factor(h.skill)+as.factor(m.skill)+as.factor(municipio)+as.factor(spanish.speakers)+as.factor(no.spanish.speaker)+as.factor(Aged.16.19)+as.factor(Aged.20.24)+as.factor(Aged.25.29)+as.factor(Aged.30.34)+as.factor(Aged.35.39)+as.factor(Aged.40.44)+as.factor(Aged.45.51)+as.factor(older61)+as.factor(responsabilities),data=cr_05_males_services,n.knots=12,kappa1=1000,kappa2=1000,maxit=1000, Frailty=TRUE,joint=TRUE, recurrentAG=TRUE)
###The program took 3167.15 seconds
Because you neither provide information about the packages used, nor the data necessary to run multivepenal or frailtyPenal, I can only help you with the Surv part (because I happened to have that package loaded).
The Surv warning message you provided (In Surv(.t0, .t, delta.stable) : Stop time must be > start time, NA created) suggests that something is strange with your variables .t0 (the time argument in Surv, refered to as 'start time' in the warning), and/or .t (time2 argument, 'Stop time' in the warning). I check this possibility with a simple example
# read the data you feed `Surv` with
df <- read.table(text = "row id contadorbis unst. stable d.censor .t0 .t
1 78 1 0 1 0 0 88
2 101 2 0 1 0 0 46
3 155 3 0 1 0 0 27
4 170 4 0 0 0 0 61
5 170 4 1 0 0 61 86
6 213 5 0 0 0 0 92
7 213 5 0 0 0 92 182
8 213 5 0 0 0 182 273
9 213 5 0 0 0 273 365
10 213 5 1 0 0 365 394
11 334 6 0 1 0 0 6
12 334 7 1 0 0 0 38
13 369 8 0 0 0 0 27
14 369 8 0 0 0 27 119
15 369 8 0 0 0 119 209
16 369 8 0 0 0 209 300
17 369 8 0 0 0 300 392", header = TRUE)
# create survival object
mysurv <- with(df, Surv(time = .t0, time2 = .t, event = stable))
mysurv
# create a new data set where one .t for some reason is less than .to
# on row five .t0 is 61, so I set .t to 60
df2 <- df
df2$.t[df2$.t == 86] <- 60
# create survival object using new data which contains at least one Stop time that is less than Start time
mysurv2 <- with(df2, Surv(time = .t0, time2 = .t, event = stable))
# Warning message:
# In Surv(time = .t0, time2 = .t, event = stable) :
# Stop time must be > start time, NA created
# i.e. the same warning message as you got
# check the survival object
mysurv2
# as you can see, the fifth interval contains NA
# I would recommend you check .t0 and .t in your data set carefully
# one way to examine rows where Stop time (.t) is less than start time (.t0) is:
df2[which(df2$.t0 > df2$.t), ]
I am not familiar with multivepenal but it seems that it does not accept a survival object which contains intervals with NA, whereas might frailtyPenal might do so.
The authors of the package have told me that the function is not finished yet, so perhaps that is the reason that it is not working well.
I encountered the same error and arrived at this solution.
frailtyPenal() will not accept data.frames of different length. The data.frame used in Surv and data.frame named in data= in frailtyPenal must be the same length. I used a Cox regression to identify the incomplete cases, reset the survival object to exclude the missing cases and, finally, run frailtyPenal:
library(survival)
library(frailtypack)
data(readmission)
#Reproduce the error
#change the first start time to NA
readmission[1,3] <- NA
#create a survival object with one missing time
surv.obj1 <- with(readmission, Surv(t.start, t.stop, event))
#observe the error
frailtyPenal(surv.obj1 ~ cluster(id) + dukes,
data=readmission,
cross.validation=FALSE,
n.knots=10,
kappa=1,
hazard="Splines")
#repair by resetting the surv object to omit the missing value(s)
#identify NAs using a Cox model
cox.na <- coxph(surv.obj1 ~ dukes, data = readmission)
#remove the NA cases from the original set to create complete cases
readmission2 <- readmission[-cox.na$na.action,]
#reset the survival object using the complete cases
surv.obj2 <- with(readmission2, Surv(t.start, t.stop, event))
#run frailtyPenal using the complete cases dataset and the complete cases Surv object
frailtyPenal(surv.obj2 ~ cluster(id) + dukes,
data = readmission2,
cross.validation = FALSE,
n.knots = 10,
kappa = 1,
hazard = "Splines")

Resources