ANTLR Recursive replace - recursion

I want to change DBMS_LOB.substr using antlr:
Example1:
Input:
DBMS_LOB.SUBSTR(field_name1, 4000, 1)
Output:
SUBSTR(field_name1, 1, 4000)
We need to do 2 things:
1) remove DBMS_LOB.
2) interchange 2nd & 3rd arguments
I'm able to override the rule for the specific token and change it to handle the above.
Problem to handle recursive:
I have an input something like this:
DBMS_LOB.SUBSTR(field_name1, DBMS_LOB.SUBSTR(field_name2, 6000, 1), DBMS_LOB.SUBSTR(field_name3, 8000, 1))
I want to change something like:
SUBSTR(field_name1, SUBSTR(field_name3, 1, 8000), SUBSTR(field_name2, 1, 6000))
How to handle when the DBMS_LOB.SUBSTR appears multiple levels within the original statement.
Any help would be appreciated.

It's quite difficult to tell without knowing the approach you use to implement this task.
But in general, the advice may look like this:
Think of the inner DBMS_LOB.SUBSTR statements as single nonterminal statements.
What I mean is that there should be no difference in your replacing algorithm when dealing with either
"DBMS_LOB.SUBSTR(field_name1, 4000, 1)"
or
"DBMS_LOB.SUBSTR(field_name1, DBMS_LOB.SUBSTR(field_name2, 6000, 1), DBMS_LOB.SUBSTR(field_name3, 8000, 1))".
In both cases 4000 and DBMS_LOB.SUBSTR(field_name2, 6000, 1) are just arguments, no matter what they look like or consist of.

Related

How can I compare the values of different columns for each row?

So say I have a dataframe with a column for "play" and two columns with values:
df <- data.frame(Play = c("Comedy", "Midsummer", "Hamlet"),
he = c(105, 20, 210),
she = c(100, 23, 212))
I would like to get two vectors, one containing each Play with a higher value for "he" than "she", and one for the opposite, so each Play that has a higher value for "she" than "he".
I've looked at a few ways I thought going about it but none really seems to work, I tried building a 'if (x > y) {print z}' function then apply() that over my dataframe but I'm really far to inexperienced and run into so many problems, there ought to be simpler way than that …
as.character(df$Play)[df$he>df$she]
as.character(df$Play)[df$he<df$she]
Are the above 2 expressions solve your problem?

R replicate function without repetition of drawn numbers

I'm looking for a lotto-function, meaning the drawn numbers aren't repeated. If I try either
y <- replicate(39,sample(1:39,1,replace=FALSE))
or
y <- replicate(39,sample(1:39,1,replace=TRUE))
the drawn numbers are repeating.
How can I prevent this?
Try sample(1:39, 39, replace = FALSE). Check ?sample.
Don't use replicate for that. To get 39 draws without repeats, use
sample(1:39, size = 39, replace = FALSE) (or, making use of defaults,
sample(39)).
Work from the inside out.
sample(1:39, 1, replace = FALSE)
picks one number from 1:39 uniformly at random. The replace = FALSE serves no purpose as you are only drawing one number anyway.
Now
replicate(39, sample(1:39, 1, replace = FALSE))
just replicates that 39 times. So there's no reason to expect there to be no duplicates.
You don't say exactly what lotto game you want to simulate. But the usual one is a something like a Lotto 6/39 game where 6 numbers are drawn from 1:39. To do this use:
sample(1:39, 6, replace = FALSE)
If you want to simulate many plays, say 1000 of them, that's when you use replicate:
replicate(1000, sample(1:39, 6, replace = FALSE)

R: Avoid floating point arithmetic when normalizing to sum 0

I need to pull a vector from a normal distribution and normalize it to sum 0, because I want to simulate the power with the pwr.rasch() function from the pwrRasch package. Sounds easy enough.
I create the vector like this:
set.seed(123)
itempars <- rnorm(n = 10, mean = 0, sd = 1.8)
To normalize the parameters to 0, I subtract the sum of my vector off of the last element of the vector like this:
itempars[10] <- itempars[10] - sum(itempars)
When I type sum(itempars) it should be 0, but it's -8.326673e-17. How is it possible? How can I get it to 0? I tried to round already, but it only increases the sum.
I don't want to choose every Itemparameter by hand. Thanks in advance!
EDIT:
Obviously the reasion is floating-point arithmetic. But it's hard to imagine that there is no way around.
The error massage of pwr.rasch() is as follows:
Error in simul.rasch(eval(parse(text = ppar[[1]])), ipar[[1]]) :
Item pararameters are not normalized to sum-0
Sadly, the function has poor documentation. When I estimate groupwise itemparameters with eRm's RM() function, which has an extra argument for normalizing to sum 0, it gives me a similar difference like in my example.
Any trick'd come in handy as I don't want to create more than 50 normal distributed itemparameters per hand. Even worse. If I understood floating-point arithemtic corretly this problem can appear with the use of doubles in general. It'd be extremely limitating if I'd only be able to use integers as itemparameters.
I downloaded the source code of the pwrRasch package and changed the if condition from
if (all(round(unlist(lapply(ipar, sum)), 3) != 0)) {
stop("Item pararameters are not normalized to sum-0")
}
to
if (all(round(unlist(lapply(ipar, sum)), 3) > 1e-5)) {
stop("Item pararameters are not normalized to sum-0")
}

long time to import data using mongo.find.all (rmongodb)

I tried to import data from mongodb to r using:
mongo.find.all(mongo, namespace, query=query,
fields= list('_id'= 0, 'entityEventName'= 1, context= 1, 'startTime'=1 ), data.frame= T)
The command works find for small data sets, but I want to import 1,000,000 documents.
Using system.time and adding limit= X to the command, I measure the time as a function of the data to import:
system.time(mongo.find.all(mongo, namespace, query=query ,
fields= list('_id'= 0, 'entityEventName'= 1, context= 1, 'startTime'=1 ),
limit= 10000, data.frame= T))
The results:
Data Size Time
1 0.02
100 0.29
1000 2.51
5000 16.47
10000 20.41
50000 193.36
100000 743.74
200000 2828.33
After plotting the data I believe that:
Import Time = f(Data^2)
Time = -138.3643 + 0.0067807*Data Size + 6.773e-8*(Data Size-45762.6)^2
R^2 = 0.999997
Am I correct?
Is there a faster command?
Thanks!
lm is cool, but I think if you'll try to add power 3,4,5, ... features, you'll also receive great R^2 =) you overfit=)
One of the known R's drawbacks is that you can't efficiently append elements to vector (or list). Appending element triggers copy of the entire object. And here you can see derivative of this effect.
In general when you fetching data from mongodb, you don't know size of the result in advance. You iterate though cursor and grow resulting list. In older versions this procedure was incredibly slow because of described above R's behaviour. After this pull performance become much better.
Trick with environments helps a lot, but it still not as fast as preallocated list.
But can we potentially do better? Yes.
1) Simply allow user to point size of the result and preallocate list. And do it automatically if limit= is passed into mongo.find.all. I filled issue for this enhancement.
2) Construct result in C code.
If know size of your data in advance you can:
cursor <- mongo.find(mongo, namespace, query=query, fields= list('_id'= 0, 'entityEventName'= 1, context= 1, 'startTime'=1 ))
result_lst <- vector('list', NUMBER_OF_RECORDS)
i <- 1
while (mongo.cursor.next(cursor)) {
result_lst[[i]] <- mongo.bson.to.list(mongo.cursor.value(cursor))
i <- i + 1
}
result_dt <- data.table::rbindlist(result_lst)

Programming a sensitivity analysis in R: Vary 1 parameter (column), hold others constant. Better way?

I want to test the sensitivity of a calculation to the value of 4 parameters. To do this, I want to vary one parameter at a time -- i.e., change Variable 1, hold variables 2-4 at a "default" value (e.g., 1). I thought an easy way to organize these values would be in a data.frame(), where each column corresponds to a different variable, and each row to a set of parameters for which the calculation should be made. I would then loop through each row of the data frame, evaluating a function given the parameter values in that row.
This seems like it should be a simple thing to do, but I can't find a quick way to do it.
The problem might be my overall approach to programming the sensitivity analysis, but I can't think of a good, simple way to program the aforementioned data.frame.
My code for generating the data.frame:
Adj_vals <- c(seq(0, 1, by=0.1), seq(1.1, 2, by=0.1)) #a series of values for 3 of the parameters to use
A_Adj_vals <- 10^(seq(1,14,0.5)) #a series of values for another one of the parameters to use
n1 <- length(Adj_vals)
n2 <- length(A_Adj_vals)
data.frame(
"Dg_Adj"=c(Adj_vals, rep(1, n1*2+n2)), #this parameter's default is 1
"Df_Adj"=c(rep(1, n1), Adj_vals, rep(1, n1+n2)), #this parameter's default is 1
"sd_Adj"=c(rep(1, n1*2), 0.01, Adj_vals[-1], rep(1, n2)), #This parameter has default of 1, but unlike the others using Adj_vals, it can only take on values >0
"A"=c(rep(1E7, n1*3), A_Adj_vals) #this parameter's default is 10 million
)
This code produces the desired data.frame. Is there a simpler way to achieve the same result? I would accept an answer where sd_Adj takes on 0 instead of 0.01.
It's pretty debatable if this is better, but another way to do it would be to follow this pattern:
defaults<-data.frame(a=1,b=1,c=1,d=10000000)
merge(defaults[c("b","c","d")],data.frame(a=c(seq(0, 1, by=0.1), seq(1.1, 2, by=0.1))))
This should be pretty easy to cook up into a function that automatically removes the correct column from defaults based on the column name in the data frame you are merging with etc

Resources