Similar to R's rbinom function on SQL Server - r

I've recently using R's rbinom function for generate a success event such as
rbinom(7,7,p=0.81)
The case if, now I have a table that looked like
num denum prob
4 7 0.57
5 8 0.625
3 4 0.75
2 5 0.4
9 11 0.81
I want to create new column that come from sum of success event of rbinom function using the num,denum & prob column from the table for each row like this
table %>%
mutate(new_column=sum(rbinom(num,denum,prob)))
but this return to give me same result for each row even with different num, denum & prob value.
The question is :
Is there any problem with my code so it always return a same result for whole table?
And if the source table is on my SQL Server Database, is it possible to do the same objective with SQLServer function? (rbinom function for SQLServer)
Thank you.

Related

Dice Rolling Probability matrix with a D20(20 sided-dice)

I am trying to run dice rolling simulations in R. In my project I am using the Dungeons and Dragon's D20, which is a 20 sided die that is used in table top games like .
One of the features of the game is when a player is attacking or casting spells, they roll the D20 and the outcome of the roll determines success. Players may have positive or negative modifiers to add to the sum of the roll. I am trying to create a of a player rolling a particular result based on their modifier. This is what I have so far:
D20_prob = function(d, nreps) {
roll = matrix(sample(20, size = d * nreps, replace = T), ncol = nreps)
result_tot = (roll + 1) # Adds modifier
return(prop.table(table(result_tot)))
}
D20_prob(1, 100)
And it results in something like this:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0.02 0.04 0.04 0.02 0.10 0.08 0.05 0.05 0.05 0.03 0.04 0.04 0.03 0.06 0.05 0.05 0.08 0.04 0.06 0.07
In the game of D&D modifiers can range from -5 to +5. Can any one help me modify my code so I can create a table that ranges -5 to +5?
I think the function you want is
sample.int(10, n= 20, replace = TRUE)
It really depends what you want to simulate.
If you want to see how probable a result is, given a certain amount of rolls (i.e if I use that modifier is it better than another given the damage I do in next twenty hours of play), here you would use the above mentioned function and simulate as many rolls as you can (i.e 1000 or more).
The second option is you want to see the probability of a result in general (i.e. if I need to roll above 18 what is the chance of me rolling that). In that case the probability of rolling a number is 1/20 since you have a D20.
I once did a similar thing: My simulator was done with random rolls to see how likely a combination is successful to give players an idea how their modifiers will work in the long run.
But in that case be aware that random numbers never are truly random (there are better and worse packages).
Edit:
To loop through the damages you can use a for loop:
for (modifier in -5:5){
result_tot = prop.table(table(roll + modifier))
}
The problem is I have never worked with data.tables and I do not seem to be able to merge all the different tables which are generated. In the example above each loop overwrites the table result_tot. So you need to print it before or find a way to merge the resulting tables.

Creating one output table for multiple results from different functions in R

I was trying search accordingly to find the solution for my issue but I failed, so I'm writing here. At the moment I've created quite large set of functions in R, around 500 lines of the code for each scripts (7 scripts) producing the output of the calculation in the standardized form:
>getFunction1(Key_ID)
#output
calc1 calc2
category1 0.00 0.3
category2 0.06 0.2
category3 0.00 0.3
>getFunction2(Key_ID)
#output
calc1 calc2
category1 0.10 0.1
category2 0.02 0.3
category3 0.01 0.3
(...)
>getFunction7(Key_ID)
#output
calc1 calc2
category1 0.20 0.15
category2 0.04 0.4
category3 0.02 0.35
Designed functions whithin 7 scripts are same from the structure point of view but contating different calculations depend on category of the function (specific use case) and (inside the script) category of the calculations. Everything work perfectly but I'm not able to create one coherent table storing calculation coming from all getFunction1-7(Key_ID) in one tablelaric form looking like this:
Key_ID|Name_Function(refering to use case)|Category(whithin the function)|Cal1|Cal2|
Key_ID is crucial part, because this is ID allowing join new calculation with the previous one in data base. I cannot simply create tabelaric structure like:
tableFunction1 <- as.data.frame(getFunction1(Key_ID))
cause depending on Key_ID I can receive different scores depending on the subject of the calculation, thus Key_ID referring on different objects with different attributes included into calculations.
Have you ever faced with kind of issue ? Any advise so far for me ?
Thanks a lot for any.
Aleksandra
If the output from each function is a data frame containing categories and calculations, you need a way to add KEY_ID and the name of the function that produced the calculations. Since the OP didn't include logic of any of the functions being discussed, we'll assume that the current output is a data frame containing the columns category, calc1, and calc2.
First, since KEY_ID is passed as an argument into each of the seven functions, you can create a column in the output data frame containing this value.
# assuming output frame has already been created, add id column
# by using rep() to repeat the value of the KEY_ID argument
output$id <- rep(KEY_ID,nrow(output))
Second, you'll want to similarly add the name of the current function as a column in your output data frame. This can be accomplished with match.call().
# add name of current function to output data frame
output$name_function <- rep(match.call()[[1]],nrow(output))

do.call to build and execute data.table commands

I have a small data.table representing one record per test cell (AB testing results) and am wanting to add several more columns that compare each test cell, against each other test cell. In other words, the number of columns I want to add, will depend upon how many test cells are in the AB test in question.
My data.table looks like:
Group Delta SD.diff
Control 0 0
Cell1 0.00200 0.001096139
Cell2 0.00196 0.001095797
Cell3 0.00210 0.001096992
Cell4 0.00160 0.001092716
And I want to add the following columns (numbers are trash here):
Group v.Cell1 v.Cell2 v.Cell3 v.Cell4
Control 0.45 0.41 0.45 0.41
Cell1 0.50 0.58 0.48 0.66
Cell2 0.58 0.50 0.58 0.48
Cell3 0.48 0.58 0.50 0.70
Cell4 0.66 0.48 0.70 0.50
I am sure that do.call is the way to go, but I cant work out how to embed one do.call inside another to generate the script... and I can't work out how to then execute the scripts (20 lines in total). The closest I am currently is:
a <- do.call("paste",c("test.1.results <- mutate(test.1.results, P.Better.",list(unlist(test.1.results[,Group]))," = pnorm(Delta, test.1.results['",list(unlist(test.1.results[,Group])),"'][,Delta], SD.diff,lower.tail=TRUE))", sep=""))
Which produces 5 script lines like:
test.1.results <- mutate(test.1.results, P.Better.Cell2 = pnorm(Delta, test.1.results['Cell2'][,Delta], SD.diff,lower.tail=TRUE))
Which only compares one test cell results against itself.. a 0.50 result (difference due to chance). No use what so ever as I need each test compared to each other.
Not sure where to go with this one.
Update: In v1.8.11, FR #2077 is now implemented - set() can now add columns by reference, . From NEWS:
set() is able to add new columns by reference now. For example, set(DT, i=3:5, j="bla", 5L) is equivalent to DT[3:5, bla := 5L]. This was FR #2077. Tests added.
Tasks like this are often easier with set(). To demonstrate, here's a translation of what you have in the question (untested). But I realise you want something different than what you've posted (which I don't quite understand, quickly).
for (i in paste0("Cell",1:4))
set(DT, # the data.table to update/add column by reference
i=NULL, # no row subset, NULL is default anyway
j=paste("P.Better.",i), # column name or position. must be name when adding
value = pnorm(DT$Delta, DT[i][,Delta], DT$SD.diff, lower.tail=TRUE)
Note that you can add only a subset of a new column and the rest will be filled with NA. Both with := and set.

compare row values over multiple rows (R)

I don't think this question has asked yet (most similar questions are about extracting data or returning a count). I am new to R, so any help would be appreciated!
I have a dataset of multiple runs of an experiment in one file and the data looks like this, where i have all the time steps for each run in rows
time [info] id (unique per run)
I am attempting to calculate when the system reaches equilibrium, which I am defining as stable values in 3 interdependent parameters. I would like to have the contents of rows compared and if they are within 5% of each other over 20 timesteps, to return the timestep at which the stability begins and the id.
So far, I'm thinking it will be something like the following (or maybe have a while loop)(sorry for the bad formatting):
y=1;
z=0; #variables to control the loop
x=0;
for (ID) {
if (CC at time=x == 0.05+-CC at time=y ) {
if(z<=20){ #catalogs the number of periods that match
y++
z++}
else [save value in column]
}
else{ #no match for sustained period so start over again
x++
y=x+1
z=0
}
}
eta: CC is one of my parameters of interest and ranges between 0 and 1 although the endpoints are unlikely.
Here's a simple example that might help: this is something like how my data looks:
zz <- textConnection("time CC ID
1 0.99 1
2 0.80 1
3 0.90 1
4 0.91 1
5 0.92 1
6 0.91 1
1 0.99 2
2 0.90 2
3 0.90 2
4 0.91 2
5 0.92 2
6 0.91 2")
Data <- read.table(zz, header = TRUE)
close(zz)
my question is, how can i run through the lines to find out when the value of CC becomes 'stable' (meaning it doesn't change by more than 0.05 over X (here, 3) time steps) so that it would create the following results:
ID timeToEQ
1 1 3
2 2 2
does this help? The only way I can think to do this is with a for-loop and I think there must be an easier way!
Here is my code. I will post the explanation in some time.
require(plyr)
ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, abs(diff(CC)) < 0.05 ))
ID timeToEQ
1 1 3
2 2 2
EDIT. Here is how it works.
ddply breaks Data into subsets based on ID.
diff(CC) computes the difference between CC of successive rows.
abs(diff(CC)) < 0.05) returns TRUE if the difference has stabilized.
Position locates the first instance of an element which satisfies isTRUE.

How do I perform a function on each row of a data frame and have just one element of the output inserted as a new column in that row

It is easy to do an Exact Binomial Test on two values but what happens if one wants to do the test on a whole bunch of number of successes and number of trials. I created a dataframe of test sensitivities, potential number of enrollees in a study and then for each row I calculate how may successes that would be. Here is the code.
sens <-seq(from=.1, to=.5, by=0.05)
enroll <-seq(from=20, to=200, by=20)
df <-expand.grid(sens=sens,enroll=enroll)
df <-transform(df,succes=sens*enroll)
But now how do I use each row's combination of successes and number of trials to do the binomial test.
I am only interested in the upper limit of the 95% confidence interval of the binomial test. I want that single number to be added to the data frame as a column called "upper.limit"
I thought of something along the lines of
binom.test(succes,enroll)$conf.int
alas, conf.int gives something such as
[1] 0.1266556 0.2918427
attr(,"conf.level")
[1] 0.95
All I want is just 0.2918427
Furthermore I have a feeling that there has to be do.call in there somewhere and maybe even an lapply but I do not know how that will go through the whole data frame. Or should I perhaps be using plyr?
Clearly my head is spinning. Please make it stop.
If this gives you (almost) what you want, then try this:
binom.test(succes,enroll)$conf.int[2]
And apply across the board or across the rows as it were:
> df$UCL <- apply(df, 1, function(x) binom.test(x[3],x[2])$conf.int[2] )
> head(df)
sens enroll succes UCL
1 0.10 20 2 0.3169827
2 0.15 20 3 0.3789268
3 0.20 20 4 0.4366140
4 0.25 20 5 0.4910459
5 0.30 20 6 0.5427892
6 0.35 20 7 0.5921885
Here you go:
R> newres <- do.call(rbind, apply(df, 1, function(x) {
+ bt <- binom.test(x[3], x[2])$conf.int;
+ newdf <- data.frame(t(x), UCL=bt[2]) }))
R>
R> head(newres)
sens enroll succes UCL
1 0.10 20 2 0.31698
2 0.15 20 3 0.37893
3 0.20 20 4 0.43661
4 0.25 20 5 0.49105
5 0.30 20 6 0.54279
6 0.35 20 7 0.59219
R>
This uses apply to loop over your existing data, compute test, return the value you want by sticking it into a new (one-row) data.frame. And we then glue all those 90 data.frame objects into a new single one with do.call(rbind, ...) over the list we got from apply.
Ah yes, if you just want to directly insert a single column the other answer rocks as it is simple. My longer answer shows how to grow or construct a data.frame during the sweep of apply.

Resources