Dice Rolling Probability matrix with a D20(20 sided-dice) - r

I am trying to run dice rolling simulations in R. In my project I am using the Dungeons and Dragon's D20, which is a 20 sided die that is used in table top games like .
One of the features of the game is when a player is attacking or casting spells, they roll the D20 and the outcome of the roll determines success. Players may have positive or negative modifiers to add to the sum of the roll. I am trying to create a of a player rolling a particular result based on their modifier. This is what I have so far:
D20_prob = function(d, nreps) {
roll = matrix(sample(20, size = d * nreps, replace = T), ncol = nreps)
result_tot = (roll + 1) # Adds modifier
return(prop.table(table(result_tot)))
}
D20_prob(1, 100)
And it results in something like this:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0.02 0.04 0.04 0.02 0.10 0.08 0.05 0.05 0.05 0.03 0.04 0.04 0.03 0.06 0.05 0.05 0.08 0.04 0.06 0.07
In the game of D&D modifiers can range from -5 to +5. Can any one help me modify my code so I can create a table that ranges -5 to +5?

I think the function you want is
sample.int(10, n= 20, replace = TRUE)
It really depends what you want to simulate.
If you want to see how probable a result is, given a certain amount of rolls (i.e if I use that modifier is it better than another given the damage I do in next twenty hours of play), here you would use the above mentioned function and simulate as many rolls as you can (i.e 1000 or more).
The second option is you want to see the probability of a result in general (i.e. if I need to roll above 18 what is the chance of me rolling that). In that case the probability of rolling a number is 1/20 since you have a D20.
I once did a similar thing: My simulator was done with random rolls to see how likely a combination is successful to give players an idea how their modifiers will work in the long run.
But in that case be aware that random numbers never are truly random (there are better and worse packages).
Edit:
To loop through the damages you can use a for loop:
for (modifier in -5:5){
result_tot = prop.table(table(roll + modifier))
}
The problem is I have never worked with data.tables and I do not seem to be able to merge all the different tables which are generated. In the example above each loop overwrites the table result_tot. So you need to print it before or find a way to merge the resulting tables.

Related

In R, how to take the mean of a varying number of elements for each row in a data frame?

So I have a dataframe, PVALUES, like this:
PVALS <- read.csv(textConnection("PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01"),sep = " ")
That corresponds to another dataframe, DATA, like this:
DATA <- read.csv(textConnection("COL1 COL2 CO3
10 2 9
11 20 200
2 3 5"),sep=" ")
For every row in DATA, I'd like to take the mean of the numbers whose indices correspond to entries in PVALUES that are <= 0.05.
So, for example, the first row in PVALUES only has two entries <= 0.05, the entries in [1,2] and [1,3]. Therefore, for the first row of DATA, I want to take the mean of 2 and 9.
In the second row of PVALUES, only the entry [2,2] is <=0.05, so instead of taking the mean for the second row of DATA, I would just use DATA[20,20].
So, my output would look like:
MEANS
6.5
20
3.33
I thought I might be able to generate indices for every entry in PVALUES <=0.05, and then use that to select entries in DATA to use for the mean. I tried to use this command to generate indices:
exp <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
...but it only picks up on indices for entries the first column that are <=0.05. In my example above, it would only output [3,1].
Can anyone see what I'm doing wrong, or have ideas on how to tackle this problem?
Thank you!
It's a bit funny looking, but this should work
rowMeans(`is.na<-`(DATA,PVALUES>=.05), na.rm=T)
The "ugly" part is calling is.na<- without doing the automatic replacement, but here we just set all data with p-values larger than .05 to missing and then take the row means.
It's unclear to me exactly what you were doing with exp, but that type of method could work as well. Maybe with
expx <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
aggregate(val~row, cbind(expx,val=DATA[exp]), mean)
(renamed so as not to interfere with the built in exp() function)
Tested with
PVALUES<-read.table(text="PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01", header=T)
DATA<-read.table(text="COL1 COL2 CO3
10 2 9
11 20 200
2 3 5", header=T)
I usually enjoy MrFlick's responses but the use of is.na<- in that manner seems to violate my expectations of R code because it is destructively modifies the data. I admit that I probably should have been expecting that possibility because of assignment arrow but it surprised me nonetheless. (I don't object to data.table code because it is hones t and forthright about modifying its contents with the := function.) I also admit that my efforts to improve one it lead me down a rabbit hole where I found this equally "baroke" effort. (You have incorrectly averaged 2 and 9)
sapply( split( DATA[which( PVALS <= 0.05, arr.ind=TRUE)],
which( PVALS <= 0.05, arr.ind=TRUE)[,'row']),
mean)
1 2 3
5.500000 20.000000 3.333333

R comparing unequal vectors with inequality

I have two single vector data frames of unequal length
aa<-data.frame(c(2,12,35))
bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36))
For each observation in aa I want to count the number of instances bb is less than aa
My result:
bb<aa
1 1
2 7
3 9
I have been able to do it two ways by creating a function and using apply, but my datasets are large and I let one run all night without end.
What I have:
fun1<-function(a,b){k<-colSums(b<a)
k<-k*.000058242}
system.time(replicate(5000,data.frame(apply(aa,1,fun1,b=bb))))
user system elapsed
3.813 0.011 3.883
Secondly,
fun2<-function(a,b){k<-length(which(b<a))
k<-k*.000058242}
system.time(replicate(5000,data.frame(apply(aa,1,fun2,b=bb))))
user system elapsed
3.648 0.006 3.664
The second function is slightly faster in all my tests, but I let the first run all night on a dataset where bb>1.7m and aa>160k
I found this post, and have tried using with() but cannot seem to get it to work, also tried a for loop without success.
Any help or direction is appreciated.
Thank you!
aa<-data.frame(c(2,12,35))
bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36))
sapply(aa[[1]],function(x)sum(bb[[1]]<x))
# [1] 1 7 9
Some more realistic examples:
n <- 1.6e3
bb <- sample(1:n,1.7e6,replace=T)
aa <- 1:n
system.time(sapply(aa,function(x)sum(bb<x)))
# user system elapsed
# 14.63 2.23 16.87
n <- 1.6e4
bb <- sample(1:n,1.7e6,replace=T)
aa <- 1:n
system.time(sapply(aa,function(x)sum(bb<x)))
# user system elapsed
# 148.77 18.11 167.26
So with length(aa) = 1.6e4 this takes about 2.5 min (on my system), and the process scales as O(length(aa)) - no surprise there. Therefore, with your full dataset, it should run in about 25 min. Still kind of slow. Maybe someone else will come up with a better way.
My original post I had been looking for the number of times bb
So in my example
aa<-data.frame(c(2,12,35))
bb<-data.frame(c(1,2,3,4,5,6,7,15,22,36))
x<-ecdf(bb[,1])
x(2)
[1] 0.2
x(12)
[1] 0.7
x(35)
[1] 0.9
To get the answers in my original post I would need to multiply by the number of data points within bb, in this instance 10. Although the first one is not the same because in my original post I had stated bb
I am dealing with large datasets of land elevation and water elevation over 1 million data points for each, but in the end I am creating an inundation curve. I want to know how much land will be inundated at a water levels given exceedance probability.
So using the above ecdf() function on all 1 million data points would still be time consuming, but I realized I do not need all the data points just enough to create my curve.
So I applied the ecdf() function to the entire land data set, but then created an elevation sequence of the water large enough to create the curve that I needed, but small enough that it could be computed rapidly.
land_elevation <- data.frame(rnorm(1e6))
water_elevation<- data.frame(rnorm(1e6))
cdf_land<- ecdf(land_elevation[,1])
elevation_seq <- seq(from = min(water_elevation[,1]), to = max(water_elevation[,1]), length.out = 1000)
land <- sapply(elevation_seq, cdf_land)
My results are the same, but they are much faster.

How to organize scores that belong to same condition in R

I've been searching for hours for a solution to my problem, but since I'm new to R and programming, I haven't really got the terminology down well enough to effectively search online for help.
Below is a simplified version of the data I am working with. In the full data there are close to 200 different items, and 24 subjects.
I need to be able to work with the data in terms of which "item" the scores belong with.
For example, I would like to be able to perform basic functions such as calculate the means for all the First scores on Item 3, or all the Second scores for Item 2 etc.
How should I approach this? Thanks!
Subject Item First score Second score
1 1 0.92 0.58
1 2 1.00 1.00
1 3 1.00 0.69
2 1 0.90 0.58
2 2 0.95 0.90
2 3 1.00 0.92
You could also use split()
FirstScore <- c(0.92,1.00,1.00,0.90,0.95,1.00)
Item <- rep(1:3,2)
FirstScoreByItem <- split(FirstScore, as.factor(Item))
To access scores for each item, use
FirstScoreByItem[1]
To calculate mean, use
mean(FirstScoreByItem[1])

compare row values over multiple rows (R)

I don't think this question has asked yet (most similar questions are about extracting data or returning a count). I am new to R, so any help would be appreciated!
I have a dataset of multiple runs of an experiment in one file and the data looks like this, where i have all the time steps for each run in rows
time [info] id (unique per run)
I am attempting to calculate when the system reaches equilibrium, which I am defining as stable values in 3 interdependent parameters. I would like to have the contents of rows compared and if they are within 5% of each other over 20 timesteps, to return the timestep at which the stability begins and the id.
So far, I'm thinking it will be something like the following (or maybe have a while loop)(sorry for the bad formatting):
y=1;
z=0; #variables to control the loop
x=0;
for (ID) {
if (CC at time=x == 0.05+-CC at time=y ) {
if(z<=20){ #catalogs the number of periods that match
y++
z++}
else [save value in column]
}
else{ #no match for sustained period so start over again
x++
y=x+1
z=0
}
}
eta: CC is one of my parameters of interest and ranges between 0 and 1 although the endpoints are unlikely.
Here's a simple example that might help: this is something like how my data looks:
zz <- textConnection("time CC ID
1 0.99 1
2 0.80 1
3 0.90 1
4 0.91 1
5 0.92 1
6 0.91 1
1 0.99 2
2 0.90 2
3 0.90 2
4 0.91 2
5 0.92 2
6 0.91 2")
Data <- read.table(zz, header = TRUE)
close(zz)
my question is, how can i run through the lines to find out when the value of CC becomes 'stable' (meaning it doesn't change by more than 0.05 over X (here, 3) time steps) so that it would create the following results:
ID timeToEQ
1 1 3
2 2 2
does this help? The only way I can think to do this is with a for-loop and I think there must be an easier way!
Here is my code. I will post the explanation in some time.
require(plyr)
ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, abs(diff(CC)) < 0.05 ))
ID timeToEQ
1 1 3
2 2 2
EDIT. Here is how it works.
ddply breaks Data into subsets based on ID.
diff(CC) computes the difference between CC of successive rows.
abs(diff(CC)) < 0.05) returns TRUE if the difference has stabilized.
Position locates the first instance of an element which satisfies isTRUE.

How do I perform a function on each row of a data frame and have just one element of the output inserted as a new column in that row

It is easy to do an Exact Binomial Test on two values but what happens if one wants to do the test on a whole bunch of number of successes and number of trials. I created a dataframe of test sensitivities, potential number of enrollees in a study and then for each row I calculate how may successes that would be. Here is the code.
sens <-seq(from=.1, to=.5, by=0.05)
enroll <-seq(from=20, to=200, by=20)
df <-expand.grid(sens=sens,enroll=enroll)
df <-transform(df,succes=sens*enroll)
But now how do I use each row's combination of successes and number of trials to do the binomial test.
I am only interested in the upper limit of the 95% confidence interval of the binomial test. I want that single number to be added to the data frame as a column called "upper.limit"
I thought of something along the lines of
binom.test(succes,enroll)$conf.int
alas, conf.int gives something such as
[1] 0.1266556 0.2918427
attr(,"conf.level")
[1] 0.95
All I want is just 0.2918427
Furthermore I have a feeling that there has to be do.call in there somewhere and maybe even an lapply but I do not know how that will go through the whole data frame. Or should I perhaps be using plyr?
Clearly my head is spinning. Please make it stop.
If this gives you (almost) what you want, then try this:
binom.test(succes,enroll)$conf.int[2]
And apply across the board or across the rows as it were:
> df$UCL <- apply(df, 1, function(x) binom.test(x[3],x[2])$conf.int[2] )
> head(df)
sens enroll succes UCL
1 0.10 20 2 0.3169827
2 0.15 20 3 0.3789268
3 0.20 20 4 0.4366140
4 0.25 20 5 0.4910459
5 0.30 20 6 0.5427892
6 0.35 20 7 0.5921885
Here you go:
R> newres <- do.call(rbind, apply(df, 1, function(x) {
+ bt <- binom.test(x[3], x[2])$conf.int;
+ newdf <- data.frame(t(x), UCL=bt[2]) }))
R>
R> head(newres)
sens enroll succes UCL
1 0.10 20 2 0.31698
2 0.15 20 3 0.37893
3 0.20 20 4 0.43661
4 0.25 20 5 0.49105
5 0.30 20 6 0.54279
6 0.35 20 7 0.59219
R>
This uses apply to loop over your existing data, compute test, return the value you want by sticking it into a new (one-row) data.frame. And we then glue all those 90 data.frame objects into a new single one with do.call(rbind, ...) over the list we got from apply.
Ah yes, if you just want to directly insert a single column the other answer rocks as it is simple. My longer answer shows how to grow or construct a data.frame during the sweep of apply.

Resources