do.call to build and execute data.table commands - r

I have a small data.table representing one record per test cell (AB testing results) and am wanting to add several more columns that compare each test cell, against each other test cell. In other words, the number of columns I want to add, will depend upon how many test cells are in the AB test in question.
My data.table looks like:
Group Delta SD.diff
Control 0 0
Cell1 0.00200 0.001096139
Cell2 0.00196 0.001095797
Cell3 0.00210 0.001096992
Cell4 0.00160 0.001092716
And I want to add the following columns (numbers are trash here):
Group v.Cell1 v.Cell2 v.Cell3 v.Cell4
Control 0.45 0.41 0.45 0.41
Cell1 0.50 0.58 0.48 0.66
Cell2 0.58 0.50 0.58 0.48
Cell3 0.48 0.58 0.50 0.70
Cell4 0.66 0.48 0.70 0.50
I am sure that do.call is the way to go, but I cant work out how to embed one do.call inside another to generate the script... and I can't work out how to then execute the scripts (20 lines in total). The closest I am currently is:
a <- do.call("paste",c("test.1.results <- mutate(test.1.results, P.Better.",list(unlist(test.1.results[,Group]))," = pnorm(Delta, test.1.results['",list(unlist(test.1.results[,Group])),"'][,Delta], SD.diff,lower.tail=TRUE))", sep=""))
Which produces 5 script lines like:
test.1.results <- mutate(test.1.results, P.Better.Cell2 = pnorm(Delta, test.1.results['Cell2'][,Delta], SD.diff,lower.tail=TRUE))
Which only compares one test cell results against itself.. a 0.50 result (difference due to chance). No use what so ever as I need each test compared to each other.
Not sure where to go with this one.

Update: In v1.8.11, FR #2077 is now implemented - set() can now add columns by reference, . From NEWS:
set() is able to add new columns by reference now. For example, set(DT, i=3:5, j="bla", 5L) is equivalent to DT[3:5, bla := 5L]. This was FR #2077. Tests added.
Tasks like this are often easier with set(). To demonstrate, here's a translation of what you have in the question (untested). But I realise you want something different than what you've posted (which I don't quite understand, quickly).
for (i in paste0("Cell",1:4))
set(DT, # the data.table to update/add column by reference
i=NULL, # no row subset, NULL is default anyway
j=paste("P.Better.",i), # column name or position. must be name when adding
value = pnorm(DT$Delta, DT[i][,Delta], DT$SD.diff, lower.tail=TRUE)
Note that you can add only a subset of a new column and the rest will be filled with NA. Both with := and set.

Related

Dice Rolling Probability matrix with a D20(20 sided-dice)

I am trying to run dice rolling simulations in R. In my project I am using the Dungeons and Dragon's D20, which is a 20 sided die that is used in table top games like .
One of the features of the game is when a player is attacking or casting spells, they roll the D20 and the outcome of the roll determines success. Players may have positive or negative modifiers to add to the sum of the roll. I am trying to create a of a player rolling a particular result based on their modifier. This is what I have so far:
D20_prob = function(d, nreps) {
roll = matrix(sample(20, size = d * nreps, replace = T), ncol = nreps)
result_tot = (roll + 1) # Adds modifier
return(prop.table(table(result_tot)))
}
D20_prob(1, 100)
And it results in something like this:
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
0.02 0.04 0.04 0.02 0.10 0.08 0.05 0.05 0.05 0.03 0.04 0.04 0.03 0.06 0.05 0.05 0.08 0.04 0.06 0.07
In the game of D&D modifiers can range from -5 to +5. Can any one help me modify my code so I can create a table that ranges -5 to +5?
I think the function you want is
sample.int(10, n= 20, replace = TRUE)
It really depends what you want to simulate.
If you want to see how probable a result is, given a certain amount of rolls (i.e if I use that modifier is it better than another given the damage I do in next twenty hours of play), here you would use the above mentioned function and simulate as many rolls as you can (i.e 1000 or more).
The second option is you want to see the probability of a result in general (i.e. if I need to roll above 18 what is the chance of me rolling that). In that case the probability of rolling a number is 1/20 since you have a D20.
I once did a similar thing: My simulator was done with random rolls to see how likely a combination is successful to give players an idea how their modifiers will work in the long run.
But in that case be aware that random numbers never are truly random (there are better and worse packages).
Edit:
To loop through the damages you can use a for loop:
for (modifier in -5:5){
result_tot = prop.table(table(roll + modifier))
}
The problem is I have never worked with data.tables and I do not seem to be able to merge all the different tables which are generated. In the example above each loop overwrites the table result_tot. So you need to print it before or find a way to merge the resulting tables.

Creating one output table for multiple results from different functions in R

I was trying search accordingly to find the solution for my issue but I failed, so I'm writing here. At the moment I've created quite large set of functions in R, around 500 lines of the code for each scripts (7 scripts) producing the output of the calculation in the standardized form:
>getFunction1(Key_ID)
#output
calc1 calc2
category1 0.00 0.3
category2 0.06 0.2
category3 0.00 0.3
>getFunction2(Key_ID)
#output
calc1 calc2
category1 0.10 0.1
category2 0.02 0.3
category3 0.01 0.3
(...)
>getFunction7(Key_ID)
#output
calc1 calc2
category1 0.20 0.15
category2 0.04 0.4
category3 0.02 0.35
Designed functions whithin 7 scripts are same from the structure point of view but contating different calculations depend on category of the function (specific use case) and (inside the script) category of the calculations. Everything work perfectly but I'm not able to create one coherent table storing calculation coming from all getFunction1-7(Key_ID) in one tablelaric form looking like this:
Key_ID|Name_Function(refering to use case)|Category(whithin the function)|Cal1|Cal2|
Key_ID is crucial part, because this is ID allowing join new calculation with the previous one in data base. I cannot simply create tabelaric structure like:
tableFunction1 <- as.data.frame(getFunction1(Key_ID))
cause depending on Key_ID I can receive different scores depending on the subject of the calculation, thus Key_ID referring on different objects with different attributes included into calculations.
Have you ever faced with kind of issue ? Any advise so far for me ?
Thanks a lot for any.
Aleksandra
If the output from each function is a data frame containing categories and calculations, you need a way to add KEY_ID and the name of the function that produced the calculations. Since the OP didn't include logic of any of the functions being discussed, we'll assume that the current output is a data frame containing the columns category, calc1, and calc2.
First, since KEY_ID is passed as an argument into each of the seven functions, you can create a column in the output data frame containing this value.
# assuming output frame has already been created, add id column
# by using rep() to repeat the value of the KEY_ID argument
output$id <- rep(KEY_ID,nrow(output))
Second, you'll want to similarly add the name of the current function as a column in your output data frame. This can be accomplished with match.call().
# add name of current function to output data frame
output$name_function <- rep(match.call()[[1]],nrow(output))

For loops in r to create subsets

I am trying to create a subset from an existing data frame where the variable "Readings" displays values which are greater than the previous reading, as well as the corresponding row entry for the "Time" variable.
The code I have written below only produces "NA" entries.
Data$Readings<-0
for (i in 1:nrow(Data)){
Pos.Readings<-Data[Data$Readings[i+1]>Data$Readings[i],]
}
Pos.Readings
I would like the new data frame to display the row entries for i and i+1 if i+1>i in the Readings variable.
Here is an example of the data
Time Readings
12:00:00 0.1
12:00:01 0.3
12:00:02 0.45
12:00:03 0.2
12:00:04 0.02
12:00:05 -0.7
12:00:06 -0.25
12:00:07 0.27
So, what I am aiming for should look like:
Time Readings
12:00:00 0.1
12:00:01 0.3
12:00:02 0.45
12:00:05 -0.7
12:00:06 -0.25
12:00:07 0.27
I have probably gone about writing the for loop incorrectly, but I hope my intentions are clear to all.
It looks like you care about the absolute value of readings being greater than the previous. If that is the case, try this:
comparisons <- Data$Readings[-nrow(Data)]
Data$prevReading <- 0 #or just a really small number that automatically keeps row 1
Data$prevReading[-1] <- comparisons
subsetData <- subset(Data, abs(prevReading) < abs(Readings))
subsetData <- subsetData[c("Time", "Readings")]
If you wanted the actual readings being compared and not the absolute values, just get rid of the two abs() commands when you subset.

In R, how to take the mean of a varying number of elements for each row in a data frame?

So I have a dataframe, PVALUES, like this:
PVALS <- read.csv(textConnection("PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01"),sep = " ")
That corresponds to another dataframe, DATA, like this:
DATA <- read.csv(textConnection("COL1 COL2 CO3
10 2 9
11 20 200
2 3 5"),sep=" ")
For every row in DATA, I'd like to take the mean of the numbers whose indices correspond to entries in PVALUES that are <= 0.05.
So, for example, the first row in PVALUES only has two entries <= 0.05, the entries in [1,2] and [1,3]. Therefore, for the first row of DATA, I want to take the mean of 2 and 9.
In the second row of PVALUES, only the entry [2,2] is <=0.05, so instead of taking the mean for the second row of DATA, I would just use DATA[20,20].
So, my output would look like:
MEANS
6.5
20
3.33
I thought I might be able to generate indices for every entry in PVALUES <=0.05, and then use that to select entries in DATA to use for the mean. I tried to use this command to generate indices:
exp <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
...but it only picks up on indices for entries the first column that are <=0.05. In my example above, it would only output [3,1].
Can anyone see what I'm doing wrong, or have ideas on how to tackle this problem?
Thank you!
It's a bit funny looking, but this should work
rowMeans(`is.na<-`(DATA,PVALUES>=.05), na.rm=T)
The "ugly" part is calling is.na<- without doing the automatic replacement, but here we just set all data with p-values larger than .05 to missing and then take the row means.
It's unclear to me exactly what you were doing with exp, but that type of method could work as well. Maybe with
expx <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
aggregate(val~row, cbind(expx,val=DATA[exp]), mean)
(renamed so as not to interfere with the built in exp() function)
Tested with
PVALUES<-read.table(text="PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01", header=T)
DATA<-read.table(text="COL1 COL2 CO3
10 2 9
11 20 200
2 3 5", header=T)
I usually enjoy MrFlick's responses but the use of is.na<- in that manner seems to violate my expectations of R code because it is destructively modifies the data. I admit that I probably should have been expecting that possibility because of assignment arrow but it surprised me nonetheless. (I don't object to data.table code because it is hones t and forthright about modifying its contents with the := function.) I also admit that my efforts to improve one it lead me down a rabbit hole where I found this equally "baroke" effort. (You have incorrectly averaged 2 and 9)
sapply( split( DATA[which( PVALS <= 0.05, arr.ind=TRUE)],
which( PVALS <= 0.05, arr.ind=TRUE)[,'row']),
mean)
1 2 3
5.500000 20.000000 3.333333

How to organize scores that belong to same condition in R

I've been searching for hours for a solution to my problem, but since I'm new to R and programming, I haven't really got the terminology down well enough to effectively search online for help.
Below is a simplified version of the data I am working with. In the full data there are close to 200 different items, and 24 subjects.
I need to be able to work with the data in terms of which "item" the scores belong with.
For example, I would like to be able to perform basic functions such as calculate the means for all the First scores on Item 3, or all the Second scores for Item 2 etc.
How should I approach this? Thanks!
Subject Item First score Second score
1 1 0.92 0.58
1 2 1.00 1.00
1 3 1.00 0.69
2 1 0.90 0.58
2 2 0.95 0.90
2 3 1.00 0.92
You could also use split()
FirstScore <- c(0.92,1.00,1.00,0.90,0.95,1.00)
Item <- rep(1:3,2)
FirstScoreByItem <- split(FirstScore, as.factor(Item))
To access scores for each item, use
FirstScoreByItem[1]
To calculate mean, use
mean(FirstScoreByItem[1])

Resources