I have a problem with the sample function. I have an error that incorrect number of probabilities. Can I use probability in another way? I don't know that this function works on intervals.
OL_x = c(15.0:47.0,0.0:15.0,47:80,80:105)
x = sample(OL_x,1000,replace = TRUE,prob = c(0.60,0.22,0.13,0.05) )+ runif(1000,0,1)
You need to have a probability associated with each value, i don't know a way to assign a probability to an interval, so doing it "by hand" could be like:
probs = c(rep(0.60, 48-15), rep(0.22,16-0), rep(0.13, 81-47), rep(0.05, 106-80))
x = sample(OL_x, 1000, replace = TRUE, prob = probs) + runif(1000,0,1)
This is not much efficient because you need to calculate the size of each interval by hand, there are probably better ways of doing this.
The prob argument can be length 1 or one value for each element of x. OL_x is a vector with 109 elements, since the : integer sequence operator expands out your values. Not quite sure what you are trying to create, but if you are after 1000 values drawn from the values presented with the probabilities described, try:
# keep groups separate as a list
OL_x = list(15.0:47.0,0.0:15.0,47:80,80:105)
# number of values in each group
vapply(X = OL_x, FUN = length, FUN.VALUE = 0L)
# [1] 33 16 34 26
# create 109 probabilities
rep(c(0.60,0.22,0.13,0.05), times = vapply(X = OL_x, FUN = length, FUN.VALUE = 0L))
# [1] 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60
# [14] 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.60
# [27] 0.60 0.60 0.60 0.60 0.60 0.60 0.60 0.22 0.22 0.22 0.22 0.22 0.22
# ...
# create 1000 samples
x = sample(
x = unlist(OL_x),
size = 1000,
replace = TRUE,
prob = rep(c(0.60,0.22,0.13,0.05),
times = vapply(X = OL_x, FUN = length, FUN.VALUE = 0L))
) + runif(1000,0,1)
head(x)
# [1] 18.826530 36.948981 15.366685 5.142625 47.659682 14.946690
Related
I am trying to separate values for the estimates and CIs into three columns, so that the column with info of the type 99.99[-99.9,99.9] is converted into three separated columns.
Please consider the data:
out <-
structure(list(name = c("total_gray_vol_0_to_psychosis_24", "total_gray_vol_24_to_psychosis_48",
"psychosis_0_to_total_gray_vol_24", "psychosis_24_to_total_gray_vol_48"
), Std.Estimate = c(0.304045656442265, 1.48352171485462, 0.673583361513608,
0.703098685562618), Std.SE = c(0.239964279466103, 2.72428816136731,
0.112111316151443, 0.14890331153936), CI = c("0.3 [-0.17, 0.77]",
"1.48 [-3.86, 6.82]", "0.67 [0.45, 0.89]", "0.7 [0.41, 0.99]"
)), class = "data.frame", row.names = c(NA, -4L))
The farthest I got was to extract the first digit with:
library(stringr)
str_match(out$CI, pattern= "([[0-9]+]*)([[0-9]+]*)([[0-9]+]*)")
But this is not working, as it is returning only the first digits, and for some reason four columns.
How do I split the column CI into three columns (estimate, lower, upper) correctly?
You could also use tidyr::extract for this purpose as follows. Also note that in regex argument you need to define as many capturing groups as the length of into argument.
out %>%
extract(CI, c('estimate', 'lower', 'upper'), '([-\\d.]+)\\s+\\[([-\\d.]+)\\W+([-\\d.]+)\\]')
name Std.Estimate Std.SE estimate lower upper
1 total_gray_vol_0_to_psychosis_24 0.3040457 0.2399643 0.3 -0.17 0.77
2 total_gray_vol_24_to_psychosis_48 1.4835217 2.7242882 1.48 -3.86 6.82
3 psychosis_0_to_total_gray_vol_24 0.6735834 0.1121113 0.67 0.45 0.89
4 psychosis_24_to_total_gray_vol_48 0.7030987 0.1489033 0.7 0.41 0.99
Here is an option using tidyr::separate
out %>%
separate(CI, c("estimate", "lower", "upper"), sep = "\\s|[|]") %>%
mutate(across(
c(estimate, lower, upper),
~ .x %>% str_remove_all("\\[|\\]|,|\\s") %>% as.numeric()))
# name Std.Estimate Std.SE estimate lower upper
#1 total_gray_vol_0_to_psychosis_24 0.3040457 0.2399643 0.30 -0.17 0.77
#2 total_gray_vol_24_to_psychosis_48 1.4835217 2.7242882 1.48 -3.86 6.82
#3 psychosis_0_to_total_gray_vol_24 0.6735834 0.1121113 0.67 0.45 0.89
#4 psychosis_24_to_total_gray_vol_48 0.7030987 0.1489033 0.70 0.41 0.99
First, split entries on a white space, "[" or "]", then remove these characters from the resulting new columns and coerce to numeric.
Using base R
out <- cbind(out, read.table(text = gsub("[][]|,", "", out$CI),
header = FALSE, col.names = c("estimate", "lower", "upper")))
-output
> out$CI <- NULL
> out
name Std.Estimate Std.SE estimate lower upper
1 total_gray_vol_0_to_psychosis_24 0.3040457 0.2399643 0.30 -0.17 0.77
2 total_gray_vol_24_to_psychosis_48 1.4835217 2.7242882 1.48 -3.86 6.82
3 psychosis_0_to_total_gray_vol_24 0.6735834 0.1121113 0.67 0.45 0.89
4 psychosis_24_to_total_gray_vol_48 0.7030987 0.1489033 0.70 0.41 0.99
aIn R, how do one run a tournament simulation?
I have the probabilities of each teams chance of winning against the other pairs, for example:
prob_res <- matrix(round(runif(64),2), 8, 8)
prob_res[lower.tri(prob_res, diag = TRUE)] <- 0
prob_res <- as.data.frame(prob_res)
colnames(prob_res) <- 1:8
rownames(prob_res) <- 1:8
Which would mean something like this:
1 2 3 4 5 6 7 8
1 0 0.76 0.35 0.81 0.95 0.08 0.47 0.26
2 0 0.00 0.24 0.34 0.54 0.48 0.53 0.54
3 0 0.00 0.00 0.47 0.51 0.68 0.50 0.80
4 0 0.00 0.00 0.00 0.52 0.59 0.38 0.91
5 0 0.00 0.00 0.00 0.00 0.05 0.88 0.64
6 0 0.00 0.00 0.00 0.00 0.00 0.23 0.65
7 0 0.00 0.00 0.00 0.00 0.00 0.00 0.77
8 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
The next step would be to run a set of simulations, say n = 100000
First the quarter-finals (best out of 3):
1 vs 8
2 vs 7
3 vs 6
4 vs 5
And then the winners of each pair face off in the semi-finals:
1-8 winner VS 4-5 winner
2-7 winner VS 3-6 winner
Winners move on to the final. All is best out of 3.
What approach/package could I use to run bracket simulations? I did find a package called mRchmadness but it's too specific to handle this simulation.
I have created some dummy code that can help you figure out how to do it. The code is not optimized at all, but it is quite linear for you to understand how to do it.
prob_res <- matrix(round(runif(64),2), 8, 8)
prob_res[lower.tri(prob_res, diag = TRUE)] <- 0
prob_res <- as.data.frame(prob_res)
colnames(prob_res) <- 1:8
rownames(prob_res) <- 1:8
prob_res
## Total number of combinations
posscombi<-t(combn(1:8, 2))
## This function gives you winners of the match with n repetitionmatches against every other team possible combination of teams.
## It "reproduces" like the whole league assuming winning probabilities are static.
League <- function(repetitionMatches, posscomb , prob_res)
{
TotalVect<-integer(0)
for(i in 1:nrow(posscomb)){
pair <- posscomb[i,]
Vect<-sample(pair,
size = repetitionMatches,
prob = c(prob_res[pair[1], pair[2]], 1-prob_res[pair[1], pair[2]]),
replace = TRUE)
TotalVect <- c(TotalVect, Vect)
}
return(table(TotalVect))
}
Result<-League(100,posscomb = posscombi, prob_res= prob_res)
Myorder<-order(Result)
### Quarters
pair1<- c(names(Result)[Myorder[c(1,8)]])
pair2<- c(names(Result)[Myorder[c(2,7)]])
pair3<- c(names(Result)[Myorder[c(3,6)]])
pair4<- c(names(Result)[Myorder[c(4,5)]])
## This function gives you the results to n matches (being 3 in the example)
PlayMatch<-function(pairs, numMatches){
Res <-sample(pairs, size = numMatches,
prob = c(prob_res[pairs[1], pairs[2]], 1-prob_res[pairs[1], pairs[2]]),
replace = TRUE)
return(table(Res))
}
# Results of the matches
winner1<-PlayMatch(pairs = pair1, 3)
winner2<-PlayMatch(pairs = pair2, 3)
winner3<-PlayMatch(pairs = pair3, 3)
winner4<-PlayMatch(pairs = pair4, 3)
## Semis
#Choosing the winning teams
pair1<- c(names(winner1)[which.max(winner1)],names(winner2)[which.max(winner2)])
pair2<- c(names(winner3)[which.max(winner3)],names(winner4)[which.max(winner4)])
winner1<-PlayMatch(pairs = pair1, 3)
winner2<-PlayMatch(pairs = pair2, 3)
## Final
# Same as before
pair1<- c(names(winner1)[which.max(winner1)],names(winner2)[which.max(winner2)])
winner1<-PlayMatch(pairs = pair1, 3)
paste0( "team ",names(winner1)[which.max(winner1)], " is the winner!")
Working with levelplot in lattice, I have figured out how to display the corresponding value of each cell. For a matrix m:
myPanel <- function(x,y,z, ...){
panel.levelplot(x,y,z,...)
panel.text(x,y, round(m,2),col=bw[col.m])
}
levelplot(m, col.regions=col.range, colorkey=NULL, xlab=NULL, ylab=NULL,
scales = list(x = list(draw = FALSE), y = list(draw = FALSE)),
panel= myPanel)
The rounded matrix values are
round(m,2)
13 14 15 16 17 18
GDcsp -0.44 -0.34 -0.39 -0.35 -0.53 -0.60
GDsor 0.14 0.07 0.03 0.01 0.06 0.09
GDdup 0.43 0.36 0.34 0.36 0.46 0.52
GDhsw 0.22 0.05 0.11 0.00 0.20 0.26
Gdwpa 0.17 0.25 0.32 0.37 0.46 0.47
The problem is that -0.60 and 0.00 are displayed in the corresponding cell as 0.6 and 0, respectively, while I would like to have all numbers with two decimals. Any idea to solve this would be most welcome.
myPanel <- function(x,y,z, ...){
panel.levelplot(x,y,z,...)
panel.text(x,y, sprintf("%.2f", m))
}
levelplot(m, colorkey=NULL, xlab=NULL, ylab=NULL,
scales = list(x = list(draw = FALSE), y = list(draw = FALSE)),
panel= myPanel)
You can use sprintf to force the output to be 2 decimal places.
I have tried to sample values of two columns that are related (diversification rates of several siter groups), but I have no idea of how to do it. I am trying with sample function, but it limits me so I cannot choose any further condition.
df<-data.frame("M"=c(0.06,0.14,0.05,0.07), "H"=c(0.06,0.08,0.04,0.05))
df
# M H
# 1 0.06 0.06
# 2 0.14 0.08
# 3 0.05 0.04
# 4 0.07 0.05
sample(df,size=1000,replace=TRUE)
When I use this command, it resamples rows and columns:
H M M.1 M.2 M.3
1 0.06 0.06 0.06 0.06 0.06
2 0.08 0.14 0.14 0.14 0.14
3 0.04 0.05 0.05 0.05 0.05
4 0.05 0.07 0.07 0.07 0.07
...
But I want it to only sample one value from each row, and go to the next row with the same condition until the end of the rows. Finally, when there are no more rows, it should start all over again up to size=1000 so I can have a vector of length 1000.
Example of what I want (r = row, c = column): 0.06(r1c1), 0.14(r2c1), 0.05(r3c1), 0.05(r4c2), 0.06(r1c2), 0.14(r2c1),0.03(r3c2), 0.07(r4c1) and so on.
Thank you in advance for your help!
EDITED:
I think that what I am looking for is something like a loop function, but I still do not know how to do it.
You should first create an indexing matrix of two columns (row index and column index), then index the original matrix with it.
idx <- matrix(c(rep(1:4,250), sample(1:2, 1000, replace=T)), ncol=2)
res <- as.matrix(df)[idx]
With your specifications, you'll need to use a custom function.
Here's one small way to do it:
myfunc <- function(dataframe, nsamples = 1000){
rows = ((0:nsamples)%%nrow(df)) + 1 #use the %% to get the row to sample
cols = sample(ncol(df), nsamples, replace = TRUE) #and the cols
sapply(1:nsamples, function(x){df[rows[x],cols[x]]}) #sapply to get as a vector
}
myfunc(df,10)
[1] 0.08 0.05 0.07 0.06 0.08 0.05 0.05 0.06 0.08 0.05
I have a data frame that looks like this:
df <-
ID TIME AMT k10 k12 k21
1.00 0.00 50.00 0.10 0.40 0.01
1.00 1.00 0.00 0.10 0.40 0.01
1.00 2.00 0.00 0.10 0.40 0.01
1.00 3.00 50.00 0.10 0.40 0.01
1.00 4.00 0.00 0.10 0.40 0.01
2.00 0.00 100.00 0.25 0.50 0.06
2.00 1.00 0.00 0.25 0.50 0.06
2.00 2.00 0.00 0.25 0.50 0.06
I am using the values of k10, k12, k21 to process certain calculations in the function below. Each of these values is specific to a subject ID and doesn't with time. My Question is: How can I can write it in the function so it uses, the first value for each subject ID? As you may notice in the function below, this is what I am currently using:
k10 <- d$k10
k12 <- d$k12
k21 <- d$k21
Each of these gives a vector of the same value at all time points which is obviously no need for that. I just need one value for each. I think that is one reason why I am getting warnings saying number of items to replace is not a multiple of replacement length
#This is the function that I am using:
TwoCompIVbolus <- function(d){
#set initial values in the compartments
d$A1[d$TIME==0] <- d$AMT[d$TIME==0] # drug amount in the central compartment at time zero.
d$A2[d$TIME==0] <- 0 # drug amount in the peripheral compartment at time zero.
k10 <- d$k10
k12 <- d$k12
k21 <- d$k21
k20 <- 0
E1 <- k10+k12
E2 <- k21+k20
#calculate hybrid rate constants
lambda1 <- 0.5*(k12+k21+k10+sqrt((k12+k21+k10)^2-4*k21*k10))
lambda2 <- 0.5*(k12+k21+k10-sqrt((k12+k21+k10)^2-4*k21*k10))
for(i in 2:nrow(d))
{
t <- d$TIME[i]-d$TIME[i-1]
A1last <- d$A1[i-1]
A2last <- d$A2[i-1]
A1term = (((A1last*E2+A2last*k21)-A1last*lambda1)*exp(-t*lambda1)-((A1last*E2+A2last*k21)-A1last*lambda2)*exp(-t*lambda2))/(lambda2-lambda1)
d$A1[i] = A1term + d$AMT[i] #Amount in the central compartment
A2term = (((A2last*E1+A1last*k12)-A2last*lambda1)*exp(-t*lambda1)-((A2last*E1+A1last*k12)-A2last*lambda2)*exp(-t*lambda2))/(lambda2-lambda1)
d$A2[i] = A2term #Amount in the peripheral compartment
}
d
}
#to apply it for each subject
simdf <- ddply(df, .(ID), TwoCompIVbolus)
You can just use k10 <- d$k10[1]