I am fairly new to R, and am trying to automate a snake draft in R with a for loop. Essentially, I want to take a vector that has 9 columns (for each of the 9 teams) and take the first available player in that column (all 9 teams have a varying order of the same 36 players; ranked how each team captain feels the player will perform) and put it in a blank matrix that will ultimately have all the teams finalized.
As I have stated, there are 9 teams each drafting 4 players. Because it is a snake draft the "picking order" runs like this:
Team Captain 1 picks their first choice, then
Team Captain 2 picks their first choice (of the players left, Team Captain 1's first choice is no longer available), then
Team Captain 3 makes their first pick,
all the way to
Team Captain 9 who then takes their first pick AND their second pick, then
Team Captain 8 takes their second pick,
and this follows suit back to
Team Captain 1 who picks their second and third pick,
etc.
Because there are 9 Team Captains and 36 players to chose from, each team ultimately has four players (non-repeating). I hope I have explained this well enough. I love this site, and appreciate your help!
Here's a propose solution. Not the most elegant looking but should work for your problem:
players <- paste0("player", 1:36)
picks<-sample(players, 36)
draft <- matrix(NA, ncol=9, nrow=4)
for(i in 1:4){
if(i %in% c(1,3)) draft[i, 1:9] <- picks[(9*(i-1)+1):(9*(i-1)+ 9)]
if(i %in% c(2,4)) draft[i, ] <- rev(picks[(9*(i-1)+1):(9*(i-1)+ 9)])
}
draft
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] "player4" "player12" "player29" "player10" "player19" "player26" "player3" "player21" "player20"
[2,] "player17" "player7" "player9" "player5" "player6" "player23" "player15" "player35" "player13"
[3,] "player36" "player34" "player28" "player32" "player33" "player27" "player30" "player31" "player8"
[4,] "player11" "player22" "player2" "player18" "player24" "player25" "player16" "player1" "player14"
Here's a reasonably readable version:
set.seed(47)
players <- cbind(replicate(9, sample(1:36)), ID = 1:36) # column 10 is ID column
pick <- matrix(NA, 4, 9) # matrix to fill
for(round in 1:4){
direction <- if(round %% 2 == 1) {1:9} else {9:1}
for(team in direction){
pick[round, team] <- players[which.min(players[, team]), 'ID'] # store pick
players <- players[-which.min(players[, team]), , drop = FALSE] # erase player's row
}
}
pick # rows are rounds, columns are teams, numbers are player IDs
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] 18 5 20 6 27 36 24 34 26
# [2,] 19 28 32 1 23 33 30 2 17
# [3,] 21 15 8 9 13 7 35 31 14
# [4,] 16 3 4 22 10 11 29 25 12
Related
I have a big data frame with > 1 million lines representing time series data for several individuals (with different individual data in different columns).
In addition, I have a 3D array that contains encounter frame numbers that indicate from which frame in my time series I want to extract data.
For a given individual and encounter type, I want to extract one time series of e.g. 100 frames. However, as I have many replicates for each meeting type and each individual, I want to directly calculate the average time series per individual and encounter type.
I managed to do so using sapply embedded in two for loops. However, running these for loops is very slow, and I now wonder whether there is a faster way of implementing this calculation in R, or whether I should rather do it in C++. Below my code, and the small bit of my data:
nb_ind = 3;
response_duration = 100;
nb_meeting_types = 2;
nb_variables = 2;
speed_offset = 2;
MEETING_START_OFFSET = 50;
replicate = 20;
# behavior_data is a data frame with columns: frame,speed1,head1,speed2,head2,speed3,head3
# there are about 1 million rows
dim(behavior_data)
[1] 1080000 7
head(behavior_data)
frame speed1 head1 speed2 headd2 speed3 head3
1 0 0 25 2.4 179 1.1 16
2 1 1.5 20 2.0 -175 1.6 27
3 2 1.6 28 2.0 -178 1.0 37
4 3 0.8 56 1.6 170 0.8 37
5 4 0.3 56 1.8 162 0 40
# encounters is an array with frame numbers of dimension [nb_ind,replicate,nb_meeting_types]
# these frame number correspond to starting points of meetings, for which I want to calculate the speed
dim(encounters)
[1] 3 20 2
head(encounters[,,1])
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 12049 17693 23350 29018 34666 40327 68608 57293 74264 45980 113864 79922 119522 102552 51636 153462 91235 142151 159121 62948
[2,] 12036 17694 23352 29014 34674 40322 68606 57296 74268 45982 113865 79929 119521 102558 51639 153463 91242 142161 159168 62952
[3,] 12037 17694 23351 29011 34669 40329 68606 57298 74263 45985 NA 79921 NA 102550 51641 NA 91234 NA NA 62950
all_average_speeds = array(NaN, c(nb_ind, response_duration, nb_meeting_types))
for (j in 1:nb_ind){
#calculate the average speed response for each meeting type for a given individual
average_speed = numeric(0);
for (i in 1:nb_meeting_types){
# calculate the average speed response across all replicates of a given meeting type for a given individual
average_speed_type = sapply(1:response_duration, function(k){
mean(behavior_data[,(j-1)*nb_variables + speed_offset][which(behavior_data$frame == ((encounters[j,,i] + k-1) - MEETING_START_OFFSET)], na.rm=TRUE)
})
average_speed = rbind(average_speed, t(average_speed_type))
}
all_average_speeds[j,,] = average_speed;
}
How can I split a matrix into many submatrixes? Every submatrix should contain a selection of rows of the initial matrix.
The initial matrix is imported out of an .csv-file:
seri <- read.table("/PATH/seriation_testdaten.csv", sep="\t", head=TRUE, row.names=1)
This matrix seri contains numeric values like for example a matrix like seritest:
seritest <- matrix(1:100,10)
Now I would like to divide the rows of this matrix into groups. For example, I would like to have groups each with three rows. So one group should contain the rows 1,2 and 3, the next one the rows 4,5 and 6 and so on, until nrow(seri) is reached. It's no problem, if the last group just contains less than three rows.
Matrix 1:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 11 21 31 41 51 61 71 81 91
[2,] 2 12 22 32 42 52 62 72 82 92
[3,] 3 13 23 33 43 53 63 73 83 93
Matrix 2:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[4,] 4 14 24 34 44 54 64 74 84 94
[5,] 5 15 25 35 45 55 65 75 85 95
[6,] 6 16 26 36 46 56 66 76 86 96
Matrix 3:
...
That's the first step. But I would like to go further. In this example I have groups of three rows in the resulting submatrices. But I also want the resulting submatrices for groups of 1 row, 2 rows, 4 rows, 5 rows and so on up to nrow(seri) rows. So basically hundreds of submatrices which are each part of a unit based on the decision how big the group-size should be.
Thanks to the help of #farnsy I was able to produce working code, which does exactly what I want:
seritest <- matrix(1:10000,100)
a = nrow(seritest)
e = 1:nrow(seritest)
seri_sub <- list()
U=1
while(U<=a) {
Q=0
AQ=0
EQ=0
Uk <- U*1000;
repeat{
(e[U]*Q)+1 -> EQ;
Q=Q+1;
e[U]*Q -> AQ;
if(AQ>a) break
seri_sub[[Uk+Q]] <- seritest[EQ:AQ,];
};
U=U+1;
}
I can access the matrices by calling for example seri_sub[[3002]]. Like this I get the second (300**2**) group (rows 4,5 and 6) of the unit which is the result of a division of the initial matrix into groups of three (**3**002).
Like already mentioned by #Dason, this code is inefficient. Many list-elements are empty (NULL). Maybe somebody has an idea how to improve it.
This code runs without any further packages. R version 3.0.2 (2013-09-25). OS: Ubuntu 14.04 (64bit).
(I'm not a native speaker - please excuse the lack of eloquence)
seri doesn't seem like a list here. Neither do A or B, actually. Are you sure you are talking about lists at all? It looks more like you want to subset a matrix a bunch of times, creating submatrices. I can't imagine you actually want "random" names, either. That's crazy talk.
If you want to break up a matrix, why not store all the resulting matrices in an actual list?
myList <- list()
myList[[1]] <- seri[a:b,]
myList[[2]] <- seri[c:d,]
You can see how it would be pretty easy to put this in a loop. Now myList is a list of matrices and, for example, mylist[[i]] would be the i-th matrix. If you want the second row and third column entry, it would be mylist[[i]][2,3].
This is the first time I post to this forum, and I want to say from the start I am not a skilled programmer. So please let me know if the question or code were unclear!
I am trying to get the 95% confidence interval (CI) for an interaction (that is my test statistic) by doing bootstrapping. I am using the package "boot". My problem is that for every resample, I would like the randomization to be done within subjects, so that observations from different subjects are not mixed. Here is the code to generate a dataframe similar to mine. As you can see, I have two within-subjects factors ("Num" and "Gram" and I am interested in the interaction between both):
Subject = rep(c("S1","S2","S3","S4"),4)
Num = rep(c("singular","plural"),8)
Gram = rep(c("gram","gram","ungram","ungram"),4)
RT = c(657,775,678,895,887,235,645,916,930,768,890,1016,590,978,450,920)
data = data.frame(Subject,Num,Gram,RT)
This is the code I used to get the empirical interaction value:
summary(lm(RT ~ Num*Gram, data=data))
As you can see, the interaction between my two factors is -348. I want to get a bootstrap confidence interval for this statistic, which I can generate using the "boot" package:
# You need the following packages
install.packages("car")
install.packages("MASS")
install.packages("boot")
library("car")
library("MASS")
library("boot")
#Function to create the statistic to be boostrapped
boot.huber <- function(data, indices) {
data <- data[indices, ] #select obs. in bootstrap sample
mod <- lm(RT ~ Num*Gram, data=data)
coefficients(mod) #return coefficient vector
}
#Generate bootstrap estimate
data.boot <- boot(data, boot.huber, 1999)
#Get confidence interval
boot.ci(data.boot, index=4, type=c("norm", "perc", "bca"),conf=0.95) #4 gets the CI for the interaction
My problem is that I think the resamples should be generated without mixing the individual subjects observations: that is, to generate the new resamples, the observations from subject 1 (S1) should be shuffled within subject 1, not mixing them with the observations from subjects 2, etc... I don't know how "boot" is doing the resampling (I read the documentation but don't understand how the function is doing it)
Does anyone know how I could make sure that the resampling procedure used by "boot" respects subject level information?
Thanks a lot for your help/advice!
Just modify your call to boot() like this:
data.boot <- boot(data, boot.huber, 1999, strata=data$Subject)
?boot provides this description of the strata= argument, which does exactly what you are asking for:
strata: An integer vector or factor specifying the strata for
multi-sample problems. This may be specified for any
simulation, but is ignored when ‘sim = "parametric"’. When
‘strata’ is supplied for a nonparametric bootstrap, the
simulations are done within the specified strata.
Additional note:
To confirm that it's working as you'd like, you can call debugonce(boot), run the call above, and step through the debugger until the object i (whose rows contain the indices used to resample rows of data to create each bootstrap resample) has been assigned, and then have a look at it.
debugonce(boot)
data.boot <- boot(data, boot.huber, 1999, strata=data$Subject)
# Browse[2]>
## [Press return 34 times]
# Browse[2]> head(i)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
# [1,] 9 10 11 16 9 14 15 16 9 2 15 16 1 10
# [2,] 9 14 7 12 5 6 15 4 13 6 11 16 13 6
# [3,] 5 10 15 16 9 6 3 4 1 2 15 12 5 6
# [4,] 5 10 11 4 9 6 15 16 9 14 11 16 5 2
# [5,] 5 10 3 4 1 10 15 16 9 6 3 8 13 14
# [6,] 13 10 3 12 5 10 3 4 5 14 7 16 5 14
# [,15] [,16]
# [1,] 7 8
# [2,] 11 16
# [3,] 3 16
# [4,] 3 8
# [5,] 7 8
# [6,] 7 12
(You can enter Q to leave the debugger at any time.)
What I am looking to do is generate n vectors of length k such that each value in the vector is a random number, and that there are no repeated vectors regardless of order of the integers in each vector. For example, there would not be both the vectors 1,2,3 and 2,1,3.
So far what I have is the following to generate one vector, and I was planning on looping through this code to generate the n vectors.
vector<- sample(1:20000,k)
One idea that I had would be to sort all vectors in ascending order and then remove duplicate vectors. Is there an easier/more efficient way?
Thanks!
One approach would be to generate all the possible combinations, then choose from that set. This would guarantee no duplicates:
> tmp <- combn(100, 3)
> dim(tmp)
[1] 3 161700
> tmp[ , sample( ncol(tmp), 10 ) ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 34 35 28 3 3 29 8 24 50 53
[2,] 54 54 63 57 17 86 31 30 52 81
[3,] 97 79 87 92 53 94 90 83 87 97
Of course this will only work if the total number of combinations can fit in memory. Though there are ways to iterate through the combinations and only keep some, or to generate a sample of integers, then convert them to the appropriate combination.
If the number of samples you want is small relative to the number of possible combinations then the rejection method will probably be more efficient. But this could work better if the number of samples is large relative to the number of combinations which would lead to a large number of duplicates.
I am trying to use R to perform an operation (ideally with similarly displayed output) such as
> x<-1:6
> y<-1:6
> x%o%y
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
where each entry is found through addition not multiplication.
I would also be interested in creating the 36 ordered pairs (1,1) , (1,2), etc...
Furthermore, I want to use another vector like
z<-1:4
to create all the ordered triplets possible between x, y, and z.
I am using R to look into likelihoods of possible total when rolling dice with varied numbers of sizes.
Thank you for all your help! This site has been a big help to me. I appreciate anyone that takes the time to answer a stranger's question.
UPDATE So I found that `outer(x,y,'+') will do what I wanted first. But I still don't know how to create ordered pairs or ordered triplets.
Your first question is easily handled by outer:
outer(1:6,1:6,"+")
For the others, I suggest you try expand.grid, although there are specialized combination and permutation functions out there as well if you do a little searching.
expand.grid can answer your second question:
expand.grid(1:6,1:6)
expand.grid(1:6,1:6,1:4)