Efficient way of running multiple successive for loops in R? - r

I am trying to run several for loops in succession in R. I hope this simplified example of the kind of thing I am trying to do provides enough information, and that the question is relevant/interesting enough to a general audience.
Essentially, I have a pool of individuals (here represented by the 26 LETTERS and saved in a vector called 'ids'). I start with 2 of them randomly selected (called 'ids1') and run a for loop (here 5 times as defined by 'runs'). Those letters not picked get put into another vector called 'ids.left1'.
The first thing going on in the for loop in this example is that I am just randomly picking one of the letters five times. I am storing the result of this in another vector called result1. In this example I'm also storing those letters not used in another vector called 'otherresult1'. (My real-world reason for doing this would be using loops containing several different processes, not just these two).
set.seed(123)
#Initializing
ids<-LETTERS[1:26]
runs<-5
#1st time
result1 <- vector("list",runs)
otherresult1 <- vector("list",runs)
ids1<-sample(ids,2)
ids.left1<-setdiff(ids,ids1)
for (i in 1:runs) {
picked1<-sample(ids1, 1)
result1[[i]] <- picked1
otherresult1[[i]] <- setdiff(ids1,picked1)
}
result1x<-unlist(result1) #[1] "H" "T" "T" "H" "T"
The above is trivial. What I am trying to do next is to add an extra letter (randomly selected) to the pool (so we now have 3) and run the for loop again for the same number of times (5). I also want to store the now 23 letters not being used in a vector (ids.left2) and also store the results of this loop in result2. Those not selected get stored in otherresult2.
#2nd time
result2 <- vector("list",runs)
otherresult2 <- vector("list",runs)
ids2<-c(ids1, sample(ids.left1,1))
ids.left2<-setdiff(ids,ids2)
for (i in 1:runs) {
picked2<- sample(ids2, 1)
result2[[i]] <- picked2
otherresult2[[i]] <- setdiff(ids2,picked2)
}
result2x<-unlist(result2) #[1] "T" "T" "X" "T" "X"
This is repeated again. Another letter is added (so we now have 4), and the same for loop is run 5 times, and the results stored again in another vector. Those not used again get stored in otherresult3.
#3rd time
result3 <- vector("list",runs)
otherresult3 <- vector("list",runs)
ids3<-c(ids2, sample(ids.left2,1))
for (i in 1:runs) {
picked3 <- sample(ids3, 1)
result3[[i]] <- picked3
otherresult3[[i]] <- setdiff(ids3,picked3)
}
result3x<-unlist(result3) #[1] "H" "O" "H" "H" "T"
This is just putting the results all together.
#putting results together
results.final <- c(result1x,result2x,result3x)
results.final #[1] "H" "T" "T" "H" "T" "T" "T" "X" "T" "X" "H" "O" "H" "H" "T"
unlist(otherresult1) #[1] "T" "H" "H" "T" "H"
unlist(otherresult2) #[1] "H" "X" "H" "X" "H" "T" "H" "X" "H" "T"
unlist(otherresult3) #[1] "T" "X" "O" "H" "T" "X" "T" "X" "O" "T" "X" "O" "H" "X" "O"
This is all pretty easy when I am only running the for loop 3 times. However, if I wanted to do the same thing (adding in one individual into a pool of individuals) 1000 times, it would be crazy to manually write the code. (Obviously, I wouldn't be using letters if I ran it 1000 times but some other identifier).
My question is therefore, is it possible to more efficiently code these successive for loops?
EDIT: I added in another process in the for-loop (the result being stored in 'otherresult' vector) to try and make this more realistic.

A perfect time to use recursion
recCount <- 1 #which recursive iteration we are in
allLetters <- LETTERS[1:26]
endPoint <- 6 #after how many recursions do we stop
runs <- 5
recEx <- function(resultList,otherResultList,
inLetters,outLetters,
recCount)
{
newLetter <- sample(inLetters,ifelse(recCount==1,2,1)) #pick a letter, 2 if this is the first run
outLetters <- c(outLetters,newLetter) #add this letter to our pool of usable letters
inLetters <- inLetters[inLetters!=newLetter] #subtract this letter from the total pool
excludedList <- includedList <- list() #initialize the lists we will add to
for (i in 1:runs) {
picked1<-sample(outLetters, 1)
includedList[[i]] <- picked1
excludedList[[i]] <- setdiff(outLetters,picked1)
}
if(recCount == endPoint) return(list(c(resultList,list(includedList)), #if we're done
c(otherResultList,list(excludedList)))) else
return(recEx(c(resultList,list(includedList)), #pass in our results so far, and add the "included" list onto the end
c(otherResultList,list(excludedList)), #same with the "excluded" list
inLetters,outLetters,recCount+1))
}
finalResult <- recEx(list(),list(),allLetters,NULL,1)
> finalResult
[[1]]#1 is for your final results, #2 is for the excluded results
[[1]][[1]]# 1 through 6 are your 6 iterations, with 2 through 7 letters in each iteration
[[1]][[1]][[1]] #1 through 5 are your 5 runs
[1] "H"
[[1]][[1]][[2]]
[1] "T"
[[1]][[1]][[3]]
[1] "T"
[[1]][[1]][[4]]
[1] "H"
[[1]][[1]][[5]]
[1] "T"
[[1]][[2]]
[[1]][[2]][[1]]
[1] "T"
[[1]][[2]][[2]]
[1] "T"
[[1]][[2]][[3]]
[1] "X"
[[1]][[2]][[4]]
[1] "T"
[[1]][[2]][[5]]
[1] "X"
[[1]][[3]]
[[1]][[3]][[1]]
[1] "H"
[[1]][[3]][[2]]
[1] "N"
[[1]][[3]][[3]]
[1] "H"
[[1]][[3]][[4]]
[1] "H"
[[1]][[3]][[5]]
[1] "T"
[[1]][[4]]
[[1]][[4]][[1]]
[1] "Y"
[[1]][[4]][[2]]
[1] "N"
[[1]][[4]][[3]]
[1] "N"
[[1]][[4]][[4]]
[1] "Y"
[[1]][[4]][[5]]
[1] "N"
[[1]][[5]]
[[1]][[5]][[1]]
[1] "N"
[[1]][[5]][[2]]
[1] "N"
[[1]][[5]][[3]]
[1] "T"
[[1]][[5]][[4]]
[1] "H"
[[1]][[5]][[5]]
[1] "Q"
[[1]][[6]]
[[1]][[6]][[1]]
[1] "Y"
[[1]][[6]][[2]]
[1] "Q"
[[1]][[6]][[3]]
[1] "H"
[[1]][[6]][[4]]
[1] "N"
[[1]][[6]][[5]]
[1] "Q"
[[2]] #your excluded letters
[[2]][[1]]
[[2]][[1]][[1]]
[1] "T"
[[2]][[1]][[2]]
[1] "H"
[[2]][[1]][[3]]
[1] "H"
[[2]][[1]][[4]]
[1] "T"
[[2]][[1]][[5]]
[1] "H"
[[2]][[2]]
[[2]][[2]][[1]]
[1] "H" "X"
[[2]][[2]][[2]]
[1] "H" "X"
[[2]][[2]][[3]]
[1] "H" "T"
[[2]][[2]][[4]]
[1] "H" "X"
[[2]][[2]][[5]]
[1] "H" "T"
[[2]][[3]]
[[2]][[3]][[1]]
[1] "T" "X" "N"
[[2]][[3]][[2]]
[1] "H" "T" "X"
[[2]][[3]][[3]]
[1] "T" "X" "N"
[[2]][[3]][[4]]
[1] "T" "X" "N"
[[2]][[3]][[5]]
[1] "H" "X" "N"
[[2]][[4]]
[[2]][[4]][[1]]
[1] "H" "T" "X" "N"
[[2]][[4]][[2]]
[1] "H" "T" "X" "Y"
[[2]][[4]][[3]]
[1] "H" "T" "X" "Y"
[[2]][[4]][[4]]
[1] "H" "T" "X" "N"
[[2]][[4]][[5]]
[1] "H" "T" "X" "Y"
[[2]][[5]]
[[2]][[5]][[1]]
[1] "H" "T" "X" "Y" "Q"
[[2]][[5]][[2]]
[1] "H" "T" "X" "Y" "Q"
[[2]][[5]][[3]]
[1] "H" "X" "N" "Y" "Q"
[[2]][[5]][[4]]
[1] "T" "X" "N" "Y" "Q"
[[2]][[5]][[5]]
[1] "H" "T" "X" "N" "Y"
[[2]][[6]]
[[2]][[6]][[1]]
[1] "H" "T" "X" "N" "Q" "V"
[[2]][[6]][[2]]
[1] "H" "T" "X" "N" "Y" "V"
[[2]][[6]][[3]]
[1] "T" "X" "N" "Y" "Q" "V"
[[2]][[6]][[4]]
[1] "H" "T" "X" "Y" "Q" "V"
[[2]][[6]][[5]]
[1] "H" "T" "X" "N" "Y" "V"
This isn't the best structure for results imo, but this is as you specified. Unpacking these lists is trivial though

How about this?
set.seed(123)
#Initializing
ids =LETTERS[1:26]
runs=5
result1 = list()
temp = sample(ids,2)
j=1
results = c()
while(j<6) {
ids.left = ids[!(ids%in%temp)]
for(i in 1:runs){
result1[[i]] = sample(temp,1)
}
temp = c(temp, sample(ids.left,1))
j=j+1
results = c(results, unlist(result1))
}
results # [1] "H" "T" "T" "H" "T" "T" "T" "X" "T" "X" "H" "O" "H" "H" "T" "Y" "O" "O" "Y" "O" "O" "O" "T" "H" "Q"

Related

R: sample function repeats same results

I ran into something very weird with sample(). If I run the following line 5 times at the start of a session (in either RStudio or R), I would get the following results.
sample(letters,5,replace=TRUE)
[1] "b" "y" "d" "p" "n"
[1] "v" "n" "i" "s" "s"
[1] "d" "q" "a" "m" "x"
[1] "w" "s" "u" "h" "e"
[1] "b" "y" "g" "s" "e"
But if I restart the console and run it 5 times at the beginning of a new session, I would somehow get the same results -- every time. Is sample() (which I believe uses Mersenne Twister by default) supposed to do this? What should I do instead to get results that don't actually repeat?
set.seed(123)
> sample(letters,5,replace=TRUE)
[1] "h" "u" "k" "w" "y"
> sample(letters,5,replace=TRUE)
[1] "b" "n" "x" "o" "l"
> sample(letters,5,replace=TRUE)
[1] "y" "l" "r" "o" "c"
> sample(letters,5,replace=TRUE)
[1] "x" "g" "b" "i" "y"
> sample(letters,5,replace=TRUE)
[1] "x" "s" "q" "z" "r"
If you start a new session and change the set.seed value, you will get different results.
> set.seed(456)
> sample(letters,5,replace=TRUE)
[1] "c" "f" "t" "w" "u"
> sample(letters,5,replace=TRUE)
[1] "i" "c" "h" "g" "k"
> sample(letters,5,replace=TRUE)
[1] "j" "f" "t" "v" "p"
> sample(letters,5,replace=TRUE)
[1] "q" "v" "l" "s" "h"
> sample(letters,5,replace=TRUE)
[1] "e" "s" "x" "l" "v"
Hope that helps.

Matching between a vector and multiple vectors in a list in R

I have a list of vectors such as:
>list
[[1]]
[1] "a" "m" "l" "s" "t" "o"
[[2]]
[1] "a" "y" "o" "t" "e"
[[3]]
[1] "n" "a" "s" "i" "d"
I want to find the matches between each of them and the remaining (i.e. between the 1st and the other 2, the 2nd and the other 2, and so on) and keep the couple with the highest number of matches. I could do it with a "for" loop and intersect by couples. For example
for (i in 2:3) { intersect(list[[1]],list[[i]]) }
and then save the output into a vector or some other structure. However, this seems so inefficient to me (given than rather than 3 I have thousands) and I am wondering if R has some built-in function to do that in a clever way.
So the question would be:
Is there a way to look for matches of one vector to a list of vectors without the explicit use of a "for" loop?
I don't believe there is a built-in function for this. The best you could try is something like:
lsts <- lapply(1:5, function(x) sample(letters, 10)) # make some data (see below)
maxcomb <- which.max(apply(combs <- combn(length(lsts), 2), 2,
function(ix) length(intersect(lsts[[ix[1]]], lsts[[ix[2]]]))))
lsts <- lsts[combs[, maxcomb]]
# [[1]]
# [1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"
# [[2]]
# [1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"
A dump of the original:
[[1]]
[1] "z" "r" "j" "h" "e" "m" "w" "u" "q" "f"
[[2]]
[1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"
[[3]]
[1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"
[[4]]
[1] "c" "o" "t" "j" "d" "g" "u" "k" "w" "h"
[[5]]
[1] "f" "g" "q" "y" "d" "e" "n" "s" "w" "i"
datal <- list (a=c(2,2,1,2),
b=c(2,2,2,4,3),
c=c(1,2,3,4))
# all possible combinations
combs <- combn(length(datal), 2)
# split into list
combs <- split(combs, rep(1:ncol(combs), each = nrow(combs)))
# calculate length of intersection for every combination
intersections_length <- sapply(combs, function(y) {
length(intersect(datal[[y[1]]],datal[[y[2]]]))
}
)
# What lists have biggest intersection
combs[which(intersections_length == max(intersections_length))]

R - shuffle a list preserving element sizes

In R, I need an efficient solution to shuffle the elements contained within a list, preserving the total number of elements, and the local element sizes (in this case, each element of the list is a vector)
a<-LETTERS[1:6]
b<-LETTERS[6:10]
c<-LETTERS[c(9:15)]
l=list(a,b,c)
> l
[[1]]
[1] "A" "B" "C" "D" "E" "F"
[[2]]
[1] "F" "G" "H" "I" "J"
[[3]]
[1] "I" "J" "K" "L" "M" "N" "O"
The shuffling should randomly select the letters of the list (without replacement) and put them in a random position of any vector within the list.
I hope I have been clear! Thanks :-)
you may try recreating a second list with the skeleton of the first, and fill it with all the elements of the first list, like this:
u<-unlist(l)
l2<-relist(u[sample(length(u))],skeleton=l)
> l2
[[1]]
[1] "F" "A" "O" "I" "S" "Q"
[[2]]
[1] "R" "P" "K" "F" "G"
[[3]]
[1] "A" "N" "M" "J" "H" "G" "E" "B" "T" "C" "D" "L"
Hope this helps!
Like this...?
> set.seed(1)
> lapply(l, sample)
[[1]]
[1] "B" "F" "C" "D" "A" "E"
[[2]]
[1] "J" "H" "G" "F" "I"
[[3]]
[1] "J" "M" "O" "L" "N" "K" "I"

Letter "y" comes after "i" when sorting alphabetically

When using function sort(x), where x is a character, the letter "y" jumps into the middle, right after letter "i":
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t"
[21] "u" "v" "w" "x" "y" "z"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[21] "t" "u" "v" "w" "x" "z"
The reason may be that I am located in Lithuania, and this is "lithuanian-like" sorting of letters, but I need normal sorting. How do I change the sorting method back to normal inside R code?
I'm using R 2.15.2 on Win7.
You need to change the locale that R is running in. Either do that for your entire Windows install (which seems suboptimal) or within the R sessions via:
Sys.setlocale("LC_COLLATE", "C")
You can use any other valid locale string in place of "C" there, but that should get you back to the sort order for letters you want.
Read ?locales for more.
I suppose it is worth noting the sister function Sys.getlocale(), which queries the current setting of a locale parameter. Hence you could do
(locCol <- Sys.getlocale("LC_COLLATE"))
Sys.setlocale("LC_COLLATE", "lt_LT")
sort(letters)
Sys.setlocale("LC_COLLATE", locCol)
sort(letters)
Sys.getlocale("LC_COLLATE")
## giving:
> (locCol <- Sys.getlocale("LC_COLLATE"))
[1] "en_GB.UTF-8"
> Sys.setlocale("LC_COLLATE", "lt_LT")
[1] "lt_LT"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n"
[16] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "z"
> Sys.setlocale("LC_COLLATE", locCol)
[1] "en_GB.UTF-8"
> sort(letters)
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
[16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> Sys.getlocale("LC_COLLATE")
[1] "en_GB.UTF-8"
which of course is what #Hadley's Answer shows with_collate() doing somewhat more succinctly once you have devtools installed.
If you want to do this temporarily, devtools provides the with_collate function:
library(devtools)
with_collate("C", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
# [20] "t" "u" "v" "w" "x" "y" "z"
with_collate("lt_LT", sort(letters))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "y" "j" "k" "l" "m" "n" "o" "p" "q" "r"
# [20] "s" "t" "u" "v" "w" "x" "z"

How to print a character list from A to Z?

In R, how can I print a character list from A to Z? With integers I can say:
my_list = c(1:10)
> my_list
[1] 1 2 3 4 5 6 7 8 9 10
But can I do the same with characters? e.g.
my_char_list = c(A:Z)
my_char_list = c("A":"Z")
These don't work, I want the output to be: "A" "B" "C" "D", or separated by commas.
LETTERS
"A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X"
[25] "Y" "Z"
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z"
> LETTERS[5:10]
[1] "E" "F" "G" "H" "I" "J"
>
strsplit(intToUtf8(c(97:122)),"")
for a,b,c,...,z
strsplit(intToUtf8(c(65:90)),"")
for A,B,C,...,Z
#' range_ltrs() returns a vector of letters
#'
#' range_ltrs() returns a vector of letters,
#' starting with arg start and ending with arg stop.
#' Start and stop must be the same case.
#' If start is after stop, then a "backwards" vector is returned.
#'
#' #param start an upper or lowercase letter.
#' #param stop an upper or lowercase letter.
#'
#' #examples
#' > range_ltrs(start = 'A', stop = 'D')
#' [1] "A" "B" "C" "D"
#'
#' If start is after stop, then a "backwards" vector is returned.
#' > range_ltrs('d', 'a')
#' [1] "d" "c" "b" "a"
range_ltrs <- function (start, stop) {
is_start_upper <- toupper(start) == start
is_stop_upper <- toupper(stop) == stop
if (is_start_upper) stopifnot(is_stop_upper)
if (is_stop_upper) stopifnot(is_start_upper)
ltrs <- if (is_start_upper) LETTERS else letters
start_i <- which(ltrs == start)
stop_i <- which(ltrs == stop)
ltrs[start_i:stop_i]
}

Resources