I want to randomly disrupt the order of the letters that make up words in sentences. I can do the shuffling for single words, e.g.:
a <- "bach"
sample(unlist(str_split(a, "")), nchar(a))
[1] "h" "a" "b" "c"
but I fail to do it for sentences, e.g.:
b <- "bach composed fugues and cantatas"
What I've tried so far:
split into words:
b1 <- str_split(b, " ")
[[1]]
[1] "bach" "composed" "fugues" "and" "cantatas"
calculate the number of characters per word:
n <- lapply(b1, function(x) nchar(x))
n
[[1]]
[1] 4 8 6 3 8
split words in b1 into single letters:
b2 <- str_split(unlist(str_split(b, " ")), "")
b2
[[1]]
[1] "b" "a" "c" "h"
[[2]]
[1] "c" "o" "m" "p" "o" "s" "e" "d"
[[3]]
[1] "f" "u" "g" "u" "e" "s"
[[4]]
[1] "a" "n" "d"
[[5]]
[1] "c" "a" "n" "t" "a" "t" "a" "s"
Jumble the letters in each word based on the above:
lapply(b2, function(x) sample(unlist(x), unlist(n), replace = T))
[[1]]
[1] "h" "a" "c" "b"
[[2]]
[1] "o" "p" "o" "s"
[[3]]
[1] "g" "s" "s" "u"
[[4]]
[1] "d" "d" "a" "d"
[[5]]
[1] "c" "n" "s" "a"
That's obviously not the right result. How can I randomly jumble the sequence of letters in each word in the sentence?
After b2 you can randomly shuffle character using sample and paste the words back.
paste0(sapply(b2, function(x) paste0(sample(x), collapse = "")), collapse = " ")
#[1] "bhac moodscpe uefusg and tsatnaac"
Note that you don't need to mention the size in sample if you want the output to be of same length as input with replace = FALSE.
Related
After defining
> Seq.genes <- as.list(c("ATGCCCAAATTTGATTT","AGAGTTCCCACCAACG"))
I have a list of strings :
> Seq.genes[1:2]
[[1]]
[1] "ATGCCCAAATTTGATTT"
[[2]]
[1] "AGAGTTCCCACCAACG"
I would like to convert it in a list of vectors :
>Seq.genes[1:2]
[[1]]
[1]"A" "T" "G" "C" "C" "C" "A" "A" "A" "T" "T" "T" "G" "A" "T" "T" "T"
[[2]]
[1] "A" "G" "A" "G" "T" "T" "C" "C" "C" "A" "C" "C" "A" "A" "C" "G"
I tried something like :
for (i in length(Seq.genes)){
x <- Seq.genes[i]
Seq.genes[i] <- substring(x, seq(1,nchar(x),2), seq(1,nchar(x),2))
}
It may be better to have the strings in a vector rather than in a list. So, we could unlist, then do an strsplit
strsplit(unlist(Seq.genes), "")
sapply(Seq.genes, strsplit, split = '')
or
lapply(Seq.genes, strsplit, split = '')
I have a vector with several sets of elements. Each set is preceded by a certain name, given by "A", "B" and "C" as an example over here:
v1 <- c("A", letters[1:5], "B", letters[6:7], "C", letters[8:12])
v1
# [1] "A" "a" "b" "c" "d" "e" "B" "f" "g" "C" "h" "i" "j" "k" "l"
The position of the "headers" can be obtained by grep:
start <- grep("[ABC]", v1)
# [1] 1 7 10
How do I proceed from here to extract the three sets of elements as separate vectors with the preceding "headers" as their name?
"A" <- letters[1:5]
"B" <- letters[6:7]
"C" <- letters[8:12]
A
# [1] "a" "b" "c" "d" "e"
B
# [1] "f" "g"
C
# [1] "h" "i" "j" "k" "l"
SOLUTION
I hope the kind soul who provided an answer to this question (his id eluded me), but later deleted his answer and all of his comments can be contacted, and the answer reinstated, so that he can be duly rewarded with upvotes.
Contrary to my initial claim, which was caused by a misunderstanding, his answer DID provide a viable solution.
Here's the gist of it, from what I can recall:
end <- start-1
end <- end[-1]
end[length(end)+1] <- length(v1)
[1] 6 9 15
map2(start+1, end, ~v1[.x:.y]) %>% set_names(v1[start])
$A
[1] "a" "b" "c" "d" "e"
$B
[1] "f" "g"
$C
[1] "h" "i" "j" "k" "l"
I am trying to run several for loops in succession in R. I hope this simplified example of the kind of thing I am trying to do provides enough information, and that the question is relevant/interesting enough to a general audience.
Essentially, I have a pool of individuals (here represented by the 26 LETTERS and saved in a vector called 'ids'). I start with 2 of them randomly selected (called 'ids1') and run a for loop (here 5 times as defined by 'runs'). Those letters not picked get put into another vector called 'ids.left1'.
The first thing going on in the for loop in this example is that I am just randomly picking one of the letters five times. I am storing the result of this in another vector called result1. In this example I'm also storing those letters not used in another vector called 'otherresult1'. (My real-world reason for doing this would be using loops containing several different processes, not just these two).
set.seed(123)
#Initializing
ids<-LETTERS[1:26]
runs<-5
#1st time
result1 <- vector("list",runs)
otherresult1 <- vector("list",runs)
ids1<-sample(ids,2)
ids.left1<-setdiff(ids,ids1)
for (i in 1:runs) {
picked1<-sample(ids1, 1)
result1[[i]] <- picked1
otherresult1[[i]] <- setdiff(ids1,picked1)
}
result1x<-unlist(result1) #[1] "H" "T" "T" "H" "T"
The above is trivial. What I am trying to do next is to add an extra letter (randomly selected) to the pool (so we now have 3) and run the for loop again for the same number of times (5). I also want to store the now 23 letters not being used in a vector (ids.left2) and also store the results of this loop in result2. Those not selected get stored in otherresult2.
#2nd time
result2 <- vector("list",runs)
otherresult2 <- vector("list",runs)
ids2<-c(ids1, sample(ids.left1,1))
ids.left2<-setdiff(ids,ids2)
for (i in 1:runs) {
picked2<- sample(ids2, 1)
result2[[i]] <- picked2
otherresult2[[i]] <- setdiff(ids2,picked2)
}
result2x<-unlist(result2) #[1] "T" "T" "X" "T" "X"
This is repeated again. Another letter is added (so we now have 4), and the same for loop is run 5 times, and the results stored again in another vector. Those not used again get stored in otherresult3.
#3rd time
result3 <- vector("list",runs)
otherresult3 <- vector("list",runs)
ids3<-c(ids2, sample(ids.left2,1))
for (i in 1:runs) {
picked3 <- sample(ids3, 1)
result3[[i]] <- picked3
otherresult3[[i]] <- setdiff(ids3,picked3)
}
result3x<-unlist(result3) #[1] "H" "O" "H" "H" "T"
This is just putting the results all together.
#putting results together
results.final <- c(result1x,result2x,result3x)
results.final #[1] "H" "T" "T" "H" "T" "T" "T" "X" "T" "X" "H" "O" "H" "H" "T"
unlist(otherresult1) #[1] "T" "H" "H" "T" "H"
unlist(otherresult2) #[1] "H" "X" "H" "X" "H" "T" "H" "X" "H" "T"
unlist(otherresult3) #[1] "T" "X" "O" "H" "T" "X" "T" "X" "O" "T" "X" "O" "H" "X" "O"
This is all pretty easy when I am only running the for loop 3 times. However, if I wanted to do the same thing (adding in one individual into a pool of individuals) 1000 times, it would be crazy to manually write the code. (Obviously, I wouldn't be using letters if I ran it 1000 times but some other identifier).
My question is therefore, is it possible to more efficiently code these successive for loops?
EDIT: I added in another process in the for-loop (the result being stored in 'otherresult' vector) to try and make this more realistic.
A perfect time to use recursion
recCount <- 1 #which recursive iteration we are in
allLetters <- LETTERS[1:26]
endPoint <- 6 #after how many recursions do we stop
runs <- 5
recEx <- function(resultList,otherResultList,
inLetters,outLetters,
recCount)
{
newLetter <- sample(inLetters,ifelse(recCount==1,2,1)) #pick a letter, 2 if this is the first run
outLetters <- c(outLetters,newLetter) #add this letter to our pool of usable letters
inLetters <- inLetters[inLetters!=newLetter] #subtract this letter from the total pool
excludedList <- includedList <- list() #initialize the lists we will add to
for (i in 1:runs) {
picked1<-sample(outLetters, 1)
includedList[[i]] <- picked1
excludedList[[i]] <- setdiff(outLetters,picked1)
}
if(recCount == endPoint) return(list(c(resultList,list(includedList)), #if we're done
c(otherResultList,list(excludedList)))) else
return(recEx(c(resultList,list(includedList)), #pass in our results so far, and add the "included" list onto the end
c(otherResultList,list(excludedList)), #same with the "excluded" list
inLetters,outLetters,recCount+1))
}
finalResult <- recEx(list(),list(),allLetters,NULL,1)
> finalResult
[[1]]#1 is for your final results, #2 is for the excluded results
[[1]][[1]]# 1 through 6 are your 6 iterations, with 2 through 7 letters in each iteration
[[1]][[1]][[1]] #1 through 5 are your 5 runs
[1] "H"
[[1]][[1]][[2]]
[1] "T"
[[1]][[1]][[3]]
[1] "T"
[[1]][[1]][[4]]
[1] "H"
[[1]][[1]][[5]]
[1] "T"
[[1]][[2]]
[[1]][[2]][[1]]
[1] "T"
[[1]][[2]][[2]]
[1] "T"
[[1]][[2]][[3]]
[1] "X"
[[1]][[2]][[4]]
[1] "T"
[[1]][[2]][[5]]
[1] "X"
[[1]][[3]]
[[1]][[3]][[1]]
[1] "H"
[[1]][[3]][[2]]
[1] "N"
[[1]][[3]][[3]]
[1] "H"
[[1]][[3]][[4]]
[1] "H"
[[1]][[3]][[5]]
[1] "T"
[[1]][[4]]
[[1]][[4]][[1]]
[1] "Y"
[[1]][[4]][[2]]
[1] "N"
[[1]][[4]][[3]]
[1] "N"
[[1]][[4]][[4]]
[1] "Y"
[[1]][[4]][[5]]
[1] "N"
[[1]][[5]]
[[1]][[5]][[1]]
[1] "N"
[[1]][[5]][[2]]
[1] "N"
[[1]][[5]][[3]]
[1] "T"
[[1]][[5]][[4]]
[1] "H"
[[1]][[5]][[5]]
[1] "Q"
[[1]][[6]]
[[1]][[6]][[1]]
[1] "Y"
[[1]][[6]][[2]]
[1] "Q"
[[1]][[6]][[3]]
[1] "H"
[[1]][[6]][[4]]
[1] "N"
[[1]][[6]][[5]]
[1] "Q"
[[2]] #your excluded letters
[[2]][[1]]
[[2]][[1]][[1]]
[1] "T"
[[2]][[1]][[2]]
[1] "H"
[[2]][[1]][[3]]
[1] "H"
[[2]][[1]][[4]]
[1] "T"
[[2]][[1]][[5]]
[1] "H"
[[2]][[2]]
[[2]][[2]][[1]]
[1] "H" "X"
[[2]][[2]][[2]]
[1] "H" "X"
[[2]][[2]][[3]]
[1] "H" "T"
[[2]][[2]][[4]]
[1] "H" "X"
[[2]][[2]][[5]]
[1] "H" "T"
[[2]][[3]]
[[2]][[3]][[1]]
[1] "T" "X" "N"
[[2]][[3]][[2]]
[1] "H" "T" "X"
[[2]][[3]][[3]]
[1] "T" "X" "N"
[[2]][[3]][[4]]
[1] "T" "X" "N"
[[2]][[3]][[5]]
[1] "H" "X" "N"
[[2]][[4]]
[[2]][[4]][[1]]
[1] "H" "T" "X" "N"
[[2]][[4]][[2]]
[1] "H" "T" "X" "Y"
[[2]][[4]][[3]]
[1] "H" "T" "X" "Y"
[[2]][[4]][[4]]
[1] "H" "T" "X" "N"
[[2]][[4]][[5]]
[1] "H" "T" "X" "Y"
[[2]][[5]]
[[2]][[5]][[1]]
[1] "H" "T" "X" "Y" "Q"
[[2]][[5]][[2]]
[1] "H" "T" "X" "Y" "Q"
[[2]][[5]][[3]]
[1] "H" "X" "N" "Y" "Q"
[[2]][[5]][[4]]
[1] "T" "X" "N" "Y" "Q"
[[2]][[5]][[5]]
[1] "H" "T" "X" "N" "Y"
[[2]][[6]]
[[2]][[6]][[1]]
[1] "H" "T" "X" "N" "Q" "V"
[[2]][[6]][[2]]
[1] "H" "T" "X" "N" "Y" "V"
[[2]][[6]][[3]]
[1] "T" "X" "N" "Y" "Q" "V"
[[2]][[6]][[4]]
[1] "H" "T" "X" "Y" "Q" "V"
[[2]][[6]][[5]]
[1] "H" "T" "X" "N" "Y" "V"
This isn't the best structure for results imo, but this is as you specified. Unpacking these lists is trivial though
How about this?
set.seed(123)
#Initializing
ids =LETTERS[1:26]
runs=5
result1 = list()
temp = sample(ids,2)
j=1
results = c()
while(j<6) {
ids.left = ids[!(ids%in%temp)]
for(i in 1:runs){
result1[[i]] = sample(temp,1)
}
temp = c(temp, sample(ids.left,1))
j=j+1
results = c(results, unlist(result1))
}
results # [1] "H" "T" "T" "H" "T" "T" "T" "X" "T" "X" "H" "O" "H" "H" "T" "Y" "O" "O" "Y" "O" "O" "O" "T" "H" "Q"
I have a list of vectors such as:
>list
[[1]]
[1] "a" "m" "l" "s" "t" "o"
[[2]]
[1] "a" "y" "o" "t" "e"
[[3]]
[1] "n" "a" "s" "i" "d"
I want to find the matches between each of them and the remaining (i.e. between the 1st and the other 2, the 2nd and the other 2, and so on) and keep the couple with the highest number of matches. I could do it with a "for" loop and intersect by couples. For example
for (i in 2:3) { intersect(list[[1]],list[[i]]) }
and then save the output into a vector or some other structure. However, this seems so inefficient to me (given than rather than 3 I have thousands) and I am wondering if R has some built-in function to do that in a clever way.
So the question would be:
Is there a way to look for matches of one vector to a list of vectors without the explicit use of a "for" loop?
I don't believe there is a built-in function for this. The best you could try is something like:
lsts <- lapply(1:5, function(x) sample(letters, 10)) # make some data (see below)
maxcomb <- which.max(apply(combs <- combn(length(lsts), 2), 2,
function(ix) length(intersect(lsts[[ix[1]]], lsts[[ix[2]]]))))
lsts <- lsts[combs[, maxcomb]]
# [[1]]
# [1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"
# [[2]]
# [1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"
A dump of the original:
[[1]]
[1] "z" "r" "j" "h" "e" "m" "w" "u" "q" "f"
[[2]]
[1] "m" "v" "x" "d" "a" "g" "r" "b" "s" "t"
[[3]]
[1] "w" "v" "t" "i" "d" "p" "l" "e" "s" "x"
[[4]]
[1] "c" "o" "t" "j" "d" "g" "u" "k" "w" "h"
[[5]]
[1] "f" "g" "q" "y" "d" "e" "n" "s" "w" "i"
datal <- list (a=c(2,2,1,2),
b=c(2,2,2,4,3),
c=c(1,2,3,4))
# all possible combinations
combs <- combn(length(datal), 2)
# split into list
combs <- split(combs, rep(1:ncol(combs), each = nrow(combs)))
# calculate length of intersection for every combination
intersections_length <- sapply(combs, function(y) {
length(intersect(datal[[y[1]]],datal[[y[2]]]))
}
)
# What lists have biggest intersection
combs[which(intersections_length == max(intersections_length))]
In R, how can I print a character list from A to Z? With integers I can say:
my_list = c(1:10)
> my_list
[1] 1 2 3 4 5 6 7 8 9 10
But can I do the same with characters? e.g.
my_char_list = c(A:Z)
my_char_list = c("A":"Z")
These don't work, I want the output to be: "A" "B" "C" "D", or separated by commas.
LETTERS
"A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X"
[25] "Y" "Z"
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x"
[25] "y" "z"
> LETTERS[5:10]
[1] "E" "F" "G" "H" "I" "J"
>
strsplit(intToUtf8(c(97:122)),"")
for a,b,c,...,z
strsplit(intToUtf8(c(65:90)),"")
for A,B,C,...,Z
#' range_ltrs() returns a vector of letters
#'
#' range_ltrs() returns a vector of letters,
#' starting with arg start and ending with arg stop.
#' Start and stop must be the same case.
#' If start is after stop, then a "backwards" vector is returned.
#'
#' #param start an upper or lowercase letter.
#' #param stop an upper or lowercase letter.
#'
#' #examples
#' > range_ltrs(start = 'A', stop = 'D')
#' [1] "A" "B" "C" "D"
#'
#' If start is after stop, then a "backwards" vector is returned.
#' > range_ltrs('d', 'a')
#' [1] "d" "c" "b" "a"
range_ltrs <- function (start, stop) {
is_start_upper <- toupper(start) == start
is_stop_upper <- toupper(stop) == stop
if (is_start_upper) stopifnot(is_stop_upper)
if (is_stop_upper) stopifnot(is_start_upper)
ltrs <- if (is_start_upper) LETTERS else letters
start_i <- which(ltrs == start)
stop_i <- which(ltrs == stop)
ltrs[start_i:stop_i]
}