Generate a sequence of characters from 'A'-'Z' - r

I can make a sequence of numbers like this:
s = seq(from=1, to=10, by=1)
How do I make a sequence of characters from A-Z? This doesn't work:
seq(from=1, to=10)

Use LETTERS and letters (for uppercase and lowercase respectively).

Use the code you have with letters and/or LETTERS:
> LETTERS[seq( from = 1, to = 10 )]
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J"
> letters[seq( from = 1, to = 10 )]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

Just use the predefined variables letters and LETTERS.
And for completeness, here it something using seq:
R> rawToChar(as.raw(seq(as.numeric(charToRaw('a')), as.numeric(charToRaw('z')))))
[1] "abcdefghijklmnopqrstuvwxyz"
R>

R.oo package has an intToChar function, that uses ASCII values, if LETTERS and letters aren't any good. A is 65 in ASCII:
> require(R.oo)
> intToChar(65:79)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O"
or you can use the fact that the lowest unicode numbers are ascii and hence intToUtf8 in R-base like this:
> intToUtf8(65:78,multiple=TRUE)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
or faff around with rawToChar:
> rawToChar(as.raw(65:78))
[1] "ABCDEFGHIJKLMN"

LETTERS returns A-Z
To generate A-E for instance
Uppercase:
> LETTERS[1:5]
Lowercase
letters[1:5]

Related

Print vowels from the vector

I need to execute the vowels from the LETTERS R build-in vector
"A", "E", etc.
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U"
"V" "W" "X"
[25] "Y" "Z"
Maybe, someone knows how to do it with if() or other functions. Thank you in advance.
Looks like you need extract vowels, does this work:
> vowels <- c('A','E','I','O','U')
> LETTERS[sapply(vowels, function(ch) grep(ch, LETTERS))]
[1] "A" "E" "I" "O" "U"
>

Expand a vector by semicolon in some elements in R

Suppose I have a vector in R:
x<-c("a", "b", "c;d", "e", "f;g;h;i;j")
My question is how to expand x by the seperator ";", namely a desired output would be:
x
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
With strsplit:
unlist(strsplit(x, split = ";"))
# [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"

Convert vector with sets of values preceeded by "headers", to separate vectors

I have a vector with several sets of elements. Each set is preceded by a certain name, given by "A", "B" and "C" as an example over here:
v1 <- c("A", letters[1:5], "B", letters[6:7], "C", letters[8:12])
v1
# [1] "A" "a" "b" "c" "d" "e" "B" "f" "g" "C" "h" "i" "j" "k" "l"
The position of the "headers" can be obtained by grep:
start <- grep("[ABC]", v1)
# [1] 1 7 10
How do I proceed from here to extract the three sets of elements as separate vectors with the preceding "headers" as their name?
"A" <- letters[1:5]
"B" <- letters[6:7]
"C" <- letters[8:12]
A
# [1] "a" "b" "c" "d" "e"
B
# [1] "f" "g"
C
# [1] "h" "i" "j" "k" "l"
SOLUTION
I hope the kind soul who provided an answer to this question (his id eluded me), but later deleted his answer and all of his comments can be contacted, and the answer reinstated, so that he can be duly rewarded with upvotes.
Contrary to my initial claim, which was caused by a misunderstanding, his answer DID provide a viable solution.
Here's the gist of it, from what I can recall:
end <- start-1
end <- end[-1]
end[length(end)+1] <- length(v1)
[1] 6 9 15
map2(start+1, end, ~v1[.x:.y]) %>% set_names(v1[start])
$A
[1] "a" "b" "c" "d" "e"
$B
[1] "f" "g"
$C
[1] "h" "i" "j" "k" "l"

Complement a DNA sequence

Suppose I have a DNA sequence. I want to get the complement of it. I used the following code but I am not getting it. What am I doing wrong ?
s=readline()
ATCTCGGCGCGCATCGCGTACGCTACTAGC
p=unlist(strsplit(s,""))
h=rep("N",nchar(s))
unlist(lapply(p,function(d){
for b in (1:nchar(s)) {
if (p[b]=="A") h[b]="T"
if (p[b]=="T") h[b]="A"
if (p[b]=="G") h[b]="C"
if (p[b]=="C") h[b]="G"
}
Use chartr which is built for this purpose:
> s
[1] "ATCTCGGCGCGCATCGCGTACGCTACTAGC"
> chartr("ATGC","TACG",s)
[1] "TAGAGCCGCGCGTAGCGCATGCGATGATCG"
Just give it two equal-length character strings and your string. Also vectorised over the argument for translation:
> chartr("ATGC","TACG",c("AAAACG","TTTTT"))
[1] "TTTTGC" "AAAAA"
Note I'm doing the replacement on the string representation of the DNA rather than the vector. To convert the vector I'd create a lookup-map as a named vector and index that:
> p
[1] "A" "T" "C" "T" "C" "G" "G" "C" "G" "C" "G" "C" "A" "T" "C" "G" "C" "G" "T"
[20] "A" "C" "G" "C" "T" "A" "C" "T" "A" "G" "C"
> map=c("A"="T", "T"="A","G"="C","C"="G")
> unname(map[p])
[1] "T" "A" "G" "A" "G" "C" "C" "G" "C" "G" "C" "G" "T" "A" "G" "C" "G" "C" "A"
[20] "T" "G" "C" "G" "A" "T" "G" "A" "T" "C" "G"
The Bioconductor package Biostrings has many useful functions for this sort of operation. Install once:
source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
then use
library(Biostrings)
dna = DNAStringSet(c("ATCTCGGCGCGCATCGCGTACGCTACTAGC", "ACCGCTA"))
complement(dna)
To complement, in both upper and lower case, you can use chartr():
n <- "ACCTGccatGCATC"
chartr("acgtACGT", "tgcaTGCA", n)
# [1] "TGGACggtaCGTAG"
To take it a step further and reverse complement the nucleotide sequence, you can use the following function:
library(stringi)
rc <- function(nucSeq)
return(stri_reverse(chartr("acgtACGT", "tgcaTGCA", nucSeq)))
rc("AcACGTgtT")
# [1] "AacACGTgT"
There is also a package seqinr
library(seqinr)
comp(seq) # gives complement
rev(comp(seq)) # gives the reverse complement
Biostrings has a much smaller memory profile, but seqinr is nice also because you can choose the case of the bases (including mixed) and change them to anything you want, for example if you want a mix of T and U in the same sequence. Biostrings forces you to have either T or U.
sapply(p, switch, "A"="T", "T"="A","G"="C","C"="G")
A T C T C G G C G C G C A T C G C G T
"T" "A" "G" "A" "G" "C" "C" "G" "C" "G" "C" "G" "T" "A" "G" "C" "G" "C" "A"
A C G C T A C T A G C
"T" "G" "C" "G" "A" "T" "G" "A" "T" "C" "G"
If you do not want the complementary names, you can always strip them with unname.
unname(sapply(p, switch, "A"="T", "T"="A","G"="C","C"="G") )
[1] "T" "A" "G" "A" "G" "C" "C" "G" "C" "G" "C" "G" "T" "A" "G" "C" "G" "C"
[19] "A" "T" "G" "C" "G" "A" "T" "G" "A" "T" "C" "G"
>
Here a answer using base r. Written with a horrible formatting to make things clear and to keep it as a one-liner. It supports upper and lower cases.
revc = function(s){
paste0(
rev(
unlist(
strsplit(
chartr("ATGCatgc","TACGtacg",s)
, "") # from strsplit
) # from unlist
) # from rev
, collapse='') # from paste0
}
I've generalised the solution rev(comp(seq)) with the seqinr package:
install.packages("devtools")
devtools::install_github("TomKellyGenetics/tktools")
tktools::revcomp(seq)
This version is compatible with string inputs and is vectorised to handle list or vector input of multiple strings. The output class should match the input, including cases and types. This also support inputs containing "U" for RNA and RNA output sequences.
> seq <- "ATCTCGGCGCGCATCGCGTACGCTACTAGC"
> revcomp(seq)
[1] "GCTAGTAGCGTACGCGATGCGCGCCGAGAT"
> seq <- c("TATAAT", "TTTCGC", "atgcat")
> revcomp(seq)
TATAAT TTTCGC atgcat
"ATTATA" "GCGAAA" "atgcat"
See the manual or the TomKellyGenetics/tktools github package repository.

R - shuffle a list preserving element sizes

In R, I need an efficient solution to shuffle the elements contained within a list, preserving the total number of elements, and the local element sizes (in this case, each element of the list is a vector)
a<-LETTERS[1:6]
b<-LETTERS[6:10]
c<-LETTERS[c(9:15)]
l=list(a,b,c)
> l
[[1]]
[1] "A" "B" "C" "D" "E" "F"
[[2]]
[1] "F" "G" "H" "I" "J"
[[3]]
[1] "I" "J" "K" "L" "M" "N" "O"
The shuffling should randomly select the letters of the list (without replacement) and put them in a random position of any vector within the list.
I hope I have been clear! Thanks :-)
you may try recreating a second list with the skeleton of the first, and fill it with all the elements of the first list, like this:
u<-unlist(l)
l2<-relist(u[sample(length(u))],skeleton=l)
> l2
[[1]]
[1] "F" "A" "O" "I" "S" "Q"
[[2]]
[1] "R" "P" "K" "F" "G"
[[3]]
[1] "A" "N" "M" "J" "H" "G" "E" "B" "T" "C" "D" "L"
Hope this helps!
Like this...?
> set.seed(1)
> lapply(l, sample)
[[1]]
[1] "B" "F" "C" "D" "A" "E"
[[2]]
[1] "J" "H" "G" "F" "I"
[[3]]
[1] "J" "M" "O" "L" "N" "K" "I"

Resources