Finding midpoint of string in R (mid character of a word) - r

I'd like to find the midpoint of any word after the following is done to the word:
>x = 'hello'
>y = strsplit(x, '')
>y
[[1]]
[1] "h" "e" "l" "l" "o"
>z = unlist(y)
>z
[1] "h" "e" "l" "l" "o"
Doing this then allows for :
> z[1]
[1] "h"
> z[4]
[1] "l"
The difference being that before z=unlist(y) when you try z[index] you get back NA, example:
> x = 'hello'
> strsplit(x, '')
[[1]]
[1] "h" "e" "l" "l" "o"
> x[1]
[1] "hello"
> x[2]
[1] NA
Anyways, what I want to do is find the mid point of words that are in this format so that the output would be something like:
"l"
in the case of the word "hello". Also, in this example we have a word with 5 letters allowing to easily designate a single character as the midpoint but for a word like "bake" I would like to designate both "a" and "k" together as the midpoint.

Try
f1 <- function(str1){
N <- nchar(str1)
if(!N%%2){
res <- substr(str1, N/2, (N/2)+1)
}
else{
N1 <- median(sequence(N))
res <- substr(str1, N1, N1)
}
res
}
f1('bake')
#[1] "ak"
f1('hello')
#[1] "l"

Another option. get_middle assumes the word has already been split into characters, as per your description:
get_middle <- function(x) {
mid <- (length(x) + 1) / 2
x[unique(c(ceiling(mid), floor(mid)))]
}
Then:
words <- c("bake", "hello")
lapply(strsplit(words, ""), get_middle)
Produces:
[[1]]
[1] "k" "a"
[[2]]
[1] "l"

You could try this:
midpoint <- function(word) {
# Split the word into a vector of letters
split <- strsplit(word, "")[[1]]
# Get the number of letters in the word
n <- nchar(word)
# Get the two middle letters for words of even length,
# otherwise get the single middle letter
if (n %% 2 == 0) {
c(split[n/2], split[n/2+1])
} else {
split[ceiling(n/2)]
}
}
In the case of a word of even length, the middle two characters are returned as a vector.
midpoint("hello")
#[1] "l"
midpoint("bake")
#[1] "a" "k"

How about:
mid<-function(str)substr(str,(nchar(str)+1)%/%2,(nchar(str)+2)%/%2)
Or slightly more legibly:
mid2<-function(str){
n1<-nchar(str)+1
substr(str,n1%/%2,(n1+1)%/%2)
}
> mid("bake")
[1] "ak"
> mid("hello")
[1] "l"
This has the advantage that it immediately vectorizes:
> mid(c("bake","hello"))
[1] "ak" "l"
It is slower than #akrun's solution for long words, but my second version is faster; apparently counting characters can be costly for longer strings.
If you want the final product in a list, you can just strsplit the result:
mid3<-function(str)strsplit(mid2(str),"")

word = c("bake","hello")
print(nchar(word))
q = ifelse (nchar(word)%%2==0, substr(word,nchar(word)/2,nchar(word)/2+1),substr(word,nchar(word)/2+1,nchar(word)/2+1))
print(q)
[1] 4 5
[1] "ak" "l"

Related

Match all elements of a pattern with a vector and in the same order

I created a function yes.seq that takes two arguments, a pattern pat and data dat. The function looks for the presence of a pattern in the data and in the same sequence
for example
dat <- letters[1:10]
dat
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
pat <- c('a',"c","g")
yes.seq(pat = pat,dat = dat)
# [1] TRUE
because this sequence is in the pattern and in the same order
"a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
If, for example, 'dat' is reversed, then we get FALSE:
yes.seq(pat = pat, dat = rev(dat))
# [1] FALSE
Here is my function
yes.seq <- function(pat , dat){
lv <- rep(F,length(pat))
k <- 1
for(i in 1:length(dat)){
if(dat[i] == pat[k])
{
lv[k] <- TRUE
k <- k+1
}
if(k==length(pat)+1) break
}
return( all(lv) )
}
Are there any more efficient solutions, this function is too slow for me
We could paste them and use either grepl
grepl(paste(pat, collapse=".*"), paste(dat, collapse=""))
#[1] TRUE
or str_detect
library(stringr)
str_detect(paste(dat, collapse=""), paste(pat, collapse=".*"))
#[1] TRUE
Another option:
yes.seq <- function(pat, dat) {
all(pat %in% dat) && all(diff(na.omit(match(pat, dat))) > 0)
}
yes.seq(pat, dat)
# [1] TRUE
yes.seq(c(pat, "ZZ"), dat)
# [1] FALSE
yes.seq(pat, rev(dat))
# [1] FALSE
Here is another base R option
yes.seq <- function(pat,dat) identical(order(match(pat, dat)), seq_along(pat))

break and next functions in R on a single word using strsplit() function

I am trying to use break and next in for loop, my code is as below:
for(i in strsplit('chapter', '')){
if(i == 'p'){
break
}
print(i)
}
Expected output:
c
h
a
for(i in strsplit('chapter', '')){
if(i == 'p'){
next
}
print(i)
}
Expected Output:
c
h
a
t
e
r
But my output for both of above loops is:
[1] "c" "h" "a" "p" "t" "e" "r"
Warning message:
In if (i == "p") { :
the condition has length > 1 and only the first element will be used
>
Also I don't understand the Warning message, why I am getting that.
I tried another numeric example:
x <- c(1,5,2,6,8,5,9,1)
for (val in x) {
if (val == 5){
next
}
print(val)
}
Output is:
[1] 1
[1] 2
[1] 6
[1] 8
[1] 9
[1] 1
>
Here although number 5 is there in two places in the vector, the output doesn't show the warning "the condition has length > 1 and only the first element will be used"
If you look at the output of strsplit
strsplit('chapter', '')
#[[1]]
#[1] "c" "h" "a" "p" "t" "e" "r"
It is a list of length 1 and that list has individual elements. So when you iterate over it in a for loop you are just iterating over 1st list element. What you needed is to select the first list element and then iterate over each individual elements.
strsplit('chapter', '')[[1]]
#[1] "c" "h" "a" "p" "t" "e" "r"
You will get your required output if you do
for(i in strsplit('chapter', '')[[1]]){
if(i == 'p'){
break
}
print(i)
}
#[1] "c"
#[1] "h"
#[1] "a"
for(i in strsplit('chapter', '')[[1]]){
if(i == 'p'){
next
}
print(i)
}
#[1] "c"
#[1] "h"
#[1] "a"
#[1] "t"
#[1] "e"
#[1] "r"

Split a string on alternating index

I have a string similar to "HLeelmloon" which is two words interweaved together. How can I separate this into two separate words, splitting on alternating letters?
I can use strsplit() and a for loop to allocate alternating letters to two new vectors and then join the list but this seems very long winded:
string <- "HLeelmloon"
split<-el(strsplit(string,''))
> split
[1] "H" "L" "e" "e" "l" "m" "l" "o" "o" "n"
word1<-c()
word2<-c()
for(i in 1:length(split)){
if(i %% 2 == 1){
word1<-append(word1, split[i])
} else {
word2<-append(word2, split[i])
}
}
word1 = paste0(word1, collapse = '')
word2 = paste0(word2, collapse = '')
> word1
[1] "Hello"
> word2
[1] "Lemon"
My issue is it's not very elegant, and it doesn't upscale well if I want to split the string into N different words. Is there a better way to do this?
You could use gsub to capture alternating characters into the same group:
gsub("(.)(.)?", "\\1", string)
#[1] "Hello"
gsub("(.)(.)?", "\\2", string)
#[1] "Lemon"
You can do it by using TRUE and FALSE for indexing, i.e.
v1 = strsplit(string, '')[[1]]
paste(v1[c(TRUE, FALSE)], collapse = '')
#[1] "Hello"
paste(v1[c(FALSE, TRUE)], collapse = '')
#[1] "Lemon"
Considering your question is how to split into more than two words, you should use the split function. Using your example data can be a bit confusing because you chose to name one variable 'split'. In the following block, the first 'split' is the function, the second one your split variable.
number_of_words <- 2
lapply(split(split,1:number_of_words),paste0,collapse='')
$`1`
[1] "Hello"
$`2`
[1] "Lemon"
number_of_words <- 3
lapply(split(split,1:number_of_words),paste0,collapse='')
$`1`
[1] "Heln"
$`2`
[1] "Llo"
$`3`
[1] "emo"
To avoid confusion, here's the same code without the variable named split:
number_of_words <- 2
lapply(split(el(strsplit(string,'')),1:number_of_words),paste0,collapse='')
$`1`
[1] "Hello"
$`2`
[1] "Lemon"
Try this code:
paste0(split[seq(1,nchar(string),by = 2)],collapse="")
[1] "Hello"
> paste0(split[seq(2,nchar(string),by = 2)],collapse="")
[1] "Lemon"
It appends even and odd positions in the string string
Another way using your split variable, will work with any number of words:
N <- 2
apply(matrix(split,N),1,paste,collapse="")
# [1] "Hello" "Lemon"

grab element of a list R

Hi I have the following data structure:
typeof(snp.seq)
[1] "list"
typeof(snp.seq[1])
[1] "list"
snp.seq[[1]]
[1]"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWN DPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ"
typeof(snp.seq[[1]])
"character"
> dput(snp.seq)
list("MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKGPV",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ")
where snp.seq is of type list, and each element of snp.seq is also a list - as demonstrated in the example.
What I need to do is to specify a number, and return the corresponding letter from the sequence in snp.seq[[1]].
i.e. position 1 is the letter 'M'
I have tried to call it by doing
snp.seq[[1,1]]]
snp.seq[[1],[1]]
of which both do not work as they are calling a position which is out of bounds.
How would I call any letter by giving a number (it's position in the sequence)?
Thanks
With the data in your example -
snp.seq <- list("MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKGPV",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ")
##
getLetter <- function(elem,pos){
if(pos > nchar(snp.seq[[elem]])){
stop("Position argument exceeds sequence length")
} else {
substr(snp.seq[[elem]],pos,pos)
}
}
##
So to get the first letter for the first three elements in snp.seq, you would do
> getLetter(1,1)
[1] "M"
> getLetter(2,1)
[1] "M"
> getLetter(3,1)
[1] "M"
> substr(snp.seq[[1]],1,3)
[1] "MYS"
> substr(snp.seq[[2]],1,3)
[1] "MYS"
> substr(snp.seq[[3]],1,3)
[1] "MDY"
(sampling the first few letters of these elements to validate).
You could use letter from Biostrings
library(Biostrings)
letter(snp.seq[[1]], 1)
#[1] "M"
letter(snp.seq[[3]], 3:1)
#[1] "YDM"
letter(snp.seq[[5]], 5:10)
#[1] "NTLRLY"
Try:
snp.seq <- list("MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKGPV",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ")
##
uls = unlist(snp.seq)
substr(uls,1,1)
[1] "M" "M" "M" "M" "M"
substr(uls,1,2)
[1] "MY" "MY" "MD" "MD" "MY"
substr(uls,3,5)
[1] "SFN" "SFN" "YRV" "YRV" "SFN"

Using character values as object names

I would like to use the characters in a vector as the names of character objects
aiming to get
first as say "d","e","a","t" etc.
tried this approach but am clearly missing some function to apply to x[i]
x <- c("first","second","third"..)
for (i in 1:length(x)) {
x[i] <- sample(letters,4)
}
TIA
The function you are looking for is assign():
> x <- c("first","second","third")
> for (i in 1:length(x)) {
+ assign(x[i], sample(letters,4))
+ }
>
> ls()
[1] "first" "i" "second" "third" "x"
> first
[1] "t" "d" "u" "j"
> second
[1] "o" "i" "p" "l"
> third
[1] "w" "v" "r" "n"
As an alternative, you could build these vectors as different elements of a list:
> mylist <- list()
> for (i in 1:length(x)) {
+ mylist[[x[i]]] <- sample(letters,4)
+ }
> mylist
$first
[1] "e" "l" "y" "d"
$second
[1] "t" "o" "k" "h"
$third
[1] "g" "x" "p" "b"
You don't say what you will be doing with this object. You may get the simplest structure by using a named vector:
names(x) <- x
x[] <- sample(letters, 4)
If you do not use the paired bracket on the LHS, the whole vector gets replaced and the names will be lost. You can now access the values with quoted names:
> x
first second third fourth
"w" "c" "r" "x"
> x["second"]
second
"c"

Resources