let x be the vector
[1] "hi" "hello" "Nyarlathotep"
Is it possible to produce a vector, let us say y, from x s.t. its components are
[1] "hi" "hello" "Nyarl"
?
In other words, I would need a command in R which cuts strings of text to a given length (in the above, length=5).
Thanks a lot!
More obvious than substring to me would be strtrim:
> x <- c("hi", "hello", "Nyarlathotep")
> x
[1] "hi" "hello" "Nyarlathotep"
> strtrim(x, 5)
[1] "hi" "hello" "Nyarl"
substring is great for extracting data from within a string at a given position, but strtrim does exactly what you're looking for.
The second argument is widths and that can be a vector of widths the same length as the input vector, in which case, each element can be trimmed by a specified amount.
> strtrim(x, c(1, 2, 3))
[1] "h" "he" "Nya"
Use substring see details in ?substring
> x <- c("hi", "hello", "Nyarlathotep")
> substring(x, first=1, last=5)
[1] "hi" "hello" "Nyarl"
Last update
You can also use sub with regex
> sub("(.{5}).*", "\\1", x)
[1] "hi" "hello" "Nyarl"
A (probably) faster alternative is sprintf():
sprintf("%.*s", 5, x)
[1] "hi" "hello" "Nyarl"
Related
If I define a vector as:
vec <- c("for", "paste")
Is there any way to apply a selection on this vector and use the result as an operator? I have tried like this:
vec[1](i, 0:10, print("Hello"))
but the result is an error:
Error: attempt to apply non-function
The first element in 'vec' i.e. for is a Primitive function, so, we can append .Primitive
.Primitive(vec[1])(i, 0:10, print("Hello"))
-output
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
#[1] "Hello"
while paste is not Primitive. Not clear from the OP's post about the expected output for second element. With match.fun, we can use
match.fun(vec[2])(rep("Hello", 10), collapse=", ")
#[1] "Hello, Hello, Hello, Hello, Hello, Hello, Hello, Hello, Hello, Hello"
assuming that OP' wants to paste 10 "Hello" into a single string
How about?
vec <- c("for", "paste")
do.call(vec[[1]], list(as.symbol('i'), 0:10, substitute(print('Hello'))))
I want to extract the string before certain keywords and the first element right after the keyword. Given the following strings and the keywords,
s <- c("E123Apple12", "EJ23ZGrape0Z", "J8BananaZ!")
keywords <- c("Apple", "Grape", "Banana")
I would expect the output to be: E123, EJ23Z, and J8 for strings before the keywords, and 1, 0, and Z for the first element that appears right after the keyword.
Using sub(keywords, "\\1", s) gives me the following error:
Warning message:
In sub(keywords, "\\", s) :
argument 'pattern' has length > 1 and only the first element will be used
Your keywords need to be a regex string, rather than an R vector representing multiple matches. Then you can replace any matching keyword with an empty string, leaving just the characters around it:
keywords <- "(Apple|Grape|Banana)"
sub(keywords, "", s) # [1] "E123" "EJ23Z" "J8"
If you want just the characters before or after the keyword, you can match them with .*:
s <- c("E123AppleABC", "EJ23ZGrapeDEF", "J8BananaGHI")
keywords <- "(Apple|Grape|Banana).*"
sub(keywords, "", s) # [1] "E123" "EJ23Z" "J8"
keywords <- ".*(Apple|Grape|Banana)"
sub(keywords, "", s) # [1] "ABC" "DEF" "GHI"
If you have parallel vectors one way to do this would be to use strsplit, but you'll need to massage the result a little.
strsplit(s, keywords)
Results in:
[[1]]
[1] "E123" "12"
[[2]]
[1] "EJ23Z" "0Z"
[[3]]
[1] "J8" "Z!"
You need to select the first member in each list and combine into a vector like this:
unlist(lapply(strsplit(s, keywords), "[[", 1))
Which outputs
[1] "E123" "EJ23Z" "J8"
If you want what's after the string just do
unlist(lapply(strsplit(s, keywords), "[[", 2))
keywords <- "(Apple|Grape|Banana)"
sub(paste0("(.*)",keywords,".*"),'\\1',s)
[1] "E123" "EJ23Z" "J8"
sub(paste0(".*",keywords,"(\\w)",".*$"),'\\2',s)
[1] "1" "0" "Z"
for (x in keywords) {
s <- gsub(paste0("(.*)", x), "\\1", s)
}
s
# [1] "E123" "EJ23Z" "J8"
If I have a string,
x <- "Hello World"
How can I access the second word, "World", using string split, after
x <- strsplit(x, " ")
x[[2]] does not do anything.
As mentioned in the comments, it's important to realise that strsplit returns a list object. Since your example is only splitting a single item (a vector of length 1) your list is length 1. I'll explain with a slightly different example, inputting a vector of length 3 (3 text items to split):
input <- c( "Hello world", "Hi there", "Back at ya" )
x <- strsplit( input, " " )
> x
[[1]]
[1] "Hello" "world"
[[2]]
[1] "Hi" "there"
[[3]]
[1] "Back" "at" "ya"
Notice that the returned list has 3 elements, one for each element of the input vector. Each of those list elements is split as per the strsplit call. So we can recall any of these list elements using [[ (this is what your x[[2]] call was doing, but you only had one list element, which is why you couldn't get anything in return):
> x[[1]]
[1] "Hello" "world"
> x[[3]]
[1] "Back" "at" "ya"
Now we can get the second part of any of those list elements by appending a [ call:
> x[[1]][2]
[1] "world"
> x[[3]][2]
[1] "at"
This will return the second item from each list element (note that the "Back at ya" input has returned "at" in this case). You can do this for all items at once using something from the apply family. sapply will return a vector, which will probably be good in this case:
> sapply( x, "[", 2 )
[1] "world" "there" "at"
The last value in the input here (2) is passed to the [ operator, meaning the operation x[2] is applied to every list element.
If instead of the second item, you'd like the last item of each list element, we can use tail within the sapply call instead of [:
> sapply( x, tail, 1 )
[1] "world" "there" "ya"
This time, we've applied tail( x, 1 ) to every list element, giving us the last item.
As a preference, my favourite way to apply actions like these is with the magrittr pipe, for the second word like so:
x <- input %>%
strsplit( " " ) %>%
sapply( "[", 2 )
> x
[1] "world" "there" "at"
Or for the last word:
x <- input %>%
strsplit( " " ) %>%
sapply( tail, 1 )
> x
[1] "world" "there" "ya"
Another approach that might be a little easier to read and apply to a data frame within a pipeline (though it takes more lines) would be to wrap it in your own function and apply that.
library(tidyverse)
df <- data.frame(
greetings = c( "Hello world", "Hi there", "Back at ya" )
)
split_params = function (x, sep, n) {
# Splits string into list of substrings separated by 'sep'.
# Returns nth substring.
x = strsplit(x, sep)[[1]][n]
return(x)
}
df = df %>%
mutate(
'greetings' = sapply(
X = greetings,
FUN = split_params,
# Arguments for split_params.
sep = ' ',
n = 2
)
)
df
### (Output in RStudio Notebook)
greetings second_word
<chr> <chr>
Hello world world
Hi there there
Back at ya at
3 rows
###
With stringr 1.5.0, you can use str_split_i to access the ith element of a split string:
library(stringr)
x <- "Hello World"
str_split_i(x, " ", i = 2)
#[1] "World"
It is vectorized:
x <- c("Hello world", "Hi there", "Back at ya")
str_split_i(x, " ", 2)
#[1] "world" "there" "at"
x=strsplit("a;b;c;d",";")
x
[[1]]
[1] "a" "b" "c" "d"
x=as.character(x[[1]])
x
[1] "a" "b" "c" "d"
x=strsplit(x," ")
x
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c"
[[4]]
[1] "d"
Consider:
x<-strsplit("This is an example",split="\\s",fixed=FALSE)
I am surprised to see that x has length 1 rather than length 4:
> length(x)
[1] 1
like this, x[3] is null. But If I unlist, then:
> x<-unlist(x)
> x
[1] "This" "is" "an" "example"
> length(x)
[1] 4
only now x[3] is "an".
Why wasn't that list originally by length 4 so that elements can be accessed by indexing? This gives troubles to access the splitted elements, since I have to unlist first.
This allows strsplit to be vectorized for its input argument. For instance, it will allow you to split a vector such as:
x <- c("string one", "string two", "and string three")
into a list of split results.
You do not need to unlist, but rather, you can refer to the element by a combination of its list index and the vector index. For instance, if you wanted to get the second word in the second item, you can do:
> x <- c("string one", "string two", "and string three")
> y <- strsplit(x, "\\s")
> y[[2]][2]
[1] "two"
That's because strsplit generates a list containing each element (word).
Try
> x[[1]]
#[1] "This" "is" "an" "example"
and
> length(x[[1]])
#[1] 4
Hi I have the following data structure:
typeof(snp.seq)
[1] "list"
typeof(snp.seq[1])
[1] "list"
snp.seq[[1]]
[1]"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWN DPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ"
typeof(snp.seq[[1]])
"character"
> dput(snp.seq)
list("MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKGPV",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ")
where snp.seq is of type list, and each element of snp.seq is also a list - as demonstrated in the example.
What I need to do is to specify a number, and return the corresponding letter from the sequence in snp.seq[[1]].
i.e. position 1 is the letter 'M'
I have tried to call it by doing
snp.seq[[1,1]]]
snp.seq[[1],[1]]
of which both do not work as they are calling a position which is out of bounds.
How would I call any letter by giving a number (it's position in the sequence)?
Thanks
With the data in your example -
snp.seq <- list("MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKGPV",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ")
##
getLetter <- function(elem,pos){
if(pos > nchar(snp.seq[[elem]])){
stop("Position argument exceeds sequence length")
} else {
substr(snp.seq[[elem]],pos,pos)
}
}
##
So to get the first letter for the first three elements in snp.seq, you would do
> getLetter(1,1)
[1] "M"
> getLetter(2,1)
[1] "M"
> getLetter(3,1)
[1] "M"
> substr(snp.seq[[1]],1,3)
[1] "MYS"
> substr(snp.seq[[2]],1,3)
[1] "MYS"
> substr(snp.seq[[3]],1,3)
[1] "MDY"
(sampling the first few letters of these elements to validate).
You could use letter from Biostrings
library(Biostrings)
letter(snp.seq[[1]], 1)
#[1] "M"
letter(snp.seq[[3]], 3:1)
#[1] "YDM"
letter(snp.seq[[5]], 5:10)
#[1] "NTLRLY"
Try:
snp.seq <- list("MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKSPMLNLFQEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKGPV",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ",
"MYSFNTLRLYLWETIVFFSLAASKEAEAARSAPKPMSPSDFLDKLMGRTSGYDARIRPNFKGPPVNVSCNIFINSFGSIAETTMDYRVNIFLRQQWNDPRLAYNEYPDDSLDLDPSMLDSIWKPDLFFANEKGAHFHEITTDNKLLRISRNGNVLYSIRITLTLACPMDLKNFPMDVQTCIMQLESFGYTMNDLIFEWQEQGAVQVADGLTLPQFILKEEKDLRYCTKHYNTGKFTCIEARFHLERQMGYYLIQMYIPSLLIVILSWISFWINMDAAPARVGLGITTVLTMTTQSSGSRASLPKVSYVKAIDIWMAVCLLFVFSALLEYAAVNFVSRQHKELLRFRRKRRHHKEDEAGEGRFNFSAYGMGPACLQAKDGISVKGANNSNTTNPPPAPSKSPEEMRKLFIQRAKKIDKISRIGFPMAFLIFNMFYWIIYKIVRREDVHNQ")
##
uls = unlist(snp.seq)
substr(uls,1,1)
[1] "M" "M" "M" "M" "M"
substr(uls,1,2)
[1] "MY" "MY" "MD" "MD" "MY"
substr(uls,3,5)
[1] "SFN" "SFN" "YRV" "YRV" "SFN"