str_extract in a vector by position [duplicate] - r

This question already has answers here:
Extracting the nth character from a vector of strings [duplicate]
(2 answers)
Extract first N characters from each string in a vector
(1 answer)
Closed 1 year ago.
I have a vector like this
cod <- c("6W41_CH", "6W41_CL" ,"6WPS_AH", "7C01_BC", "7C01_BD", "7C01_BL", "7C2L_AH", "7C2L_BI", "7C2L_CJ",
"7C8V_BA", "7C8W_BA", "7CAH_AD")
I'm doing an iteration and for each time I'd like to have for the cod[i] only the letter in the 6th position. I was trying to use str_extract. How I can do?

Position based extraction should be faster and more efficient with substr. We can provide the start and stop positions which is 6
substr(cod, 6, 6)
#[1] "C" "C" "A" "B" "B" "B" "A" "B" "C" "B" "B" "A"
or with str_sub
library(stringr)
str_sub(cod, 6, 6)
#[1] "C" "C" "A" "B" "B" "B" "A" "B" "C" "B" "B" "A"
If we need to use str_extract, specify a regex lookaround stating that we need to extract the next character after 5 characters
str_extract(cod, '(?<=^.{5}).')
#[1] "C" "C" "A" "B" "B" "B" "A" "B" "C" "B" "B" "A"

Related

Smartest way for making a sequence of characters in R

I am going to make the below sequence in R:
A A B B B A A B B B
I have used the below code:
rep(c("A","A","B","B","B"),2)
I got the correct answer as follows:
[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
But I don't like my code. I would like to see the smartest way for making the above sequence. I don't know if it is possible to make the above sequence using LETTERS[1:2].
Thank you in advance
You can do it without using rep at all:
LETTERS[(0:9 %% 5 > 1) + 1]
[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
Here you just replace 9 with however long you want the sequence to be.
You can use rep twice :
rep(rep(LETTERS[1:2], c(2, 3)), 2)
#[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
A Reduce() version of #RonakShah's answer.
Reduce(rep, list(c(2, 3), 2), LETTERS[1:2])
# [1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
Another variant using rep and LETTERS:
LETTERS[rep(rep(1:2, 2:3), 2)]
# [1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"
An option with replicate
unlist(replicate(2, Map(rep, LETTERS[1:2], c(2, 3))))
#[1] "A" "A" "B" "B" "B" "A" "A" "B" "B" "B"

Concatenate two vectors while preserving order in R [duplicate]

This question already has answers here:
Interlacing two vectors [duplicate]
(2 answers)
Closed 3 years ago.
This is hard to explain, so I'll try and then leave a simple example. When I concatenate the vectors, I would like the first element of each vector next to each other, then the second elements next to each other, etc. See example below.
x <- c("a","b","c")
y <- c(1,2,3)
c(x,y)
[1] "a" "b" "c" "1" "2" "3"
However, I would like the following:
[1] "a" "1" "b" "2" "c" "3"
I'm sure there is an answer on here already, but I'm having trouble putting in the right search. Any help appreciated!
An option would be to rbind and then concatenate
c(rbind(x, y))
#[1] "a" "1" "b" "2" "c" "3"
and for general case when the vectors are not of same length, order on the sequence of elements concatentated
c(x, y)[order(c(seq_along(x), seq_along(y)))]
#[1] "a" "1" "b" "2" "c" "3"

How do you retrieve individual DNA sequences after importing an alignment into R?

I imported an alignment in FASTA format into R
read.dna(file.choose(),format="fasta",skip=0)
My alignment looks something like this
Seq1 ATGCGGGAATGGACTCATGCATCG
Seq2 ATTCGATCTTGCTAGCTAGCTCGT
Seq3 ATATCGATGTCGATCGATCGACGA
If I want to call individual sequences from within this alignment (say Seq2 for example), what do I need to do ?
I don't know where read.dna() comes from (there are >6000 CRAN packages, and almost 1000 Bioconductor packages). You could use the Biostrings package and
library(Biostrings)
dna = readDNAStringSet("path/to.fasta")
and do many useful things, including those described in the quick reference. If at the end you want a single character vector, then
as.character(dna[1])
or
as.character(dna[names(dna) == "Seq3"])
I am guessing that you are using ape package. Using the example in ?read.dna
library(ape)
cat(">No305",
"NTTCGAAAAACACACCCACTACTAAAANTTATCAGTCACT",
">No304",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAACCACT",
">No306",
"ATTCGAAAAACACACCCACTACTAAAAATTATCAATCACT",
file = "exdna.txt", sep = "\n")
ex.dna4 <- read.dna("exdna.txt", format = "fasta")
ex.dna4[dimnames(ex.dna4)[[1]]=='No304',]
#1 DNA sequences in binary format stored in a matrix.
#All sequences of same length: 40
#Labels: No304
#Base composition:
# a c g t
#0.475 0.300 0.025 0.200
as.character(ex.dna4[dimnames(ex.dna4)[[1]]=='No304'])
#[1] "a" "t" "t" "c" "g" "a" "a" "a" "a" "a" "c" "a" "c" "a" "c" "c" "c" "a" "c"
#[20] "t" "a" "c" "t" "a" "a" "a" "a" "a" "t" "t" "a" "t" "c" "a" "a" "c" "c" "a"
#[39] "c" "t"

How can I split a string and add them to vector?

I'd like to split a character vector so that additional members are added to the length of the vector.
> va <- c("a", "b", "c;d;e")
[1] "a" "b" "c;d;e"
> vb <- strsplit(va, ";")
[[1]]
[1] "a"
[[2]]
[1] "b"
[[3]]
[1] "c" "d" "e"
Can can I get vb vector in the same format as va vector so that I get 1-dimensional, 5 member vector in vb as such?
[1] "a" "b" "c" "d" "e"
Appreciate the help.
One possibility:
unlist(vb)
# [1] "a" "b" "c" "d" "e"
Or
scan(text=va, sep=";",what="")
#Read 5 items
# [1] "a" "b" "c" "d" "e"

Access nested structure

Are there some nice designs to call data in a nested structure e.g.
a<-list(list(LETTERS[1:3],LETTERS[1:3]),list(LETTERS[4:6]))
lapply(a,function(x) lapply(x, function(x) x))
but unlist is not a option.
Not as good as #SimonO101's answer but just for providing as an alternative you can do it using do.call
> do.call(c,do.call(c, a))
[1] "A" "B" "C" "A" "B" "C" "D" "E" "F"
Also using Reduce
> do.call(c, Reduce(c, a))
[1] "A" "B" "C" "A" "B" "C" "D" "E" "F"
Recursive lapply... a.k.a rapply?
rapply( a , c )
[1] "A" "B" "C" "A" "B" "C" "D" "E" "F"

Resources