Replace text values in a vector [duplicate] - r

This question already has answers here:
Dictionary style replace multiple items
(11 answers)
Closed 5 years ago.
Here's my data :
dataset <- c("h", "H", "homme", "masculin", "f", "femme", "épouse")
How can I replace text values of the vector like :
"femme" -> "f"
"épouse" ->"f"
"Homme"-> "h"
"masculin" -> "h"
What I tried for "femme" -> "f"
test_out <- sapply(dataset, switch,
"f"="femme")
test_out
Expected result :
"h" "h" "h" "masculin" "f" "f" "f"

Try gsub with regular expressions:
dataset = gsub("^((?!h).*)$", "f", gsub("^((h|H|m).*)$", "h", dataset), perl=TRUE)

Related

Complementary sequence using gsub

I'm trying to make the complementary sequence of a dna chain stored in a vector.
It's supposed to change the "A" for the "T" and the "C" for the "G" and vice versa, the thing is, I need this to happen to the first vector and print the complementary sequence correctly. This is what I tried but got stucked:
pilot_sequence <- c("C","G","A","T","C","C","T","A","T")
complement_sequence_display <- function(pilot_sequence){
complement_chain_Incom <- gsub("A", "T", pilot_sequence)
complement_chain <- paste(complement_chain_Incom, collapse = "")
cat("Complement sequence: ", complement_chain, "\n")
}
complement_chain_Incom <- gsub("A","T", pilot_sequence)
complement_chain <- paste(complement_chain_Incom, collapse= "")
complement_sequence_display(pilot_sequence)
I got as answer: CGTTCCTTT,just the second and penultimate T are correct, how do I solve to the rest of letters ?
the pilot_sequence vector is character type and the functions displays no execution errors.
This is a ideal use case for chartr function:
chartr("ATGC","TACG",pilot_sequence)
output:
[1] "G" "C" "T" "A" "G" "G" "A" "T" "A"
You can do this with purrr::map:
pilot_sequence |> purrr::map_chr(~case_when(
.x == "T" ~ "A",
.x == "G" ~ "C",
.x == "A" ~ "T",
.x == "C" ~ "G"
))
#> [1] "G" "C" "T" "A" "G" "G" "A" "T" "A"
You can use recode from dplyr
library(dplyr)
recode(pilot_sequence, "C" = "G", "G" = "C", "A" = "T", "T" = "A")
Or in base R, create a named vector and use match to match the values location in the named vector and then call name to get the names
pilot_sequence <- c("C","G","A","T","C","C","T","A","T")
values = c("G" = "C", "C" = "G", "A" = "T", "T" = "A")
names(values[match(pilot_sequence, values)])
"G" "C" "T" "A" "G" "G" "A" "T" "A"

Split a string into a vector of single character strings [duplicate]

This question already has answers here:
Split a character vector into individual characters? (opposite of paste or stringr::str_c)
(4 answers)
Closed 4 years ago.
I have a string like "abcde" and want it split into a vector like
> c("a", "b", "c", "d", "e")
[1] "a" "b" "c" "d" "e"
I found one way that I'll post as an answer but I am hoping someone else has a simpler way to do this, either in base R or using a package.
Using the stringr package:
library(stringr)
as.vector(str_split_fixed(x, pattern = "", n = nchar(x)))
[1] "a" "b" "c" "d" "e"
str_split_fixed produces a matrix that has to be coerced into a vector.

Why does my while loop gets stuck ? -Programming in R

I am trying to make a function that counts de caracteres between "a" "t" "g" and "t" "a" "g" or "t" "g" "a"or "t" "a" "a" inside of a vector. But my code gets stuck in the while loop. An example would be like x = "a" "a" "a" "t" "a" "t" "g" "t" "c" "g" "t " "t " "t" "t" "a" "g". In this example the code should count 6 characters between "a" "t" "g" and "t" "a" "g". Any help will be appreciated :) .
orfs<-function(x,p){
count<-0
cntorfs<-0
n<-length(x)
v<-n-2
for (i in 1:v){
if(x[i]=="a"&& x[i+1]=="t"&& x[i+2]=="g"){
k<-i+3;
w<-x[k]
y<-x[k+1]
z<-x[k+2]
while (((w!="t")&&(y!="a")&& (z!="g"))||((w!="t")&&(y!="a")&&(z!="a"))||((w!="t")&&(y!="g")&& (z!="a"))||(i+2>v)){
count<-count+1
k<-k+1
w<-x[k]
y<-x[k+1]
z<-x[k+2]
}
}
if(count>p){
cntorfs<-cntorfs+1
}
if (count!=0){
count<-0
}
}
cat("orf:",cntorfs)
}
This is a very inefficient and un-R-like way to count the number of characters between two patterns.
Here is an alternative using gsub that should get you started and can be extended to account for the other stop codons:
x <- c("a", "a", "a", "t", "a", "t", "g", "t", "c", "g", "t", "t", "t", "t", "a", "g")
nchar(gsub("[actg]*atg([actg]*)tag[actg]*", "\\1", paste0(x, collapse = "")))
#[1] 6
A more robust and general approach can be found here making use of Biostrings::matchPattern. I would strongly advise against reinventing the wheel here, and instead recommend using some of the standard Bioconductor packages that were developed for exactly these kind of tasks.

Compressing a string in R [duplicate]

This question already has an answer here:
Collapse vector to string of characters with respective numbers of consequtive occurences
(1 answer)
Closed 4 years ago.
I have a large sequence of strings containing only the following characters
"M", "D", "A"
such as:
"M" "M" "A" "A" "D" "D" "M" "D" "A"
and I would like to compress it to:
M2A2D2M1D1A1
in R. Googling has led me to this (a java solution) but before implementing it, it would be interesting to check if I can find something ready online. Thanks!
R function rle() is your friend.
testVector <- sample(c("M", "D", "A"), 20, replace=T)
res <- rle(testVector)
compressedString <- paste(res$values, res$lengths, collapse = "", sep = "")

return number of specific element of vector based of its name [duplicate]

This question already has answers here:
Convert letters to numbers
(5 answers)
Closed 5 years ago.
I need to return number of element in vector based on vector element name. Lets say i have vector of letters:
myLetters=letters[1:26]
> myLetters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
and what I intent to do is to create/find function that returns me the number of element when called for example:
myFunction(myLetters["b"])
[1] 2
myFunction(myLetters["z"])
[1]26
In summary I need a way to refer to excel columns by writing letters of a column (A,B,C later maybe even AA or further) and to get the number.
If you want to refer to excel columnnames, you could create a reference vector with all possible excel column names:
eg1 <- expand.grid(LETTERS, LETTERS)
eg2 <- expand.grid(LETTERS, LETTERS, LETTERS)
excelcols <- c(LETTERS, paste0(eg1[[2]], eg1[[1]]), paste0(paste0(eg2[[3]], eg2[[2]], eg2[[1]])))
After which you can use which:
> which(excelcols == 'A')
[1] 1
> which(excelcols == 'AB')
[1] 28
> which(excelcols == 'ABC')
[1] 731
If you need to find the number of times specific letter occurs then the following should work:
myLetters = c("a","a", "b")
myFunction = function(myLetters, findLetter){
length(which(myLetters==findLetter))
}
Let find how many times "a" occurs in myLetters:
myFunction(myLetters, "a")
# [1] 2

Resources