This question already has answers here:
Trying to return a specified number of characters from a gene sequence in R
(3 answers)
Extracting the last n characters from a string in R
(15 answers)
Closed 5 years ago.
Is there a function in R that I can cut a value in vector.
for example i got this vec:
40754831597
64278107602
64212163451
and each vale in the vec i want to cut so from the number pos 3 to 6 for example and get a new vector look like this
7548
2781
2121
and so on
I don't really get why you would like to do this, but here you go:
# assuming it's a character vector
substring(vec,3,6)
# if it's numeric
substring(as.character(vec),3,6)
#output
#[1] "7548" "2781" "2121"
We can use sub
sub(".{2}(.{4}).*", "\\1", v1)
#[1] "7548" "2781" "2121"
data
v1 <- c(40754831597, 64278107602, 64212163451)
Related
This question already has answers here:
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Count the number of pattern matches in a string
(6 answers)
Closed 2 years ago.
Say I have a strings
seq1 <- "ACTACTGGATGACT"
pattern1 <- "ACT"
What is the best way to find the number of times the pattern is in the sequence, in R? I would like to use a sliding window for loop, but im not clear on the proper way to handle the character strings.
We can use str_count
library(stringr)
str_count(seq1, pattern1)
#[1] 3
This question already has answers here:
Regex in R: matching the string before a sequence of characters
(4 answers)
Closed 3 years ago.
Data :
"Danger Sign 14x10 Aluminum - No Smoking Beyond This Point"
Output Desired : First find the pattern "x" . Second, extract characters 7 indices before "x" and 7 indices after "x".
If anyone has any clue , please reply.
You can nest grep and substr
ind <- grep("x", Data)
substr(Data, ind-7, ind+7)
This question already has answers here:
Getting and removing the first character of a string
(7 answers)
Closed 5 years ago.
I am working on Bioinformatics recently. I have to edit row.names for my variable. Here is the situation for me:
I have clinical data and gene expression values downloaded from Cancer Genome Atlas. I have to match row names but in clinical data I have row names like this "TCGA-6D-AA2E". But in gene expressions row names like "TCGA-6D-AA2E-01A-11R-A38B-07".
Normally I used "match" command to match row names but the character lengths are not same. So my question is "Is there easy way to edit character length for row names?"
You could use grep function instead:
gene.names <- c("TCGA-6D-AA2E-01A-11R-A38B-07", "TCGC-6D-AA2E-01A-11R-A38B-07", "TAGA-6D-AA2E-01A-11R-07", "TCGA-6D-AA2E-A38B-07")
pick <- "TCGA-6D-AA2E"
grep(pick, gene.names)
# [1] 1 4
Edit based on the comment: Use substr to pick 12 first characters:
substr(gene.names, 0,12)
#[1] "TCGA-6D-AA2E" "TCGC-6D-AA2E" "TAGA-6D-AA2E" "TCGA-6D-AA2E"
This question already has answers here:
Count values separated by a comma in a character string
(5 answers)
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 6 years ago.
I have a column with a piped list of identifiers
Identifier
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786
P28066|P60900|O14818|P20618|P40306
Q99436|P28062|P28065
P28062|P28065|P62191|P35998|P17980|P43686
How do I produce a column of the numbers of identifiers in each row?
Output to read something like this
Identifier Count
O75496|P62979|P62987|P0CG47|P0CG48|O00487|P25786 7
P28066|P60900|O14818|P20618|P40306 5
Q99436|P28062|P28065 3
P28062|P28065|P62191|P35998|P17980|P43686 6
Thanks in advance!
sapply(strsplit(df$Identifier, '[|]'), length)
for unique cases, just add the unique function
sapply(strsplit(df$Identifier, '[|]'), function(i) length(unique(i)))
A base R option without splitting would be
df1$Count <- nchar(gsub("[^|]", "", df1$Identifier)) + 1L
df1$Count
#[1] 7 5 3 6
Or with gregexpr
sapply(gregexpr("[|]", df1$Identifier),
function(x) sum(attr(x, "match.length"))+1)
#[1] 7 5 3 6
This question already has answers here:
Digit sum function in R
(4 answers)
Closed 7 years ago.
I've got this simple question: how can I change a vector consisting of 10 numbers into a vector consisting of ten numbers which are the sum of the figures of the first numbers? So 11 in the first vector becomes 2, 234 becomes 9.
We can use str_extract_all from stringr to get the individual numbers, convert them to numeric and get the sum.
library(stringr)
sapply(str_extract_all(c(11, 234), '\\d'), function(x) sum(as.numeric(x)))