This question already has answers here:
Remove part of string after "."
(6 answers)
gsub() in R is not replacing '.' (dot)
(3 answers)
Closed last year.
I have a list
t <- list('mcd.norm_1','mcc.norm_1', 'mcr.norm_1')
How can i convert the list to remove the period and everything after so the list is just
'mcd' 'mcc' 'mcr'
You may try
library(stringr)
lapply(t, function(x) str_split(x, "\\.", simplify = T)[1])
Another possible solution:
library(tidyverse)
t <- list('mcd.norm_1','mcc.norm_1', 'mcr.norm_1')
t %>%
str_remove("\\..*")
#> [1] "mcd" "mcc" "mcr"
This could be another option:
unlist(sapply(t, \(x) regmatches(x, regexec(".*(?=\\.)", x, perl = TRUE))))
[1] "mcd" "mcc" "mcr"
Related
This question already has answers here:
Finding the most repeated character in a string in R
(2 answers)
Closed 1 year ago.
Suppose the next character string:
test_string <- "A A B B C C C H I"
Is there any way to extract the most frequent value within test_string?
Something like:
extract_most_frequent_character(test_string)
Output:
#C
We can use scan to read the string as a vector of individual elements by splitting at the space, get the frequency count with table, return the named index that have the max count (which.count), get its name
extract_most_frequent_character <- function(x) {
names(which.max(table(scan(text = x, what = '', quiet = TRUE))))
}
-testing
extract_most_frequent_character(test_string)
[1] "C"
Or with strsplit
extract_most_frequent_character <- function(x) {
names(which.max(table(unlist(strsplit(x, "\\s+")))))
}
Here is another base R option (not as elegant as #akrun's answer)
> intToUtf8(names(which.max(table(utf8ToInt(gsub("\\s", "", test_string))))))
[1] "C"
One possibility involving stringr could be:
names(which.max(table(str_extract_all(test_string, "[A-Z]", simplify = TRUE))))
[1] "C"
Or marginally shorter:
names(which.max(table(str_extract_all(test_string, "[A-Z]")[[1]])))
Here is solution using stringr package, table and which:
library(stringr)
test_string <- str_split(test_string, " ")
test_string <- table(test_string)
names(test_string)[which.max(test_string)]
[1] "C"
This question already has an answer here:
regular expression match digit and characters
(1 answer)
Closed 2 years ago.
I need an additional solution to the previous question/answer
Move characters from beginning of column name to end of column name
I have a dataset where column names have two parts divided by _ e.g.
pal036a_lon
pal036a_lat
pal036a_elevation
I would like to convert the prefixes into suffixes so that it becomes:
lon_pal036a
lat_pal036a
elevation_pal036a
The answer to the previous question
names(df) <- sub("([a-z])_([a-z]+)", "\\2_\\1", names(df))
does not work for numbers within the prefixes.
Assuming your names have a single _. You could also you strsplit():
sapply(strsplit(names(df), '_'), function(x) paste(rev(x), collapse = '_'))
If you have more than one you could modify the above as suggested by jay.sf:
sapply(strsplit(x, "_"), function(x) paste(c(x[length(x)], x[-length(x)]), collapse="_"))
You can include alphanumeric characters in the first group:
names(df) <- sub("([a-z0-9]+)_([a-z]+)", "\\2_\\1", names(df))
For example :
x <- c("pal036a_lon","pal036a_lat","pal036a_elevation")
sub("([a-z0-9]+)_([a-z]+)", "\\2_\\1",x)
#[1] "lon_pal036a" "lat_pal036a" "elevation_pal036a"
This question already has answers here:
Remove the letters between two patterns of strings in R
(3 answers)
Closed 2 years ago.
I have a data frame with this kind of expression in column C:
GT_rs9628326:N_rs9628326
GT_rs1111:N_rs1111
GT_rs8374:N_rs8374
Using R, I want to remove everything between the first "T" and ":", as well as everything after the "N". I know this can be done with gsub. I would get:
GT:N
GT:N
GT:N
Maybe you can try
gsub("_\\w+","",s)
giving
[1] "GT:N" "GT:N" "GT:N"
Data
s <- c("GT_rs9628326:N_rs9628326","GT_rs1111:N_rs1111","GT_rs8374:N_rs8374")
Another option would be splitting the strings by : and then replace non necessary text in order to collapse all together again by same split symbol (I have used #ThomasIsCoding data thanks):
#Data
v1 <- c("GT_rs9628326:N_rs9628326","GT_rs1111:N_rs1111","GT_rs8374:N_rs8374")
#Code
unlist(lapply(lapply(strsplit(v1,split = ':'),
function(x) sub("_[^_]+$", "", x)),
function(x) paste0(x,collapse = ':')))
Output:
[1] "GT:N" "GT:N" "GT:N"
Using str_remove from stringr
library(stringr)
str_remove_all(s, "_\\w+")
#[1] "GT:N" "GT:N" "GT:N"
data
s <- c("GT_rs9628326:N_rs9628326","GT_rs1111:N_rs1111","GT_rs8374:N_rs8374")
Remove a word after either "T" or "N". Using #ThomasIsCoding's data.
gsub('(?<=T|N)\\w+', '', s, perl = TRUE)
#[1] "GT:N" "GT:N" "GT:N"
This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 7 years ago.
I would like to turn vector
a <- 1:5
into a string
"1_2_3_4_5"
How to do this?
paste(a,sep="_")
does not work this way.
You could try collapse
paste(a, collapse="_")
#[1] "1_2_3_4_5"
you can try this
paste0(a, collapse="_")
# [1] "1_2_3_4_5"
Just for fun:
gsub(', ', '_', toString(a))
#[1] "1_2_3_4_5"
This question already has answers here:
Concatenate a vector of strings/character
(8 answers)
Closed 6 years ago.
I was working with the paste command in R, when I found that
a <- c("something", "to", "paste")
paste(a, sep="_")
produces the output
# [1] "something" "to" "paste"
Which is same as when I print "a"
# [1] "something" "to" "paste"
So what effect does the sep have on the paste command in R?
sep is more generally applicable when you have more than two vectors of length greater than 1. If you were looking to get "something_to_paste", then you would be looking for the collapse argument.
Try the following to get a sense of what the sep argument does:
paste(a, 1:3, sep = "_")
# [1] "something_1" "to_2" "paste_3"
and compare it to collapse:
paste(a, collapse = "_")
# [1] "something_to_paste"