This question already has answers here:
Does R have function startswith or endswith like python? [closed]
(6 answers)
How to determine if a string "ends with" another string in R?
(5 answers)
Closed 2 days ago.
I'm trying to find all variables names that end with "DE".
I'm already using this "grep" to find variables that start with "TR"
names(df )[ grep( "^(TR)" , names(df) ) ]
Is there a grep way to find the end pattern.
Just an fyi, I know I can do it with "ends_with" but I'm trying to find a base r method.
names( df %>% select(ends_with( "DE" ) ) )
Thanks.
There is already endsWith in base R
names(df)[endsWith(names(df), "DE")]
-a reproducible example
> names(iris)[endsWith(names(iris), "Length")]
[1] "Sepal.Length" "Petal.Length"
Or with grep use the $ to specify the end of the string (also, with grep, we can use value = TRUE which returns the names instead of the numeric index
grep("DE$", names(df), value = TRUE)
# similar to
names(df)[grep("DE$", names(df)]
Or in a base R pipe
grep("DE$", names(df)) |>
`[`(names(df), i = _)
grep("Length", names(iris)) |>
`[`(names(iris), i = _)
[1] "Sepal.Length" "Petal.Length"
Related
This question already has answers here:
Remove part of string after "."
(6 answers)
gsub() in R is not replacing '.' (dot)
(3 answers)
Closed last year.
I have a list
t <- list('mcd.norm_1','mcc.norm_1', 'mcr.norm_1')
How can i convert the list to remove the period and everything after so the list is just
'mcd' 'mcc' 'mcr'
You may try
library(stringr)
lapply(t, function(x) str_split(x, "\\.", simplify = T)[1])
Another possible solution:
library(tidyverse)
t <- list('mcd.norm_1','mcc.norm_1', 'mcr.norm_1')
t %>%
str_remove("\\..*")
#> [1] "mcd" "mcc" "mcr"
This could be another option:
unlist(sapply(t, \(x) regmatches(x, regexec(".*(?=\\.)", x, perl = TRUE))))
[1] "mcd" "mcc" "mcr"
This question already has answers here:
Finding the most repeated character in a string in R
(2 answers)
Closed 1 year ago.
Suppose the next character string:
test_string <- "A A B B C C C H I"
Is there any way to extract the most frequent value within test_string?
Something like:
extract_most_frequent_character(test_string)
Output:
#C
We can use scan to read the string as a vector of individual elements by splitting at the space, get the frequency count with table, return the named index that have the max count (which.count), get its name
extract_most_frequent_character <- function(x) {
names(which.max(table(scan(text = x, what = '', quiet = TRUE))))
}
-testing
extract_most_frequent_character(test_string)
[1] "C"
Or with strsplit
extract_most_frequent_character <- function(x) {
names(which.max(table(unlist(strsplit(x, "\\s+")))))
}
Here is another base R option (not as elegant as #akrun's answer)
> intToUtf8(names(which.max(table(utf8ToInt(gsub("\\s", "", test_string))))))
[1] "C"
One possibility involving stringr could be:
names(which.max(table(str_extract_all(test_string, "[A-Z]", simplify = TRUE))))
[1] "C"
Or marginally shorter:
names(which.max(table(str_extract_all(test_string, "[A-Z]")[[1]])))
Here is solution using stringr package, table and which:
library(stringr)
test_string <- str_split(test_string, " ")
test_string <- table(test_string)
names(test_string)[which.max(test_string)]
[1] "C"
This question already has answers here:
Extract a regular expression match
(12 answers)
Closed 1 year ago.
As per this link, I wrote a regex that does not give the expected result when executed for a specific string in R:
string <- "0,9% BB"
regex <- "^ ?\\d+[\\d ,\\.]*[B-DF-HJ-NP-TV-Z\\/]*%?"
grep(regex, string, value = T, perl = T)
The result output is
[1] "0,9% BB"
instead of the desired (and outputed by the link)
[1] "0,9%"
What am I missing to get the desired output? Preferably base R, please.
This returns "0,9%" using only base R
string <- "0,9% BB"
regex <- "^ ?\\d+[\\d ,\\.]*[B-DF-HJ-NP-TV-Z\\/]*%?"
regmatches(x = string, m = regexpr(regex,string,perl = TRUE))
This question already has answers here:
grep using a character vector with multiple patterns
(11 answers)
Closed 2 years ago.
Is is possible to find the names in a vector that contain either id OR group Or both in the example below?
I have used grepl() without success.
a = c("c-id" = 2, "g_idgroups" = 3, "z+i" = 4)
grepl(c("id", "group"), names(a)) # return name of elements that contain either `id` OR `group` OR both
You can use :
pattern <- c("id", "group")
grep(paste0(pattern, collapse = '|'), names(a), value = TRUE)
#[1] "c-id" "g_igroups"
With grepl you can get logical value
grepl(paste0(pattern, collapse = '|'), names(a))
#[1] TRUE TRUE FALSE
A stringr solution :
stringr::str_subset(names(a), paste0(pattern, collapse = '|'))
#[1] "c-id" "g_igroups"
Using str_detect:
> names(a)[str_detect(names(a), 'id|groups')]
[1] "c-id" "g_idgroups"
> names(a)
[1] "c-id" "g_idgroups" "z+i"
>
I have a question about the use of gsub. The rownames of my data, have the same partial names. See below:
> rownames(test)
[1] "U2OS.EV.2.7.9" "U2OS.PIM.2.7.9" "U2OS.WDR.2.7.9" "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9" "U2OS.EV.18.6.9" "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX" "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM" "X5.U2OS...EV" "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC" "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV" "EXP2.U2OS.MYC" "EXP2.U2OS.PIM1" "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"
In my previous question, I asked if there is a way to get the same names for the same partial names. See this question: Replacing rownames of data frame by a sub-string
The answer is a very nice solution. The function gsub is used in this way:
transfecties = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", rownames(test)
Now, I have another problem, the program I run with R (Galaxy) doesn't recognize the | characters. My question is, is there another way to get to the same solution without using this |?
Thanks!
If you don't want to use the "|" character, you can try something like :
Rnames <-
c( "U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9" ,
"U2OS.OBX.2.7.9" , "U2OS.EV.18.6.9" ,"U2O2.PIM.18.6.9" ,"U2OS.WDR.18.6.9" )
Rlevels <- c("MYC","EV","PIM","WDR","OBX")
tmp <- sapply(Rlevels,grepl,Rnames)
apply(tmp,1,function(i)colnames(tmp)[i])
[1] "EV" "PIM" "WDR" "MYC" "OBX" "EV" "PIM" "WDR"
But I would seriously consider mentioning this to the team of galaxy, as it seems to be rather awkward not to be able to use the symbol for OR...
I wouldn't recommend doing this in general in R as it is far less efficient than the solution #csgillespie provided, but an alternative is to loop over the various strings you want to match and do the replacements on each string separately, i.e. search for "MYN" and replace only in those rownames that match "MYN".
Here is an example using the x data from #csgillespie's Answer:
x <- c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
"U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
"U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")
Copy the data so we have something to compare with later (this just for the example):
x2 <- x
Then create a list of strings you want to match on:
matches <- c("MYC","EV","PIM","WDR","OBX")
Then we loop over the values in matches and do three things (numbered ##X in the code):
Create the regular expression by pasting together the current match string i with the other bits of the regular expression we want to use,
Using grepl() we return a logical indicator for those elements of x2 that contain the string i
We then use the same style gsub() call as you were already shown, but use only the elements of x2 that matched the string, and replace only those elements.
The loop is:
for(i in matches) {
rgexp <- paste(".*(", i, ").*", sep = "") ## 1
ind <- grepl(rgexp, x) ## 2
x2[ind] <- gsub(rgexp, "\\1", x2[ind]) ## 3
}
x2
Which gives:
> x2
[1] "EV" "PIM" "WDR" "MYC" "OBX" "EV" "PIM" "WDR" "MYC" "OBX" "OBX" "MYC"