This question already has answers here:
List distinct values in a vector in R
(7 answers)
Closed 2 years ago.
I would like to extract duplicated strings from a list. As, the unique function does not work on non-numerical data, I used the stringi package with the stri_duplicated function to obtain logical values (TRUE or FALSE). I would like to extract the strings that are duplicated from the list (the strings for which stri_duplicated reports a TRUE).
Here a minimal example:
ex1 <- c("SE1", "SE2", "SE5", "SE2")
dupl <- stri_duplicated(ex1)
> dupl
[1] FALSE FALSE FALSE TRUE
Many thanks in advance.
In base-R there is
duplicated(ex1)
[1] FALSE FALSE FALSE TRUE
if you want to extract the duplicated items
ex1[duplicated(ex1)]
[1] "SE2"
Related
This question already has answers here:
Using regex in R to find strings as whole words (but not strings as part of words)
(2 answers)
Closed 1 year ago.
I referred this question (How to filter Exact match string using dplyr) but mine is slightly different as the word is not the start but can occur anywhere in the string. I want TRUE to be returned only for first one not the second & third
library(stringr)
vec <- c("this should be selected", "thisus should not be selected","not selected thisis too")
str_detect(vec,"this")
Current output
TRUE TRUE TRUE
Expected output
TRUE FALSE FALSE
Use a word boundary (\\b)
stringr::str_detect(vec,"\\bthis\\b")
#[1] TRUE FALSE FALSE
In base R :
grepl('\\bthis\\b', vec)
This question already has answers here:
Find the words in list of strings
(1 answer)
Regular expression pipe confusion
(5 answers)
Closed 3 years ago.
I want to detect (and then extract) month names from text using str_detect and str_extract.
For this, I create an object containing all month names and abbreviations.
m <- paste(c(month.name, month.abb), collapse = "|")
> m
[1] "January|February|March|April|May|June|July|August|September|October|November|December|Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec"
Then, I want to detect any of the entries occurring as a single word (surrounded by word boundaries):
stringr::str_detect(c("inJan", "Jan"), str_glue("\\b{m}\\b"))
This, however, returns TRUE TRUE (I expect FALSE TRUE, as the first is not a single word.
I suspect this is due to the collapsing of the list, as stringr::str_detect(c("inJan", "Jan"), str_glue("\\bJan\\b")) returns the expected FALSE TRUE.
I need to detect occurrences of m, however. What's the best way to go about this?
This question already has answers here:
How to sort a character vector where elements contain letters and numbers?
(6 answers)
Closed 3 years ago.
I have data that looks like the following, except the numbers are out of order:
dat<-
paste("Experience",1:20,sep="_")
Basically, I am trying to sort the columns in numerical order based on the ending number to order them as the code above produces. However, when I sort the values, it sorts based on the first digit as such:
"Experience_1" "Experience_10" "Experience_11" "Experience_12"
"Experience_13" "Experience_14" "Experience_15" "Experience_16"
"Experience_17" "Experience_18" "Experience_19" "Experience_2"
"Experience_20" "Experience_3" "Experience_4" "Experience_5"
"Experience_6" "Experience_7" "Experience_8" "Experience_9"
Thoughts?
The Stringr library, a part of the tidyverse, has str_sort() which sorts strings numerically in R.
library(stringr)
str_sort(dat, numeric = TRUE)
An option would be mixedsort from gtools
gtools::mixedsort(dat)
#[1] "Experience_1" "Experience_2" "Experience_3" "Experience_4" "Experience_5" "Experience_6"
#[7] "Experience_7" "Experience_8" "Experience_9" "Experience_10" "Experience_11" "Experience_12"
#[13] "Experience_13" "Experience_14" "Experience_15" "Experience_16" "Experience_17" "Experience_18"
#[19] "Experience_19" "Experience_20"
This question already has answers here:
Boolean operators && and ||
(4 answers)
Closed 3 years ago.
I can't find the answer and the simple approaches I've tried haven't worked.
Basically, I have two corresponding dataframes with identical dimensions, full of boolean values.
I want "OR" logic, to produce a third corresponding dataframe with a TRUE anywhere either starting dataframes had TRUE.
df1 <- data.frame(a=c(T,T),
b=c(F,F))
df2 <- data.frame(a=c(F,T),
b=c(F,T))
Desired output:
a b
[1,] TRUE FALSE
[2,] TRUE TRUE
It works using the | operator:
df1 | df2
a b
[1,] TRUE FALSE
[2,] TRUE TRUE
This question already has answers here:
In R, how do you differentiate a result is vector or matrix?
(2 answers)
Closed 7 years ago.
Example code:
> sapply(list(1:3,25:29),median)
[1] 2 27
Is this output considered to be a vector or a matrix? Is there a command that I can use to determine this kind of information directly?
Hat-tip and thank you to thelatemail:
> is.vector(sapply(list(1:3,25:29),median))
[1] TRUE
> is.matrix(sapply(list(1:3,25:29),median))
[1] FALSE
As mentioned by him: The general pattern in R is as.xxx to convert and is.xxx to test.