This question already has answers here:
Filter rows which contain a certain string
(5 answers)
Closed 3 years ago.
Using this command it keeps the rows which have the specific word
df[df$ID == "interesting", ]
If this word is exist in the row but it has more words around how is it possible to find if this word exist and keep the row.
Example input
data.frame(text = c("interesting", " I am interesting for this", "remove")
Expected output
data.frame(text = c("interesting", " I am interesting for this")
1.Example data:
df <- data.frame(text = c("interesting", " I am interesting for this", "remove"),
stringsAsFactors = FALSE)
Solution using base R. Indexing using grepl:
df[grepl("interesting", df$text), ]
This returns:
[1] "interesting" " I am interesting for this"
Edit 1
Change code so that it returns a data.frame and not a vector.
df[grep("interesting", df$text), , drop = FALSE]
This now returns:
text
1 interesting
2 I am interesting for this
Related
This question already has answers here:
Extract a regular expression match
(12 answers)
Closed 1 year ago.
As per this link, I wrote a regex that does not give the expected result when executed for a specific string in R:
string <- "0,9% BB"
regex <- "^ ?\\d+[\\d ,\\.]*[B-DF-HJ-NP-TV-Z\\/]*%?"
grep(regex, string, value = T, perl = T)
The result output is
[1] "0,9% BB"
instead of the desired (and outputed by the link)
[1] "0,9%"
What am I missing to get the desired output? Preferably base R, please.
This returns "0,9%" using only base R
string <- "0,9% BB"
regex <- "^ ?\\d+[\\d ,\\.]*[B-DF-HJ-NP-TV-Z\\/]*%?"
regmatches(x = string, m = regexpr(regex,string,perl = TRUE))
This question already has answers here:
in R, use gsub to remove all punctuation except period
(4 answers)
Closed 2 years ago.
In the column text how it is possible to remove all punctuation remarks but keep only the ?
data.frame(id = c(1), text = c("keep<>-??it--!##"))
expected output
data.frame(id = c(1), text = c("keep??it"))
A more general solution would be to used nested gsub commands that converts ? to a particular unusual string (like "foobar"), gets rid of all punctuation, then writes "foobar" back to ?:
gsub("foobar", "?", gsub("[[:punct:]]", "", gsub("\\?", "foobar", df$text)))
#> [1] "keep??it"
Using gsub you could do:
gsub("(\\?+)|[[:punct:]]","\\1",df$text)
[1] "keep??it"
gsub('[[:punct:] ]+',' ',data) removes all punctuation which is not what you want.
But this is:
library(stringr)
sapply(df, function(x) str_replace_all(x, "<|>|-|!|#|#",""))
id text
[1,] "1" "a"
[2,] "2" "keep??it"
Better IMO than other answers because no need for nesting, and lets you define whichever characters to sub.
Here's another solution using negative lookahead:
gsub("(?!\\?)[[:punct:]]", "", df$text, perl = T)
[1] "keep??it"
The negative lookahead asserts that the next character is not a ? and then matches any punctuation.
Data:
df <- data.frame(id = c(1), text = c("keep<>-??it--!##"))
This question already has answers here:
How to remove all whitespace from a string?
(9 answers)
Closed 1 year ago.
I want to merge following two strings in R (and remove the spaces). I was using paste but I was not able to get desired results.
a <- "big earth"
b <- "small moon"
c <- paste(a,b, sep = "")
I want to have a c <- "bigearthsmallmoon"
Thank you very much for the help.
You can paste the strings together into one with paste(). Then you can use gsub() to remove all spaces:
gsub(" ", "", paste(a, b))
# [1] "bigearthsmallmoon"
c <- paste(a, b, sep = "")
This question already has an answer here:
Split Strings into values in long dataframe format [duplicate]
(1 answer)
Closed 3 years ago.
Having into one row of dataframe data like this:
data.frame(text = c("in this line ???? another line and ???? one more", "more lines ???? another row")
separate into many rows using as separation the ????. Here the expected output
data.frame(text = c("in this line", "another line and", "one more", "more lines", "another row")
Here is a base R solution
dfout <- data.frame(text = unlist(strsplit(as.character(df$text),split = " \\?{4} ")))
or a more efficient (Thanks to comments by #Sotos)
dfout <- data.frame(text = unlist(strsplit(as.character(df$text),split = " ???? ", fixed = TRUE)))
such that
> dfout
text
1 in this line
2 another line and
3 one more
4 more lines
5 another row
This question already has answers here:
Removing a group of words from a character vector
(2 answers)
Closed 5 years ago.
I think the title is a bit confusing, but here my problem:
I have 2 vectors, one containing some text the other one containing some phrases
text <- c("this is some text","some elements should be removed", "i hope you can help me with this text element problem")
pattern <- c("text", "some","be")
And now I want to remove all elements from patternwhich are in text, so as result vector
text_result
[1] "this is"
[2] "elements should removed"
[3] "i hope you can help me with this element problem"
I tried
text_result <- sapply(pattern, function(x) gsub(x, text, replacement =""))
or
text_result <- sapply(text, function(y) sapply(pattern, function(x)gsub(x,y,replacement ="")))
but in both cases I receive a large matrix with
length(pattern)*length(text) elements
thanks in advance!
You can try:
`%notin%` <- function(x,y) !(x %in% y)
lapply(strsplit(text," "),function(x) paste(x[x %notin% pattern],collapse=" "))