This question already has answers here:
Filter rows which contain a certain string
(5 answers)
Closed 5 years ago.
Below is the image that describes my data frame, I wish to conditionally delete all city names which have "Range" written in them as indicated in the snippet. I tried various approaches but haven't been successful so far.
There are two things: detect a pattern in a character vector, you can use stringr::str_detect() and extract a subset of rows, this is dplyr::filter() purpose.
library(dplyr)
library(stringr)
df <- df %>%
filter( ! str_detect(City, "Range") )
Use grep with invert option to select all lines without Range.
yourDataFrame <- yourDataFrame[grep("Range", yourDataFrame$City, invert = TRUE), ]
Related
This question already has an answer here:
Search within a string that does not contain a pattern
(1 answer)
Closed 10 months ago.
Considering that I have a dataframe (3+ Million rows) (df) with a column named as Text containing a sentence in each row. I want to filter the data in a way that it excludes those rows which have certain keywords.
I know that if you want to filter a dataframe if it has some certain strings you can do as follows:
df <- df %>% filter(grepl('first|second', Text))
And this will filter the rows including first and second as keywords only.
How can I filter the rows excluding the above two keywords?
You can simply use ! before your grepl like this:
df <- df %>% filter(!grepl('first|second', Text))
This question already has answers here:
Opposite of %in%: exclude rows with values specified in a vector
(13 answers)
Closed last year.
How can I filter out (exclude) from a single column called "record". I would like to exclude record = (1,2,3,6,8,10,15,16) from a single column. dataset name is "sample". Sorry for a simple question I am brand new to R.
sample data set below
The dplyr library from tidyverse is very helpful for these types of problems.
library(dplyr)
df_filtered<-df %>%
filter(!(record %in% c(1,2,3,6,8,10,15,16)))
This question already has answers here:
Selecting data frame rows based on partial string match in a column
(4 answers)
Closed 1 year ago.
I want to filter a dataframe using dplyr contains() and filter. Must be simple, right? The examples I've seen use base R grepl which sort of defeats the object. Here's a simple dataframe:
site_type <- c('Urban','Rural','Rural Background','Urban Background','Roadside','Kerbside')
df <- data.frame(row_id, site_type)
df <- as.tibble(df)
df
Now I want to filter the dataframe by all rows where site.type contains the string background.
I can find the string directly if I know the unique values of site_type:
filtered_df <- filter(df, site_type == 'Urban Background')
But I want to do something like:
filtered_df <- filter(df, site_type(contains('background', match_case = False)))
Any ideas how to do that? Can dplyr helper contains only be used with columns and not rows?
The contains function in dplyr is a select helper. It's purpose is to help when using the select function, and the select function is focused on selecting columns not rows. See documentation here.
filter is the intended mechanism for selecting rows. The function you are probably looking for is grepl which does pattern matching for text.
So the solution you are looking for is probably:
filtered_df <- filter(df, grepl("background", site_type, ignore.case = TRUE))
I suspect that contains is mostly a wrapper applying grepl to the column names. So the logic is very similar.
References:
grep R documentation
high rated question applying exactly this technique
This question already has answers here:
dplyr::group_by_ with character string input of several variable names
(2 answers)
Group by multiple columns in dplyr, using string vector input
(10 answers)
Closed 5 years ago.
My code need to group by column names. The problem that the code adds or removes columns to data.frame automatically, thus putting columns names by hand is not good solution.
Is there work around this problem. I tried solutions like this but obviously it doesn’t work. In addition the dataframe stretches to over 100 columns.
myDataFrame1 <- myDataFrame %>% group_by( colnames(myDataFrame) )
How can I paste the column names into group_by() automatically.
Thanks for help
We can make use of the group_by_ if there are more columns. Suppose, we want to have the first three columns as the grouping variable,
library(dplyr)
myDataFrame %>%
group_by_(.dots = names(myDataFrame)[1:3])
This question already has answers here:
Using grep in R to delete rows from a data.frame
(5 answers)
Closed 8 years ago.
I want to find rows in a dataframe that do not match a pattern.
Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
I can find items beginning with I60 using:
df2 <- subset (df1, grepl("^I60", df1$Code))
But I want to be able to find all the other rows (that is, those NOT beginning with I60). The invert argument does not work with grepl. grep on its own does not find all rows, nor can it pass the results to the subset command. Grateful for help.
You could use the [ operator and do
df1[!grepl("I60", Code),]
(Suggested clarification from #Hugh:) Another way would be
df1[!grepl("I60",df1$Code),]
Here is the reference manual on array indexing, which is done with [:
http://cran.r-project.org/doc/manuals/R-intro.html#Array-indexing
Also, you can try this:
Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
toRemove<-grep("^I60", df1$Code)
df2 <- df1[-toRemove,]