R Not in subset [duplicate] - r

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Standard way to remove multiple elements from a dataframe
I know in R that if you are searching for a subset of another group or matching based on id you'd use something like
subset(df1, df1$id %in% idNums1)
My question is how to do the opposite or choose items NOT matching a vector of ids.
I tried using ! but get the error message
subset(df1, df1$id !%in% idNums1)
I think my backup is to do sometime like this:
matches <- subset(df1, df1$id %in% idNums1)
nonMatches <- df1[(-matches[,1]),]
but I'm hoping there's something a bit more efficient.

The expression df1$id %in% idNums1 produces a logical vector. To negate it, you need to negate the whole vector:
!(df1$id %in% idNums1)

Related

How to exclude rows from dataframe that match rows in same column of another dataframe? [duplicate]

This question already has answers here:
How I can select rows from a dataframe that do not match?
(3 answers)
Closed 1 year ago.
So I want to filter out certain rows in first dataframe also present in second dataset. The command below doesn't make a dataframe I desire.
newdf <- filter(df1, df1$organization_name != df2$organization_name)
Is there an alternative that works?
try this:
newdf <- filter(df1, !df1$organization_name %in% df2$organization_name)
###or using the pipe command:
newdf <- df1 %>%
filter(!df1$organization_name %in% df2$organization_name)
Some people prefer the "anti-join" function by the above suggestions are simpler and would do the trick

Filter row based on a string condition, dplyr filter, contains [duplicate]

This question already has answers here:
Selecting data frame rows based on partial string match in a column
(4 answers)
Closed 1 year ago.
I want to filter a dataframe using dplyr contains() and filter. Must be simple, right? The examples I've seen use base R grepl which sort of defeats the object. Here's a simple dataframe:
site_type <- c('Urban','Rural','Rural Background','Urban Background','Roadside','Kerbside')
df <- data.frame(row_id, site_type)
df <- as.tibble(df)
df
Now I want to filter the dataframe by all rows where site.type contains the string background.
I can find the string directly if I know the unique values of site_type:
filtered_df <- filter(df, site_type == 'Urban Background')
But I want to do something like:
filtered_df <- filter(df, site_type(contains('background', match_case = False)))
Any ideas how to do that? Can dplyr helper contains only be used with columns and not rows?
The contains function in dplyr is a select helper. It's purpose is to help when using the select function, and the select function is focused on selecting columns not rows. See documentation here.
filter is the intended mechanism for selecting rows. The function you are probably looking for is grepl which does pattern matching for text.
So the solution you are looking for is probably:
filtered_df <- filter(df, grepl("background", site_type, ignore.case = TRUE))
I suspect that contains is mostly a wrapper applying grepl to the column names. So the logic is very similar.
References:
grep R documentation
high rated question applying exactly this technique

Deleting Rows by condition [duplicate]

This question already has answers here:
Filter rows which contain a certain string
(5 answers)
Closed 5 years ago.
Below is the image that describes my data frame, I wish to conditionally delete all city names which have "Range" written in them as indicated in the snippet. I tried various approaches but haven't been successful so far.
There are two things: detect a pattern in a character vector, you can use stringr::str_detect() and extract a subset of rows, this is dplyr::filter() purpose.
library(dplyr)
library(stringr)
df <- df %>%
filter( ! str_detect(City, "Range") )
Use grep with invert option to select all lines without Range.
yourDataFrame <- yourDataFrame[grep("Range", yourDataFrame$City, invert = TRUE), ]

Delete multiple columns by reference using reverse selection in data.Table [duplicate]

This question already has an answer here:
How do I subset column variables in DF1 based on the important variables I got in DF2?
(1 answer)
Closed 5 years ago.
I want to delete the columns that are not in a list using reference.
library("data.table")
df <- data.frame("ID"=1:10,"A"=1:10,"B"=1:10,"C"=1:10,"D"=1:10)
setDT(df,key="ID")
list_to_keep <- c("ID","A","B","C")
df[,!names(df)%in%list_to_keep,with=FALSE]
gives me a selection of the columns that I want to delete, but when I do:
df <- data.frame("ID"=1:10,"A"=1:10,"B"=1:10,"C"=1:10,"D"=1:10)
setDT(df,key="ID")
list_to_keep <- c("ID","A","B","C")
df[,!names(df)%in%list_to_keep:=NULL,with=FALSE]
I get LHS of := isn't a column names ('character' or positions ('integer' or 'numeric'). What is the correct way of doing this?
We can use the setdiff to get the names of the dataset that are not in the list_to_keep and assign (:=) it to NULL
df[, setdiff(names(df), list_to_keep) := NULL]
As #rosscova mentioned, using which on the logical vector can be used to get the position of the column and to assign the columns to NULL
df[, which(!names(df)%in%list_to_keep):=NULL]
LHS of := is "A character vector of column names (or numeric positions) or a variable that evaluates as such."
!names(df)%in%list_to_keep is logical vector.
So,
df[,names(df)[!names(df)%in%list_to_keep]:=NULL]
will work.

! grep in R - finding items that do not match [duplicate]

This question already has answers here:
Using grep in R to delete rows from a data.frame
(5 answers)
Closed 8 years ago.
I want to find rows in a dataframe that do not match a pattern.
Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
I can find items beginning with I60 using:
df2 <- subset (df1, grepl("^I60", df1$Code))
But I want to be able to find all the other rows (that is, those NOT beginning with I60). The invert argument does not work with grepl. grep on its own does not find all rows, nor can it pass the results to the subset command. Grateful for help.
You could use the [ operator and do
df1[!grepl("I60", Code),]
(Suggested clarification from #Hugh:) Another way would be
df1[!grepl("I60",df1$Code),]
Here is the reference manual on array indexing, which is done with [:
http://cran.r-project.org/doc/manuals/R-intro.html#Array-indexing
Also, you can try this:
Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
toRemove<-grep("^I60", df1$Code)
df2 <- df1[-toRemove,]

Resources