! grep in R - finding items that do not match [duplicate] - r

This question already has answers here:
Using grep in R to delete rows from a data.frame
(5 answers)
Closed 8 years ago.
I want to find rows in a dataframe that do not match a pattern.
Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
I can find items beginning with I60 using:
df2 <- subset (df1, grepl("^I60", df1$Code))
But I want to be able to find all the other rows (that is, those NOT beginning with I60). The invert argument does not work with grepl. grep on its own does not find all rows, nor can it pass the results to the subset command. Grateful for help.

You could use the [ operator and do
df1[!grepl("I60", Code),]
(Suggested clarification from #Hugh:) Another way would be
df1[!grepl("I60",df1$Code),]
Here is the reference manual on array indexing, which is done with [:
http://cran.r-project.org/doc/manuals/R-intro.html#Array-indexing

Also, you can try this:
Key = c(1,2,3,4,5)
Code = c("X348","I605","B777","I609","F123")
df1 <- data.frame(Key, Code)
toRemove<-grep("^I60", df1$Code)
df2 <- df1[-toRemove,]

Related

Filter row based on a string condition, dplyr filter, contains [duplicate]

This question already has answers here:
Selecting data frame rows based on partial string match in a column
(4 answers)
Closed 1 year ago.
I want to filter a dataframe using dplyr contains() and filter. Must be simple, right? The examples I've seen use base R grepl which sort of defeats the object. Here's a simple dataframe:
site_type <- c('Urban','Rural','Rural Background','Urban Background','Roadside','Kerbside')
df <- data.frame(row_id, site_type)
df <- as.tibble(df)
df
Now I want to filter the dataframe by all rows where site.type contains the string background.
I can find the string directly if I know the unique values of site_type:
filtered_df <- filter(df, site_type == 'Urban Background')
But I want to do something like:
filtered_df <- filter(df, site_type(contains('background', match_case = False)))
Any ideas how to do that? Can dplyr helper contains only be used with columns and not rows?
The contains function in dplyr is a select helper. It's purpose is to help when using the select function, and the select function is focused on selecting columns not rows. See documentation here.
filter is the intended mechanism for selecting rows. The function you are probably looking for is grepl which does pattern matching for text.
So the solution you are looking for is probably:
filtered_df <- filter(df, grepl("background", site_type, ignore.case = TRUE))
I suspect that contains is mostly a wrapper applying grepl to the column names. So the logic is very similar.
References:
grep R documentation
high rated question applying exactly this technique

creating, directly, data.tables with column names from variables, and using variables for column names with := [duplicate]

This question already has answers here:
Select / assign to data.table when variable names are stored in a character vector
(6 answers)
Closed 3 years ago.
The only way I know so far is in two steps: creating the columns with dummy names and then using setnames(). I would like to do it in one step, probably there is some parameter/option, but am not able to find it
# the awkward way I have found so far
col_names <- c("one", "two","three")
dt <- data.table()
# add columns with dummy names...
setnames(dt, col_names )
Also interested in a way to be able to use a variable with :=, something like
colNameVar <- "dummy_values"
DT[ , colNameVar := 1:10]
This question to me does not seem a duplicate of Select / assign to data.table when variable names are stored in a character vector
here I ask about when creating a data.table, word "creating" in the title.
This is totally different from when the data table is already created, which is the subject of the question indicated as duplicate, for the latter there are kown ways clearly documented, that do not work in the case I ask about here.
PS. Note similar question indicated in comment by # Ronak Shah: Create empty data frame with column names by assigning a string vector?
For the first question, I'm not absolutely sure, but you may want to try and see if fread is of any help creating an empty data.table with named columns.
As for the second question, try
DT[, c(nameOfCols) := 10]
Where nameOfCols is the vector with names of the columns you want to modify. See ?data.table

Deleting Rows by condition [duplicate]

This question already has answers here:
Filter rows which contain a certain string
(5 answers)
Closed 5 years ago.
Below is the image that describes my data frame, I wish to conditionally delete all city names which have "Range" written in them as indicated in the snippet. I tried various approaches but haven't been successful so far.
There are two things: detect a pattern in a character vector, you can use stringr::str_detect() and extract a subset of rows, this is dplyr::filter() purpose.
library(dplyr)
library(stringr)
df <- df %>%
filter( ! str_detect(City, "Range") )
Use grep with invert option to select all lines without Range.
yourDataFrame <- yourDataFrame[grep("Range", yourDataFrame$City, invert = TRUE), ]

In R, how can i replace for loops with apply functions? [duplicate]

This question already has answers here:
How to do vlookup and fill down (like in Excel) in R?
(9 answers)
Closed 5 years ago.
I am trying to apply an operation which applies on each row of a dataframe. Below is the code.
for(i in 1:nrow(df)){
df$final[i] <- alignfile[alignfile$Response == df$value[i],]$Aligned
}
It is basically doing the vlookup from "alignfile" data frame and making a new column with the successful vlookup of "value" column in data frame "df".
How do i replace this operation with apply family of function so that i can get rid of for loops which is making it slow.
Looking for suggestions. Please feel free for more clarifications.
Thanks
You didn't provide a reproducible example so take my answer with a grain of salt,
I think you don't need to use a for loop at all in this case (as in most cases with R) and neither an apply function. I think this problem could be easily solved with an ifelse in the following way:
df$final <- ifelse(alignfile$Response==df$value, 1, 0)
this will put a one in the final column of the df dataframe if the value in the current cell of the alignfile$Response column is equal to the value of the current cell in the df$value column. This assumes alignfile and df have the same number of rows (as it appears from the code you provided).

R Not in subset [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Standard way to remove multiple elements from a dataframe
I know in R that if you are searching for a subset of another group or matching based on id you'd use something like
subset(df1, df1$id %in% idNums1)
My question is how to do the opposite or choose items NOT matching a vector of ids.
I tried using ! but get the error message
subset(df1, df1$id !%in% idNums1)
I think my backup is to do sometime like this:
matches <- subset(df1, df1$id %in% idNums1)
nonMatches <- df1[(-matches[,1]),]
but I'm hoping there's something a bit more efficient.
The expression df1$id %in% idNums1 produces a logical vector. To negate it, you need to negate the whole vector:
!(df1$id %in% idNums1)

Resources